From 387b14684f94483cbbb72843db406ec9a8d0d6d2 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Wed, 10 Apr 2019 08:32:41 -0300
Subject: docs: locking: convert docs to ReST and rename to *.rst

Convert the locking documents to ReST and add them to the
kernel development book where it belongs.

Most of the stuff here is just to make Sphinx to properly
parse the text file, as they're already in good shape,
not requiring massive changes in order to be parsed.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
---
 Documentation/kernel-hacking/locking.rst           |   2 +-
 Documentation/locking/index.rst                    |  24 +
 Documentation/locking/lockdep-design.rst           | 394 ++++++++++++++
 Documentation/locking/lockdep-design.txt           | 389 --------------
 Documentation/locking/lockstat.rst                 | 204 ++++++++
 Documentation/locking/lockstat.txt                 | 183 -------
 Documentation/locking/locktorture.rst              | 170 ++++++
 Documentation/locking/locktorture.txt              | 145 ------
 Documentation/locking/mutex-design.rst             | 152 ++++++
 Documentation/locking/mutex-design.txt             | 142 -----
 Documentation/locking/rt-mutex-design.rst          | 574 +++++++++++++++++++++
 Documentation/locking/rt-mutex-design.txt          | 559 --------------------
 Documentation/locking/rt-mutex.rst                 |  77 +++
 Documentation/locking/rt-mutex.txt                 |  73 ---
 Documentation/locking/spinlocks.rst                | 177 +++++++
 Documentation/locking/spinlocks.txt                | 167 ------
 Documentation/locking/ww-mutex-design.rst          | 393 ++++++++++++++
 Documentation/locking/ww-mutex-design.txt          | 383 --------------
 Documentation/pi-futex.txt                         |   2 +-
 .../translations/it_IT/kernel-hacking/locking.rst  |   2 +-
 drivers/gpu/drm/drm_modeset_lock.c                 |   2 +-
 include/linux/lockdep.h                            |   2 +-
 include/linux/mutex.h                              |   2 +-
 include/linux/rwsem.h                              |   2 +-
 kernel/locking/mutex.c                             |   2 +-
 kernel/locking/rtmutex.c                           |   2 +-
 lib/Kconfig.debug                                  |   4 +-
 27 files changed, 2176 insertions(+), 2052 deletions(-)
 create mode 100644 Documentation/locking/index.rst
 create mode 100644 Documentation/locking/lockdep-design.rst
 delete mode 100644 Documentation/locking/lockdep-design.txt
 create mode 100644 Documentation/locking/lockstat.rst
 delete mode 100644 Documentation/locking/lockstat.txt
 create mode 100644 Documentation/locking/locktorture.rst
 delete mode 100644 Documentation/locking/locktorture.txt
 create mode 100644 Documentation/locking/mutex-design.rst
 delete mode 100644 Documentation/locking/mutex-design.txt
 create mode 100644 Documentation/locking/rt-mutex-design.rst
 delete mode 100644 Documentation/locking/rt-mutex-design.txt
 create mode 100644 Documentation/locking/rt-mutex.rst
 delete mode 100644 Documentation/locking/rt-mutex.txt
 create mode 100644 Documentation/locking/spinlocks.rst
 delete mode 100644 Documentation/locking/spinlocks.txt
 create mode 100644 Documentation/locking/ww-mutex-design.rst
 delete mode 100644 Documentation/locking/ww-mutex-design.txt

diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst
index dc698ea456e0..a8518ac0d31d 100644
--- a/Documentation/kernel-hacking/locking.rst
+++ b/Documentation/kernel-hacking/locking.rst
@@ -1364,7 +1364,7 @@ Futex API reference
 Further reading
 ===============
 
--  ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking
+-  ``Documentation/locking/spinlocks.rst``: Linus Torvalds' spinlocking
    tutorial in the kernel sources.
 
 -  Unix Systems for Modern Architectures: Symmetric Multiprocessing and
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
new file mode 100644
index 000000000000..ef5da7fe9aac
--- /dev/null
+++ b/Documentation/locking/index.rst
@@ -0,0 +1,24 @@
+:orphan:
+
+=======
+locking
+=======
+
+.. toctree::
+    :maxdepth: 1
+
+    lockdep-design
+    lockstat
+    locktorture
+    mutex-design
+    rt-mutex-design
+    rt-mutex
+    spinlocks
+    ww-mutex-design
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/locking/lockdep-design.rst b/Documentation/locking/lockdep-design.rst
new file mode 100644
index 000000000000..23fcbc4d3fc0
--- /dev/null
+++ b/Documentation/locking/lockdep-design.rst
@@ -0,0 +1,394 @@
+Runtime locking correctness validator
+=====================================
+
+started by Ingo Molnar <mingo@redhat.com>
+
+additions by Arjan van de Ven <arjan@linux.intel.com>
+
+Lock-class
+----------
+
+The basic object the validator operates upon is a 'class' of locks.
+
+A class of locks is a group of locks that are logically the same with
+respect to locking rules, even if the locks may have multiple (possibly
+tens of thousands of) instantiations. For example a lock in the inode
+struct is one class, while each inode has its own instantiation of that
+lock class.
+
+The validator tracks the 'usage state' of lock-classes, and it tracks
+the dependencies between different lock-classes. Lock usage indicates
+how a lock is used with regard to its IRQ contexts, while lock
+dependency can be understood as lock order, where L1 -> L2 suggests that
+a task is attempting to acquire L2 while holding L1. From lockdep's
+perspective, the two locks (L1 and L2) are not necessarily related; that
+dependency just means the order ever happened. The validator maintains a
+continuing effort to prove lock usages and dependencies are correct or
+the validator will shoot a splat if incorrect.
+
+A lock-class's behavior is constructed by its instances collectively:
+when the first instance of a lock-class is used after bootup the class
+gets registered, then all (subsequent) instances will be mapped to the
+class and hence their usages and dependecies will contribute to those of
+the class. A lock-class does not go away when a lock instance does, but
+it can be removed if the memory space of the lock class (static or
+dynamic) is reclaimed, this happens for example when a module is
+unloaded or a workqueue is destroyed.
+
+State
+-----
+
+The validator tracks lock-class usage history and divides the usage into
+(4 usages * n STATEs + 1) categories:
+
+where the 4 usages can be:
+- 'ever held in STATE context'
+- 'ever held as readlock in STATE context'
+- 'ever held with STATE enabled'
+- 'ever held as readlock with STATE enabled'
+
+where the n STATEs are coded in kernel/locking/lockdep_states.h and as of
+now they include:
+- hardirq
+- softirq
+
+where the last 1 category is:
+- 'ever used'                                       [ == !unused        ]
+
+When locking rules are violated, these usage bits are presented in the
+locking error messages, inside curlies, with a total of 2 * n STATEs bits.
+A contrived example::
+
+   modprobe/2287 is trying to acquire lock:
+    (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
+
+   but task is already holding lock:
+    (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
+
+
+For a given lock, the bit positions from left to right indicate the usage
+of the lock and readlock (if exists), for each of the n STATEs listed
+above respectively, and the character displayed at each bit position
+indicates:
+
+   ===  ===================================================
+   '.'  acquired while irqs disabled and not in irq context
+   '-'  acquired in irq context
+   '+'  acquired with irqs enabled
+   '?'  acquired in irq context with irqs enabled.
+   ===  ===================================================
+
+The bits are illustrated with an example::
+
+    (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
+                         ||||
+                         ||| \-> softirq disabled and not in softirq context
+                         || \--> acquired in softirq context
+                         | \---> hardirq disabled and not in hardirq context
+                          \----> acquired in hardirq context
+
+
+For a given STATE, whether the lock is ever acquired in that STATE
+context and whether that STATE is enabled yields four possible cases as
+shown in the table below. The bit character is able to indicate which
+exact case is for the lock as of the reporting time.
+
+  +--------------+-------------+--------------+
+  |              | irq enabled | irq disabled |
+  +--------------+-------------+--------------+
+  | ever in irq  |      ?      |       -      |
+  +--------------+-------------+--------------+
+  | never in irq |      +      |       .      |
+  +--------------+-------------+--------------+
+
+The character '-' suggests irq is disabled because if otherwise the
+charactor '?' would have been shown instead. Similar deduction can be
+applied for '+' too.
+
+Unused locks (e.g., mutexes) cannot be part of the cause of an error.
+
+
+Single-lock state rules:
+------------------------
+
+A lock is irq-safe means it was ever used in an irq context, while a lock
+is irq-unsafe means it was ever acquired with irq enabled.
+
+A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
+following states must be exclusive: only one of them is allowed to be set
+for any lock-class based on its usage::
+
+ <hardirq-safe> or <hardirq-unsafe>
+ <softirq-safe> or <softirq-unsafe>
+
+This is because if a lock can be used in irq context (irq-safe) then it
+cannot be ever acquired with irq enabled (irq-unsafe). Otherwise, a
+deadlock may happen. For example, in the scenario that after this lock
+was acquired but before released, if the context is interrupted this
+lock will be attempted to acquire twice, which creates a deadlock,
+referred to as lock recursion deadlock.
+
+The validator detects and reports lock usage that violates these
+single-lock state rules.
+
+Multi-lock dependency rules:
+----------------------------
+
+The same lock-class must not be acquired twice, because this could lead
+to lock recursion deadlocks.
+
+Furthermore, two locks can not be taken in inverse order::
+
+ <L1> -> <L2>
+ <L2> -> <L1>
+
+because this could lead to a deadlock - referred to as lock inversion
+deadlock - as attempts to acquire the two locks form a circle which
+could lead to the two contexts waiting for each other permanently. The
+validator will find such dependency circle in arbitrary complexity,
+i.e., there can be any other locking sequence between the acquire-lock
+operations; the validator will still find whether these locks can be
+acquired in a circular fashion.
+
+Furthermore, the following usage based lock dependencies are not allowed
+between any two lock-classes::
+
+   <hardirq-safe>   ->  <hardirq-unsafe>
+   <softirq-safe>   ->  <softirq-unsafe>
+
+The first rule comes from the fact that a hardirq-safe lock could be
+taken by a hardirq context, interrupting a hardirq-unsafe lock - and
+thus could result in a lock inversion deadlock. Likewise, a softirq-safe
+lock could be taken by an softirq context, interrupting a softirq-unsafe
+lock.
+
+The above rules are enforced for any locking sequence that occurs in the
+kernel: when acquiring a new lock, the validator checks whether there is
+any rule violation between the new lock and any of the held locks.
+
+When a lock-class changes its state, the following aspects of the above
+dependency rules are enforced:
+
+- if a new hardirq-safe lock is discovered, we check whether it
+  took any hardirq-unsafe lock in the past.
+
+- if a new softirq-safe lock is discovered, we check whether it took
+  any softirq-unsafe lock in the past.
+
+- if a new hardirq-unsafe lock is discovered, we check whether any
+  hardirq-safe lock took it in the past.
+
+- if a new softirq-unsafe lock is discovered, we check whether any
+  softirq-safe lock took it in the past.
+
+(Again, we do these checks too on the basis that an interrupt context
+could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
+could lead to a lock inversion deadlock - even if that lock scenario did
+not trigger in practice yet.)
+
+Exception: Nested data dependencies leading to nested locking
+-------------------------------------------------------------
+
+There are a few cases where the Linux kernel acquires more than one
+instance of the same lock-class. Such cases typically happen when there
+is some sort of hierarchy within objects of the same type. In these
+cases there is an inherent "natural" ordering between the two objects
+(defined by the properties of the hierarchy), and the kernel grabs the
+locks in this fixed order on each of the objects.
+
+An example of such an object hierarchy that results in "nested locking"
+is that of a "whole disk" block-dev object and a "partition" block-dev
+object; the partition is "part of" the whole device and as long as one
+always takes the whole disk lock as a higher lock than the partition
+lock, the lock ordering is fully correct. The validator does not
+automatically detect this natural ordering, as the locking rule behind
+the ordering is not static.
+
+In order to teach the validator about this correct usage model, new
+versions of the various locking primitives were added that allow you to
+specify a "nesting level". An example call, for the block device mutex,
+looks like this::
+
+  enum bdev_bd_mutex_lock_class
+  {
+       BD_MUTEX_NORMAL,
+       BD_MUTEX_WHOLE,
+       BD_MUTEX_PARTITION
+  };
+
+mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
+
+In this case the locking is done on a bdev object that is known to be a
+partition.
+
+The validator treats a lock that is taken in such a nested fashion as a
+separate (sub)class for the purposes of validation.
+
+Note: When changing code to use the _nested() primitives, be careful and
+check really thoroughly that the hierarchy is correctly mapped; otherwise
+you can get false positives or false negatives.
+
+Annotations
+-----------
+
+Two constructs can be used to annotate and check where and if certain locks
+must be held: lockdep_assert_held*(&lock) and lockdep_*pin_lock(&lock).
+
+As the name suggests, lockdep_assert_held* family of macros assert that a
+particular lock is held at a certain time (and generate a WARN() otherwise).
+This annotation is largely used all over the kernel, e.g. kernel/sched/
+core.c::
+
+  void update_rq_clock(struct rq *rq)
+  {
+	s64 delta;
+
+	lockdep_assert_held(&rq->lock);
+	[...]
+  }
+
+where holding rq->lock is required to safely update a rq's clock.
+
+The other family of macros is lockdep_*pin_lock(), which is admittedly only
+used for rq->lock ATM. Despite their limited adoption these annotations
+generate a WARN() if the lock of interest is "accidentally" unlocked. This turns
+out to be especially helpful to debug code with callbacks, where an upper
+layer assumes a lock remains taken, but a lower layer thinks it can maybe drop
+and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock()
+returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check
+that nobody tampered with the lock, e.g. kernel/sched/sched.h::
+
+  static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
+  {
+	rf->cookie = lockdep_pin_lock(&rq->lock);
+	[...]
+  }
+
+  static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
+  {
+	[...]
+	lockdep_unpin_lock(&rq->lock, rf->cookie);
+  }
+
+While comments about locking requirements might provide useful information,
+the runtime checks performed by annotations are invaluable when debugging
+locking problems and they carry the same level of details when inspecting
+code.  Always prefer annotations when in doubt!
+
+Proof of 100% correctness:
+--------------------------
+
+The validator achieves perfect, mathematical 'closure' (proof of locking
+correctness) in the sense that for every simple, standalone single-task
+locking sequence that occurred at least once during the lifetime of the
+kernel, the validator proves it with a 100% certainty that no
+combination and timing of these locking sequences can cause any class of
+lock related deadlock. [1]_
+
+I.e. complex multi-CPU and multi-task locking scenarios do not have to
+occur in practice to prove a deadlock: only the simple 'component'
+locking chains have to occur at least once (anytime, in any
+task/context) for the validator to be able to prove correctness. (For
+example, complex deadlocks that would normally need more than 3 CPUs and
+a very unlikely constellation of tasks, irq-contexts and timings to
+occur, can be detected on a plain, lightly loaded single-CPU system as
+well!)
+
+This radically decreases the complexity of locking related QA of the
+kernel: what has to be done during QA is to trigger as many "simple"
+single-task locking dependencies in the kernel as possible, at least
+once, to prove locking correctness - instead of having to trigger every
+possible combination of locking interaction between CPUs, combined with
+every possible hardirq and softirq nesting scenario (which is impossible
+to do in practice).
+
+.. [1]
+
+    assuming that the validator itself is 100% correct, and no other
+    part of the system corrupts the state of the validator in any way.
+    We also assume that all NMI/SMM paths [which could interrupt
+    even hardirq-disabled codepaths] are correct and do not interfere
+    with the validator. We also assume that the 64-bit 'chain hash'
+    value is unique for every lock-chain in the system. Also, lock
+    recursion must not be higher than 20.
+
+Performance:
+------------
+
+The above rules require **massive** amounts of runtime checking. If we did
+that for every lock taken and for every irqs-enable event, it would
+render the system practically unusably slow. The complexity of checking
+is O(N^2), so even with just a few hundred lock-classes we'd have to do
+tens of thousands of checks for every event.
+
+This problem is solved by checking any given 'locking scenario' (unique
+sequence of locks taken after each other) only once. A simple stack of
+held locks is maintained, and a lightweight 64-bit hash value is
+calculated, which hash is unique for every lock chain. The hash value,
+when the chain is validated for the first time, is then put into a hash
+table, which hash-table can be checked in a lockfree manner. If the
+locking chain occurs again later on, the hash table tells us that we
+don't have to validate the chain again.
+
+Troubleshooting:
+----------------
+
+The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
+Exceeding this number will trigger the following lockdep warning:
+
+	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+
+By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
+desktop systems have less than 1,000 lock classes, so this warning
+normally results from lock-class leakage or failure to properly
+initialize locks.  These two problems are illustrated below:
+
+1.	Repeated module loading and unloading while running the validator
+	will result in lock-class leakage.  The issue here is that each
+	load of the module will create a new set of lock classes for
+	that module's locks, but module unloading does not remove old
+	classes (see below discussion of reuse of lock classes for why).
+	Therefore, if that module is loaded and unloaded repeatedly,
+	the number of lock classes will eventually reach the maximum.
+
+2.	Using structures such as arrays that have large numbers of
+	locks that are not explicitly initialized.  For example,
+	a hash table with 8192 buckets where each bucket has its own
+	spinlock_t will consume 8192 lock classes -unless- each spinlock
+	is explicitly initialized at runtime, for example, using the
+	run-time spin_lock_init() as opposed to compile-time initializers
+	such as __SPIN_LOCK_UNLOCKED().  Failure to properly initialize
+	the per-bucket spinlocks would guarantee lock-class overflow.
+	In contrast, a loop that called spin_lock_init() on each lock
+	would place all 8192 locks into a single lock class.
+
+	The moral of this story is that you should always explicitly
+	initialize your locks.
+
+One might argue that the validator should be modified to allow
+lock classes to be reused.  However, if you are tempted to make this
+argument, first review the code and think through the changes that would
+be required, keeping in mind that the lock classes to be removed are
+likely to be linked into the lock-dependency graph.  This turns out to
+be harder to do than to say.
+
+Of course, if you do run out of lock classes, the next thing to do is
+to find the offending lock classes.  First, the following command gives
+you the number of lock classes currently in use along with the maximum::
+
+	grep "lock-classes" /proc/lockdep_stats
+
+This command produces the following output on a modest system::
+
+	lock-classes:                          748 [max: 8191]
+
+If the number allocated (748 above) increases continually over time,
+then there is likely a leak.  The following command can be used to
+identify the leaking lock classes::
+
+	grep "BD" /proc/lockdep
+
+Run the command and save the output, then compare against the output from
+a later run of this command to identify the leakers.  This same output
+can also help you find situations where runtime lock initialization has
+been omitted.
diff --git a/Documentation/locking/lockdep-design.txt b/Documentation/locking/lockdep-design.txt
deleted file mode 100644
index f189d130e543..000000000000
--- a/Documentation/locking/lockdep-design.txt
+++ /dev/null
@@ -1,389 +0,0 @@
-Runtime locking correctness validator
-=====================================
-
-started by Ingo Molnar <mingo@redhat.com>
-additions by Arjan van de Ven <arjan@linux.intel.com>
-
-Lock-class
-----------
-
-The basic object the validator operates upon is a 'class' of locks.
-
-A class of locks is a group of locks that are logically the same with
-respect to locking rules, even if the locks may have multiple (possibly
-tens of thousands of) instantiations. For example a lock in the inode
-struct is one class, while each inode has its own instantiation of that
-lock class.
-
-The validator tracks the 'usage state' of lock-classes, and it tracks
-the dependencies between different lock-classes. Lock usage indicates
-how a lock is used with regard to its IRQ contexts, while lock
-dependency can be understood as lock order, where L1 -> L2 suggests that
-a task is attempting to acquire L2 while holding L1. From lockdep's
-perspective, the two locks (L1 and L2) are not necessarily related; that
-dependency just means the order ever happened. The validator maintains a
-continuing effort to prove lock usages and dependencies are correct or
-the validator will shoot a splat if incorrect.
-
-A lock-class's behavior is constructed by its instances collectively:
-when the first instance of a lock-class is used after bootup the class
-gets registered, then all (subsequent) instances will be mapped to the
-class and hence their usages and dependecies will contribute to those of
-the class. A lock-class does not go away when a lock instance does, but
-it can be removed if the memory space of the lock class (static or
-dynamic) is reclaimed, this happens for example when a module is
-unloaded or a workqueue is destroyed.
-
-State
------
-
-The validator tracks lock-class usage history and divides the usage into
-(4 usages * n STATEs + 1) categories:
-
-where the 4 usages can be:
-- 'ever held in STATE context'
-- 'ever held as readlock in STATE context'
-- 'ever held with STATE enabled'
-- 'ever held as readlock with STATE enabled'
-
-where the n STATEs are coded in kernel/locking/lockdep_states.h and as of
-now they include:
-- hardirq
-- softirq
-
-where the last 1 category is:
-- 'ever used'                                       [ == !unused        ]
-
-When locking rules are violated, these usage bits are presented in the
-locking error messages, inside curlies, with a total of 2 * n STATEs bits.
-A contrived example:
-
-   modprobe/2287 is trying to acquire lock:
-    (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
-
-   but task is already holding lock:
-    (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
-
-
-For a given lock, the bit positions from left to right indicate the usage
-of the lock and readlock (if exists), for each of the n STATEs listed
-above respectively, and the character displayed at each bit position
-indicates:
-
-   '.'  acquired while irqs disabled and not in irq context
-   '-'  acquired in irq context
-   '+'  acquired with irqs enabled
-   '?'  acquired in irq context with irqs enabled.
-
-The bits are illustrated with an example:
-
-    (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
-                         ||||
-                         ||| \-> softirq disabled and not in softirq context
-                         || \--> acquired in softirq context
-                         | \---> hardirq disabled and not in hardirq context
-                          \----> acquired in hardirq context
-
-
-For a given STATE, whether the lock is ever acquired in that STATE
-context and whether that STATE is enabled yields four possible cases as
-shown in the table below. The bit character is able to indicate which
-exact case is for the lock as of the reporting time.
-
-   -------------------------------------------
-  |              | irq enabled | irq disabled |
-  |-------------------------------------------|
-  | ever in irq  |      ?      |       -      |
-  |-------------------------------------------|
-  | never in irq |      +      |       .      |
-   -------------------------------------------
-
-The character '-' suggests irq is disabled because if otherwise the
-charactor '?' would have been shown instead. Similar deduction can be
-applied for '+' too.
-
-Unused locks (e.g., mutexes) cannot be part of the cause of an error.
-
-
-Single-lock state rules:
-------------------------
-
-A lock is irq-safe means it was ever used in an irq context, while a lock
-is irq-unsafe means it was ever acquired with irq enabled.
-
-A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
-following states must be exclusive: only one of them is allowed to be set
-for any lock-class based on its usage:
-
- <hardirq-safe> or <hardirq-unsafe>
- <softirq-safe> or <softirq-unsafe>
-
-This is because if a lock can be used in irq context (irq-safe) then it
-cannot be ever acquired with irq enabled (irq-unsafe). Otherwise, a
-deadlock may happen. For example, in the scenario that after this lock
-was acquired but before released, if the context is interrupted this
-lock will be attempted to acquire twice, which creates a deadlock,
-referred to as lock recursion deadlock.
-
-The validator detects and reports lock usage that violates these
-single-lock state rules.
-
-Multi-lock dependency rules:
-----------------------------
-
-The same lock-class must not be acquired twice, because this could lead
-to lock recursion deadlocks.
-
-Furthermore, two locks can not be taken in inverse order:
-
- <L1> -> <L2>
- <L2> -> <L1>
-
-because this could lead to a deadlock - referred to as lock inversion
-deadlock - as attempts to acquire the two locks form a circle which
-could lead to the two contexts waiting for each other permanently. The
-validator will find such dependency circle in arbitrary complexity,
-i.e., there can be any other locking sequence between the acquire-lock
-operations; the validator will still find whether these locks can be
-acquired in a circular fashion.
-
-Furthermore, the following usage based lock dependencies are not allowed
-between any two lock-classes:
-
-   <hardirq-safe>   ->  <hardirq-unsafe>
-   <softirq-safe>   ->  <softirq-unsafe>
-
-The first rule comes from the fact that a hardirq-safe lock could be
-taken by a hardirq context, interrupting a hardirq-unsafe lock - and
-thus could result in a lock inversion deadlock. Likewise, a softirq-safe
-lock could be taken by an softirq context, interrupting a softirq-unsafe
-lock.
-
-The above rules are enforced for any locking sequence that occurs in the
-kernel: when acquiring a new lock, the validator checks whether there is
-any rule violation between the new lock and any of the held locks.
-
-When a lock-class changes its state, the following aspects of the above
-dependency rules are enforced:
-
-- if a new hardirq-safe lock is discovered, we check whether it
-  took any hardirq-unsafe lock in the past.
-
-- if a new softirq-safe lock is discovered, we check whether it took
-  any softirq-unsafe lock in the past.
-
-- if a new hardirq-unsafe lock is discovered, we check whether any
-  hardirq-safe lock took it in the past.
-
-- if a new softirq-unsafe lock is discovered, we check whether any
-  softirq-safe lock took it in the past.
-
-(Again, we do these checks too on the basis that an interrupt context
-could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
-could lead to a lock inversion deadlock - even if that lock scenario did
-not trigger in practice yet.)
-
-Exception: Nested data dependencies leading to nested locking
--------------------------------------------------------------
-
-There are a few cases where the Linux kernel acquires more than one
-instance of the same lock-class. Such cases typically happen when there
-is some sort of hierarchy within objects of the same type. In these
-cases there is an inherent "natural" ordering between the two objects
-(defined by the properties of the hierarchy), and the kernel grabs the
-locks in this fixed order on each of the objects.
-
-An example of such an object hierarchy that results in "nested locking"
-is that of a "whole disk" block-dev object and a "partition" block-dev
-object; the partition is "part of" the whole device and as long as one
-always takes the whole disk lock as a higher lock than the partition
-lock, the lock ordering is fully correct. The validator does not
-automatically detect this natural ordering, as the locking rule behind
-the ordering is not static.
-
-In order to teach the validator about this correct usage model, new
-versions of the various locking primitives were added that allow you to
-specify a "nesting level". An example call, for the block device mutex,
-looks like this:
-
-enum bdev_bd_mutex_lock_class
-{
-       BD_MUTEX_NORMAL,
-       BD_MUTEX_WHOLE,
-       BD_MUTEX_PARTITION
-};
-
- mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
-
-In this case the locking is done on a bdev object that is known to be a
-partition.
-
-The validator treats a lock that is taken in such a nested fashion as a
-separate (sub)class for the purposes of validation.
-
-Note: When changing code to use the _nested() primitives, be careful and
-check really thoroughly that the hierarchy is correctly mapped; otherwise
-you can get false positives or false negatives.
-
-Annotations
------------
-
-Two constructs can be used to annotate and check where and if certain locks
-must be held: lockdep_assert_held*(&lock) and lockdep_*pin_lock(&lock).
-
-As the name suggests, lockdep_assert_held* family of macros assert that a
-particular lock is held at a certain time (and generate a WARN() otherwise).
-This annotation is largely used all over the kernel, e.g. kernel/sched/
-core.c
-
-  void update_rq_clock(struct rq *rq)
-  {
-	s64 delta;
-
-	lockdep_assert_held(&rq->lock);
-	[...]
-  }
-
-where holding rq->lock is required to safely update a rq's clock.
-
-The other family of macros is lockdep_*pin_lock(), which is admittedly only
-used for rq->lock ATM. Despite their limited adoption these annotations
-generate a WARN() if the lock of interest is "accidentally" unlocked. This turns
-out to be especially helpful to debug code with callbacks, where an upper
-layer assumes a lock remains taken, but a lower layer thinks it can maybe drop
-and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock()
-returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check
-that nobody tampered with the lock, e.g. kernel/sched/sched.h
-
-  static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
-  {
-	rf->cookie = lockdep_pin_lock(&rq->lock);
-	[...]
-  }
-
-  static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
-  {
-	[...]
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
-  }
-
-While comments about locking requirements might provide useful information,
-the runtime checks performed by annotations are invaluable when debugging
-locking problems and they carry the same level of details when inspecting
-code.  Always prefer annotations when in doubt!
-
-Proof of 100% correctness:
---------------------------
-
-The validator achieves perfect, mathematical 'closure' (proof of locking
-correctness) in the sense that for every simple, standalone single-task
-locking sequence that occurred at least once during the lifetime of the
-kernel, the validator proves it with a 100% certainty that no
-combination and timing of these locking sequences can cause any class of
-lock related deadlock. [*]
-
-I.e. complex multi-CPU and multi-task locking scenarios do not have to
-occur in practice to prove a deadlock: only the simple 'component'
-locking chains have to occur at least once (anytime, in any
-task/context) for the validator to be able to prove correctness. (For
-example, complex deadlocks that would normally need more than 3 CPUs and
-a very unlikely constellation of tasks, irq-contexts and timings to
-occur, can be detected on a plain, lightly loaded single-CPU system as
-well!)
-
-This radically decreases the complexity of locking related QA of the
-kernel: what has to be done during QA is to trigger as many "simple"
-single-task locking dependencies in the kernel as possible, at least
-once, to prove locking correctness - instead of having to trigger every
-possible combination of locking interaction between CPUs, combined with
-every possible hardirq and softirq nesting scenario (which is impossible
-to do in practice).
-
-[*] assuming that the validator itself is 100% correct, and no other
-    part of the system corrupts the state of the validator in any way.
-    We also assume that all NMI/SMM paths [which could interrupt
-    even hardirq-disabled codepaths] are correct and do not interfere
-    with the validator. We also assume that the 64-bit 'chain hash'
-    value is unique for every lock-chain in the system. Also, lock
-    recursion must not be higher than 20.
-
-Performance:
-------------
-
-The above rules require _massive_ amounts of runtime checking. If we did
-that for every lock taken and for every irqs-enable event, it would
-render the system practically unusably slow. The complexity of checking
-is O(N^2), so even with just a few hundred lock-classes we'd have to do
-tens of thousands of checks for every event.
-
-This problem is solved by checking any given 'locking scenario' (unique
-sequence of locks taken after each other) only once. A simple stack of
-held locks is maintained, and a lightweight 64-bit hash value is
-calculated, which hash is unique for every lock chain. The hash value,
-when the chain is validated for the first time, is then put into a hash
-table, which hash-table can be checked in a lockfree manner. If the
-locking chain occurs again later on, the hash table tells us that we
-don't have to validate the chain again.
-
-Troubleshooting:
-----------------
-
-The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
-Exceeding this number will trigger the following lockdep warning:
-
-	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
-
-By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
-desktop systems have less than 1,000 lock classes, so this warning
-normally results from lock-class leakage or failure to properly
-initialize locks.  These two problems are illustrated below:
-
-1.	Repeated module loading and unloading while running the validator
-	will result in lock-class leakage.  The issue here is that each
-	load of the module will create a new set of lock classes for
-	that module's locks, but module unloading does not remove old
-	classes (see below discussion of reuse of lock classes for why).
-	Therefore, if that module is loaded and unloaded repeatedly,
-	the number of lock classes will eventually reach the maximum.
-
-2.	Using structures such as arrays that have large numbers of
-	locks that are not explicitly initialized.  For example,
-	a hash table with 8192 buckets where each bucket has its own
-	spinlock_t will consume 8192 lock classes -unless- each spinlock
-	is explicitly initialized at runtime, for example, using the
-	run-time spin_lock_init() as opposed to compile-time initializers
-	such as __SPIN_LOCK_UNLOCKED().  Failure to properly initialize
-	the per-bucket spinlocks would guarantee lock-class overflow.
-	In contrast, a loop that called spin_lock_init() on each lock
-	would place all 8192 locks into a single lock class.
-
-	The moral of this story is that you should always explicitly
-	initialize your locks.
-
-One might argue that the validator should be modified to allow
-lock classes to be reused.  However, if you are tempted to make this
-argument, first review the code and think through the changes that would
-be required, keeping in mind that the lock classes to be removed are
-likely to be linked into the lock-dependency graph.  This turns out to
-be harder to do than to say.
-
-Of course, if you do run out of lock classes, the next thing to do is
-to find the offending lock classes.  First, the following command gives
-you the number of lock classes currently in use along with the maximum:
-
-	grep "lock-classes" /proc/lockdep_stats
-
-This command produces the following output on a modest system:
-
-	 lock-classes:                          748 [max: 8191]
-
-If the number allocated (748 above) increases continually over time,
-then there is likely a leak.  The following command can be used to
-identify the leaking lock classes:
-
-	grep "BD" /proc/lockdep
-
-Run the command and save the output, then compare against the output from
-a later run of this command to identify the leakers.  This same output
-can also help you find situations where runtime lock initialization has
-been omitted.
diff --git a/Documentation/locking/lockstat.rst b/Documentation/locking/lockstat.rst
new file mode 100644
index 000000000000..536eab8dbd99
--- /dev/null
+++ b/Documentation/locking/lockstat.rst
@@ -0,0 +1,204 @@
+===============
+Lock Statistics
+===============
+
+What
+====
+
+As the name suggests, it provides statistics on locks.
+
+
+Why
+===
+
+Because things like lock contention can severely impact performance.
+
+How
+===
+
+Lockdep already has hooks in the lock functions and maps lock instances to
+lock classes. We build on that (see Documentation/locking/lockdep-design.rst).
+The graph below shows the relation between the lock functions and the various
+hooks therein::
+
+        __acquire
+            |
+           lock _____
+            |        \
+            |    __contended
+            |         |
+            |       <wait>
+            | _______/
+            |/
+            |
+       __acquired
+            |
+            .
+          <hold>
+            .
+            |
+       __release
+            |
+         unlock
+
+  lock, unlock	- the regular lock functions
+  __*		- the hooks
+  <> 		- states
+
+With these hooks we provide the following statistics:
+
+ con-bounces
+	- number of lock contention that involved x-cpu data
+ contentions
+	- number of lock acquisitions that had to wait
+ wait time
+     min
+	- shortest (non-0) time we ever had to wait for a lock
+     max
+	- longest time we ever had to wait for a lock
+     total
+	- total time we spend waiting on this lock
+     avg
+	- average time spent waiting on this lock
+ acq-bounces
+	- number of lock acquisitions that involved x-cpu data
+ acquisitions
+	- number of times we took the lock
+ hold time
+     min
+	- shortest (non-0) time we ever held the lock
+     max
+	- longest time we ever held the lock
+     total
+	- total time this lock was held
+     avg
+	- average time this lock was held
+
+These numbers are gathered per lock class, per read/write state (when
+applicable).
+
+It also tracks 4 contention points per class. A contention point is a call site
+that had to wait on lock acquisition.
+
+Configuration
+-------------
+
+Lock statistics are enabled via CONFIG_LOCK_STAT.
+
+Usage
+-----
+
+Enable collection of statistics::
+
+	# echo 1 >/proc/sys/kernel/lock_stat
+
+Disable collection of statistics::
+
+	# echo 0 >/proc/sys/kernel/lock_stat
+
+Look at the current lock statistics::
+
+  ( line numbers not part of actual output, done for clarity in the explanation
+    below )
+
+  # less /proc/lock_stat
+
+  01 lock_stat version 0.4
+  02-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  03                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
+  04-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+  05
+  06                         &mm->mmap_sem-W:            46             84           0.26         939.10       16371.53         194.90          47291        2922365           0.16     2220301.69 17464026916.32        5975.99
+  07                         &mm->mmap_sem-R:            37            100           1.31      299502.61      325629.52        3256.30         212344       34316685           0.10        7744.91    95016910.20           2.77
+  08                         ---------------
+  09                           &mm->mmap_sem              1          [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280
+  10                           &mm->mmap_sem             96          [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
+  11                           &mm->mmap_sem             34          [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
+  12                           &mm->mmap_sem             17          [<ffffffff81127e71>] vm_munmap+0x41/0x80
+  13                         ---------------
+  14                           &mm->mmap_sem              1          [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0
+  15                           &mm->mmap_sem             60          [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250
+  16                           &mm->mmap_sem             41          [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
+  17                           &mm->mmap_sem             68          [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
+  18
+  19.............................................................................................................................................................................................................................
+  20
+  21                         unix_table_lock:           110            112           0.21          49.24         163.91           1.46          21094          66312           0.12         624.42       31589.81           0.48
+  22                         ---------------
+  23                         unix_table_lock             45          [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
+  24                         unix_table_lock             47          [<ffffffff8150b111>] unix_release_sock+0x31/0x250
+  25                         unix_table_lock             15          [<ffffffff8150ca37>] unix_find_other+0x117/0x230
+  26                         unix_table_lock              5          [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
+  27                         ---------------
+  28                         unix_table_lock             39          [<ffffffff8150b111>] unix_release_sock+0x31/0x250
+  29                         unix_table_lock             49          [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
+  30                         unix_table_lock             20          [<ffffffff8150ca37>] unix_find_other+0x117/0x230
+  31                         unix_table_lock              4          [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
+
+
+This excerpt shows the first two lock class statistics. Line 01 shows the
+output version - each time the format changes this will be updated. Line 02-04
+show the header with column descriptions. Lines 05-18 and 20-31 show the actual
+statistics. These statistics come in two parts; the actual stats separated by a
+short separator (line 08, 13) from the contention points.
+
+Lines 09-12 show the first 4 recorded contention points (the code
+which tries to get the lock) and lines 14-17 show the first 4 recorded
+contended points (the lock holder). It is possible that the max
+con-bounces point is missing in the statistics.
+
+The first lock (05-18) is a read/write lock, and shows two lines above the
+short separator. The contention points don't match the column descriptors,
+they have two: contentions and [<IP>] symbol. The second set of contention
+points are the points we're contending with.
+
+The integer part of the time values is in us.
+
+Dealing with nested locks, subclasses may appear::
+
+  32...........................................................................................................................................................................................................................
+  33
+  34                               &rq->lock:       13128          13128           0.43         190.53      103881.26           7.91          97454        3453404           0.00         401.11    13224683.11           3.82
+  35                               ---------
+  36                               &rq->lock          645          [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
+  37                               &rq->lock          297          [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
+  38                               &rq->lock          360          [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
+  39                               &rq->lock          428          [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
+  40                               ---------
+  41                               &rq->lock           77          [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
+  42                               &rq->lock          174          [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
+  43                               &rq->lock         4715          [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
+  44                               &rq->lock          893          [<ffffffff81340524>] schedule+0x157/0x7b8
+  45
+  46...........................................................................................................................................................................................................................
+  47
+  48                             &rq->lock/1:        1526          11488           0.33         388.73      136294.31          11.86          21461          38404           0.00          37.93      109388.53           2.84
+  49                             -----------
+  50                             &rq->lock/1        11526          [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
+  51                             -----------
+  52                             &rq->lock/1         5645          [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
+  53                             &rq->lock/1         1224          [<ffffffff81340524>] schedule+0x157/0x7b8
+  54                             &rq->lock/1         4336          [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
+  55                             &rq->lock/1          181          [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
+
+Line 48 shows statistics for the second subclass (/1) of &rq->lock class
+(subclass starts from 0), since in this case, as line 50 suggests,
+double_rq_lock actually acquires a nested lock of two spinlocks.
+
+View the top contending locks::
+
+  # grep : /proc/lock_stat | head
+			clockevents_lock:       2926159        2947636           0.15       46882.81  1784540466.34         605.41        3381345        3879161           0.00        2260.97    53178395.68          13.71
+		     tick_broadcast_lock:        346460         346717           0.18        2257.43    39364622.71         113.54        3642919        4242696           0.00        2263.79    49173646.60          11.59
+		  &mapping->i_mmap_mutex:        203896         203899           3.36      645530.05 31767507988.39      155800.21        3361776        8893984           0.17        2254.15    14110121.02           1.59
+			       &rq->lock:        135014         136909           0.18         606.09      842160.68           6.15        1540728       10436146           0.00         728.72    17606683.41           1.69
+	       &(&zone->lru_lock)->rlock:         93000          94934           0.16          59.18      188253.78           1.98        1199912        3809894           0.15         391.40     3559518.81           0.93
+			 tasklist_lock-W:         40667          41130           0.23        1189.42      428980.51          10.43         270278         510106           0.16         653.51     3939674.91           7.72
+			 tasklist_lock-R:         21298          21305           0.20        1310.05      215511.12          10.12         186204         241258           0.14        1162.33     1179779.23           4.89
+			      rcu_node_1:         47656          49022           0.16         635.41      193616.41           3.95         844888        1865423           0.00         764.26     1656226.96           0.89
+       &(&dentry->d_lockref.lock)->rlock:         39791          40179           0.15        1302.08       88851.96           2.21        2790851       12527025           0.10        1910.75     3379714.27           0.27
+			      rcu_node_0:         29203          30064           0.16         786.55     1555573.00          51.74          88963         244254           0.00         398.87      428872.51           1.76
+
+Clear the statistics::
+
+  # echo 0 > /proc/lock_stat
diff --git a/Documentation/locking/lockstat.txt b/Documentation/locking/lockstat.txt
deleted file mode 100644
index fdbeb0c45ef3..000000000000
--- a/Documentation/locking/lockstat.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-
-LOCK STATISTICS
-
-- WHAT
-
-As the name suggests, it provides statistics on locks.
-
-- WHY
-
-Because things like lock contention can severely impact performance.
-
-- HOW
-
-Lockdep already has hooks in the lock functions and maps lock instances to
-lock classes. We build on that (see Documentation/locking/lockdep-design.txt).
-The graph below shows the relation between the lock functions and the various
-hooks therein.
-
-        __acquire
-            |
-           lock _____
-            |        \
-            |    __contended
-            |         |
-            |       <wait>
-            | _______/
-            |/
-            |
-       __acquired
-            |
-            .
-          <hold>
-            .
-            |
-       __release
-            |
-         unlock
-
-lock, unlock	- the regular lock functions
-__*		- the hooks
-<> 		- states
-
-With these hooks we provide the following statistics:
-
- con-bounces       - number of lock contention that involved x-cpu data
- contentions       - number of lock acquisitions that had to wait
- wait time min     - shortest (non-0) time we ever had to wait for a lock
-           max     - longest time we ever had to wait for a lock
-	   total   - total time we spend waiting on this lock
-	   avg     - average time spent waiting on this lock
- acq-bounces       - number of lock acquisitions that involved x-cpu data
- acquisitions      - number of times we took the lock
- hold time min     - shortest (non-0) time we ever held the lock
-	   max     - longest time we ever held the lock
-	   total   - total time this lock was held
-	   avg     - average time this lock was held
-
-These numbers are gathered per lock class, per read/write state (when
-applicable).
-
-It also tracks 4 contention points per class. A contention point is a call site
-that had to wait on lock acquisition.
-
- - CONFIGURATION
-
-Lock statistics are enabled via CONFIG_LOCK_STAT.
-
- - USAGE
-
-Enable collection of statistics:
-
-# echo 1 >/proc/sys/kernel/lock_stat
-
-Disable collection of statistics:
-
-# echo 0 >/proc/sys/kernel/lock_stat
-
-Look at the current lock statistics:
-
-( line numbers not part of actual output, done for clarity in the explanation
-  below )
-
-# less /proc/lock_stat
-
-01 lock_stat version 0.4
-02-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-03                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
-04-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-05
-06                         &mm->mmap_sem-W:            46             84           0.26         939.10       16371.53         194.90          47291        2922365           0.16     2220301.69 17464026916.32        5975.99
-07                         &mm->mmap_sem-R:            37            100           1.31      299502.61      325629.52        3256.30         212344       34316685           0.10        7744.91    95016910.20           2.77
-08                         ---------------
-09                           &mm->mmap_sem              1          [<ffffffff811502a7>] khugepaged_scan_mm_slot+0x57/0x280
-10                           &mm->mmap_sem             96          [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
-11                           &mm->mmap_sem             34          [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
-12                           &mm->mmap_sem             17          [<ffffffff81127e71>] vm_munmap+0x41/0x80
-13                         ---------------
-14                           &mm->mmap_sem              1          [<ffffffff81046fda>] dup_mmap+0x2a/0x3f0
-15                           &mm->mmap_sem             60          [<ffffffff81129e29>] SyS_mprotect+0xe9/0x250
-16                           &mm->mmap_sem             41          [<ffffffff815351c4>] __do_page_fault+0x1d4/0x510
-17                           &mm->mmap_sem             68          [<ffffffff81113d77>] vm_mmap_pgoff+0x87/0xd0
-18
-19.............................................................................................................................................................................................................................
-20
-21                         unix_table_lock:           110            112           0.21          49.24         163.91           1.46          21094          66312           0.12         624.42       31589.81           0.48
-22                         ---------------
-23                         unix_table_lock             45          [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
-24                         unix_table_lock             47          [<ffffffff8150b111>] unix_release_sock+0x31/0x250
-25                         unix_table_lock             15          [<ffffffff8150ca37>] unix_find_other+0x117/0x230
-26                         unix_table_lock              5          [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
-27                         ---------------
-28                         unix_table_lock             39          [<ffffffff8150b111>] unix_release_sock+0x31/0x250
-29                         unix_table_lock             49          [<ffffffff8150ad8e>] unix_create1+0x16e/0x1b0
-30                         unix_table_lock             20          [<ffffffff8150ca37>] unix_find_other+0x117/0x230
-31                         unix_table_lock              4          [<ffffffff8150a09f>] unix_autobind+0x11f/0x1b0
-
-
-This excerpt shows the first two lock class statistics. Line 01 shows the
-output version - each time the format changes this will be updated. Line 02-04
-show the header with column descriptions. Lines 05-18 and 20-31 show the actual
-statistics. These statistics come in two parts; the actual stats separated by a
-short separator (line 08, 13) from the contention points.
-
-Lines 09-12 show the first 4 recorded contention points (the code
-which tries to get the lock) and lines 14-17 show the first 4 recorded
-contended points (the lock holder). It is possible that the max
-con-bounces point is missing in the statistics.
-
-The first lock (05-18) is a read/write lock, and shows two lines above the
-short separator. The contention points don't match the column descriptors,
-they have two: contentions and [<IP>] symbol. The second set of contention
-points are the points we're contending with.
-
-The integer part of the time values is in us.
-
-Dealing with nested locks, subclasses may appear:
-
-32...........................................................................................................................................................................................................................
-33
-34                               &rq->lock:       13128          13128           0.43         190.53      103881.26           7.91          97454        3453404           0.00         401.11    13224683.11           3.82
-35                               ---------
-36                               &rq->lock          645          [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
-37                               &rq->lock          297          [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
-38                               &rq->lock          360          [<ffffffff8103c4c5>] select_task_rq_fair+0x1f0/0x74a
-39                               &rq->lock          428          [<ffffffff81045f98>] scheduler_tick+0x46/0x1fb
-40                               ---------
-41                               &rq->lock           77          [<ffffffff8103bfc4>] task_rq_lock+0x43/0x75
-42                               &rq->lock          174          [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
-43                               &rq->lock         4715          [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
-44                               &rq->lock          893          [<ffffffff81340524>] schedule+0x157/0x7b8
-45
-46...........................................................................................................................................................................................................................
-47
-48                             &rq->lock/1:        1526          11488           0.33         388.73      136294.31          11.86          21461          38404           0.00          37.93      109388.53           2.84
-49                             -----------
-50                             &rq->lock/1        11526          [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
-51                             -----------
-52                             &rq->lock/1         5645          [<ffffffff8103ed4b>] double_rq_lock+0x42/0x54
-53                             &rq->lock/1         1224          [<ffffffff81340524>] schedule+0x157/0x7b8
-54                             &rq->lock/1         4336          [<ffffffff8103ed58>] double_rq_lock+0x4f/0x54
-55                             &rq->lock/1          181          [<ffffffff8104ba65>] try_to_wake_up+0x127/0x25a
-
-Line 48 shows statistics for the second subclass (/1) of &rq->lock class
-(subclass starts from 0), since in this case, as line 50 suggests,
-double_rq_lock actually acquires a nested lock of two spinlocks.
-
-View the top contending locks:
-
-# grep : /proc/lock_stat | head
-			clockevents_lock:       2926159        2947636           0.15       46882.81  1784540466.34         605.41        3381345        3879161           0.00        2260.97    53178395.68          13.71
-		     tick_broadcast_lock:        346460         346717           0.18        2257.43    39364622.71         113.54        3642919        4242696           0.00        2263.79    49173646.60          11.59
-		  &mapping->i_mmap_mutex:        203896         203899           3.36      645530.05 31767507988.39      155800.21        3361776        8893984           0.17        2254.15    14110121.02           1.59
-			       &rq->lock:        135014         136909           0.18         606.09      842160.68           6.15        1540728       10436146           0.00         728.72    17606683.41           1.69
-	       &(&zone->lru_lock)->rlock:         93000          94934           0.16          59.18      188253.78           1.98        1199912        3809894           0.15         391.40     3559518.81           0.93
-			 tasklist_lock-W:         40667          41130           0.23        1189.42      428980.51          10.43         270278         510106           0.16         653.51     3939674.91           7.72
-			 tasklist_lock-R:         21298          21305           0.20        1310.05      215511.12          10.12         186204         241258           0.14        1162.33     1179779.23           4.89
-			      rcu_node_1:         47656          49022           0.16         635.41      193616.41           3.95         844888        1865423           0.00         764.26     1656226.96           0.89
-       &(&dentry->d_lockref.lock)->rlock:         39791          40179           0.15        1302.08       88851.96           2.21        2790851       12527025           0.10        1910.75     3379714.27           0.27
-			      rcu_node_0:         29203          30064           0.16         786.55     1555573.00          51.74          88963         244254           0.00         398.87      428872.51           1.76
-
-Clear the statistics:
-
-# echo 0 > /proc/lock_stat
diff --git a/Documentation/locking/locktorture.rst b/Documentation/locking/locktorture.rst
new file mode 100644
index 000000000000..e79eeeca3ac6
--- /dev/null
+++ b/Documentation/locking/locktorture.rst
@@ -0,0 +1,170 @@
+==================================
+Kernel Lock Torture Test Operation
+==================================
+
+CONFIG_LOCK_TORTURE_TEST
+========================
+
+The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
+that runs torture tests on core kernel locking primitives. The kernel
+module, 'locktorture', may be built after the fact on the running
+kernel to be tested, if desired. The tests periodically output status
+messages via printk(), which can be examined via the dmesg (perhaps
+grepping for "torture").  The test is started when the module is loaded,
+and stops when the module is unloaded. This program is based on how RCU
+is tortured, via rcutorture.
+
+This torture test consists of creating a number of kernel threads which
+acquire the lock and hold it for specific amount of time, thus simulating
+different critical region behaviors. The amount of contention on the lock
+can be simulated by either enlarging this critical region hold time and/or
+creating more kthreads.
+
+
+Module Parameters
+=================
+
+This module has the following parameters:
+
+
+Locktorture-specific
+--------------------
+
+nwriters_stress
+		  Number of kernel threads that will stress exclusive lock
+		  ownership (writers). The default value is twice the number
+		  of online CPUs.
+
+nreaders_stress
+		  Number of kernel threads that will stress shared lock
+		  ownership (readers). The default is the same amount of writer
+		  locks. If the user did not specify nwriters_stress, then
+		  both readers and writers be the amount of online CPUs.
+
+torture_type
+		  Type of lock to torture. By default, only spinlocks will
+		  be tortured. This module can torture the following locks,
+		  with string values as follows:
+
+		     - "lock_busted":
+				Simulates a buggy lock implementation.
+
+		     - "spin_lock":
+				spin_lock() and spin_unlock() pairs.
+
+		     - "spin_lock_irq":
+				spin_lock_irq() and spin_unlock_irq() pairs.
+
+		     - "rw_lock":
+				read/write lock() and unlock() rwlock pairs.
+
+		     - "rw_lock_irq":
+				read/write lock_irq() and unlock_irq()
+				rwlock pairs.
+
+		     - "mutex_lock":
+				mutex_lock() and mutex_unlock() pairs.
+
+		     - "rtmutex_lock":
+				rtmutex_lock() and rtmutex_unlock() pairs.
+				Kernel must have CONFIG_RT_MUTEX=y.
+
+		     - "rwsem_lock":
+				read/write down() and up() semaphore pairs.
+
+
+Torture-framework (RCU + locking)
+---------------------------------
+
+shutdown_secs
+		  The number of seconds to run the test before terminating
+		  the test and powering off the system.  The default is
+		  zero, which disables test termination and system shutdown.
+		  This capability is useful for automated testing.
+
+onoff_interval
+		  The number of seconds between each attempt to execute a
+		  randomly selected CPU-hotplug operation.  Defaults
+		  to zero, which disables CPU hotplugging.  In
+		  CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently
+		  refuse to do any CPU-hotplug operations regardless of
+		  what value is specified for onoff_interval.
+
+onoff_holdoff
+		  The number of seconds to wait until starting CPU-hotplug
+		  operations.  This would normally only be used when
+		  locktorture was built into the kernel and started
+		  automatically at boot time, in which case it is useful
+		  in order to avoid confusing boot-time code with CPUs
+		  coming and going. This parameter is only useful if
+		  CONFIG_HOTPLUG_CPU is enabled.
+
+stat_interval
+		  Number of seconds between statistics-related printk()s.
+		  By default, locktorture will report stats every 60 seconds.
+		  Setting the interval to zero causes the statistics to
+		  be printed -only- when the module is unloaded, and this
+		  is the default.
+
+stutter
+		  The length of time to run the test before pausing for this
+		  same period of time.  Defaults to "stutter=5", so as
+		  to run and pause for (roughly) five-second intervals.
+		  Specifying "stutter=0" causes the test to run continuously
+		  without pausing, which is the old default behavior.
+
+shuffle_interval
+		  The number of seconds to keep the test threads affinitied
+		  to a particular subset of the CPUs, defaults to 3 seconds.
+		  Used in conjunction with test_no_idle_hz.
+
+verbose
+		  Enable verbose debugging printing, via printk(). Enabled
+		  by default. This extra information is mostly related to
+		  high-level errors and reports from the main 'torture'
+		  framework.
+
+
+Statistics
+==========
+
+Statistics are printed in the following format::
+
+  spin_lock-torture: Writes:  Total: 93746064  Max/Min: 0/0   Fail: 0
+     (A)		    (B)		   (C)		  (D)	       (E)
+
+  (A): Lock type that is being tortured -- torture_type parameter.
+
+  (B): Number of writer lock acquisitions. If dealing with a read/write
+       primitive a second "Reads" statistics line is printed.
+
+  (C): Number of times the lock was acquired.
+
+  (D): Min and max number of times threads failed to acquire the lock.
+
+  (E): true/false values if there were errors acquiring the lock. This should
+       -only- be positive if there is a bug in the locking primitive's
+       implementation. Otherwise a lock should never fail (i.e., spin_lock()).
+       Of course, the same applies for (C), above. A dummy example of this is
+       the "lock_busted" type.
+
+Usage
+=====
+
+The following script may be used to torture locks::
+
+	#!/bin/sh
+
+	modprobe locktorture
+	sleep 3600
+	rmmod locktorture
+	dmesg | grep torture:
+
+The output can be manually inspected for the error flag of "!!!".
+One could of course create a more elaborate script that automatically
+checked for such errors.  The "rmmod" command forces a "SUCCESS",
+"FAILURE", or "RCU_HOTPLUG" indication to be printk()ed.  The first
+two are self-explanatory, while the last indicates that while there
+were no locking failures, CPU-hotplug problems were detected.
+
+Also see: Documentation/RCU/torture.txt
diff --git a/Documentation/locking/locktorture.txt b/Documentation/locking/locktorture.txt
deleted file mode 100644
index 6a8df4cd19bf..000000000000
--- a/Documentation/locking/locktorture.txt
+++ /dev/null
@@ -1,145 +0,0 @@
-Kernel Lock Torture Test Operation
-
-CONFIG_LOCK_TORTURE_TEST
-
-The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
-that runs torture tests on core kernel locking primitives. The kernel
-module, 'locktorture', may be built after the fact on the running
-kernel to be tested, if desired. The tests periodically output status
-messages via printk(), which can be examined via the dmesg (perhaps
-grepping for "torture").  The test is started when the module is loaded,
-and stops when the module is unloaded. This program is based on how RCU
-is tortured, via rcutorture.
-
-This torture test consists of creating a number of kernel threads which
-acquire the lock and hold it for specific amount of time, thus simulating
-different critical region behaviors. The amount of contention on the lock
-can be simulated by either enlarging this critical region hold time and/or
-creating more kthreads.
-
-
-MODULE PARAMETERS
-
-This module has the following parameters:
-
-
-	    ** Locktorture-specific **
-
-nwriters_stress   Number of kernel threads that will stress exclusive lock
-		  ownership (writers). The default value is twice the number
-		  of online CPUs.
-
-nreaders_stress   Number of kernel threads that will stress shared lock
-		  ownership (readers). The default is the same amount of writer
-		  locks. If the user did not specify nwriters_stress, then
-		  both readers and writers be the amount of online CPUs.
-
-torture_type	  Type of lock to torture. By default, only spinlocks will
-		  be tortured. This module can torture the following locks,
-		  with string values as follows:
-
-		     o "lock_busted": Simulates a buggy lock implementation.
-
-		     o "spin_lock": spin_lock() and spin_unlock() pairs.
-
-		     o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq()
-					pairs.
-
-		     o "rw_lock": read/write lock() and unlock() rwlock pairs.
-
-		     o "rw_lock_irq": read/write lock_irq() and unlock_irq()
-				      rwlock pairs.
-
-		     o "mutex_lock": mutex_lock() and mutex_unlock() pairs.
-
-		     o "rtmutex_lock": rtmutex_lock() and rtmutex_unlock()
-				       pairs. Kernel must have CONFIG_RT_MUTEX=y.
-
-		     o "rwsem_lock": read/write down() and up() semaphore pairs.
-
-
-	    ** Torture-framework (RCU + locking) **
-
-shutdown_secs	  The number of seconds to run the test before terminating
-		  the test and powering off the system.  The default is
-		  zero, which disables test termination and system shutdown.
-		  This capability is useful for automated testing.
-
-onoff_interval	  The number of seconds between each attempt to execute a
-		  randomly selected CPU-hotplug operation.  Defaults
-		  to zero, which disables CPU hotplugging.  In
-		  CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently
-		  refuse to do any CPU-hotplug operations regardless of
-		  what value is specified for onoff_interval.
-
-onoff_holdoff	  The number of seconds to wait until starting CPU-hotplug
-		  operations.  This would normally only be used when
-		  locktorture was built into the kernel and started
-		  automatically at boot time, in which case it is useful
-		  in order to avoid confusing boot-time code with CPUs
-		  coming and going. This parameter is only useful if
-		  CONFIG_HOTPLUG_CPU is enabled.
-
-stat_interval	  Number of seconds between statistics-related printk()s.
-		  By default, locktorture will report stats every 60 seconds.
-		  Setting the interval to zero causes the statistics to
-		  be printed -only- when the module is unloaded, and this
-		  is the default.
-
-stutter		  The length of time to run the test before pausing for this
-		  same period of time.  Defaults to "stutter=5", so as
-		  to run and pause for (roughly) five-second intervals.
-		  Specifying "stutter=0" causes the test to run continuously
-		  without pausing, which is the old default behavior.
-
-shuffle_interval  The number of seconds to keep the test threads affinitied
-		  to a particular subset of the CPUs, defaults to 3 seconds.
-		  Used in conjunction with test_no_idle_hz.
-
-verbose		  Enable verbose debugging printing, via printk(). Enabled
-		  by default. This extra information is mostly related to
-		  high-level errors and reports from the main 'torture'
-		  framework.
-
-
-STATISTICS
-
-Statistics are printed in the following format:
-
-spin_lock-torture: Writes:  Total: 93746064  Max/Min: 0/0   Fail: 0
-   (A)		    (B)		   (C)		  (D)	       (E)
-
-(A): Lock type that is being tortured -- torture_type parameter.
-
-(B): Number of writer lock acquisitions. If dealing with a read/write primitive
-     a second "Reads" statistics line is printed.
-
-(C): Number of times the lock was acquired.
-
-(D): Min and max number of times threads failed to acquire the lock.
-
-(E): true/false values if there were errors acquiring the lock. This should
-     -only- be positive if there is a bug in the locking primitive's
-     implementation. Otherwise a lock should never fail (i.e., spin_lock()).
-     Of course, the same applies for (C), above. A dummy example of this is
-     the "lock_busted" type.
-
-USAGE
-
-The following script may be used to torture locks:
-
-	#!/bin/sh
-
-	modprobe locktorture
-	sleep 3600
-	rmmod locktorture
-	dmesg | grep torture:
-
-The output can be manually inspected for the error flag of "!!!".
-One could of course create a more elaborate script that automatically
-checked for such errors.  The "rmmod" command forces a "SUCCESS",
-"FAILURE", or "RCU_HOTPLUG" indication to be printk()ed.  The first
-two are self-explanatory, while the last indicates that while there
-were no locking failures, CPU-hotplug problems were detected.
-
-Also see: Documentation/RCU/torture.txt
diff --git a/Documentation/locking/mutex-design.rst b/Documentation/locking/mutex-design.rst
new file mode 100644
index 000000000000..4d8236b81fa5
--- /dev/null
+++ b/Documentation/locking/mutex-design.rst
@@ -0,0 +1,152 @@
+=======================
+Generic Mutex Subsystem
+=======================
+
+started by Ingo Molnar <mingo@redhat.com>
+
+updated by Davidlohr Bueso <davidlohr@hp.com>
+
+What are mutexes?
+-----------------
+
+In the Linux kernel, mutexes refer to a particular locking primitive
+that enforces serialization on shared memory systems, and not only to
+the generic term referring to 'mutual exclusion' found in academia
+or similar theoretical text books. Mutexes are sleeping locks which
+behave similarly to binary semaphores, and were introduced in 2006[1]
+as an alternative to these. This new data structure provided a number
+of advantages, including simpler interfaces, and at that time smaller
+code (see Disadvantages).
+
+[1] http://lwn.net/Articles/164802/
+
+Implementation
+--------------
+
+Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
+and implemented in kernel/locking/mutex.c. These locks use an atomic variable
+(->owner) to keep track of the lock state during its lifetime.  Field owner
+actually contains `struct task_struct *` to the current lock owner and it is
+therefore NULL if not currently owned. Since task_struct pointers are aligned
+at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
+if waiter list is non-empty).  In its most basic form it also includes a
+wait-queue and a spinlock that serializes access to it. Furthermore,
+CONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described
+below in (ii).
+
+When acquiring a mutex, there are three possible paths that can be
+taken, depending on the state of the lock:
+
+(i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with
+    the current task. This only works in the uncontended case (cmpxchg() checks
+    against 0UL, so all 3 state bits above have to be 0). If the lock is
+    contended it goes to the next possible path.
+
+(ii) midpath: aka optimistic spinning, tries to spin for acquisition
+     while the lock owner is running and there are no other tasks ready
+     to run that have higher priority (need_resched). The rationale is
+     that if the lock owner is running, it is likely to release the lock
+     soon. The mutex spinners are queued up using MCS lock so that only
+     one spinner can compete for the mutex.
+
+     The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
+     with the desirable properties of being fair and with each cpu trying
+     to acquire the lock spinning on a local variable. It avoids expensive
+     cacheline bouncing that common test-and-set spinlock implementations
+     incur. An MCS-like lock is specially tailored for optimistic spinning
+     for sleeping lock implementation. An important feature of the customized
+     MCS lock is that it has the extra property that spinners are able to exit
+     the MCS spinlock queue when they need to reschedule. This further helps
+     avoid situations where MCS spinners that need to reschedule would continue
+     waiting to spin on mutex owner, only to go directly to slowpath upon
+     obtaining the MCS lock.
+
+
+(iii) slowpath: last resort, if the lock is still unable to be acquired,
+      the task is added to the wait-queue and sleeps until woken up by the
+      unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
+
+While formally kernel mutexes are sleepable locks, it is path (ii) that
+makes them more practically a hybrid type. By simply not interrupting a
+task and busy-waiting for a few cycles instead of immediately sleeping,
+the performance of this lock has been seen to significantly improve a
+number of workloads. Note that this technique is also used for rw-semaphores.
+
+Semantics
+---------
+
+The mutex subsystem checks and enforces the following rules:
+
+    - Only one task can hold the mutex at a time.
+    - Only the owner can unlock the mutex.
+    - Multiple unlocks are not permitted.
+    - Recursive locking/unlocking is not permitted.
+    - A mutex must only be initialized via the API (see below).
+    - A task may not exit with a mutex held.
+    - Memory areas where held locks reside must not be freed.
+    - Held mutexes must not be reinitialized.
+    - Mutexes may not be used in hardware or software interrupt
+      contexts such as tasklets and timers.
+
+These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
+In addition, the mutex debugging code also implements a number of other
+features that make lock debugging easier and faster:
+
+    - Uses symbolic names of mutexes, whenever they are printed
+      in debug output.
+    - Point-of-acquire tracking, symbolic lookup of function names,
+      list of all locks held in the system, printout of them.
+    - Owner tracking.
+    - Detects self-recursing locks and prints out all relevant info.
+    - Detects multi-task circular deadlocks and prints out all affected
+      locks and tasks (and only those tasks).
+
+
+Interfaces
+----------
+Statically define the mutex::
+
+   DEFINE_MUTEX(name);
+
+Dynamically initialize the mutex::
+
+   mutex_init(mutex);
+
+Acquire the mutex, uninterruptible::
+
+   void mutex_lock(struct mutex *lock);
+   void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
+   int  mutex_trylock(struct mutex *lock);
+
+Acquire the mutex, interruptible::
+
+   int mutex_lock_interruptible_nested(struct mutex *lock,
+				       unsigned int subclass);
+   int mutex_lock_interruptible(struct mutex *lock);
+
+Acquire the mutex, interruptible, if dec to 0::
+
+   int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
+
+Unlock the mutex::
+
+   void mutex_unlock(struct mutex *lock);
+
+Test if the mutex is taken::
+
+   int mutex_is_locked(struct mutex *lock);
+
+Disadvantages
+-------------
+
+Unlike its original design and purpose, 'struct mutex' is among the largest
+locks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore'
+is 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU
+cache and memory footprint.
+
+When to use mutexes
+-------------------
+
+Unless the strict semantics of mutexes are unsuitable and/or the critical
+region prevents the lock from being shared, always prefer them to any other
+locking primitive.
diff --git a/Documentation/locking/mutex-design.txt b/Documentation/locking/mutex-design.txt
deleted file mode 100644
index 818aca19612f..000000000000
--- a/Documentation/locking/mutex-design.txt
+++ /dev/null
@@ -1,142 +0,0 @@
-Generic Mutex Subsystem
-
-started by Ingo Molnar <mingo@redhat.com>
-updated by Davidlohr Bueso <davidlohr@hp.com>
-
-What are mutexes?
------------------
-
-In the Linux kernel, mutexes refer to a particular locking primitive
-that enforces serialization on shared memory systems, and not only to
-the generic term referring to 'mutual exclusion' found in academia
-or similar theoretical text books. Mutexes are sleeping locks which
-behave similarly to binary semaphores, and were introduced in 2006[1]
-as an alternative to these. This new data structure provided a number
-of advantages, including simpler interfaces, and at that time smaller
-code (see Disadvantages).
-
-[1] http://lwn.net/Articles/164802/
-
-Implementation
---------------
-
-Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
-and implemented in kernel/locking/mutex.c. These locks use an atomic variable
-(->owner) to keep track of the lock state during its lifetime.  Field owner
-actually contains 'struct task_struct *' to the current lock owner and it is
-therefore NULL if not currently owned. Since task_struct pointers are aligned
-at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
-if waiter list is non-empty).  In its most basic form it also includes a
-wait-queue and a spinlock that serializes access to it. Furthermore,
-CONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described
-below in (ii).
-
-When acquiring a mutex, there are three possible paths that can be
-taken, depending on the state of the lock:
-
-(i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with
-    the current task. This only works in the uncontended case (cmpxchg() checks
-    against 0UL, so all 3 state bits above have to be 0). If the lock is
-    contended it goes to the next possible path.
-
-(ii) midpath: aka optimistic spinning, tries to spin for acquisition
-     while the lock owner is running and there are no other tasks ready
-     to run that have higher priority (need_resched). The rationale is
-     that if the lock owner is running, it is likely to release the lock
-     soon. The mutex spinners are queued up using MCS lock so that only
-     one spinner can compete for the mutex.
-
-     The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
-     with the desirable properties of being fair and with each cpu trying
-     to acquire the lock spinning on a local variable. It avoids expensive
-     cacheline bouncing that common test-and-set spinlock implementations
-     incur. An MCS-like lock is specially tailored for optimistic spinning
-     for sleeping lock implementation. An important feature of the customized
-     MCS lock is that it has the extra property that spinners are able to exit
-     the MCS spinlock queue when they need to reschedule. This further helps
-     avoid situations where MCS spinners that need to reschedule would continue
-     waiting to spin on mutex owner, only to go directly to slowpath upon
-     obtaining the MCS lock.
-
-
-(iii) slowpath: last resort, if the lock is still unable to be acquired,
-      the task is added to the wait-queue and sleeps until woken up by the
-      unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
-
-While formally kernel mutexes are sleepable locks, it is path (ii) that
-makes them more practically a hybrid type. By simply not interrupting a
-task and busy-waiting for a few cycles instead of immediately sleeping,
-the performance of this lock has been seen to significantly improve a
-number of workloads. Note that this technique is also used for rw-semaphores.
-
-Semantics
----------
-
-The mutex subsystem checks and enforces the following rules:
-
-    - Only one task can hold the mutex at a time.
-    - Only the owner can unlock the mutex.
-    - Multiple unlocks are not permitted.
-    - Recursive locking/unlocking is not permitted.
-    - A mutex must only be initialized via the API (see below).
-    - A task may not exit with a mutex held.
-    - Memory areas where held locks reside must not be freed.
-    - Held mutexes must not be reinitialized.
-    - Mutexes may not be used in hardware or software interrupt
-      contexts such as tasklets and timers.
-
-These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
-In addition, the mutex debugging code also implements a number of other
-features that make lock debugging easier and faster:
-
-    - Uses symbolic names of mutexes, whenever they are printed
-      in debug output.
-    - Point-of-acquire tracking, symbolic lookup of function names,
-      list of all locks held in the system, printout of them.
-    - Owner tracking.
-    - Detects self-recursing locks and prints out all relevant info.
-    - Detects multi-task circular deadlocks and prints out all affected
-      locks and tasks (and only those tasks).
-
-
-Interfaces
-----------
-Statically define the mutex:
-   DEFINE_MUTEX(name);
-
-Dynamically initialize the mutex:
-   mutex_init(mutex);
-
-Acquire the mutex, uninterruptible:
-   void mutex_lock(struct mutex *lock);
-   void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
-   int  mutex_trylock(struct mutex *lock);
-
-Acquire the mutex, interruptible:
-   int mutex_lock_interruptible_nested(struct mutex *lock,
-				       unsigned int subclass);
-   int mutex_lock_interruptible(struct mutex *lock);
-
-Acquire the mutex, interruptible, if dec to 0:
-   int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
-
-Unlock the mutex:
-   void mutex_unlock(struct mutex *lock);
-
-Test if the mutex is taken:
-   int mutex_is_locked(struct mutex *lock);
-
-Disadvantages
--------------
-
-Unlike its original design and purpose, 'struct mutex' is among the largest
-locks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore'
-is 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU
-cache and memory footprint.
-
-When to use mutexes
--------------------
-
-Unless the strict semantics of mutexes are unsuitable and/or the critical
-region prevents the lock from being shared, always prefer them to any other
-locking primitive.
diff --git a/Documentation/locking/rt-mutex-design.rst b/Documentation/locking/rt-mutex-design.rst
new file mode 100644
index 000000000000..59c2a64efb21
--- /dev/null
+++ b/Documentation/locking/rt-mutex-design.rst
@@ -0,0 +1,574 @@
+==============================
+RT-mutex implementation design
+==============================
+
+Copyright (c) 2006 Steven Rostedt
+
+Licensed under the GNU Free Documentation License, Version 1.2
+
+
+This document tries to describe the design of the rtmutex.c implementation.
+It doesn't describe the reasons why rtmutex.c exists. For that please see
+Documentation/locking/rt-mutex.rst.  Although this document does explain problems
+that happen without this code, but that is in the concept to understand
+what the code actually is doing.
+
+The goal of this document is to help others understand the priority
+inheritance (PI) algorithm that is used, as well as reasons for the
+decisions that were made to implement PI in the manner that was done.
+
+
+Unbounded Priority Inversion
+----------------------------
+
+Priority inversion is when a lower priority process executes while a higher
+priority process wants to run.  This happens for several reasons, and
+most of the time it can't be helped.  Anytime a high priority process wants
+to use a resource that a lower priority process has (a mutex for example),
+the high priority process must wait until the lower priority process is done
+with the resource.  This is a priority inversion.  What we want to prevent
+is something called unbounded priority inversion.  That is when the high
+priority process is prevented from running by a lower priority process for
+an undetermined amount of time.
+
+The classic example of unbounded priority inversion is where you have three
+processes, let's call them processes A, B, and C, where A is the highest
+priority process, C is the lowest, and B is in between. A tries to grab a lock
+that C owns and must wait and lets C run to release the lock. But in the
+meantime, B executes, and since B is of a higher priority than C, it preempts C,
+but by doing so, it is in fact preempting A which is a higher priority process.
+Now there's no way of knowing how long A will be sleeping waiting for C
+to release the lock, because for all we know, B is a CPU hog and will
+never give C a chance to release the lock.  This is called unbounded priority
+inversion.
+
+Here's a little ASCII art to show the problem::
+
+     grab lock L1 (owned by C)
+       |
+  A ---+
+          C preempted by B
+            |
+  C    +----+
+
+  B         +-------->
+                  B now keeps A from running.
+
+
+Priority Inheritance (PI)
+-------------------------
+
+There are several ways to solve this issue, but other ways are out of scope
+for this document.  Here we only discuss PI.
+
+PI is where a process inherits the priority of another process if the other
+process blocks on a lock owned by the current process.  To make this easier
+to understand, let's use the previous example, with processes A, B, and C again.
+
+This time, when A blocks on the lock owned by C, C would inherit the priority
+of A.  So now if B becomes runnable, it would not preempt C, since C now has
+the high priority of A.  As soon as C releases the lock, it loses its
+inherited priority, and A then can continue with the resource that C had.
+
+Terminology
+-----------
+
+Here I explain some terminology that is used in this document to help describe
+the design that is used to implement PI.
+
+PI chain
+         - The PI chain is an ordered series of locks and processes that cause
+           processes to inherit priorities from a previous process that is
+           blocked on one of its locks.  This is described in more detail
+           later in this document.
+
+mutex
+         - In this document, to differentiate from locks that implement
+           PI and spin locks that are used in the PI code, from now on
+           the PI locks will be called a mutex.
+
+lock
+         - In this document from now on, I will use the term lock when
+           referring to spin locks that are used to protect parts of the PI
+           algorithm.  These locks disable preemption for UP (when
+           CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from
+           entering critical sections simultaneously.
+
+spin lock
+         - Same as lock above.
+
+waiter
+         - A waiter is a struct that is stored on the stack of a blocked
+           process.  Since the scope of the waiter is within the code for
+           a process being blocked on the mutex, it is fine to allocate
+           the waiter on the process's stack (local variable).  This
+           structure holds a pointer to the task, as well as the mutex that
+           the task is blocked on.  It also has rbtree node structures to
+           place the task in the waiters rbtree of a mutex as well as the
+           pi_waiters rbtree of a mutex owner task (described below).
+
+           waiter is sometimes used in reference to the task that is waiting
+           on a mutex. This is the same as waiter->task.
+
+waiters
+         - A list of processes that are blocked on a mutex.
+
+top waiter
+         - The highest priority process waiting on a specific mutex.
+
+top pi waiter
+              - The highest priority process waiting on one of the mutexes
+                that a specific process owns.
+
+Note:
+       task and process are used interchangeably in this document, mostly to
+       differentiate between two processes that are being described together.
+
+
+PI chain
+--------
+
+The PI chain is a list of processes and mutexes that may cause priority
+inheritance to take place.  Multiple chains may converge, but a chain
+would never diverge, since a process can't be blocked on more than one
+mutex at a time.
+
+Example::
+
+   Process:  A, B, C, D, E
+   Mutexes:  L1, L2, L3, L4
+
+   A owns: L1
+           B blocked on L1
+           B owns L2
+                  C blocked on L2
+                  C owns L3
+                         D blocked on L3
+                         D owns L4
+                                E blocked on L4
+
+The chain would be::
+
+   E->L4->D->L3->C->L2->B->L1->A
+
+To show where two chains merge, we could add another process F and
+another mutex L5 where B owns L5 and F is blocked on mutex L5.
+
+The chain for F would be::
+
+   F->L5->B->L1->A
+
+Since a process may own more than one mutex, but never be blocked on more than
+one, the chains merge.
+
+Here we show both chains::
+
+   E->L4->D->L3->C->L2-+
+                       |
+                       +->B->L1->A
+                       |
+                 F->L5-+
+
+For PI to work, the processes at the right end of these chains (or we may
+also call it the Top of the chain) must be equal to or higher in priority
+than the processes to the left or below in the chain.
+
+Also since a mutex may have more than one process blocked on it, we can
+have multiple chains merge at mutexes.  If we add another process G that is
+blocked on mutex L2::
+
+  G->L2->B->L1->A
+
+And once again, to show how this can grow I will show the merging chains
+again::
+
+   E->L4->D->L3->C-+
+                   +->L2-+
+                   |     |
+                 G-+     +->B->L1->A
+                         |
+                   F->L5-+
+
+If process G has the highest priority in the chain, then all the tasks up
+the chain (A and B in this example), must have their priorities increased
+to that of G.
+
+Mutex Waiters Tree
+------------------
+
+Every mutex keeps track of all the waiters that are blocked on itself. The
+mutex has a rbtree to store these waiters by priority.  This tree is protected
+by a spin lock that is located in the struct of the mutex. This lock is called
+wait_lock.
+
+
+Task PI Tree
+------------
+
+To keep track of the PI chains, each process has its own PI rbtree.  This is
+a tree of all top waiters of the mutexes that are owned by the process.
+Note that this tree only holds the top waiters and not all waiters that are
+blocked on mutexes owned by the process.
+
+The top of the task's PI tree is always the highest priority task that
+is waiting on a mutex that is owned by the task.  So if the task has
+inherited a priority, it will always be the priority of the task that is
+at the top of this tree.
+
+This tree is stored in the task structure of a process as a rbtree called
+pi_waiters.  It is protected by a spin lock also in the task structure,
+called pi_lock.  This lock may also be taken in interrupt context, so when
+locking the pi_lock, interrupts must be disabled.
+
+
+Depth of the PI Chain
+---------------------
+
+The maximum depth of the PI chain is not dynamic, and could actually be
+defined.  But is very complex to figure it out, since it depends on all
+the nesting of mutexes.  Let's look at the example where we have 3 mutexes,
+L1, L2, and L3, and four separate functions func1, func2, func3 and func4.
+The following shows a locking order of L1->L2->L3, but may not actually
+be directly nested that way::
+
+  void func1(void)
+  {
+	mutex_lock(L1);
+
+	/* do anything */
+
+	mutex_unlock(L1);
+  }
+
+  void func2(void)
+  {
+	mutex_lock(L1);
+	mutex_lock(L2);
+
+	/* do something */
+
+	mutex_unlock(L2);
+	mutex_unlock(L1);
+  }
+
+  void func3(void)
+  {
+	mutex_lock(L2);
+	mutex_lock(L3);
+
+	/* do something else */
+
+	mutex_unlock(L3);
+	mutex_unlock(L2);
+  }
+
+  void func4(void)
+  {
+	mutex_lock(L3);
+
+	/* do something again */
+
+	mutex_unlock(L3);
+  }
+
+Now we add 4 processes that run each of these functions separately.
+Processes A, B, C, and D which run functions func1, func2, func3 and func4
+respectively, and such that D runs first and A last.  With D being preempted
+in func4 in the "do something again" area, we have a locking that follows::
+
+  D owns L3
+         C blocked on L3
+         C owns L2
+                B blocked on L2
+                B owns L1
+                       A blocked on L1
+
+  And thus we have the chain A->L1->B->L2->C->L3->D.
+
+This gives us a PI depth of 4 (four processes), but looking at any of the
+functions individually, it seems as though they only have at most a locking
+depth of two.  So, although the locking depth is defined at compile time,
+it still is very difficult to find the possibilities of that depth.
+
+Now since mutexes can be defined by user-land applications, we don't want a DOS
+type of application that nests large amounts of mutexes to create a large
+PI chain, and have the code holding spin locks while looking at a large
+amount of data.  So to prevent this, the implementation not only implements
+a maximum lock depth, but also only holds at most two different locks at a
+time, as it walks the PI chain.  More about this below.
+
+
+Mutex owner and flags
+---------------------
+
+The mutex structure contains a pointer to the owner of the mutex.  If the
+mutex is not owned, this owner is set to NULL.  Since all architectures
+have the task structure on at least a two byte alignment (and if this is
+not true, the rtmutex.c code will be broken!), this allows for the least
+significant bit to be used as a flag.  Bit 0 is used as the "Has Waiters"
+flag. It's set whenever there are waiters on a mutex.
+
+See Documentation/locking/rt-mutex.rst for further details.
+
+cmpxchg Tricks
+--------------
+
+Some architectures implement an atomic cmpxchg (Compare and Exchange).  This
+is used (when applicable) to keep the fast path of grabbing and releasing
+mutexes short.
+
+cmpxchg is basically the following function performed atomically::
+
+  unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C)
+  {
+	unsigned long T = *A;
+	if (*A == *B) {
+		*A = *C;
+	}
+	return T;
+  }
+  #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c)
+
+This is really nice to have, since it allows you to only update a variable
+if the variable is what you expect it to be.  You know if it succeeded if
+the return value (the old value of A) is equal to B.
+
+The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If
+the architecture does not support CMPXCHG, then this macro is simply set
+to fail every time.  But if CMPXCHG is supported, then this will
+help out extremely to keep the fast path short.
+
+The use of rt_mutex_cmpxchg with the flags in the owner field help optimize
+the system for architectures that support it.  This will also be explained
+later in this document.
+
+
+Priority adjustments
+--------------------
+
+The implementation of the PI code in rtmutex.c has several places that a
+process must adjust its priority.  With the help of the pi_waiters of a
+process this is rather easy to know what needs to be adjusted.
+
+The functions implementing the task adjustments are rt_mutex_adjust_prio
+and rt_mutex_setprio. rt_mutex_setprio is only used in rt_mutex_adjust_prio.
+
+rt_mutex_adjust_prio examines the priority of the task, and the highest
+priority process that is waiting any of mutexes owned by the task. Since
+the pi_waiters of a task holds an order by priority of all the top waiters
+of all the mutexes that the task owns, we simply need to compare the top
+pi waiter to its own normal/deadline priority and take the higher one.
+Then rt_mutex_setprio is called to adjust the priority of the task to the
+new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c
+to implement the actual change in priority.
+
+Note:
+	For the "prio" field in task_struct, the lower the number, the
+	higher the priority. A "prio" of 5 is of higher priority than a
+	"prio" of 10.
+
+It is interesting to note that rt_mutex_adjust_prio can either increase
+or decrease the priority of the task.  In the case that a higher priority
+process has just blocked on a mutex owned by the task, rt_mutex_adjust_prio
+would increase/boost the task's priority.  But if a higher priority task
+were for some reason to leave the mutex (timeout or signal), this same function
+would decrease/unboost the priority of the task.  That is because the pi_waiters
+always contains the highest priority task that is waiting on a mutex owned
+by the task, so we only need to compare the priority of that top pi waiter
+to the normal priority of the given task.
+
+
+High level overview of the PI chain walk
+----------------------------------------
+
+The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain.
+
+The implementation has gone through several iterations, and has ended up
+with what we believe is the best.  It walks the PI chain by only grabbing
+at most two locks at a time, and is very efficient.
+
+The rt_mutex_adjust_prio_chain can be used either to boost or lower process
+priorities.
+
+rt_mutex_adjust_prio_chain is called with a task to be checked for PI
+(de)boosting (the owner of a mutex that a process is blocking on), a flag to
+check for deadlocking, the mutex that the task owns, a pointer to a waiter
+that is the process's waiter struct that is blocked on the mutex (although this
+parameter may be NULL for deboosting), a pointer to the mutex on which the task
+is blocked, and a top_task as the top waiter of the mutex.
+
+For this explanation, I will not mention deadlock detection. This explanation
+will try to stay at a high level.
+
+When this function is called, there are no locks held.  That also means
+that the state of the owner and lock can change when entered into this function.
+
+Before this function is called, the task has already had rt_mutex_adjust_prio
+performed on it.  This means that the task is set to the priority that it
+should be at, but the rbtree nodes of the task's waiter have not been updated
+with the new priorities, and this task may not be in the proper locations
+in the pi_waiters and waiters trees that the task is blocked on. This function
+solves all that.
+
+The main operation of this function is summarized by Thomas Gleixner in
+rtmutex.c. See the 'Chain walk basics and protection scope' comment for further
+details.
+
+Taking of a mutex (The walk through)
+------------------------------------
+
+OK, now let's take a look at the detailed walk through of what happens when
+taking a mutex.
+
+The first thing that is tried is the fast taking of the mutex.  This is
+done when we have CMPXCHG enabled (otherwise the fast taking automatically
+fails).  Only when the owner field of the mutex is NULL can the lock be
+taken with the CMPXCHG and nothing else needs to be done.
+
+If there is contention on the lock, we go about the slow path
+(rt_mutex_slowlock).
+
+The slow path function is where the task's waiter structure is created on
+the stack.  This is because the waiter structure is only needed for the
+scope of this function.  The waiter structure holds the nodes to store
+the task on the waiters tree of the mutex, and if need be, the pi_waiters
+tree of the owner.
+
+The wait_lock of the mutex is taken since the slow path of unlocking the
+mutex also takes this lock.
+
+We then call try_to_take_rt_mutex.  This is where the architecture that
+does not implement CMPXCHG would always grab the lock (if there's no
+contention).
+
+try_to_take_rt_mutex is used every time the task tries to grab a mutex in the
+slow path.  The first thing that is done here is an atomic setting of
+the "Has Waiters" flag of the mutex's owner field. By setting this flag
+now, the current owner of the mutex being contended for can't release the mutex
+without going into the slow unlock path, and it would then need to grab the
+wait_lock, which this code currently holds. So setting the "Has Waiters" flag
+forces the current owner to synchronize with this code.
+
+The lock is taken if the following are true:
+
+   1) The lock has no owner
+   2) The current task is the highest priority against all other
+      waiters of the lock
+
+If the task succeeds to acquire the lock, then the task is set as the
+owner of the lock, and if the lock still has waiters, the top_waiter
+(highest priority task waiting on the lock) is added to this task's
+pi_waiters tree.
+
+If the lock is not taken by try_to_take_rt_mutex(), then the
+task_blocks_on_rt_mutex() function is called. This will add the task to
+the lock's waiter tree and propagate the pi chain of the lock as well
+as the lock's owner's pi_waiters tree. This is described in the next
+section.
+
+Task blocks on mutex
+--------------------
+
+The accounting of a mutex and process is done with the waiter structure of
+the process.  The "task" field is set to the process, and the "lock" field
+to the mutex.  The rbtree node of waiter are initialized to the processes
+current priority.
+
+Since the wait_lock was taken at the entry of the slow lock, we can safely
+add the waiter to the task waiter tree.  If the current process is the
+highest priority process currently waiting on this mutex, then we remove the
+previous top waiter process (if it exists) from the pi_waiters of the owner,
+and add the current process to that tree.  Since the pi_waiter of the owner
+has changed, we call rt_mutex_adjust_prio on the owner to see if the owner
+should adjust its priority accordingly.
+
+If the owner is also blocked on a lock, and had its pi_waiters changed
+(or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead
+and run rt_mutex_adjust_prio_chain on the owner, as described earlier.
+
+Now all locks are released, and if the current process is still blocked on a
+mutex (waiter "task" field is not NULL), then we go to sleep (call schedule).
+
+Waking up in the loop
+---------------------
+
+The task can then wake up for a couple of reasons:
+  1) The previous lock owner released the lock, and the task now is top_waiter
+  2) we received a signal or timeout
+
+In both cases, the task will try again to acquire the lock. If it
+does, then it will take itself off the waiters tree and set itself back
+to the TASK_RUNNING state.
+
+In first case, if the lock was acquired by another task before this task
+could get the lock, then it will go back to sleep and wait to be woken again.
+
+The second case is only applicable for tasks that are grabbing a mutex
+that can wake up before getting the lock, either due to a signal or
+a timeout (i.e. rt_mutex_timed_futex_lock()). When woken, it will try to
+take the lock again, if it succeeds, then the task will return with the
+lock held, otherwise it will return with -EINTR if the task was woken
+by a signal, or -ETIMEDOUT if it timed out.
+
+
+Unlocking the Mutex
+-------------------
+
+The unlocking of a mutex also has a fast path for those architectures with
+CMPXCHG.  Since the taking of a mutex on contention always sets the
+"Has Waiters" flag of the mutex's owner, we use this to know if we need to
+take the slow path when unlocking the mutex.  If the mutex doesn't have any
+waiters, the owner field of the mutex would equal the current process and
+the mutex can be unlocked by just replacing the owner field with NULL.
+
+If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available),
+the slow unlock path is taken.
+
+The first thing done in the slow unlock path is to take the wait_lock of the
+mutex.  This synchronizes the locking and unlocking of the mutex.
+
+A check is made to see if the mutex has waiters or not.  On architectures that
+do not have CMPXCHG, this is the location that the owner of the mutex will
+determine if a waiter needs to be awoken or not.  On architectures that
+do have CMPXCHG, that check is done in the fast path, but it is still needed
+in the slow path too.  If a waiter of a mutex woke up because of a signal
+or timeout between the time the owner failed the fast path CMPXCHG check and
+the grabbing of the wait_lock, the mutex may not have any waiters, thus the
+owner still needs to make this check. If there are no waiters then the mutex
+owner field is set to NULL, the wait_lock is released and nothing more is
+needed.
+
+If there are waiters, then we need to wake one up.
+
+On the wake up code, the pi_lock of the current owner is taken.  The top
+waiter of the lock is found and removed from the waiters tree of the mutex
+as well as the pi_waiters tree of the current owner. The "Has Waiters" bit is
+marked to prevent lower priority tasks from stealing the lock.
+
+Finally we unlock the pi_lock of the pending owner and wake it up.
+
+
+Contact
+-------
+
+For updates on this document, please email Steven Rostedt <rostedt@goodmis.org>
+
+
+Credits
+-------
+
+Author:  Steven Rostedt <rostedt@goodmis.org>
+
+Updated: Alex Shi <alex.shi@linaro.org>	- 7/6/2017
+
+Original Reviewers:
+		     Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and
+		     Randy Dunlap
+
+Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior
+
+Updates
+-------
+
+This document was originally written for 2.6.17-rc3-mm1
+was updated on 4.12
diff --git a/Documentation/locking/rt-mutex-design.txt b/Documentation/locking/rt-mutex-design.txt
deleted file mode 100644
index 3d7b865539cc..000000000000
--- a/Documentation/locking/rt-mutex-design.txt
+++ /dev/null
@@ -1,559 +0,0 @@
-#
-# Copyright (c) 2006 Steven Rostedt
-# Licensed under the GNU Free Documentation License, Version 1.2
-#
-
-RT-mutex implementation design
-------------------------------
-
-This document tries to describe the design of the rtmutex.c implementation.
-It doesn't describe the reasons why rtmutex.c exists. For that please see
-Documentation/locking/rt-mutex.txt.  Although this document does explain problems
-that happen without this code, but that is in the concept to understand
-what the code actually is doing.
-
-The goal of this document is to help others understand the priority
-inheritance (PI) algorithm that is used, as well as reasons for the
-decisions that were made to implement PI in the manner that was done.
-
-
-Unbounded Priority Inversion
-----------------------------
-
-Priority inversion is when a lower priority process executes while a higher
-priority process wants to run.  This happens for several reasons, and
-most of the time it can't be helped.  Anytime a high priority process wants
-to use a resource that a lower priority process has (a mutex for example),
-the high priority process must wait until the lower priority process is done
-with the resource.  This is a priority inversion.  What we want to prevent
-is something called unbounded priority inversion.  That is when the high
-priority process is prevented from running by a lower priority process for
-an undetermined amount of time.
-
-The classic example of unbounded priority inversion is where you have three
-processes, let's call them processes A, B, and C, where A is the highest
-priority process, C is the lowest, and B is in between. A tries to grab a lock
-that C owns and must wait and lets C run to release the lock. But in the
-meantime, B executes, and since B is of a higher priority than C, it preempts C,
-but by doing so, it is in fact preempting A which is a higher priority process.
-Now there's no way of knowing how long A will be sleeping waiting for C
-to release the lock, because for all we know, B is a CPU hog and will
-never give C a chance to release the lock.  This is called unbounded priority
-inversion.
-
-Here's a little ASCII art to show the problem.
-
-   grab lock L1 (owned by C)
-     |
-A ---+
-        C preempted by B
-          |
-C    +----+
-
-B         +-------->
-                B now keeps A from running.
-
-
-Priority Inheritance (PI)
--------------------------
-
-There are several ways to solve this issue, but other ways are out of scope
-for this document.  Here we only discuss PI.
-
-PI is where a process inherits the priority of another process if the other
-process blocks on a lock owned by the current process.  To make this easier
-to understand, let's use the previous example, with processes A, B, and C again.
-
-This time, when A blocks on the lock owned by C, C would inherit the priority
-of A.  So now if B becomes runnable, it would not preempt C, since C now has
-the high priority of A.  As soon as C releases the lock, it loses its
-inherited priority, and A then can continue with the resource that C had.
-
-Terminology
------------
-
-Here I explain some terminology that is used in this document to help describe
-the design that is used to implement PI.
-
-PI chain - The PI chain is an ordered series of locks and processes that cause
-           processes to inherit priorities from a previous process that is
-           blocked on one of its locks.  This is described in more detail
-           later in this document.
-
-mutex    - In this document, to differentiate from locks that implement
-           PI and spin locks that are used in the PI code, from now on
-           the PI locks will be called a mutex.
-
-lock     - In this document from now on, I will use the term lock when
-           referring to spin locks that are used to protect parts of the PI
-           algorithm.  These locks disable preemption for UP (when
-           CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from
-           entering critical sections simultaneously.
-
-spin lock - Same as lock above.
-
-waiter   - A waiter is a struct that is stored on the stack of a blocked
-           process.  Since the scope of the waiter is within the code for
-           a process being blocked on the mutex, it is fine to allocate
-           the waiter on the process's stack (local variable).  This
-           structure holds a pointer to the task, as well as the mutex that
-           the task is blocked on.  It also has rbtree node structures to
-           place the task in the waiters rbtree of a mutex as well as the
-           pi_waiters rbtree of a mutex owner task (described below).
-
-           waiter is sometimes used in reference to the task that is waiting
-           on a mutex. This is the same as waiter->task.
-
-waiters  - A list of processes that are blocked on a mutex.
-
-top waiter - The highest priority process waiting on a specific mutex.
-
-top pi waiter - The highest priority process waiting on one of the mutexes
-                that a specific process owns.
-
-Note:  task and process are used interchangeably in this document, mostly to
-       differentiate between two processes that are being described together.
-
-
-PI chain
---------
-
-The PI chain is a list of processes and mutexes that may cause priority
-inheritance to take place.  Multiple chains may converge, but a chain
-would never diverge, since a process can't be blocked on more than one
-mutex at a time.
-
-Example:
-
-   Process:  A, B, C, D, E
-   Mutexes:  L1, L2, L3, L4
-
-   A owns: L1
-           B blocked on L1
-           B owns L2
-                  C blocked on L2
-                  C owns L3
-                         D blocked on L3
-                         D owns L4
-                                E blocked on L4
-
-The chain would be:
-
-   E->L4->D->L3->C->L2->B->L1->A
-
-To show where two chains merge, we could add another process F and
-another mutex L5 where B owns L5 and F is blocked on mutex L5.
-
-The chain for F would be:
-
-   F->L5->B->L1->A
-
-Since a process may own more than one mutex, but never be blocked on more than
-one, the chains merge.
-
-Here we show both chains:
-
-   E->L4->D->L3->C->L2-+
-                       |
-                       +->B->L1->A
-                       |
-                 F->L5-+
-
-For PI to work, the processes at the right end of these chains (or we may
-also call it the Top of the chain) must be equal to or higher in priority
-than the processes to the left or below in the chain.
-
-Also since a mutex may have more than one process blocked on it, we can
-have multiple chains merge at mutexes.  If we add another process G that is
-blocked on mutex L2:
-
-  G->L2->B->L1->A
-
-And once again, to show how this can grow I will show the merging chains
-again.
-
-   E->L4->D->L3->C-+
-                   +->L2-+
-                   |     |
-                 G-+     +->B->L1->A
-                         |
-                   F->L5-+
-
-If process G has the highest priority in the chain, then all the tasks up
-the chain (A and B in this example), must have their priorities increased
-to that of G.
-
-Mutex Waiters Tree
------------------
-
-Every mutex keeps track of all the waiters that are blocked on itself. The
-mutex has a rbtree to store these waiters by priority.  This tree is protected
-by a spin lock that is located in the struct of the mutex. This lock is called
-wait_lock.
-
-
-Task PI Tree
-------------
-
-To keep track of the PI chains, each process has its own PI rbtree.  This is
-a tree of all top waiters of the mutexes that are owned by the process.
-Note that this tree only holds the top waiters and not all waiters that are
-blocked on mutexes owned by the process.
-
-The top of the task's PI tree is always the highest priority task that
-is waiting on a mutex that is owned by the task.  So if the task has
-inherited a priority, it will always be the priority of the task that is
-at the top of this tree.
-
-This tree is stored in the task structure of a process as a rbtree called
-pi_waiters.  It is protected by a spin lock also in the task structure,
-called pi_lock.  This lock may also be taken in interrupt context, so when
-locking the pi_lock, interrupts must be disabled.
-
-
-Depth of the PI Chain
----------------------
-
-The maximum depth of the PI chain is not dynamic, and could actually be
-defined.  But is very complex to figure it out, since it depends on all
-the nesting of mutexes.  Let's look at the example where we have 3 mutexes,
-L1, L2, and L3, and four separate functions func1, func2, func3 and func4.
-The following shows a locking order of L1->L2->L3, but may not actually
-be directly nested that way.
-
-void func1(void)
-{
-	mutex_lock(L1);
-
-	/* do anything */
-
-	mutex_unlock(L1);
-}
-
-void func2(void)
-{
-	mutex_lock(L1);
-	mutex_lock(L2);
-
-	/* do something */
-
-	mutex_unlock(L2);
-	mutex_unlock(L1);
-}
-
-void func3(void)
-{
-	mutex_lock(L2);
-	mutex_lock(L3);
-
-	/* do something else */
-
-	mutex_unlock(L3);
-	mutex_unlock(L2);
-}
-
-void func4(void)
-{
-	mutex_lock(L3);
-
-	/* do something again */
-
-	mutex_unlock(L3);
-}
-
-Now we add 4 processes that run each of these functions separately.
-Processes A, B, C, and D which run functions func1, func2, func3 and func4
-respectively, and such that D runs first and A last.  With D being preempted
-in func4 in the "do something again" area, we have a locking that follows:
-
-D owns L3
-       C blocked on L3
-       C owns L2
-              B blocked on L2
-              B owns L1
-                     A blocked on L1
-
-And thus we have the chain A->L1->B->L2->C->L3->D.
-
-This gives us a PI depth of 4 (four processes), but looking at any of the
-functions individually, it seems as though they only have at most a locking
-depth of two.  So, although the locking depth is defined at compile time,
-it still is very difficult to find the possibilities of that depth.
-
-Now since mutexes can be defined by user-land applications, we don't want a DOS
-type of application that nests large amounts of mutexes to create a large
-PI chain, and have the code holding spin locks while looking at a large
-amount of data.  So to prevent this, the implementation not only implements
-a maximum lock depth, but also only holds at most two different locks at a
-time, as it walks the PI chain.  More about this below.
-
-
-Mutex owner and flags
----------------------
-
-The mutex structure contains a pointer to the owner of the mutex.  If the
-mutex is not owned, this owner is set to NULL.  Since all architectures
-have the task structure on at least a two byte alignment (and if this is
-not true, the rtmutex.c code will be broken!), this allows for the least
-significant bit to be used as a flag.  Bit 0 is used as the "Has Waiters"
-flag. It's set whenever there are waiters on a mutex.
-
-See Documentation/locking/rt-mutex.txt for further details.
-
-cmpxchg Tricks
---------------
-
-Some architectures implement an atomic cmpxchg (Compare and Exchange).  This
-is used (when applicable) to keep the fast path of grabbing and releasing
-mutexes short.
-
-cmpxchg is basically the following function performed atomically:
-
-unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C)
-{
-	unsigned long T = *A;
-	if (*A == *B) {
-		*A = *C;
-	}
-	return T;
-}
-#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c)
-
-This is really nice to have, since it allows you to only update a variable
-if the variable is what you expect it to be.  You know if it succeeded if
-the return value (the old value of A) is equal to B.
-
-The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If
-the architecture does not support CMPXCHG, then this macro is simply set
-to fail every time.  But if CMPXCHG is supported, then this will
-help out extremely to keep the fast path short.
-
-The use of rt_mutex_cmpxchg with the flags in the owner field help optimize
-the system for architectures that support it.  This will also be explained
-later in this document.
-
-
-Priority adjustments
---------------------
-
-The implementation of the PI code in rtmutex.c has several places that a
-process must adjust its priority.  With the help of the pi_waiters of a
-process this is rather easy to know what needs to be adjusted.
-
-The functions implementing the task adjustments are rt_mutex_adjust_prio
-and rt_mutex_setprio. rt_mutex_setprio is only used in rt_mutex_adjust_prio.
-
-rt_mutex_adjust_prio examines the priority of the task, and the highest
-priority process that is waiting any of mutexes owned by the task. Since
-the pi_waiters of a task holds an order by priority of all the top waiters
-of all the mutexes that the task owns, we simply need to compare the top
-pi waiter to its own normal/deadline priority and take the higher one.
-Then rt_mutex_setprio is called to adjust the priority of the task to the
-new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c
-to implement the actual change in priority.
-
-(Note:  For the "prio" field in task_struct, the lower the number, the
-	higher the priority. A "prio" of 5 is of higher priority than a
-	"prio" of 10.)
-
-It is interesting to note that rt_mutex_adjust_prio can either increase
-or decrease the priority of the task.  In the case that a higher priority
-process has just blocked on a mutex owned by the task, rt_mutex_adjust_prio
-would increase/boost the task's priority.  But if a higher priority task
-were for some reason to leave the mutex (timeout or signal), this same function
-would decrease/unboost the priority of the task.  That is because the pi_waiters
-always contains the highest priority task that is waiting on a mutex owned
-by the task, so we only need to compare the priority of that top pi waiter
-to the normal priority of the given task.
-
-
-High level overview of the PI chain walk
-----------------------------------------
-
-The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain.
-
-The implementation has gone through several iterations, and has ended up
-with what we believe is the best.  It walks the PI chain by only grabbing
-at most two locks at a time, and is very efficient.
-
-The rt_mutex_adjust_prio_chain can be used either to boost or lower process
-priorities.
-
-rt_mutex_adjust_prio_chain is called with a task to be checked for PI
-(de)boosting (the owner of a mutex that a process is blocking on), a flag to
-check for deadlocking, the mutex that the task owns, a pointer to a waiter
-that is the process's waiter struct that is blocked on the mutex (although this
-parameter may be NULL for deboosting), a pointer to the mutex on which the task
-is blocked, and a top_task as the top waiter of the mutex.
-
-For this explanation, I will not mention deadlock detection. This explanation
-will try to stay at a high level.
-
-When this function is called, there are no locks held.  That also means
-that the state of the owner and lock can change when entered into this function.
-
-Before this function is called, the task has already had rt_mutex_adjust_prio
-performed on it.  This means that the task is set to the priority that it
-should be at, but the rbtree nodes of the task's waiter have not been updated
-with the new priorities, and this task may not be in the proper locations
-in the pi_waiters and waiters trees that the task is blocked on. This function
-solves all that.
-
-The main operation of this function is summarized by Thomas Gleixner in
-rtmutex.c. See the 'Chain walk basics and protection scope' comment for further
-details.
-
-Taking of a mutex (The walk through)
-------------------------------------
-
-OK, now let's take a look at the detailed walk through of what happens when
-taking a mutex.
-
-The first thing that is tried is the fast taking of the mutex.  This is
-done when we have CMPXCHG enabled (otherwise the fast taking automatically
-fails).  Only when the owner field of the mutex is NULL can the lock be
-taken with the CMPXCHG and nothing else needs to be done.
-
-If there is contention on the lock, we go about the slow path
-(rt_mutex_slowlock).
-
-The slow path function is where the task's waiter structure is created on
-the stack.  This is because the waiter structure is only needed for the
-scope of this function.  The waiter structure holds the nodes to store
-the task on the waiters tree of the mutex, and if need be, the pi_waiters
-tree of the owner.
-
-The wait_lock of the mutex is taken since the slow path of unlocking the
-mutex also takes this lock.
-
-We then call try_to_take_rt_mutex.  This is where the architecture that
-does not implement CMPXCHG would always grab the lock (if there's no
-contention).
-
-try_to_take_rt_mutex is used every time the task tries to grab a mutex in the
-slow path.  The first thing that is done here is an atomic setting of
-the "Has Waiters" flag of the mutex's owner field. By setting this flag
-now, the current owner of the mutex being contended for can't release the mutex
-without going into the slow unlock path, and it would then need to grab the
-wait_lock, which this code currently holds. So setting the "Has Waiters" flag
-forces the current owner to synchronize with this code.
-
-The lock is taken if the following are true:
-   1) The lock has no owner
-   2) The current task is the highest priority against all other
-      waiters of the lock
-
-If the task succeeds to acquire the lock, then the task is set as the
-owner of the lock, and if the lock still has waiters, the top_waiter
-(highest priority task waiting on the lock) is added to this task's
-pi_waiters tree.
-
-If the lock is not taken by try_to_take_rt_mutex(), then the
-task_blocks_on_rt_mutex() function is called. This will add the task to
-the lock's waiter tree and propagate the pi chain of the lock as well
-as the lock's owner's pi_waiters tree. This is described in the next
-section.
-
-Task blocks on mutex
---------------------
-
-The accounting of a mutex and process is done with the waiter structure of
-the process.  The "task" field is set to the process, and the "lock" field
-to the mutex.  The rbtree node of waiter are initialized to the processes
-current priority.
-
-Since the wait_lock was taken at the entry of the slow lock, we can safely
-add the waiter to the task waiter tree.  If the current process is the
-highest priority process currently waiting on this mutex, then we remove the
-previous top waiter process (if it exists) from the pi_waiters of the owner,
-and add the current process to that tree.  Since the pi_waiter of the owner
-has changed, we call rt_mutex_adjust_prio on the owner to see if the owner
-should adjust its priority accordingly.
-
-If the owner is also blocked on a lock, and had its pi_waiters changed
-(or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead
-and run rt_mutex_adjust_prio_chain on the owner, as described earlier.
-
-Now all locks are released, and if the current process is still blocked on a
-mutex (waiter "task" field is not NULL), then we go to sleep (call schedule).
-
-Waking up in the loop
----------------------
-
-The task can then wake up for a couple of reasons:
-  1) The previous lock owner released the lock, and the task now is top_waiter
-  2) we received a signal or timeout
-
-In both cases, the task will try again to acquire the lock. If it
-does, then it will take itself off the waiters tree and set itself back
-to the TASK_RUNNING state.
-
-In first case, if the lock was acquired by another task before this task
-could get the lock, then it will go back to sleep and wait to be woken again.
-
-The second case is only applicable for tasks that are grabbing a mutex
-that can wake up before getting the lock, either due to a signal or
-a timeout (i.e. rt_mutex_timed_futex_lock()). When woken, it will try to
-take the lock again, if it succeeds, then the task will return with the
-lock held, otherwise it will return with -EINTR if the task was woken
-by a signal, or -ETIMEDOUT if it timed out.
-
-
-Unlocking the Mutex
--------------------
-
-The unlocking of a mutex also has a fast path for those architectures with
-CMPXCHG.  Since the taking of a mutex on contention always sets the
-"Has Waiters" flag of the mutex's owner, we use this to know if we need to
-take the slow path when unlocking the mutex.  If the mutex doesn't have any
-waiters, the owner field of the mutex would equal the current process and
-the mutex can be unlocked by just replacing the owner field with NULL.
-
-If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available),
-the slow unlock path is taken.
-
-The first thing done in the slow unlock path is to take the wait_lock of the
-mutex.  This synchronizes the locking and unlocking of the mutex.
-
-A check is made to see if the mutex has waiters or not.  On architectures that
-do not have CMPXCHG, this is the location that the owner of the mutex will
-determine if a waiter needs to be awoken or not.  On architectures that
-do have CMPXCHG, that check is done in the fast path, but it is still needed
-in the slow path too.  If a waiter of a mutex woke up because of a signal
-or timeout between the time the owner failed the fast path CMPXCHG check and
-the grabbing of the wait_lock, the mutex may not have any waiters, thus the
-owner still needs to make this check. If there are no waiters then the mutex
-owner field is set to NULL, the wait_lock is released and nothing more is
-needed.
-
-If there are waiters, then we need to wake one up.
-
-On the wake up code, the pi_lock of the current owner is taken.  The top
-waiter of the lock is found and removed from the waiters tree of the mutex
-as well as the pi_waiters tree of the current owner. The "Has Waiters" bit is
-marked to prevent lower priority tasks from stealing the lock.
-
-Finally we unlock the pi_lock of the pending owner and wake it up.
-
-
-Contact
--------
-
-For updates on this document, please email Steven Rostedt <rostedt@goodmis.org>
-
-
-Credits
--------
-
-Author:  Steven Rostedt <rostedt@goodmis.org>
-Updated: Alex Shi <alex.shi@linaro.org>	- 7/6/2017
-
-Original Reviewers:  Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and
-		     Randy Dunlap
-Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior
-
-Updates
--------
-
-This document was originally written for 2.6.17-rc3-mm1
-was updated on 4.12
diff --git a/Documentation/locking/rt-mutex.rst b/Documentation/locking/rt-mutex.rst
new file mode 100644
index 000000000000..c365dc302081
--- /dev/null
+++ b/Documentation/locking/rt-mutex.rst
@@ -0,0 +1,77 @@
+==================================
+RT-mutex subsystem with PI support
+==================================
+
+RT-mutexes with priority inheritance are used to support PI-futexes,
+which enable pthread_mutex_t priority inheritance attributes
+(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details
+about PI-futexes.]
+
+This technology was developed in the -rt tree and streamlined for
+pthread_mutex support.
+
+Basic principles:
+-----------------
+
+RT-mutexes extend the semantics of simple mutexes by the priority
+inheritance protocol.
+
+A low priority owner of a rt-mutex inherits the priority of a higher
+priority waiter until the rt-mutex is released. If the temporarily
+boosted owner blocks on a rt-mutex itself it propagates the priority
+boosting to the owner of the other rt_mutex it gets blocked on. The
+priority boosting is immediately removed once the rt_mutex has been
+unlocked.
+
+This approach allows us to shorten the block of high-prio tasks on
+mutexes which protect shared resources. Priority inheritance is not a
+magic bullet for poorly designed applications, but it allows
+well-designed applications to use userspace locks in critical parts of
+an high priority thread, without losing determinism.
+
+The enqueueing of the waiters into the rtmutex waiter tree is done in
+priority order. For same priorities FIFO order is chosen. For each
+rtmutex, only the top priority waiter is enqueued into the owner's
+priority waiters tree. This tree too queues in priority order. Whenever
+the top priority waiter of a task changes (for example it timed out or
+got a signal), the priority of the owner task is readjusted. The
+priority enqueueing is handled by "pi_waiters".
+
+RT-mutexes are optimized for fastpath operations and have no internal
+locking overhead when locking an uncontended mutex or unlocking a mutex
+without waiters. The optimized fastpath operations require cmpxchg
+support. [If that is not available then the rt-mutex internal spinlock
+is used]
+
+The state of the rt-mutex is tracked via the owner field of the rt-mutex
+structure:
+
+lock->owner holds the task_struct pointer of the owner. Bit 0 is used to
+keep track of the "lock has waiters" state:
+
+ ============ ======= ================================================
+ owner        bit0    Notes
+ ============ ======= ================================================
+ NULL         0       lock is free (fast acquire possible)
+ NULL         1       lock is free and has waiters and the top waiter
+		      is going to take the lock [1]_
+ taskpointer  0       lock is held (fast release possible)
+ taskpointer  1       lock is held and has waiters [2]_
+ ============ ======= ================================================
+
+The fast atomic compare exchange based acquire and release is only
+possible when bit 0 of lock->owner is 0.
+
+.. [1] It also can be a transitional state when grabbing the lock
+       with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
+       we need to set the bit0 before looking at the lock, and the owner may
+       be NULL in this small time, hence this can be a transitional state.
+
+.. [2] There is a small time when bit 0 is set but there are no
+       waiters. This can happen when grabbing the lock in the slow path.
+       To prevent a cmpxchg of the owner releasing the lock, we need to
+       set this bit before looking at the lock.
+
+BTW, there is still technically a "Pending Owner", it's just not called
+that anymore. The pending owner happens to be the top_waiter of a lock
+that has no owner and has been woken up to grab the lock.
diff --git a/Documentation/locking/rt-mutex.txt b/Documentation/locking/rt-mutex.txt
deleted file mode 100644
index 35793e003041..000000000000
--- a/Documentation/locking/rt-mutex.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-RT-mutex subsystem with PI support
-----------------------------------
-
-RT-mutexes with priority inheritance are used to support PI-futexes,
-which enable pthread_mutex_t priority inheritance attributes
-(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details
-about PI-futexes.]
-
-This technology was developed in the -rt tree and streamlined for
-pthread_mutex support.
-
-Basic principles:
------------------
-
-RT-mutexes extend the semantics of simple mutexes by the priority
-inheritance protocol.
-
-A low priority owner of a rt-mutex inherits the priority of a higher
-priority waiter until the rt-mutex is released. If the temporarily
-boosted owner blocks on a rt-mutex itself it propagates the priority
-boosting to the owner of the other rt_mutex it gets blocked on. The
-priority boosting is immediately removed once the rt_mutex has been
-unlocked.
-
-This approach allows us to shorten the block of high-prio tasks on
-mutexes which protect shared resources. Priority inheritance is not a
-magic bullet for poorly designed applications, but it allows
-well-designed applications to use userspace locks in critical parts of
-an high priority thread, without losing determinism.
-
-The enqueueing of the waiters into the rtmutex waiter tree is done in
-priority order. For same priorities FIFO order is chosen. For each
-rtmutex, only the top priority waiter is enqueued into the owner's
-priority waiters tree. This tree too queues in priority order. Whenever
-the top priority waiter of a task changes (for example it timed out or
-got a signal), the priority of the owner task is readjusted. The
-priority enqueueing is handled by "pi_waiters".
-
-RT-mutexes are optimized for fastpath operations and have no internal
-locking overhead when locking an uncontended mutex or unlocking a mutex
-without waiters. The optimized fastpath operations require cmpxchg
-support. [If that is not available then the rt-mutex internal spinlock
-is used]
-
-The state of the rt-mutex is tracked via the owner field of the rt-mutex
-structure:
-
-lock->owner holds the task_struct pointer of the owner. Bit 0 is used to
-keep track of the "lock has waiters" state.
-
- owner        bit0
- NULL         0       lock is free (fast acquire possible)
- NULL         1       lock is free and has waiters and the top waiter
-			is going to take the lock*
- taskpointer  0       lock is held (fast release possible)
- taskpointer  1       lock is held and has waiters**
-
-The fast atomic compare exchange based acquire and release is only
-possible when bit 0 of lock->owner is 0.
-
-(*) It also can be a transitional state when grabbing the lock
-with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
-we need to set the bit0 before looking at the lock, and the owner may be
-NULL in this small time, hence this can be a transitional state.
-
-(**) There is a small time when bit 0 is set but there are no
-waiters. This can happen when grabbing the lock in the slow path.
-To prevent a cmpxchg of the owner releasing the lock, we need to
-set this bit before looking at the lock.
-
-BTW, there is still technically a "Pending Owner", it's just not called
-that anymore. The pending owner happens to be the top_waiter of a lock
-that has no owner and has been woken up to grab the lock.
diff --git a/Documentation/locking/spinlocks.rst b/Documentation/locking/spinlocks.rst
new file mode 100644
index 000000000000..098107fb7d86
--- /dev/null
+++ b/Documentation/locking/spinlocks.rst
@@ -0,0 +1,177 @@
+===============
+Locking lessons
+===============
+
+Lesson 1: Spin locks
+====================
+
+The most basic primitive for locking is spinlock::
+
+  static DEFINE_SPINLOCK(xxx_lock);
+
+	unsigned long flags;
+
+	spin_lock_irqsave(&xxx_lock, flags);
+	... critical section here ..
+	spin_unlock_irqrestore(&xxx_lock, flags);
+
+The above is always safe. It will disable interrupts _locally_, but the
+spinlock itself will guarantee the global lock, so it will guarantee that
+there is only one thread-of-control within the region(s) protected by that
+lock. This works well even under UP also, so the code does _not_ need to
+worry about UP vs SMP issues: the spinlocks work correctly under both.
+
+   NOTE! Implications of spin_locks for memory are further described in:
+
+     Documentation/memory-barriers.txt
+
+       (5) LOCK operations.
+
+       (6) UNLOCK operations.
+
+The above is usually pretty simple (you usually need and want only one
+spinlock for most things - using more than one spinlock can make things a
+lot more complex and even slower and is usually worth it only for
+sequences that you **know** need to be split up: avoid it at all cost if you
+aren't sure).
+
+This is really the only really hard part about spinlocks: once you start
+using spinlocks they tend to expand to areas you might not have noticed
+before, because you have to make sure the spinlocks correctly protect the
+shared data structures **everywhere** they are used. The spinlocks are most
+easily added to places that are completely independent of other code (for
+example, internal driver data structures that nobody else ever touches).
+
+   NOTE! The spin-lock is safe only when you **also** use the lock itself
+   to do locking across CPU's, which implies that EVERYTHING that
+   touches a shared variable has to agree about the spinlock they want
+   to use.
+
+----
+
+Lesson 2: reader-writer spinlocks.
+==================================
+
+If your data accesses have a very natural pattern where you usually tend
+to mostly read from the shared variables, the reader-writer locks
+(rw_lock) versions of the spinlocks are sometimes useful. They allow multiple
+readers to be in the same critical region at once, but if somebody wants
+to change the variables it has to get an exclusive write lock.
+
+   NOTE! reader-writer locks require more atomic memory operations than
+   simple spinlocks.  Unless the reader critical section is long, you
+   are better off just using spinlocks.
+
+The routines look the same as above::
+
+   rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
+
+	unsigned long flags;
+
+	read_lock_irqsave(&xxx_lock, flags);
+	.. critical section that only reads the info ...
+	read_unlock_irqrestore(&xxx_lock, flags);
+
+	write_lock_irqsave(&xxx_lock, flags);
+	.. read and write exclusive access to the info ...
+	write_unlock_irqrestore(&xxx_lock, flags);
+
+The above kind of lock may be useful for complex data structures like
+linked lists, especially searching for entries without changing the list
+itself.  The read lock allows many concurrent readers.  Anything that
+**changes** the list will have to get the write lock.
+
+   NOTE! RCU is better for list traversal, but requires careful
+   attention to design detail (see Documentation/RCU/listRCU.txt).
+
+Also, you cannot "upgrade" a read-lock to a write-lock, so if you at _any_
+time need to do any changes (even if you don't do it every time), you have
+to get the write-lock at the very beginning.
+
+   NOTE! We are working hard to remove reader-writer spinlocks in most
+   cases, so please don't add a new one without consensus.  (Instead, see
+   Documentation/RCU/rcu.txt for complete information.)
+
+----
+
+Lesson 3: spinlocks revisited.
+==============================
+
+The single spin-lock primitives above are by no means the only ones. They
+are the most safe ones, and the ones that work under all circumstances,
+but partly **because** they are safe they are also fairly slow. They are slower
+than they'd need to be, because they do have to disable interrupts
+(which is just a single instruction on a x86, but it's an expensive one -
+and on other architectures it can be worse).
+
+If you have a case where you have to protect a data structure across
+several CPU's and you want to use spinlocks you can potentially use
+cheaper versions of the spinlocks. IFF you know that the spinlocks are
+never used in interrupt handlers, you can use the non-irq versions::
+
+	spin_lock(&lock);
+	...
+	spin_unlock(&lock);
+
+(and the equivalent read-write versions too, of course). The spinlock will
+guarantee the same kind of exclusive access, and it will be much faster.
+This is useful if you know that the data in question is only ever
+manipulated from a "process context", ie no interrupts involved.
+
+The reasons you mustn't use these versions if you have interrupts that
+play with the spinlock is that you can get deadlocks::
+
+	spin_lock(&lock);
+	...
+		<- interrupt comes in:
+			spin_lock(&lock);
+
+where an interrupt tries to lock an already locked variable. This is ok if
+the other interrupt happens on another CPU, but it is _not_ ok if the
+interrupt happens on the same CPU that already holds the lock, because the
+lock will obviously never be released (because the interrupt is waiting
+for the lock, and the lock-holder is interrupted by the interrupt and will
+not continue until the interrupt has been processed).
+
+(This is also the reason why the irq-versions of the spinlocks only need
+to disable the _local_ interrupts - it's ok to use spinlocks in interrupts
+on other CPU's, because an interrupt on another CPU doesn't interrupt the
+CPU that holds the lock, so the lock-holder can continue and eventually
+releases the lock).
+
+Note that you can be clever with read-write locks and interrupts. For
+example, if you know that the interrupt only ever gets a read-lock, then
+you can use a non-irq version of read locks everywhere - because they
+don't block on each other (and thus there is no dead-lock wrt interrupts.
+But when you do the write-lock, you have to use the irq-safe version.
+
+For an example of being clever with rw-locks, see the "waitqueue_lock"
+handling in kernel/sched/core.c - nothing ever _changes_ a wait-queue from
+within an interrupt, they only read the queue in order to know whom to
+wake up. So read-locks are safe (which is good: they are very common
+indeed), while write-locks need to protect themselves against interrupts.
+
+		Linus
+
+----
+
+Reference information:
+======================
+
+For dynamic initialization, use spin_lock_init() or rwlock_init() as
+appropriate::
+
+   spinlock_t xxx_lock;
+   rwlock_t xxx_rw_lock;
+
+   static int __init xxx_init(void)
+   {
+	spin_lock_init(&xxx_lock);
+	rwlock_init(&xxx_rw_lock);
+	...
+   }
+
+   module_init(xxx_init);
+
+For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or
+__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate.
diff --git a/Documentation/locking/spinlocks.txt b/Documentation/locking/spinlocks.txt
deleted file mode 100644
index ff35e40bdf5b..000000000000
--- a/Documentation/locking/spinlocks.txt
+++ /dev/null
@@ -1,167 +0,0 @@
-Lesson 1: Spin locks
-
-The most basic primitive for locking is spinlock.
-
-static DEFINE_SPINLOCK(xxx_lock);
-
-	unsigned long flags;
-
-	spin_lock_irqsave(&xxx_lock, flags);
-	... critical section here ..
-	spin_unlock_irqrestore(&xxx_lock, flags);
-
-The above is always safe. It will disable interrupts _locally_, but the
-spinlock itself will guarantee the global lock, so it will guarantee that
-there is only one thread-of-control within the region(s) protected by that
-lock. This works well even under UP also, so the code does _not_ need to
-worry about UP vs SMP issues: the spinlocks work correctly under both.
-
-   NOTE! Implications of spin_locks for memory are further described in:
-
-     Documentation/memory-barriers.txt
-       (5) LOCK operations.
-       (6) UNLOCK operations.
-
-The above is usually pretty simple (you usually need and want only one
-spinlock for most things - using more than one spinlock can make things a
-lot more complex and even slower and is usually worth it only for
-sequences that you _know_ need to be split up: avoid it at all cost if you
-aren't sure).
-
-This is really the only really hard part about spinlocks: once you start
-using spinlocks they tend to expand to areas you might not have noticed
-before, because you have to make sure the spinlocks correctly protect the
-shared data structures _everywhere_ they are used. The spinlocks are most
-easily added to places that are completely independent of other code (for
-example, internal driver data structures that nobody else ever touches).
-
-   NOTE! The spin-lock is safe only when you _also_ use the lock itself
-   to do locking across CPU's, which implies that EVERYTHING that
-   touches a shared variable has to agree about the spinlock they want
-   to use.
-
-----
-
-Lesson 2: reader-writer spinlocks.
-
-If your data accesses have a very natural pattern where you usually tend
-to mostly read from the shared variables, the reader-writer locks
-(rw_lock) versions of the spinlocks are sometimes useful. They allow multiple
-readers to be in the same critical region at once, but if somebody wants
-to change the variables it has to get an exclusive write lock.
-
-   NOTE! reader-writer locks require more atomic memory operations than
-   simple spinlocks.  Unless the reader critical section is long, you
-   are better off just using spinlocks.
-
-The routines look the same as above:
-
-   rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
-
-	unsigned long flags;
-
-	read_lock_irqsave(&xxx_lock, flags);
-	.. critical section that only reads the info ...
-	read_unlock_irqrestore(&xxx_lock, flags);
-
-	write_lock_irqsave(&xxx_lock, flags);
-	.. read and write exclusive access to the info ...
-	write_unlock_irqrestore(&xxx_lock, flags);
-
-The above kind of lock may be useful for complex data structures like
-linked lists, especially searching for entries without changing the list
-itself.  The read lock allows many concurrent readers.  Anything that
-_changes_ the list will have to get the write lock.
-
-   NOTE! RCU is better for list traversal, but requires careful
-   attention to design detail (see Documentation/RCU/listRCU.txt).
-
-Also, you cannot "upgrade" a read-lock to a write-lock, so if you at _any_
-time need to do any changes (even if you don't do it every time), you have
-to get the write-lock at the very beginning.
-
-   NOTE! We are working hard to remove reader-writer spinlocks in most
-   cases, so please don't add a new one without consensus.  (Instead, see
-   Documentation/RCU/rcu.txt for complete information.)
-
-----
-
-Lesson 3: spinlocks revisited.
-
-The single spin-lock primitives above are by no means the only ones. They
-are the most safe ones, and the ones that work under all circumstances,
-but partly _because_ they are safe they are also fairly slow. They are slower
-than they'd need to be, because they do have to disable interrupts
-(which is just a single instruction on a x86, but it's an expensive one -
-and on other architectures it can be worse).
-
-If you have a case where you have to protect a data structure across
-several CPU's and you want to use spinlocks you can potentially use
-cheaper versions of the spinlocks. IFF you know that the spinlocks are
-never used in interrupt handlers, you can use the non-irq versions:
-
-	spin_lock(&lock);
-	...
-	spin_unlock(&lock);
-
-(and the equivalent read-write versions too, of course). The spinlock will
-guarantee the same kind of exclusive access, and it will be much faster.
-This is useful if you know that the data in question is only ever
-manipulated from a "process context", ie no interrupts involved.
-
-The reasons you mustn't use these versions if you have interrupts that
-play with the spinlock is that you can get deadlocks:
-
-	spin_lock(&lock);
-	...
-		<- interrupt comes in:
-			spin_lock(&lock);
-
-where an interrupt tries to lock an already locked variable. This is ok if
-the other interrupt happens on another CPU, but it is _not_ ok if the
-interrupt happens on the same CPU that already holds the lock, because the
-lock will obviously never be released (because the interrupt is waiting
-for the lock, and the lock-holder is interrupted by the interrupt and will
-not continue until the interrupt has been processed).
-
-(This is also the reason why the irq-versions of the spinlocks only need
-to disable the _local_ interrupts - it's ok to use spinlocks in interrupts
-on other CPU's, because an interrupt on another CPU doesn't interrupt the
-CPU that holds the lock, so the lock-holder can continue and eventually
-releases the lock).
-
-Note that you can be clever with read-write locks and interrupts. For
-example, if you know that the interrupt only ever gets a read-lock, then
-you can use a non-irq version of read locks everywhere - because they
-don't block on each other (and thus there is no dead-lock wrt interrupts.
-But when you do the write-lock, you have to use the irq-safe version.
-
-For an example of being clever with rw-locks, see the "waitqueue_lock"
-handling in kernel/sched/core.c - nothing ever _changes_ a wait-queue from
-within an interrupt, they only read the queue in order to know whom to
-wake up. So read-locks are safe (which is good: they are very common
-indeed), while write-locks need to protect themselves against interrupts.
-
-		Linus
-
-----
-
-Reference information:
-
-For dynamic initialization, use spin_lock_init() or rwlock_init() as
-appropriate:
-
-   spinlock_t xxx_lock;
-   rwlock_t xxx_rw_lock;
-
-   static int __init xxx_init(void)
-   {
-	spin_lock_init(&xxx_lock);
-	rwlock_init(&xxx_rw_lock);
-	...
-   }
-
-   module_init(xxx_init);
-
-For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or
-__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate.
diff --git a/Documentation/locking/ww-mutex-design.rst b/Documentation/locking/ww-mutex-design.rst
new file mode 100644
index 000000000000..1846c199da23
--- /dev/null
+++ b/Documentation/locking/ww-mutex-design.rst
@@ -0,0 +1,393 @@
+======================================
+Wound/Wait Deadlock-Proof Mutex Design
+======================================
+
+Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
+
+Motivation for WW-Mutexes
+-------------------------
+
+GPU's do operations that commonly involve many buffers.  Those buffers
+can be shared across contexts/processes, exist in different memory
+domains (for example VRAM vs system memory), and so on.  And with
+PRIME / dmabuf, they can even be shared across devices.  So there are
+a handful of situations where the driver needs to wait for buffers to
+become ready.  If you think about this in terms of waiting on a buffer
+mutex for it to become available, this presents a problem because
+there is no way to guarantee that buffers appear in a execbuf/batch in
+the same order in all contexts.  That is directly under control of
+userspace, and a result of the sequence of GL calls that an application
+makes.	Which results in the potential for deadlock.  The problem gets
+more complex when you consider that the kernel may need to migrate the
+buffer(s) into VRAM before the GPU operates on the buffer(s), which
+may in turn require evicting some other buffers (and you don't want to
+evict other buffers which are already queued up to the GPU), but for a
+simplified understanding of the problem you can ignore this.
+
+The algorithm that the TTM graphics subsystem came up with for dealing with
+this problem is quite simple.  For each group of buffers (execbuf) that need
+to be locked, the caller would be assigned a unique reservation id/ticket,
+from a global counter.  In case of deadlock while locking all the buffers
+associated with a execbuf, the one with the lowest reservation ticket (i.e.
+the oldest task) wins, and the one with the higher reservation id (i.e. the
+younger task) unlocks all of the buffers that it has already locked, and then
+tries again.
+
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+wounds the transaction holding the lock, requesting it to die.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm in that transactions are wounded by other transactions, and that
+requires a reliable way to pick up up the wounded condition and preempt the
+running transaction. Note that this is not the same as process preemption. A
+Wound-Wait transaction is considered preempted when it dies (returning
+-EDEADLK) following a wound.
+
+Concepts
+--------
+
+Compared to normal mutexes two additional concepts/objects show up in the lock
+interface for w/w mutexes:
+
+Acquire context: To ensure eventual forward progress it is important the a task
+trying to acquire locks doesn't grab a new reservation id, but keeps the one it
+acquired when starting the lock acquisition. This ticket is stored in the
+acquire context. Furthermore the acquire context keeps track of debugging state
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
+
+W/w class: In contrast to normal mutexes the lock class needs to be explicit for
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
+
+Furthermore there are three different class of w/w lock acquire functions:
+
+* Normal lock acquisition with a context, using ww_mutex_lock.
+
+* Slowpath lock acquisition on the contending lock, used by the task that just
+  killed its transaction after having dropped all already acquired locks.
+  These functions have the _slow postfix.
+
+  From a simple semantics point-of-view the _slow functions are not strictly
+  required, since simply calling the normal ww_mutex_lock functions on the
+  contending lock (after having dropped all other already acquired locks) will
+  work correctly. After all if no other ww mutex has been acquired yet there's
+  no deadlock potential and hence the ww_mutex_lock call will block and not
+  prematurely return -EDEADLK. The advantage of the _slow functions is in
+  interface safety:
+
+  - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
+    has a void return type. Note that since ww mutex code needs loops/retries
+    anyway the __must_check doesn't result in spurious warnings, even though the
+    very first lock operation can never fail.
+  - When full debugging is enabled ww_mutex_lock_slow checks that all acquired
+    ww mutex have been released (preventing deadlocks) and makes sure that we
+    block on the contending lock (preventing spinning through the -EDEADLK
+    slowpath until the contended lock can be acquired).
+
+* Functions to only acquire a single w/w mutex, which results in the exact same
+  semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL
+  context.
+
+  Again this is not strictly required. But often you only want to acquire a
+  single lock in which case it's pointless to set up an acquire context (and so
+  better to avoid grabbing a deadlock avoidance ticket).
+
+Of course, all the usual variants for handling wake-ups due to signals are also
+provided.
+
+Usage
+-----
+
+The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
+DEFINE_WW_CLASS() (Wound-Wait) or DEFINE_WD_CLASS() (Wait-Die)
+As a rough rule of thumb, use Wound-Wait iff you
+expect the number of simultaneous competing transactions to be typically small,
+and you want to reduce the number of rollbacks.
+
+Three different ways to acquire locks within the same w/w class. Common
+definitions for methods #1 and #2::
+
+  static DEFINE_WW_CLASS(ww_class);
+
+  struct obj {
+	struct ww_mutex lock;
+	/* obj data */
+  };
+
+  struct obj_entry {
+	struct list_head head;
+	struct obj *obj;
+  };
+
+Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
+This is useful if a list of required objects is already tracked somewhere.
+Furthermore the lock helper can use propagate the -EALREADY return code back to
+the caller as a signal that an object is twice on the list. This is useful if
+the list is constructed from userspace input and the ABI requires userspace to
+not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl)::
+
+  int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
+  {
+	struct obj *res_obj = NULL;
+	struct obj_entry *contended_entry = NULL;
+	struct obj_entry *entry;
+
+	ww_acquire_init(ctx, &ww_class);
+
+  retry:
+	list_for_each_entry (entry, list, head) {
+		if (entry->obj == res_obj) {
+			res_obj = NULL;
+			continue;
+		}
+		ret = ww_mutex_lock(&entry->obj->lock, ctx);
+		if (ret < 0) {
+			contended_entry = entry;
+			goto err;
+		}
+	}
+
+	ww_acquire_done(ctx);
+	return 0;
+
+  err:
+	list_for_each_entry_continue_reverse (entry, list, head)
+		ww_mutex_unlock(&entry->obj->lock);
+
+	if (res_obj)
+		ww_mutex_unlock(&res_obj->lock);
+
+	if (ret == -EDEADLK) {
+		/* we lost out in a seqno race, lock and retry.. */
+		ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
+		res_obj = contended_entry->obj;
+		goto retry;
+	}
+	ww_acquire_fini(ctx);
+
+	return ret;
+  }
+
+Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
+of duplicate entry detection using -EALREADY as method 1 above. But the
+list-reordering allows for a bit more idiomatic code::
+
+  int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
+  {
+	struct obj_entry *entry, *entry2;
+
+	ww_acquire_init(ctx, &ww_class);
+
+	list_for_each_entry (entry, list, head) {
+		ret = ww_mutex_lock(&entry->obj->lock, ctx);
+		if (ret < 0) {
+			entry2 = entry;
+
+			list_for_each_entry_continue_reverse (entry2, list, head)
+				ww_mutex_unlock(&entry2->obj->lock);
+
+			if (ret != -EDEADLK) {
+				ww_acquire_fini(ctx);
+				return ret;
+			}
+
+			/* we lost out in a seqno race, lock and retry.. */
+			ww_mutex_lock_slow(&entry->obj->lock, ctx);
+
+			/*
+			 * Move buf to head of the list, this will point
+			 * buf->next to the first unlocked entry,
+			 * restarting the for loop.
+			 */
+			list_del(&entry->head);
+			list_add(&entry->head, list);
+		}
+	}
+
+	ww_acquire_done(ctx);
+	return 0;
+  }
+
+Unlocking works the same way for both methods #1 and #2::
+
+  void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
+  {
+	struct obj_entry *entry;
+
+	list_for_each_entry (entry, list, head)
+		ww_mutex_unlock(&entry->obj->lock);
+
+	ww_acquire_fini(ctx);
+  }
+
+Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
+e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
+and edges can only be changed when holding the locks of all involved nodes. w/w
+mutexes are a natural fit for such a case for two reasons:
+
+- They can handle lock-acquisition in any order which allows us to start walking
+  a graph from a starting point and then iteratively discovering new edges and
+  locking down the nodes those edges connect to.
+- Due to the -EALREADY return code signalling that a given objects is already
+  held there's no need for additional book-keeping to break cycles in the graph
+  or keep track off which looks are already held (when using more than one node
+  as a starting point).
+
+Note that this approach differs in two important ways from the above methods:
+
+- Since the list of objects is dynamically constructed (and might very well be
+  different when retrying due to hitting the -EDEADLK die condition) there's
+  no need to keep any object on a persistent list when it's not locked. We can
+  therefore move the list_head into the object itself.
+- On the other hand the dynamic object list construction also means that the -EALREADY return
+  code can't be propagated.
+
+Note also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a
+list of starting nodes (passed in from userspace) using one of the above
+methods. And then lock any additional objects affected by the operations using
+method #3 below. The backoff/retry procedure will be a bit more involved, since
+when the dynamic locking step hits -EDEADLK we also need to unlock all the
+objects acquired with the fixed list. But the w/w mutex debug checks will catch
+any interface misuse for these cases.
+
+Also, method 3 can't fail the lock acquisition step since it doesn't return
+-EALREADY. Of course this would be different when using the _interruptible
+variants, but that's outside of the scope of these examples here::
+
+  struct obj {
+	struct ww_mutex ww_mutex;
+	struct list_head locked_list;
+  };
+
+  static DEFINE_WW_CLASS(ww_class);
+
+  void __unlock_objs(struct list_head *list)
+  {
+	struct obj *entry, *temp;
+
+	list_for_each_entry_safe (entry, temp, list, locked_list) {
+		/* need to do that before unlocking, since only the current lock holder is
+		allowed to use object */
+		list_del(&entry->locked_list);
+		ww_mutex_unlock(entry->ww_mutex)
+	}
+  }
+
+  void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
+  {
+	struct obj *obj;
+
+	ww_acquire_init(ctx, &ww_class);
+
+  retry:
+	/* re-init loop start state */
+	loop {
+		/* magic code which walks over a graph and decides which objects
+		 * to lock */
+
+		ret = ww_mutex_lock(obj->ww_mutex, ctx);
+		if (ret == -EALREADY) {
+			/* we have that one already, get to the next object */
+			continue;
+		}
+		if (ret == -EDEADLK) {
+			__unlock_objs(list);
+
+			ww_mutex_lock_slow(obj, ctx);
+			list_add(&entry->locked_list, list);
+			goto retry;
+		}
+
+		/* locked a new object, add it to the list */
+		list_add_tail(&entry->locked_list, list);
+	}
+
+	ww_acquire_done(ctx);
+	return 0;
+  }
+
+  void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
+  {
+	__unlock_objs(list);
+	ww_acquire_fini(ctx);
+  }
+
+Method 4: Only lock one single objects. In that case deadlock detection and
+prevention is obviously overkill, since with grabbing just one lock you can't
+produce a deadlock within just one class. To simplify this case the w/w mutex
+api can be used with a NULL context.
+
+Implementation Details
+----------------------
+
+Design:
+^^^^^^^
+
+  ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
+  normal mutex locks, which are far more common. As such there is only a small
+  increase in code size if wait/wound mutexes are not used.
+
+  We maintain the following invariants for the wait list:
+
+  (1) Waiters with an acquire context are sorted by stamp order; waiters
+      without an acquire context are interspersed in FIFO order.
+  (2) For Wait-Die, among waiters with contexts, only the first one can have
+      other locks acquired already (ctx->acquired > 0). Note that this waiter
+      may come after other waiters without contexts in the list.
+
+  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
+  The wounded status of the transaction is checked only when there is
+  contention for a new lock and hence a true chance of deadlock. In that
+  situation, if the transaction is wounded, it backs off, clears the
+  wounded status and retries. A great benefit of implementing preemption in
+  this way is that the wounded transaction can identify a contending lock to
+  wait for before restarting the transaction. Just blindly restarting the
+  transaction would likely make the transaction end up in a situation where
+  it would have to back off again.
+
+  In general, not much contention is expected. The locks are typically used to
+  serialize access to resources for devices, and optimization focus should
+  therefore be directed towards the uncontended cases.
+
+Lockdep:
+^^^^^^^^
+
+  Special care has been taken to warn for as many cases of api abuse
+  as possible. Some common api abuses will be caught with
+  CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
+
+  Some of the errors which will be warned about:
+   - Forgetting to call ww_acquire_fini or ww_acquire_init.
+   - Attempting to lock more mutexes after ww_acquire_done.
+   - Attempting to lock the wrong mutex after -EDEADLK and
+     unlocking all mutexes.
+   - Attempting to lock the right mutex after -EDEADLK,
+     before unlocking all mutexes.
+
+   - Calling ww_mutex_lock_slow before -EDEADLK was returned.
+
+   - Unlocking mutexes with the wrong unlock function.
+   - Calling one of the ww_acquire_* twice on the same context.
+   - Using a different ww_class for the mutex than for the ww_acquire_ctx.
+   - Normal lockdep errors that can result in deadlocks.
+
+  Some of the lockdep errors that can result in deadlocks:
+   - Calling ww_acquire_init to initialize a second ww_acquire_ctx before
+     having called ww_acquire_fini on the first.
+   - 'normal' deadlocks that can occur.
+
+FIXME:
+  Update this section once we have the TASK_DEADLOCK task state flag magic
+  implemented.
diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
deleted file mode 100644
index f0ed7c30e695..000000000000
--- a/Documentation/locking/ww-mutex-design.txt
+++ /dev/null
@@ -1,383 +0,0 @@
-Wound/Wait Deadlock-Proof Mutex Design
-======================================
-
-Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
-
-Motivation for WW-Mutexes
--------------------------
-
-GPU's do operations that commonly involve many buffers.  Those buffers
-can be shared across contexts/processes, exist in different memory
-domains (for example VRAM vs system memory), and so on.  And with
-PRIME / dmabuf, they can even be shared across devices.  So there are
-a handful of situations where the driver needs to wait for buffers to
-become ready.  If you think about this in terms of waiting on a buffer
-mutex for it to become available, this presents a problem because
-there is no way to guarantee that buffers appear in a execbuf/batch in
-the same order in all contexts.  That is directly under control of
-userspace, and a result of the sequence of GL calls that an application
-makes.	Which results in the potential for deadlock.  The problem gets
-more complex when you consider that the kernel may need to migrate the
-buffer(s) into VRAM before the GPU operates on the buffer(s), which
-may in turn require evicting some other buffers (and you don't want to
-evict other buffers which are already queued up to the GPU), but for a
-simplified understanding of the problem you can ignore this.
-
-The algorithm that the TTM graphics subsystem came up with for dealing with
-this problem is quite simple.  For each group of buffers (execbuf) that need
-to be locked, the caller would be assigned a unique reservation id/ticket,
-from a global counter.  In case of deadlock while locking all the buffers
-associated with a execbuf, the one with the lowest reservation ticket (i.e.
-the oldest task) wins, and the one with the higher reservation id (i.e. the
-younger task) unlocks all of the buffers that it has already locked, and then
-tries again.
-
-In the RDBMS literature, a reservation ticket is associated with a transaction.
-and the deadlock handling approach is called Wait-Die. The name is based on
-the actions of a locking thread when it encounters an already locked mutex.
-If the transaction holding the lock is younger, the locking transaction waits.
-If the transaction holding the lock is older, the locking transaction backs off
-and dies. Hence Wait-Die.
-There is also another algorithm called Wound-Wait:
-If the transaction holding the lock is younger, the locking transaction
-wounds the transaction holding the lock, requesting it to die.
-If the transaction holding the lock is older, it waits for the other
-transaction. Hence Wound-Wait.
-The two algorithms are both fair in that a transaction will eventually succeed.
-However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
-compared to Wait-Die, but is, on the other hand, associated with more work than
-Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
-algorithm in that transactions are wounded by other transactions, and that
-requires a reliable way to pick up up the wounded condition and preempt the
-running transaction. Note that this is not the same as process preemption. A
-Wound-Wait transaction is considered preempted when it dies (returning
--EDEADLK) following a wound.
-
-Concepts
---------
-
-Compared to normal mutexes two additional concepts/objects show up in the lock
-interface for w/w mutexes:
-
-Acquire context: To ensure eventual forward progress it is important the a task
-trying to acquire locks doesn't grab a new reservation id, but keeps the one it
-acquired when starting the lock acquisition. This ticket is stored in the
-acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse. An acquire context is representing a
-transaction.
-
-W/w class: In contrast to normal mutexes the lock class needs to be explicit for
-w/w mutexes, since it is required to initialize the acquire context. The lock
-class also specifies what algorithm to use, Wound-Wait or Wait-Die.
-
-Furthermore there are three different class of w/w lock acquire functions:
-
-* Normal lock acquisition with a context, using ww_mutex_lock.
-
-* Slowpath lock acquisition on the contending lock, used by the task that just
-  killed its transaction after having dropped all already acquired locks.
-  These functions have the _slow postfix.
-
-  From a simple semantics point-of-view the _slow functions are not strictly
-  required, since simply calling the normal ww_mutex_lock functions on the
-  contending lock (after having dropped all other already acquired locks) will
-  work correctly. After all if no other ww mutex has been acquired yet there's
-  no deadlock potential and hence the ww_mutex_lock call will block and not
-  prematurely return -EDEADLK. The advantage of the _slow functions is in
-  interface safety:
-  - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
-    has a void return type. Note that since ww mutex code needs loops/retries
-    anyway the __must_check doesn't result in spurious warnings, even though the
-    very first lock operation can never fail.
-  - When full debugging is enabled ww_mutex_lock_slow checks that all acquired
-    ww mutex have been released (preventing deadlocks) and makes sure that we
-    block on the contending lock (preventing spinning through the -EDEADLK
-    slowpath until the contended lock can be acquired).
-
-* Functions to only acquire a single w/w mutex, which results in the exact same
-  semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL
-  context.
-
-  Again this is not strictly required. But often you only want to acquire a
-  single lock in which case it's pointless to set up an acquire context (and so
-  better to avoid grabbing a deadlock avoidance ticket).
-
-Of course, all the usual variants for handling wake-ups due to signals are also
-provided.
-
-Usage
------
-
-The algorithm (Wait-Die vs Wound-Wait) is chosen by using either
-DEFINE_WW_CLASS() (Wound-Wait) or DEFINE_WD_CLASS() (Wait-Die)
-As a rough rule of thumb, use Wound-Wait iff you
-expect the number of simultaneous competing transactions to be typically small,
-and you want to reduce the number of rollbacks.
-
-Three different ways to acquire locks within the same w/w class. Common
-definitions for methods #1 and #2:
-
-static DEFINE_WW_CLASS(ww_class);
-
-struct obj {
-	struct ww_mutex lock;
-	/* obj data */
-};
-
-struct obj_entry {
-	struct list_head head;
-	struct obj *obj;
-};
-
-Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
-This is useful if a list of required objects is already tracked somewhere.
-Furthermore the lock helper can use propagate the -EALREADY return code back to
-the caller as a signal that an object is twice on the list. This is useful if
-the list is constructed from userspace input and the ABI requires userspace to
-not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl).
-
-int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
-{
-	struct obj *res_obj = NULL;
-	struct obj_entry *contended_entry = NULL;
-	struct obj_entry *entry;
-
-	ww_acquire_init(ctx, &ww_class);
-
-retry:
-	list_for_each_entry (entry, list, head) {
-		if (entry->obj == res_obj) {
-			res_obj = NULL;
-			continue;
-		}
-		ret = ww_mutex_lock(&entry->obj->lock, ctx);
-		if (ret < 0) {
-			contended_entry = entry;
-			goto err;
-		}
-	}
-
-	ww_acquire_done(ctx);
-	return 0;
-
-err:
-	list_for_each_entry_continue_reverse (entry, list, head)
-		ww_mutex_unlock(&entry->obj->lock);
-
-	if (res_obj)
-		ww_mutex_unlock(&res_obj->lock);
-
-	if (ret == -EDEADLK) {
-		/* we lost out in a seqno race, lock and retry.. */
-		ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
-		res_obj = contended_entry->obj;
-		goto retry;
-	}
-	ww_acquire_fini(ctx);
-
-	return ret;
-}
-
-Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
-of duplicate entry detection using -EALREADY as method 1 above. But the
-list-reordering allows for a bit more idiomatic code.
-
-int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
-{
-	struct obj_entry *entry, *entry2;
-
-	ww_acquire_init(ctx, &ww_class);
-
-	list_for_each_entry (entry, list, head) {
-		ret = ww_mutex_lock(&entry->obj->lock, ctx);
-		if (ret < 0) {
-			entry2 = entry;
-
-			list_for_each_entry_continue_reverse (entry2, list, head)
-				ww_mutex_unlock(&entry2->obj->lock);
-
-			if (ret != -EDEADLK) {
-				ww_acquire_fini(ctx);
-				return ret;
-			}
-
-			/* we lost out in a seqno race, lock and retry.. */
-			ww_mutex_lock_slow(&entry->obj->lock, ctx);
-
-			/*
-			 * Move buf to head of the list, this will point
-			 * buf->next to the first unlocked entry,
-			 * restarting the for loop.
-			 */
-			list_del(&entry->head);
-			list_add(&entry->head, list);
-		}
-	}
-
-	ww_acquire_done(ctx);
-	return 0;
-}
-
-Unlocking works the same way for both methods #1 and #2:
-
-void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
-{
-	struct obj_entry *entry;
-
-	list_for_each_entry (entry, list, head)
-		ww_mutex_unlock(&entry->obj->lock);
-
-	ww_acquire_fini(ctx);
-}
-
-Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
-e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
-and edges can only be changed when holding the locks of all involved nodes. w/w
-mutexes are a natural fit for such a case for two reasons:
-- They can handle lock-acquisition in any order which allows us to start walking
-  a graph from a starting point and then iteratively discovering new edges and
-  locking down the nodes those edges connect to.
-- Due to the -EALREADY return code signalling that a given objects is already
-  held there's no need for additional book-keeping to break cycles in the graph
-  or keep track off which looks are already held (when using more than one node
-  as a starting point).
-
-Note that this approach differs in two important ways from the above methods:
-- Since the list of objects is dynamically constructed (and might very well be
-  different when retrying due to hitting the -EDEADLK die condition) there's
-  no need to keep any object on a persistent list when it's not locked. We can
-  therefore move the list_head into the object itself.
-- On the other hand the dynamic object list construction also means that the -EALREADY return
-  code can't be propagated.
-
-Note also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a
-list of starting nodes (passed in from userspace) using one of the above
-methods. And then lock any additional objects affected by the operations using
-method #3 below. The backoff/retry procedure will be a bit more involved, since
-when the dynamic locking step hits -EDEADLK we also need to unlock all the
-objects acquired with the fixed list. But the w/w mutex debug checks will catch
-any interface misuse for these cases.
-
-Also, method 3 can't fail the lock acquisition step since it doesn't return
--EALREADY. Of course this would be different when using the _interruptible
-variants, but that's outside of the scope of these examples here.
-
-struct obj {
-	struct ww_mutex ww_mutex;
-	struct list_head locked_list;
-};
-
-static DEFINE_WW_CLASS(ww_class);
-
-void __unlock_objs(struct list_head *list)
-{
-	struct obj *entry, *temp;
-
-	list_for_each_entry_safe (entry, temp, list, locked_list) {
-		/* need to do that before unlocking, since only the current lock holder is
-		allowed to use object */
-		list_del(&entry->locked_list);
-		ww_mutex_unlock(entry->ww_mutex)
-	}
-}
-
-void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
-{
-	struct obj *obj;
-
-	ww_acquire_init(ctx, &ww_class);
-
-retry:
-	/* re-init loop start state */
-	loop {
-		/* magic code which walks over a graph and decides which objects
-		 * to lock */
-
-		ret = ww_mutex_lock(obj->ww_mutex, ctx);
-		if (ret == -EALREADY) {
-			/* we have that one already, get to the next object */
-			continue;
-		}
-		if (ret == -EDEADLK) {
-			__unlock_objs(list);
-
-			ww_mutex_lock_slow(obj, ctx);
-			list_add(&entry->locked_list, list);
-			goto retry;
-		}
-
-		/* locked a new object, add it to the list */
-		list_add_tail(&entry->locked_list, list);
-	}
-
-	ww_acquire_done(ctx);
-	return 0;
-}
-
-void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
-{
-	__unlock_objs(list);
-	ww_acquire_fini(ctx);
-}
-
-Method 4: Only lock one single objects. In that case deadlock detection and
-prevention is obviously overkill, since with grabbing just one lock you can't
-produce a deadlock within just one class. To simplify this case the w/w mutex
-api can be used with a NULL context.
-
-Implementation Details
-----------------------
-
-Design:
-  ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
-  normal mutex locks, which are far more common. As such there is only a small
-  increase in code size if wait/wound mutexes are not used.
-
-  We maintain the following invariants for the wait list:
-  (1) Waiters with an acquire context are sorted by stamp order; waiters
-      without an acquire context are interspersed in FIFO order.
-  (2) For Wait-Die, among waiters with contexts, only the first one can have
-      other locks acquired already (ctx->acquired > 0). Note that this waiter
-      may come after other waiters without contexts in the list.
-
-  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
-  The wounded status of the transaction is checked only when there is
-  contention for a new lock and hence a true chance of deadlock. In that
-  situation, if the transaction is wounded, it backs off, clears the
-  wounded status and retries. A great benefit of implementing preemption in
-  this way is that the wounded transaction can identify a contending lock to
-  wait for before restarting the transaction. Just blindly restarting the
-  transaction would likely make the transaction end up in a situation where
-  it would have to back off again.
-
-  In general, not much contention is expected. The locks are typically used to
-  serialize access to resources for devices, and optimization focus should
-  therefore be directed towards the uncontended cases.
-
-Lockdep:
-  Special care has been taken to warn for as many cases of api abuse
-  as possible. Some common api abuses will be caught with
-  CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
-
-  Some of the errors which will be warned about:
-   - Forgetting to call ww_acquire_fini or ww_acquire_init.
-   - Attempting to lock more mutexes after ww_acquire_done.
-   - Attempting to lock the wrong mutex after -EDEADLK and
-     unlocking all mutexes.
-   - Attempting to lock the right mutex after -EDEADLK,
-     before unlocking all mutexes.
-
-   - Calling ww_mutex_lock_slow before -EDEADLK was returned.
-
-   - Unlocking mutexes with the wrong unlock function.
-   - Calling one of the ww_acquire_* twice on the same context.
-   - Using a different ww_class for the mutex than for the ww_acquire_ctx.
-   - Normal lockdep errors that can result in deadlocks.
-
-  Some of the lockdep errors that can result in deadlocks:
-   - Calling ww_acquire_init to initialize a second ww_acquire_ctx before
-     having called ww_acquire_fini on the first.
-   - 'normal' deadlocks that can occur.
-
-FIXME: Update this section once we have the TASK_DEADLOCK task state flag magic
-implemented.
diff --git a/Documentation/pi-futex.txt b/Documentation/pi-futex.txt
index b154f6c0c36e..c33ba2befbf8 100644
--- a/Documentation/pi-futex.txt
+++ b/Documentation/pi-futex.txt
@@ -119,4 +119,4 @@ properties of futexes, and all four combinations are possible: futex,
 robust-futex, PI-futex, robust+PI-futex.
 
 More details about priority inheritance can be found in
-Documentation/locking/rt-mutex.txt.
+Documentation/locking/rt-mutex.rst.
diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst
index 5fd8a1abd2be..b9a6be4b8499 100644
--- a/Documentation/translations/it_IT/kernel-hacking/locking.rst
+++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst
@@ -1404,7 +1404,7 @@ Riferimento per l'API dei Futex
 Approfondimenti
 ===============
 
--  ``Documentation/locking/spinlocks.txt``: la guida di Linus Torvalds agli
+-  ``Documentation/locking/spinlocks.rst``: la guida di Linus Torvalds agli
    spinlock del kernel.
 
 -  Unix Systems for Modern Architectures: Symmetric Multiprocessing and
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index 81dd11901ffd..cb5671d32ada 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -36,7 +36,7 @@
  * of extra utility/tracking out of our acquire-ctx.  This is provided
  * by &struct drm_modeset_lock and &struct drm_modeset_acquire_ctx.
  *
- * For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.txt
+ * For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.rst
  *
  * The basic usage pattern is to::
  *
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 57baa27f238c..0b0d7259276d 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -5,7 +5,7 @@
  *  Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
  *  Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
  *
- * see Documentation/locking/lockdep-design.txt for more details.
+ * see Documentation/locking/lockdep-design.rst for more details.
  */
 #ifndef __LINUX_LOCKDEP_H
 #define __LINUX_LOCKDEP_H
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 3093dd162424..dcd03fee6e01 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -151,7 +151,7 @@ static inline bool mutex_is_locked(struct mutex *lock)
 
 /*
  * See kernel/locking/mutex.c for detailed documentation of these APIs.
- * Also see Documentation/locking/mutex-design.txt.
+ * Also see Documentation/locking/mutex-design.rst.
  */
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index e401358c4e7e..9d9c663987d8 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -160,7 +160,7 @@ extern void downgrade_write(struct rw_semaphore *sem);
  * static then another method for expressing nested locking is
  * the explicit definition of lock class keys and the use of
  * lockdep_set_class() at lock initialization time.
- * See Documentation/locking/lockdep-design.txt for more details.)
+ * See Documentation/locking/lockdep-design.rst for more details.)
  */
 extern void down_read_nested(struct rw_semaphore *sem, int subclass);
 extern void down_write_nested(struct rw_semaphore *sem, int subclass);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0c601ae072b3..edd1c082dbf5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -16,7 +16,7 @@
  *    by Steven Rostedt, based on work by Gregory Haskins, Peter Morreale
  *    and Sven Dietrich.
  *
- * Also see Documentation/locking/mutex-design.txt.
+ * Also see Documentation/locking/mutex-design.rst.
  */
 #include <linux/mutex.h>
 #include <linux/ww_mutex.h>
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 38fbf9fa7f1b..fa83d36e30c6 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -9,7 +9,7 @@
  *  Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt
  *  Copyright (C) 2006 Esben Nielsen
  *
- *  See Documentation/locking/rt-mutex-design.txt for details.
+ *  See Documentation/locking/rt-mutex-design.rst for details.
  */
 #include <linux/spinlock.h>
 #include <linux/export.h>
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4ac4ca21a30a..a858b55e8ac7 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1139,7 +1139,7 @@ config PROVE_LOCKING
 	 the proof of observed correctness is also maintained for an
 	 arbitrary combination of these separate locking variants.
 
-	 For more details, see Documentation/locking/lockdep-design.txt.
+	 For more details, see Documentation/locking/lockdep-design.rst.
 
 config LOCK_STAT
 	bool "Lock usage statistics"
@@ -1153,7 +1153,7 @@ config LOCK_STAT
 	help
 	 This feature enables tracking lock contention points
 
-	 For more details, see Documentation/locking/lockstat.txt
+	 For more details, see Documentation/locking/lockstat.rst
 
 	 This also enables lock events required by "perf lock",
 	 subcommand of perf.
-- 
cgit v1.2.3-55-g7522


From 720594f691e5c8fb0624f3653b20b24ba8e57742 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sat, 13 Apr 2019 22:54:53 -0300
Subject: docs: connector: convert to ReST and rename to connector.rst

As it has some function definitions, move them to connector.h.

The remaining conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/connector/connector.rst | 156 +++++++++++++++++++++++++++
 Documentation/connector/connector.txt | 196 ----------------------------------
 drivers/w1/Kconfig                    |   2 +-
 include/linux/connector.h             |  63 ++++++++++-
 samples/Kconfig                       |   2 +-
 5 files changed, 220 insertions(+), 199 deletions(-)
 create mode 100644 Documentation/connector/connector.rst
 delete mode 100644 Documentation/connector/connector.txt

diff --git a/Documentation/connector/connector.rst b/Documentation/connector/connector.rst
new file mode 100644
index 000000000000..24e26dc22dbf
--- /dev/null
+++ b/Documentation/connector/connector.rst
@@ -0,0 +1,156 @@
+:orphan:
+
+================
+Kernel Connector
+================
+
+Kernel connector - new netlink based userspace <-> kernel space easy
+to use communication module.
+
+The Connector driver makes it easy to connect various agents using a
+netlink based network.  One must register a callback and an identifier.
+When the driver receives a special netlink message with the appropriate
+identifier, the appropriate callback will be called.
+
+From the userspace point of view it's quite straightforward:
+
+	- socket();
+	- bind();
+	- send();
+	- recv();
+
+But if kernelspace wants to use the full power of such connections, the
+driver writer must create special sockets, must know about struct sk_buff
+handling, etc...  The Connector driver allows any kernelspace agents to use
+netlink based networking for inter-process communication in a significantly
+easier way::
+
+  int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
+  void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
+  void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);
+
+  struct cb_id
+  {
+	__u32			idx;
+	__u32			val;
+  };
+
+idx and val are unique identifiers which must be registered in the
+connector.h header for in-kernel usage.  `void (*callback) (void *)` is a
+callback function which will be called when a message with above idx.val
+is received by the connector core.  The argument for that function must
+be dereferenced to `struct cn_msg *`::
+
+  struct cn_msg
+  {
+	struct cb_id		id;
+
+	__u32			seq;
+	__u32			ack;
+
+	__u32			len;	/* Length of the following data */
+	__u8			data[0];
+  };
+
+Connector interfaces
+====================
+
+ .. kernel-doc:: include/linux/connector.h
+
+ Note:
+   When registering new callback user, connector core assigns
+   netlink group to the user which is equal to its id.idx.
+
+Protocol description
+====================
+
+The current framework offers a transport layer with fixed headers.  The
+recommended protocol which uses such a header is as following:
+
+msg->seq and msg->ack are used to determine message genealogy.  When
+someone sends a message, they use a locally unique sequence and random
+acknowledge number.  The sequence number may be copied into
+nlmsghdr->nlmsg_seq too.
+
+The sequence number is incremented with each message sent.
+
+If you expect a reply to the message, then the sequence number in the
+received message MUST be the same as in the original message, and the
+acknowledge number MUST be the same + 1.
+
+If we receive a message and its sequence number is not equal to one we
+are expecting, then it is a new message.  If we receive a message and
+its sequence number is the same as one we are expecting, but its
+acknowledge is not equal to the sequence number in the original
+message + 1, then it is a new message.
+
+Obviously, the protocol header contains the above id.
+
+The connector allows event notification in the following form: kernel
+driver or userspace process can ask connector to notify it when
+selected ids will be turned on or off (registered or unregistered its
+callback).  It is done by sending a special command to the connector
+driver (it also registers itself with id={-1, -1}).
+
+As example of this usage can be found in the cn_test.c module which
+uses the connector to request notification and to send messages.
+
+Reliability
+===========
+
+Netlink itself is not a reliable protocol.  That means that messages can
+be lost due to memory pressure or process' receiving queue overflowed,
+so caller is warned that it must be prepared.  That is why the struct
+cn_msg [main connector's message header] contains u32 seq and u32 ack
+fields.
+
+Userspace usage
+===============
+
+2.6.14 has a new netlink socket implementation, which by default does not
+allow people to send data to netlink groups other than 1.
+So, if you wish to use a netlink socket (for example using connector)
+with a different group number, the userspace application must subscribe to
+that group first.  It can be achieved by the following pseudocode::
+
+  s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
+
+  l_local.nl_family = AF_NETLINK;
+  l_local.nl_groups = 12345;
+  l_local.nl_pid = 0;
+
+  if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
+	perror("bind");
+	close(s);
+	return -1;
+  }
+
+  {
+	int on = l_local.nl_groups;
+	setsockopt(s, 270, 1, &on, sizeof(on));
+  }
+
+Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket
+option.  To drop a multicast subscription, one should call the above socket
+option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0.
+
+2.6.14 netlink code only allows to select a group which is less or equal to
+the maximum group number, which is used at netlink_kernel_create() time.
+In case of connector it is CN_NETLINK_USERS + 0xf, so if you want to use
+group number 12345, you must increment CN_NETLINK_USERS to that number.
+Additional 0xf numbers are allocated to be used by non-in-kernel users.
+
+Due to this limitation, group 0xffffffff does not work now, so one can
+not use add/remove connector's group notifications, but as far as I know,
+only cn_test.c test module used it.
+
+Some work in netlink area is still being done, so things can be changed in
+2.6.15 timeframe, if it will happen, documentation will be updated for that
+kernel.
+
+Code samples
+============
+
+Sample code for a connector test module and user space can be found
+in samples/connector/. To build this code, enable CONFIG_CONNECTOR
+and CONFIG_SAMPLES.
diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt
deleted file mode 100644
index ab7ca897fab7..000000000000
--- a/Documentation/connector/connector.txt
+++ /dev/null
@@ -1,196 +0,0 @@
-/*****************************************/
-Kernel Connector.
-/*****************************************/
-
-Kernel connector - new netlink based userspace <-> kernel space easy
-to use communication module.
-
-The Connector driver makes it easy to connect various agents using a
-netlink based network.  One must register a callback and an identifier.
-When the driver receives a special netlink message with the appropriate
-identifier, the appropriate callback will be called.
-
-From the userspace point of view it's quite straightforward:
-
-	socket();
-	bind();
-	send();
-	recv();
-
-But if kernelspace wants to use the full power of such connections, the
-driver writer must create special sockets, must know about struct sk_buff
-handling, etc...  The Connector driver allows any kernelspace agents to use
-netlink based networking for inter-process communication in a significantly
-easier way:
-
-int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
-void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
-void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);
-
-struct cb_id
-{
-	__u32			idx;
-	__u32			val;
-};
-
-idx and val are unique identifiers which must be registered in the
-connector.h header for in-kernel usage.  void (*callback) (void *) is a
-callback function which will be called when a message with above idx.val
-is received by the connector core.  The argument for that function must
-be dereferenced to struct cn_msg *.
-
-struct cn_msg
-{
-	struct cb_id		id;
-
-	__u32			seq;
-	__u32			ack;
-
-	__u32			len;		/* Length of the following data */
-	__u8			data[0];
-};
-
-/*****************************************/
-Connector interfaces.
-/*****************************************/
-
-int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
-
- Registers new callback with connector core.
-
- struct cb_id *id		- unique connector's user identifier.
-				  It must be registered in connector.h for legal in-kernel users.
- char *name			- connector's callback symbolic name.
- void (*callback) (struct cn..)	- connector's callback.
-				  cn_msg and the sender's credentials
-
-
-void cn_del_callback(struct cb_id *id);
-
- Unregisters new callback with connector core.
-
- struct cb_id *id		- unique connector's user identifier.
-
-
-int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __groups, int gfp_mask);
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int gfp_mask);
-
- Sends message to the specified groups.  It can be safely called from
- softirq context, but may silently fail under strong memory pressure.
- If there are no listeners for given group -ESRCH can be returned.
-
- struct cn_msg *		- message header(with attached data).
- u16 len			- for *_multi multiple cn_msg messages can be sent
- u32 port			- destination port.
- 				  If non-zero the message will be sent to the
-				  given port, which should be set to the
-				  original sender.
- u32 __group			- destination group.
-				  If port and __group is zero, then appropriate group will
-				  be searched through all registered connector users,
-				  and message will be delivered to the group which was
-				  created for user with the same ID as in msg.
-				  If __group is not zero, then message will be delivered
-				  to the specified group.
- int gfp_mask			- GFP mask.
-
- Note: When registering new callback user, connector core assigns
- netlink group to the user which is equal to its id.idx.
-
-/*****************************************/
-Protocol description.
-/*****************************************/
-
-The current framework offers a transport layer with fixed headers.  The
-recommended protocol which uses such a header is as following:
-
-msg->seq and msg->ack are used to determine message genealogy.  When
-someone sends a message, they use a locally unique sequence and random
-acknowledge number.  The sequence number may be copied into
-nlmsghdr->nlmsg_seq too.
-
-The sequence number is incremented with each message sent.
-
-If you expect a reply to the message, then the sequence number in the
-received message MUST be the same as in the original message, and the
-acknowledge number MUST be the same + 1.
-
-If we receive a message and its sequence number is not equal to one we
-are expecting, then it is a new message.  If we receive a message and
-its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the sequence number in the original
-message + 1, then it is a new message.
-
-Obviously, the protocol header contains the above id.
-
-The connector allows event notification in the following form: kernel
-driver or userspace process can ask connector to notify it when
-selected ids will be turned on or off (registered or unregistered its
-callback).  It is done by sending a special command to the connector
-driver (it also registers itself with id={-1, -1}).
-
-As example of this usage can be found in the cn_test.c module which
-uses the connector to request notification and to send messages.
-
-/*****************************************/
-Reliability.
-/*****************************************/
-
-Netlink itself is not a reliable protocol.  That means that messages can
-be lost due to memory pressure or process' receiving queue overflowed,
-so caller is warned that it must be prepared.  That is why the struct
-cn_msg [main connector's message header] contains u32 seq and u32 ack
-fields.
-
-/*****************************************/
-Userspace usage.
-/*****************************************/
-
-2.6.14 has a new netlink socket implementation, which by default does not
-allow people to send data to netlink groups other than 1.
-So, if you wish to use a netlink socket (for example using connector)
-with a different group number, the userspace application must subscribe to
-that group first.  It can be achieved by the following pseudocode:
-
-s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
-
-l_local.nl_family = AF_NETLINK;
-l_local.nl_groups = 12345;
-l_local.nl_pid = 0;
-
-if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
-	perror("bind");
-	close(s);
-	return -1;
-}
-
-{
-	int on = l_local.nl_groups;
-	setsockopt(s, 270, 1, &on, sizeof(on));
-}
-
-Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket
-option.  To drop a multicast subscription, one should call the above socket
-option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0.
-
-2.6.14 netlink code only allows to select a group which is less or equal to
-the maximum group number, which is used at netlink_kernel_create() time.
-In case of connector it is CN_NETLINK_USERS + 0xf, so if you want to use
-group number 12345, you must increment CN_NETLINK_USERS to that number.
-Additional 0xf numbers are allocated to be used by non-in-kernel users.
-
-Due to this limitation, group 0xffffffff does not work now, so one can
-not use add/remove connector's group notifications, but as far as I know, 
-only cn_test.c test module used it.
-
-Some work in netlink area is still being done, so things can be changed in
-2.6.15 timeframe, if it will happen, documentation will be updated for that
-kernel.
-
-/*****************************************/
-Code samples
-/*****************************************/
-
-Sample code for a connector test module and user space can be found
-in samples/connector/. To build this code, enable CONFIG_CONNECTOR
-and CONFIG_SAMPLES.
diff --git a/drivers/w1/Kconfig b/drivers/w1/Kconfig
index 03dd57581df7..160053c0baea 100644
--- a/drivers/w1/Kconfig
+++ b/drivers/w1/Kconfig
@@ -19,7 +19,7 @@ config W1_CON
 	default y
 	---help---
 	  This allows to communicate with userspace using connector. For more
-	  information see <file:Documentation/connector/connector.txt>.
+	  information see <file:Documentation/connector/connector.rst>.
 	  There are three types of messages between w1 core and userspace:
 	  1. Events. They are generated each time new master or slave device found
 		either due to automatic or requested search.
diff --git a/include/linux/connector.h b/include/linux/connector.h
index 1d72ef76f24f..6b6c7396a584 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -55,10 +55,71 @@ struct cn_dev {
 	struct cn_queue_dev *cbdev;
 };
 
+/**
+ * cn_add_callback() - Registers new callback with connector core.
+ *
+ * @id:		unique connector's user identifier.
+ *		It must be registered in connector.h for legal
+ *		in-kernel users.
+ * @name:	connector's callback symbolic name.
+ * @callback:	connector's callback.
+ * 		parameters are %cn_msg and the sender's credentials
+ */
 int cn_add_callback(struct cb_id *id, const char *name,
 		    void (*callback)(struct cn_msg *, struct netlink_skb_parms *));
-void cn_del_callback(struct cb_id *);
+/**
+ * cn_del_callback() - Unregisters new callback with connector core.
+ *
+ * @id:		unique connector's user identifier.
+ */
+void cn_del_callback(struct cb_id *id);
+
+
+/**
+ * cn_netlink_send_mult - Sends message to the specified groups.
+ *
+ * @msg: 	message header(with attached data).
+ * @len:	Number of @msg to be sent.
+ * @portid:	destination port.
+ *		If non-zero the message will be sent to the given port,
+ *		which should be set to the original sender.
+ * @group:	destination group.
+ * 		If @portid and @group is zero, then appropriate group will
+ *		be searched through all registered connector users, and
+ *		message will be delivered to the group which was created
+ *		for user with the same ID as in @msg.
+ *		If @group is not zero, then message will be delivered
+ *		to the specified group.
+ * @gfp_mask:	GFP mask.
+ *
+ * It can be safely called from softirq context, but may silently
+ * fail under strong memory pressure.
+ *
+ * If there are no listeners for given group %-ESRCH can be returned.
+ */
 int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 group, gfp_t gfp_mask);
+
+/**
+ * cn_netlink_send_mult - Sends message to the specified groups.
+ *
+ * @msg:	message header(with attached data).
+ * @portid:	destination port.
+ *		If non-zero the message will be sent to the given port,
+ *		which should be set to the original sender.
+ * @group:	destination group.
+ * 		If @portid and @group is zero, then appropriate group will
+ *		be searched through all registered connector users, and
+ *		message will be delivered to the group which was created
+ *		for user with the same ID as in @msg.
+ *		If @group is not zero, then message will be delivered
+ *		to the specified group.
+ * @gfp_mask:	GFP mask.
+ *
+ * It can be safely called from softirq context, but may silently
+ * fail under strong memory pressure.
+ *
+ * If there are no listeners for given group %-ESRCH can be returned.
+ */
 int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 group, gfp_t gfp_mask);
 
 int cn_queue_add_callback(struct cn_queue_dev *dev, const char *name,
diff --git a/samples/Kconfig b/samples/Kconfig
index 71b5e833dd9e..155da47dc6a4 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -99,7 +99,7 @@ config SAMPLE_CONNECTOR
 	  When enabled, this builds both a sample kernel module for
 	  the connector interface and a user space tool to communicate
 	  with it.
-	  See also Documentation/connector/connector.txt
+	  See also Documentation/connector/connector.rst
 
 config SAMPLE_HIDRAW
 	bool "hidraw sample"
-- 
cgit v1.2.3-55-g7522


From 065504d5b45bc780b8da221162145a4c9ec67ffc Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 07:59:32 -0300
Subject: docs: lcd-panel-cgram.txt: convert docs to ReST and rename to *.rst

This small text file describes the usage of parallel port LCD
displays from userspace PoV. So, a good candidate for the
admin guide.

While this is not part of the admin-guide book, mark it as
:orphan:, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/auxdisplay/lcd-panel-cgram.rst | 29 ++++++++++++++++++++++++++++
 Documentation/auxdisplay/lcd-panel-cgram.txt | 24 -----------------------
 MAINTAINERS                                  |  2 +-
 3 files changed, 30 insertions(+), 25 deletions(-)
 create mode 100644 Documentation/auxdisplay/lcd-panel-cgram.rst
 delete mode 100644 Documentation/auxdisplay/lcd-panel-cgram.txt

diff --git a/Documentation/auxdisplay/lcd-panel-cgram.rst b/Documentation/auxdisplay/lcd-panel-cgram.rst
new file mode 100644
index 000000000000..dfef50286018
--- /dev/null
+++ b/Documentation/auxdisplay/lcd-panel-cgram.rst
@@ -0,0 +1,29 @@
+:orphan:
+
+======================================
+Parallel port LCD/Keypad Panel support
+======================================
+
+Some LCDs allow you to define up to 8 characters, mapped to ASCII
+characters 0 to 7. The escape code to define a new character is
+'\e[LG' followed by one digit from 0 to 7, representing the character
+number, and up to 8 couples of hex digits terminated by a semi-colon
+(';'). Each couple of digits represents a line, with 1-bits for each
+illuminated pixel with LSB on the right. Lines are numbered from the
+top of the character to the bottom. On a 5x7 matrix, only the 5 lower
+bits of the 7 first bytes are used for each character. If the string
+is incomplete, only complete lines will be redefined. Here are some
+examples::
+
+  printf "\e[LG0010101050D1F0C04;"  => 0 = [enter]
+  printf "\e[LG1040E1F0000000000;"  => 1 = [up]
+  printf "\e[LG2000000001F0E0400;"  => 2 = [down]
+  printf "\e[LG3040E1F001F0E0400;"  => 3 = [up-down]
+  printf "\e[LG40002060E1E0E0602;"  => 4 = [left]
+  printf "\e[LG500080C0E0F0E0C08;"  => 5 = [right]
+  printf "\e[LG60016051516141400;"  => 6 = "IP"
+
+  printf "\e[LG00103071F1F070301;"  => big speaker
+  printf "\e[LG00002061E1E060200;"  => small speaker
+
+Willy
diff --git a/Documentation/auxdisplay/lcd-panel-cgram.txt b/Documentation/auxdisplay/lcd-panel-cgram.txt
deleted file mode 100644
index 7f82c905763d..000000000000
--- a/Documentation/auxdisplay/lcd-panel-cgram.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Some LCDs allow you to define up to 8 characters, mapped to ASCII
-characters 0 to 7. The escape code to define a new character is
-'\e[LG' followed by one digit from 0 to 7, representing the character
-number, and up to 8 couples of hex digits terminated by a semi-colon
-(';'). Each couple of digits represents a line, with 1-bits for each
-illuminated pixel with LSB on the right. Lines are numbered from the
-top of the character to the bottom. On a 5x7 matrix, only the 5 lower
-bits of the 7 first bytes are used for each character. If the string
-is incomplete, only complete lines will be redefined. Here are some
-examples :
-
-  printf "\e[LG0010101050D1F0C04;"  => 0 = [enter]
-  printf "\e[LG1040E1F0000000000;"  => 1 = [up]
-  printf "\e[LG2000000001F0E0400;"  => 2 = [down]
-  printf "\e[LG3040E1F001F0E0400;"  => 3 = [up-down]
-  printf "\e[LG40002060E1E0E0602;"  => 4 = [left]
-  printf "\e[LG500080C0E0F0E0C08;"  => 5 = [right]
-  printf "\e[LG60016051516141400;"  => 6 = "IP"
-
-  printf "\e[LG00103071F1F070301;"  => big speaker
-  printf "\e[LG00002061E1E060200;"  => small speaker
-
-Willy
-
diff --git a/MAINTAINERS b/MAINTAINERS
index f5533d1bda2e..01c83e294f5f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12058,7 +12058,7 @@ PARALLEL LCD/KEYPAD PANEL DRIVER
 M:	Willy Tarreau <willy@haproxy.com>
 M:	Ksenija Stanojevic <ksenija.stanojevic@gmail.com>
 S:	Odd Fixes
-F:	Documentation/auxdisplay/lcd-panel-cgram.txt
+F:	Documentation/auxdisplay/lcd-panel-cgram.rst
 F:	drivers/auxdisplay/panel.c
 
 PARALLEL PORT SUBSYSTEM
-- 
cgit v1.2.3-55-g7522


From 6f2846cc2ebae4a8c875389e3aedb0cda3c4f462 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 08:03:23 -0300
Subject: docs: lp855x-driver.txt: convert to ReST and move to kernel-api

This small file seems to be an attempt to start documenting
backlight drivers.

It contains descriptions of the controls for the driver
with could sound as an somewhat user-faced description, but
it's main focus is to describe, instead, the data that should
be passed via platform data and some driver-specific stuff.

While this is not part of the driver-api book, mark it as
:orphan:, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/backlight/lp855x-driver.rst | 83 +++++++++++++++++++++++++++++++
 Documentation/backlight/lp855x-driver.txt | 66 ------------------------
 MAINTAINERS                               |  2 +-
 3 files changed, 84 insertions(+), 67 deletions(-)
 create mode 100644 Documentation/backlight/lp855x-driver.rst
 delete mode 100644 Documentation/backlight/lp855x-driver.txt

diff --git a/Documentation/backlight/lp855x-driver.rst b/Documentation/backlight/lp855x-driver.rst
new file mode 100644
index 000000000000..62b7ed847a77
--- /dev/null
+++ b/Documentation/backlight/lp855x-driver.rst
@@ -0,0 +1,83 @@
+:orphan:
+
+====================
+Kernel driver lp855x
+====================
+
+Backlight driver for LP855x ICs
+
+Supported chips:
+
+	Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and
+	LP8557
+
+Author: Milo(Woogyom) Kim <milo.kim@ti.com>
+
+Description
+-----------
+
+* Brightness control
+
+  Brightness can be controlled by the pwm input or the i2c command.
+  The lp855x driver supports both cases.
+
+* Device attributes
+
+  1) bl_ctl_mode
+
+  Backlight control mode.
+
+  Value: pwm based or register based
+
+  2) chip_id
+
+  The lp855x chip id.
+
+  Value: lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557
+
+Platform data for lp855x
+------------------------
+
+For supporting platform specific data, the lp855x platform data can be used.
+
+* name:
+	Backlight driver name. If it is not defined, default name is set.
+* device_control:
+	Value of DEVICE CONTROL register.
+* initial_brightness:
+	Initial value of backlight brightness.
+* period_ns:
+	Platform specific PWM period value. unit is nano.
+	Only valid when brightness is pwm input mode.
+* size_program:
+	Total size of lp855x_rom_data.
+* rom_data:
+	List of new eeprom/eprom registers.
+
+Examples
+========
+
+1) lp8552 platform data: i2c register mode with new eeprom data::
+
+    #define EEPROM_A5_ADDR	0xA5
+    #define EEPROM_A5_VAL	0x4f	/* EN_VSYNC=0 */
+
+    static struct lp855x_rom_data lp8552_eeprom_arr[] = {
+	{EEPROM_A5_ADDR, EEPROM_A5_VAL},
+    };
+
+    static struct lp855x_platform_data lp8552_pdata = {
+	.name = "lcd-bl",
+	.device_control = I2C_CONFIG(LP8552),
+	.initial_brightness = INITIAL_BRT,
+	.size_program = ARRAY_SIZE(lp8552_eeprom_arr),
+	.rom_data = lp8552_eeprom_arr,
+    };
+
+2) lp8556 platform data: pwm input mode with default rom data::
+
+    static struct lp855x_platform_data lp8556_pdata = {
+	.device_control = PWM_CONFIG(LP8556),
+	.initial_brightness = INITIAL_BRT,
+	.period_ns = 1000000,
+    };
diff --git a/Documentation/backlight/lp855x-driver.txt b/Documentation/backlight/lp855x-driver.txt
deleted file mode 100644
index 01bce243d3d7..000000000000
--- a/Documentation/backlight/lp855x-driver.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-Kernel driver lp855x
-====================
-
-Backlight driver for LP855x ICs
-
-Supported chips:
-	Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and
-	LP8557
-
-Author: Milo(Woogyom) Kim <milo.kim@ti.com>
-
-Description
------------
-
-* Brightness control
-
-Brightness can be controlled by the pwm input or the i2c command.
-The lp855x driver supports both cases.
-
-* Device attributes
-
-1) bl_ctl_mode
-Backlight control mode.
-Value : pwm based or register based
-
-2) chip_id
-The lp855x chip id.
-Value : lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557
-
-Platform data for lp855x
-------------------------
-
-For supporting platform specific data, the lp855x platform data can be used.
-
-* name : Backlight driver name. If it is not defined, default name is set.
-* device_control : Value of DEVICE CONTROL register.
-* initial_brightness : Initial value of backlight brightness.
-* period_ns : Platform specific PWM period value. unit is nano.
-	     Only valid when brightness is pwm input mode.
-* size_program : Total size of lp855x_rom_data.
-* rom_data : List of new eeprom/eprom registers.
-
-example 1) lp8552 platform data : i2c register mode with new eeprom data
-
-#define EEPROM_A5_ADDR	0xA5
-#define EEPROM_A5_VAL	0x4f	/* EN_VSYNC=0 */
-
-static struct lp855x_rom_data lp8552_eeprom_arr[] = {
-	{EEPROM_A5_ADDR, EEPROM_A5_VAL},
-};
-
-static struct lp855x_platform_data lp8552_pdata = {
-	.name = "lcd-bl",
-	.device_control = I2C_CONFIG(LP8552),
-	.initial_brightness = INITIAL_BRT,
-	.size_program = ARRAY_SIZE(lp8552_eeprom_arr),
-	.rom_data = lp8552_eeprom_arr,
-};
-
-example 2) lp8556 platform data : pwm input mode with default rom data
-
-static struct lp855x_platform_data lp8556_pdata = {
-	.device_control = PWM_CONFIG(LP8556),
-	.initial_brightness = INITIAL_BRT,
-	.period_ns = 1000000,
-};
diff --git a/MAINTAINERS b/MAINTAINERS
index 01c83e294f5f..37ba75bae7aa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15964,7 +15964,7 @@ F:	sound/soc/codecs/isabelle*
 TI LP855x BACKLIGHT DRIVER
 M:	Milo Kim <milo.kim@ti.com>
 S:	Maintained
-F:	Documentation/backlight/lp855x-driver.txt
+F:	Documentation/backlight/lp855x-driver.rst
 F:	drivers/video/backlight/lp855x_bl.c
 F:	include/linux/platform_data/lp855x.h
 
-- 
cgit v1.2.3-55-g7522


From 23e02422877b7fac868d8610a4265003da4ac0f4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 08:27:15 -0300
Subject: docs: m68k: convert docs to ReST and rename to *.rst

Convert the m68k kernel-options.txt file to ReST.

The conversion is trivial, as the document is already on a format
close enough to ReST. Just some small adjustments were needed in
order to make it both good for being parsed while keeping it on
a good txt shape.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.rst |   2 +-
 Documentation/m68k/index.rst                    |  17 +
 Documentation/m68k/kernel-options.rst           | 911 ++++++++++++++++++++++++
 Documentation/m68k/kernel-options.txt           | 884 -----------------------
 4 files changed, 929 insertions(+), 885 deletions(-)
 create mode 100644 Documentation/m68k/index.rst
 create mode 100644 Documentation/m68k/kernel-options.rst
 delete mode 100644 Documentation/m68k/kernel-options.txt

diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 5d29ba5ad88c..d05d531b4ec9 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -118,7 +118,7 @@ parameter is applicable::
 	LOOP	Loopback device support is enabled.
 	M68k	M68k architecture is enabled.
 			These options have more detailed description inside of
-			Documentation/m68k/kernel-options.txt.
+			Documentation/m68k/kernel-options.rst.
 	MDA	MDA console support is enabled.
 	MIPS	MIPS architecture is enabled.
 	MOUSE	Appropriate mouse support is enabled.
diff --git a/Documentation/m68k/index.rst b/Documentation/m68k/index.rst
new file mode 100644
index 000000000000..f3273ec075c3
--- /dev/null
+++ b/Documentation/m68k/index.rst
@@ -0,0 +1,17 @@
+:orphan:
+
+=================
+m68k Architecture
+=================
+
+.. toctree::
+   :maxdepth: 2
+
+   kernel-options
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/m68k/kernel-options.rst b/Documentation/m68k/kernel-options.rst
new file mode 100644
index 000000000000..cabd9419740d
--- /dev/null
+++ b/Documentation/m68k/kernel-options.rst
@@ -0,0 +1,911 @@
+===================================
+Command Line Options for Linux/m68k
+===================================
+
+Last Update: 2 May 1999
+
+Linux/m68k version: 2.2.6
+
+Author: Roman.Hodek@informatik.uni-erlangen.de (Roman Hodek)
+
+Update: jds@kom.auc.dk (Jes Sorensen) and faq@linux-m68k.org (Chris Lawrence)
+
+0) Introduction
+===============
+
+Often I've been asked which command line options the Linux/m68k
+kernel understands, or how the exact syntax for the ... option is, or
+... about the option ... . I hope, this document supplies all the
+answers...
+
+Note that some options might be outdated, their descriptions being
+incomplete or missing. Please update the information and send in the
+patches.
+
+
+1) Overview of the Kernel's Option Processing
+=============================================
+
+The kernel knows three kinds of options on its command line:
+
+  1) kernel options
+  2) environment settings
+  3) arguments for init
+
+To which of these classes an argument belongs is determined as
+follows: If the option is known to the kernel itself, i.e. if the name
+(the part before the '=') or, in some cases, the whole argument string
+is known to the kernel, it belongs to class 1. Otherwise, if the
+argument contains an '=', it is of class 2, and the definition is put
+into init's environment. All other arguments are passed to init as
+command line options.
+
+This document describes the valid kernel options for Linux/m68k in
+the version mentioned at the start of this file. Later revisions may
+add new such options, and some may be missing in older versions.
+
+In general, the value (the part after the '=') of an option is a
+list of values separated by commas. The interpretation of these values
+is up to the driver that "owns" the option. This association of
+options with drivers is also the reason that some are further
+subdivided.
+
+
+2) General Kernel Options
+=========================
+
+2.1) root=
+----------
+
+:Syntax: root=/dev/<device>
+:or:     root=<hex_number>
+
+This tells the kernel which device it should mount as the root
+filesystem. The device must be a block device with a valid filesystem
+on it.
+
+The first syntax gives the device by name. These names are converted
+into a major/minor number internally in the kernel in an unusual way.
+Normally, this "conversion" is done by the device files in /dev, but
+this isn't possible here, because the root filesystem (with /dev)
+isn't mounted yet... So the kernel parses the name itself, with some
+hardcoded name to number mappings. The name must always be a
+combination of two or three letters, followed by a decimal number.
+Valid names are::
+
+  /dev/ram: -> 0x0100 (initial ramdisk)
+  /dev/hda: -> 0x0300 (first IDE disk)
+  /dev/hdb: -> 0x0340 (second IDE disk)
+  /dev/sda: -> 0x0800 (first SCSI disk)
+  /dev/sdb: -> 0x0810 (second SCSI disk)
+  /dev/sdc: -> 0x0820 (third SCSI disk)
+  /dev/sdd: -> 0x0830 (forth SCSI disk)
+  /dev/sde: -> 0x0840 (fifth SCSI disk)
+  /dev/fd : -> 0x0200 (floppy disk)
+
+The name must be followed by a decimal number, that stands for the
+partition number. Internally, the value of the number is just
+added to the device number mentioned in the table above. The
+exceptions are /dev/ram and /dev/fd, where /dev/ram refers to an
+initial ramdisk loaded by your bootstrap program (please consult the
+instructions for your bootstrap program to find out how to load an
+initial ramdisk). As of kernel version 2.0.18 you must specify
+/dev/ram as the root device if you want to boot from an initial
+ramdisk. For the floppy devices, /dev/fd, the number stands for the
+floppy drive number (there are no partitions on floppy disks). I.e.,
+/dev/fd0 stands for the first drive, /dev/fd1 for the second, and so
+on. Since the number is just added, you can also force the disk format
+by adding a number greater than 3. If you look into your /dev
+directory, use can see the /dev/fd0D720 has major 2 and minor 16. You
+can specify this device for the root FS by writing "root=/dev/fd16" on
+the kernel command line.
+
+[Strange and maybe uninteresting stuff ON]
+
+This unusual translation of device names has some strange
+consequences: If, for example, you have a symbolic link from /dev/fd
+to /dev/fd0D720 as an abbreviation for floppy driver #0 in DD format,
+you cannot use this name for specifying the root device, because the
+kernel cannot see this symlink before mounting the root FS and it
+isn't in the table above. If you use it, the root device will not be
+set at all, without an error message. Another example: You cannot use a
+partition on e.g. the sixth SCSI disk as the root filesystem, if you
+want to specify it by name. This is, because only the devices up to
+/dev/sde are in the table above, but not /dev/sdf. Although, you can
+use the sixth SCSI disk for the root FS, but you have to specify the
+device by number... (see below). Or, even more strange, you can use the
+fact that there is no range checking of the partition number, and your
+knowledge that each disk uses 16 minors, and write "root=/dev/sde17"
+(for /dev/sdf1).
+
+[Strange and maybe uninteresting stuff OFF]
+
+If the device containing your root partition isn't in the table
+above, you can also specify it by major and minor numbers. These are
+written in hex, with no prefix and no separator between. E.g., if you
+have a CD with contents appropriate as a root filesystem in the first
+SCSI CD-ROM drive, you boot from it by "root=0b00". Here, hex "0b" =
+decimal 11 is the major of SCSI CD-ROMs, and the minor 0 stands for
+the first of these. You can find out all valid major numbers by
+looking into include/linux/major.h.
+
+In addition to major and minor numbers, if the device containing your
+root partition uses a partition table format with unique partition
+identifiers, then you may use them.  For instance,
+"root=PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF".  It is also
+possible to reference another partition on the same device using a
+known partition UUID as the starting point.  For example,
+if partition 5 of the device has the UUID of
+00112233-4455-6677-8899-AABBCCDDEEFF then partition 3 may be found as
+follows:
+
+  PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF/PARTNROFF=-2
+
+Authoritative information can be found in
+"Documentation/admin-guide/kernel-parameters.rst".
+
+
+2.2) ro, rw
+-----------
+
+:Syntax: ro
+:or:     rw
+
+These two options tell the kernel whether it should mount the root
+filesystem read-only or read-write. The default is read-only, except
+for ramdisks, which default to read-write.
+
+
+2.3) debug
+----------
+
+:Syntax: debug
+
+This raises the kernel log level to 10 (the default is 7). This is the
+same level as set by the "dmesg" command, just that the maximum level
+selectable by dmesg is 8.
+
+
+2.4) debug=
+-----------
+
+:Syntax: debug=<device>
+
+This option causes certain kernel messages be printed to the selected
+debugging device. This can aid debugging the kernel, since the
+messages can be captured and analyzed on some other machine. Which
+devices are possible depends on the machine type. There are no checks
+for the validity of the device name. If the device isn't implemented,
+nothing happens.
+
+Messages logged this way are in general stack dumps after kernel
+memory faults or bad kernel traps, and kernel panics. To be exact: all
+messages of level 0 (panic messages) and all messages printed while
+the log level is 8 or more (their level doesn't matter). Before stack
+dumps, the kernel sets the log level to 10 automatically. A level of
+at least 8 can also be set by the "debug" command line option (see
+2.3) and at run time with "dmesg -n 8".
+
+Devices possible for Amiga:
+
+ - "ser":
+	  built-in serial port; parameters: 9600bps, 8N1
+ - "mem":
+	  Save the messages to a reserved area in chip mem. After
+          rebooting, they can be read under AmigaOS with the tool
+          'dmesg'.
+
+Devices possible for Atari:
+
+ - "ser1":
+	   ST-MFP serial port ("Modem1"); parameters: 9600bps, 8N1
+ - "ser2":
+	   SCC channel B serial port ("Modem2"); parameters: 9600bps, 8N1
+ - "ser" :
+	   default serial port
+           This is "ser2" for a Falcon, and "ser1" for any other machine
+ - "midi":
+	   The MIDI port; parameters: 31250bps, 8N1
+ - "par" :
+	   parallel port
+
+           The printing routine for this implements a timeout for the
+           case there's no printer connected (else the kernel would
+           lock up). The timeout is not exact, but usually a few
+           seconds.
+
+
+2.6) ramdisk_size=
+------------------
+
+:Syntax: ramdisk_size=<size>
+
+This option instructs the kernel to set up a ramdisk of the given
+size in KBytes. Do not use this option if the ramdisk contents are
+passed by bootstrap! In this case, the size is selected automatically
+and should not be overwritten.
+
+The only application is for root filesystems on floppy disks, that
+should be loaded into memory. To do that, select the corresponding
+size of the disk as ramdisk size, and set the root device to the disk
+drive (with "root=").
+
+
+2.7) swap=
+
+  I can't find any sign of this option in 2.2.6.
+
+2.8) buff=
+-----------
+
+  I can't find any sign of this option in 2.2.6.
+
+
+3) General Device Options (Amiga and Atari)
+===========================================
+
+3.1) ether=
+-----------
+
+:Syntax: ether=[<irq>[,<base_addr>[,<mem_start>[,<mem_end>]]]],<dev-name>
+
+<dev-name> is the name of a net driver, as specified in
+drivers/net/Space.c in the Linux source. Most prominent are eth0, ...
+eth3, sl0, ... sl3, ppp0, ..., ppp3, dummy, and lo.
+
+The non-ethernet drivers (sl, ppp, dummy, lo) obviously ignore the
+settings by this options. Also, the existing ethernet drivers for
+Linux/m68k (ariadne, a2065, hydra) don't use them because Zorro boards
+are really Plug-'n-Play, so the "ether=" option is useless altogether
+for Linux/m68k.
+
+
+3.2) hd=
+--------
+
+:Syntax: hd=<cylinders>,<heads>,<sectors>
+
+This option sets the disk geometry of an IDE disk. The first hd=
+option is for the first IDE disk, the second for the second one.
+(I.e., you can give this option twice.) In most cases, you won't have
+to use this option, since the kernel can obtain the geometry data
+itself. It exists just for the case that this fails for one of your
+disks.
+
+
+3.3) max_scsi_luns=
+-------------------
+
+:Syntax: max_scsi_luns=<n>
+
+Sets the maximum number of LUNs (logical units) of SCSI devices to
+be scanned. Valid values for <n> are between 1 and 8. Default is 8 if
+"Probe all LUNs on each SCSI device" was selected during the kernel
+configuration, else 1.
+
+
+3.4) st=
+--------
+
+:Syntax: st=<buffer_size>,[<write_thres>,[<max_buffers>]]
+
+Sets several parameters of the SCSI tape driver. <buffer_size> is
+the number of 512-byte buffers reserved for tape operations for each
+device. <write_thres> sets the number of blocks which must be filled
+to start an actual write operation to the tape. Maximum value is the
+total number of buffers. <max_buffer> limits the total number of
+buffers allocated for all tape devices.
+
+
+3.5) dmasound=
+--------------
+
+:Syntax: dmasound=[<buffers>,<buffer-size>[,<catch-radius>]]
+
+This option controls some configurations of the Linux/m68k DMA sound
+driver (Amiga and Atari): <buffers> is the number of buffers you want
+to use (minimum 4, default 4), <buffer-size> is the size of each
+buffer in kilobytes (minimum 4, default 32) and <catch-radius> says
+how much percent of error will be tolerated when setting a frequency
+(maximum 10, default 0). For example with 3% you can play 8000Hz
+AU-Files on the Falcon with its hardware frequency of 8195Hz and thus
+don't need to expand the sound.
+
+
+
+4) Options for Atari Only
+=========================
+
+4.1) video=
+-----------
+
+:Syntax: video=<fbname>:<sub-options...>
+
+The <fbname> parameter specifies the name of the frame buffer,
+eg. most atari users will want to specify `atafb` here. The
+<sub-options> is a comma-separated list of the sub-options listed
+below.
+
+NB:
+    Please notice that this option was renamed from `atavideo` to
+    `video` during the development of the 1.3.x kernels, thus you
+    might need to update your boot-scripts if upgrading to 2.x from
+    an 1.2.x kernel.
+
+NBB:
+    The behavior of video= was changed in 2.1.57 so the recommended
+    option is to specify the name of the frame buffer.
+
+4.1.1) Video Mode
+-----------------
+
+This sub-option may be any of the predefined video modes, as listed
+in atari/atafb.c in the Linux/m68k source tree. The kernel will
+activate the given video mode at boot time and make it the default
+mode, if the hardware allows. Currently defined names are:
+
+ - stlow           : 320x200x4
+ - stmid, default5 : 640x200x2
+ - sthigh, default4: 640x400x1
+ - ttlow           : 320x480x8, TT only
+ - ttmid, default1 : 640x480x4, TT only
+ - tthigh, default2: 1280x960x1, TT only
+ - vga2            : 640x480x1, Falcon only
+ - vga4            : 640x480x2, Falcon only
+ - vga16, default3 : 640x480x4, Falcon only
+ - vga256          : 640x480x8, Falcon only
+ - falh2           : 896x608x1, Falcon only
+ - falh16          : 896x608x4, Falcon only
+
+If no video mode is given on the command line, the kernel tries the
+modes names "default<n>" in turn, until one is possible with the
+hardware in use.
+
+A video mode setting doesn't make sense, if the external driver is
+activated by a "external:" sub-option.
+
+4.1.2) inverse
+--------------
+
+Invert the display. This affects both, text (consoles) and graphics
+(X) display. Usually, the background is chosen to be black. With this
+option, you can make the background white.
+
+4.1.3) font
+-----------
+
+:Syntax: font:<fontname>
+
+Specify the font to use in text modes. Currently you can choose only
+between `VGA8x8`, `VGA8x16` and `PEARL8x8`. `VGA8x8` is default, if the
+vertical size of the display is less than 400 pixel rows. Otherwise, the
+`VGA8x16` font is the default.
+
+4.1.4) `hwscroll_`
+------------------
+
+:Syntax: `hwscroll_<n>`
+
+The number of additional lines of video memory to reserve for
+speeding up the scrolling ("hardware scrolling"). Hardware scrolling
+is possible only if the kernel can set the video base address in steps
+fine enough. This is true for STE, MegaSTE, TT, and Falcon. It is not
+possible with plain STs and graphics cards (The former because the
+base address must be on a 256 byte boundary there, the latter because
+the kernel doesn't know how to set the base address at all.)
+
+By default, <n> is set to the number of visible text lines on the
+display. Thus, the amount of video memory is doubled, compared to no
+hardware scrolling. You can turn off the hardware scrolling altogether
+by setting <n> to 0.
+
+4.1.5) internal:
+----------------
+
+:Syntax: internal:<xres>;<yres>[;<xres_max>;<yres_max>;<offset>]
+
+This option specifies the capabilities of some extended internal video
+hardware, like e.g. OverScan. <xres> and <yres> give the (extended)
+dimensions of the screen.
+
+If your OverScan needs a black border, you have to write the last
+three arguments of the "internal:". <xres_max> is the maximum line
+length the hardware allows, <yres_max> the maximum number of lines.
+<offset> is the offset of the visible part of the screen memory to its
+physical start, in bytes.
+
+Often, extended interval video hardware has to be activated somehow.
+For this, see the "sw_*" options below.
+
+4.1.6) external:
+----------------
+
+:Syntax:
+  external:<xres>;<yres>;<depth>;<org>;<scrmem>[;<scrlen>[;<vgabase>
+  [;<colw>[;<coltype>[;<xres_virtual>]]]]]
+
+.. I had to break this line...
+
+This is probably the most complicated parameter... It specifies that
+you have some external video hardware (a graphics board), and how to
+use it under Linux/m68k. The kernel cannot know more about the hardware
+than you tell it here! The kernel also is unable to set or change any
+video modes, since it doesn't know about any board internal. So, you
+have to switch to that video mode before you start Linux, and cannot
+switch to another mode once Linux has started.
+
+The first 3 parameters of this sub-option should be obvious: <xres>,
+<yres> and <depth> give the dimensions of the screen and the number of
+planes (depth). The depth is the logarithm to base 2 of the number
+of colors possible. (Or, the other way round: The number of colors is
+2^depth).
+
+You have to tell the kernel furthermore how the video memory is
+organized. This is done by a letter as <org> parameter:
+
+ 'n':
+      "normal planes", i.e. one whole plane after another
+ 'i':
+      "interleaved planes", i.e. 16 bit of the first plane, than 16 bit
+      of the next, and so on... This mode is used only with the
+      built-in Atari video modes, I think there is no card that
+      supports this mode.
+ 'p':
+      "packed pixels", i.e. <depth> consecutive bits stand for all
+      planes of one pixel; this is the most common mode for 8 planes
+      (256 colors) on graphic cards
+ 't':
+      "true color" (more or less packed pixels, but without a color
+      lookup table); usually depth is 24
+
+For monochrome modes (i.e., <depth> is 1), the <org> letter has a
+different meaning:
+
+ 'n':
+      normal colors, i.e. 0=white, 1=black
+ 'i':
+      inverted colors, i.e. 0=black, 1=white
+
+The next important information about the video hardware is the base
+address of the video memory. That is given in the <scrmem> parameter,
+as a hexadecimal number with a "0x" prefix. You have to find out this
+address in the documentation of your hardware.
+
+The next parameter, <scrlen>, tells the kernel about the size of the
+video memory. If it's missing, the size is calculated from <xres>,
+<yres>, and <depth>. For now, it is not useful to write a value here.
+It would be used only for hardware scrolling (which isn't possible
+with the external driver, because the kernel cannot set the video base
+address), or for virtual resolutions under X (which the X server
+doesn't support yet). So, it's currently best to leave this field
+empty, either by ending the "external:" after the video address or by
+writing two consecutive semicolons, if you want to give a <vgabase>
+(it is allowed to leave this parameter empty).
+
+The <vgabase> parameter is optional. If it is not given, the kernel
+cannot read or write any color registers of the video hardware, and
+thus you have to set appropriate colors before you start Linux. But if
+your card is somehow VGA compatible, you can tell the kernel the base
+address of the VGA register set, so it can change the color lookup
+table. You have to look up this address in your board's documentation.
+To avoid misunderstandings: <vgabase> is the _base_ address, i.e. a 4k
+aligned address. For read/writing the color registers, the kernel
+uses the addresses vgabase+0x3c7...vgabase+0x3c9. The <vgabase>
+parameter is written in hexadecimal with a "0x" prefix, just as
+<scrmem>.
+
+<colw> is meaningful only if <vgabase> is specified. It tells the
+kernel how wide each of the color register is, i.e. the number of bits
+per single color (red/green/blue). Default is 6, another quite usual
+value is 8.
+
+Also <coltype> is used together with <vgabase>. It tells the kernel
+about the color register model of your gfx board. Currently, the types
+"vga" (which is also the default) and "mv300" (SANG MV300) are
+implemented.
+
+Parameter <xres_virtual> is required for ProMST or ET4000 cards where
+the physical linelength differs from the visible length. With ProMST,
+xres_virtual must be set to 2048. For ET4000, xres_virtual depends on the
+initialisation of the video-card.
+If you're missing a corresponding yres_virtual: the external part is legacy,
+therefore we don't support hardware-dependent functions like hardware-scroll,
+panning or blanking.
+
+4.1.7) eclock:
+--------------
+
+The external pixel clock attached to the Falcon VIDEL shifter. This
+currently works only with the ScreenWonder!
+
+4.1.8) monitorcap:
+-------------------
+
+:Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax>
+
+This describes the capabilities of a multisync monitor. Don't use it
+with a fixed-frequency monitor! For now, only the Falcon frame buffer
+uses the settings of "monitorcap:".
+
+<vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies
+your monitor can work with, in Hz. <hmin> and <hmax> are the same for
+the horizontal frequency, in kHz.
+
+  The defaults are 58;62;31;32 (VGA compatible).
+
+  The defaults for TV/SC1224/SC1435 cover both PAL and NTSC standards.
+
+4.1.9) keep
+------------
+
+If this option is given, the framebuffer device doesn't do any video
+mode calculations and settings on its own. The only Atari fb device
+that does this currently is the Falcon.
+
+What you reach with this: Settings for unknown video extensions
+aren't overridden by the driver, so you can still use the mode found
+when booting, when the driver doesn't know to set this mode itself.
+But this also means, that you can't switch video modes anymore...
+
+An example where you may want to use "keep" is the ScreenBlaster for
+the Falcon.
+
+
+4.2) atamouse=
+--------------
+
+:Syntax: atamouse=<x-threshold>,[<y-threshold>]
+
+With this option, you can set the mouse movement reporting threshold.
+This is the number of pixels of mouse movement that have to accumulate
+before the IKBD sends a new mouse packet to the kernel. Higher values
+reduce the mouse interrupt load and thus reduce the chance of keyboard
+overruns. Lower values give a slightly faster mouse responses and
+slightly better mouse tracking.
+
+You can set the threshold in x and y separately, but usually this is
+of little practical use. If there's just one number in the option, it
+is used for both dimensions. The default value is 2 for both
+thresholds.
+
+
+4.3) ataflop=
+-------------
+
+:Syntax: ataflop=<drive type>[,<trackbuffering>[,<steprateA>[,<steprateB>]]]
+
+   The drive type may be 0, 1, or 2, for DD, HD, and ED, resp. This
+   setting affects how many buffers are reserved and which formats are
+   probed (see also below). The default is 1 (HD). Only one drive type
+   can be selected. If you have two disk drives, select the "better"
+   type.
+
+   The second parameter <trackbuffer> tells the kernel whether to use
+   track buffering (1) or not (0). The default is machine-dependent:
+   no for the Medusa and yes for all others.
+
+   With the two following parameters, you can change the default
+   steprate used for drive A and B, resp.
+
+
+4.4) atascsi=
+-------------
+
+:Syntax: atascsi=<can_queue>[,<cmd_per_lun>[,<scat-gat>[,<host-id>[,<tagged>]]]]
+
+This option sets some parameters for the Atari native SCSI driver.
+Generally, any number of arguments can be omitted from the end. And
+for each of the numbers, a negative value means "use default". The
+defaults depend on whether TT-style or Falcon-style SCSI is used.
+Below, defaults are noted as n/m, where the first value refers to
+TT-SCSI and the latter to Falcon-SCSI. If an illegal value is given
+for one parameter, an error message is printed and that one setting is
+ignored (others aren't affected).
+
+  <can_queue>:
+    This is the maximum number of SCSI commands queued internally to the
+    Atari SCSI driver. A value of 1 effectively turns off the driver
+    internal multitasking (if it causes problems). Legal values are >=
+    1. <can_queue> can be as high as you like, but values greater than
+    <cmd_per_lun> times the number of SCSI targets (LUNs) you have
+    don't make sense. Default: 16/8.
+
+  <cmd_per_lun>:
+    Maximum number of SCSI commands issued to the driver for one
+    logical unit (LUN, usually one SCSI target). Legal values start
+    from 1. If tagged queuing (see below) is not used, values greater
+    than 2 don't make sense, but waste memory. Otherwise, the maximum
+    is the number of command tags available to the driver (currently
+    32). Default: 8/1. (Note: Values > 1 seem to cause problems on a
+    Falcon, cause not yet known.)
+
+    The <cmd_per_lun> value at a great part determines the amount of
+    memory SCSI reserves for itself. The formula is rather
+    complicated, but I can give you some hints:
+
+      no scatter-gather:
+	cmd_per_lun * 232 bytes
+      full scatter-gather:
+	cmd_per_lun * approx. 17 Kbytes
+
+  <scat-gat>:
+    Size of the scatter-gather table, i.e. the number of requests
+    consecutive on the disk that can be merged into one SCSI command.
+    Legal values are between 0 and 255. Default: 255/0. Note: This
+    value is forced to 0 on a Falcon, since scatter-gather isn't
+    possible with the ST-DMA. Not using scatter-gather hurts
+    performance significantly.
+
+  <host-id>:
+    The SCSI ID to be used by the initiator (your Atari). This is
+    usually 7, the highest possible ID. Every ID on the SCSI bus must
+    be unique. Default: determined at run time: If the NV-RAM checksum
+    is valid, and bit 7 in byte 30 of the NV-RAM is set, the lower 3
+    bits of this byte are used as the host ID. (This method is defined
+    by Atari and also used by some TOS HD drivers.) If the above
+    isn't given, the default ID is 7. (both, TT and Falcon).
+
+  <tagged>:
+    0 means turn off tagged queuing support, all other values > 0 mean
+    use tagged queuing for targets that support it. Default: currently
+    off, but this may change when tagged queuing handling has been
+    proved to be reliable.
+
+    Tagged queuing means that more than one command can be issued to
+    one LUN, and the SCSI device itself orders the requests so they
+    can be performed in optimal order. Not all SCSI devices support
+    tagged queuing (:-().
+
+4.5 switches=
+-------------
+
+:Syntax: switches=<list of switches>
+
+With this option you can switch some hardware lines that are often
+used to enable/disable certain hardware extensions. Examples are
+OverScan, overclocking, ...
+
+The <list of switches> is a comma-separated list of the following
+items:
+
+  ikbd:
+	set RTS of the keyboard ACIA high
+  midi:
+	set RTS of the MIDI ACIA high
+  snd6:
+	set bit 6 of the PSG port A
+  snd7:
+	set bit 6 of the PSG port A
+
+It doesn't make sense to mention a switch more than once (no
+difference to only once), but you can give as many switches as you
+want to enable different features. The switch lines are set as early
+as possible during kernel initialization (even before determining the
+present hardware.)
+
+All of the items can also be prefixed with `ov_`, i.e. `ov_ikbd`,
+`ov_midi`, ... These options are meant for switching on an OverScan
+video extension. The difference to the bare option is that the
+switch-on is done after video initialization, and somehow synchronized
+to the HBLANK. A speciality is that ov_ikbd and ov_midi are switched
+off before rebooting, so that OverScan is disabled and TOS boots
+correctly.
+
+If you give an option both, with and without the `ov_` prefix, the
+earlier initialization (`ov_`-less) takes precedence. But the
+switching-off on reset still happens in this case.
+
+5) Options for Amiga Only:
+==========================
+
+5.1) video=
+-----------
+
+:Syntax: video=<fbname>:<sub-options...>
+
+The <fbname> parameter specifies the name of the frame buffer, valid
+options are `amifb`, `cyber`, 'virge', `retz3` and `clgen`, provided
+that the respective frame buffer devices have been compiled into the
+kernel (or compiled as loadable modules). The behavior of the <fbname>
+option was changed in 2.1.57 so it is now recommended to specify this
+option.
+
+The <sub-options> is a comma-separated list of the sub-options listed
+below. This option is organized similar to the Atari version of the
+"video"-option (4.1), but knows fewer sub-options.
+
+5.1.1) video mode
+-----------------
+
+Again, similar to the video mode for the Atari (see 4.1.1). Predefined
+modes depend on the used frame buffer device.
+
+OCS, ECS and AGA machines all use the color frame buffer. The following
+predefined video modes are available:
+
+NTSC modes:
+ - ntsc            : 640x200, 15 kHz, 60 Hz
+ - ntsc-lace       : 640x400, 15 kHz, 60 Hz interlaced
+
+PAL modes:
+ - pal             : 640x256, 15 kHz, 50 Hz
+ - pal-lace        : 640x512, 15 kHz, 50 Hz interlaced
+
+ECS modes:
+ - multiscan       : 640x480, 29 kHz, 57 Hz
+ - multiscan-lace  : 640x960, 29 kHz, 57 Hz interlaced
+ - euro36          : 640x200, 15 kHz, 72 Hz
+ - euro36-lace     : 640x400, 15 kHz, 72 Hz interlaced
+ - euro72          : 640x400, 29 kHz, 68 Hz
+ - euro72-lace     : 640x800, 29 kHz, 68 Hz interlaced
+ - super72         : 800x300, 23 kHz, 70 Hz
+ - super72-lace    : 800x600, 23 kHz, 70 Hz interlaced
+ - dblntsc-ff      : 640x400, 27 kHz, 57 Hz
+ - dblntsc-lace    : 640x800, 27 kHz, 57 Hz interlaced
+ - dblpal-ff       : 640x512, 27 kHz, 47 Hz
+ - dblpal-lace     : 640x1024, 27 kHz, 47 Hz interlaced
+ - dblntsc         : 640x200, 27 kHz, 57 Hz doublescan
+ - dblpal          : 640x256, 27 kHz, 47 Hz doublescan
+
+VGA modes:
+ - vga             : 640x480, 31 kHz, 60 Hz
+ - vga70           : 640x400, 31 kHz, 70 Hz
+
+Please notice that the ECS and VGA modes require either an ECS or AGA
+chipset, and that these modes are limited to 2-bit color for the ECS
+chipset and 8-bit color for the AGA chipset.
+
+5.1.2) depth
+------------
+
+:Syntax: depth:<nr. of bit-planes>
+
+Specify the number of bit-planes for the selected video-mode.
+
+5.1.3) inverse
+--------------
+
+Use inverted display (black on white). Functionally the same as the
+"inverse" sub-option for the Atari.
+
+5.1.4) font
+-----------
+
+:Syntax: font:<fontname>
+
+Specify the font to use in text modes. Functionally the same as the
+"font" sub-option for the Atari, except that `PEARL8x8` is used instead
+of `VGA8x8` if the vertical size of the display is less than 400 pixel
+rows.
+
+5.1.5) monitorcap:
+-------------------
+
+:Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax>
+
+This describes the capabilities of a multisync monitor. For now, only
+the color frame buffer uses the settings of "monitorcap:".
+
+<vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies
+your monitor can work with, in Hz. <hmin> and <hmax> are the same for
+the horizontal frequency, in kHz.
+
+The defaults are 50;90;15;38 (Generic Amiga multisync monitor).
+
+
+5.2) fd_def_df0=
+----------------
+
+:Syntax: fd_def_df0=<value>
+
+Sets the df0 value for "silent" floppy drives. The value should be in
+hexadecimal with "0x" prefix.
+
+
+5.3) wd33c93=
+-------------
+
+:Syntax: wd33c93=<sub-options...>
+
+These options affect the A590/A2091, A3000 and GVP Series II SCSI
+controllers.
+
+The <sub-options> is a comma-separated list of the sub-options listed
+below.
+
+5.3.1) nosync
+-------------
+
+:Syntax: nosync:bitmask
+
+bitmask is a byte where the 1st 7 bits correspond with the 7
+possible SCSI devices. Set a bit to prevent sync negotiation on that
+device. To maintain backwards compatibility, a command-line such as
+"wd33c93=255" will be automatically translated to
+"wd33c93=nosync:0xff". The default is to disable sync negotiation for
+all devices, eg. nosync:0xff.
+
+5.3.2) period
+-------------
+
+:Syntax: period:ns
+
+`ns` is the minimum # of nanoseconds in a SCSI data transfer
+period. Default is 500; acceptable values are 250 - 1000.
+
+5.3.3) disconnect
+-----------------
+
+:Syntax: disconnect:x
+
+Specify x = 0 to never allow disconnects, 2 to always allow them.
+x = 1 does 'adaptive' disconnects, which is the default and generally
+the best choice.
+
+5.3.4) debug
+------------
+
+:Syntax: debug:x
+
+If `DEBUGGING_ON` is defined, x is a bit mask that causes various
+types of debug output to printed - see the DB_xxx defines in
+wd33c93.h.
+
+5.3.5) clock
+------------
+
+:Syntax: clock:x
+
+x = clock input in MHz for WD33c93 chip. Normal values would be from
+8 through 20. The default value depends on your hostadapter(s),
+default for the A3000 internal controller is 14, for the A2091 it's 8
+and for the GVP hostadapters it's either 8 or 14, depending on the
+hostadapter and the SCSI-clock jumper present on some GVP
+hostadapters.
+
+5.3.6) next
+-----------
+
+No argument. Used to separate blocks of keywords when there's more
+than one wd33c93-based host adapter in the system.
+
+5.3.7) nodma
+------------
+
+:Syntax: nodma:x
+
+If x is 1 (or if the option is just written as "nodma"), the WD33c93
+controller will not use DMA (= direct memory access) to access the
+Amiga's memory.  This is useful for some systems (like A3000's and
+A4000's with the A3640 accelerator, revision 3.0) that have problems
+using DMA to chip memory.  The default is 0, i.e. to use DMA if
+possible.
+
+
+5.4) gvp11=
+-----------
+
+:Syntax: gvp11=<addr-mask>
+
+The earlier versions of the GVP driver did not handle DMA
+address-mask settings correctly which made it necessary for some
+people to use this option, in order to get their GVP controller
+running under Linux. These problems have hopefully been solved and the
+use of this option is now highly unrecommended!
+
+Incorrect use can lead to unpredictable behavior, so please only use
+this option if you *know* what you are doing and have a reason to do
+so. In any case if you experience problems and need to use this
+option, please inform us about it by mailing to the Linux/68k kernel
+mailing list.
+
+The address mask set by this option specifies which addresses are
+valid for DMA with the GVP Series II SCSI controller. An address is
+valid, if no bits are set except the bits that are set in the mask,
+too.
+
+Some versions of the GVP can only DMA into a 24 bit address range,
+some can address a 25 bit address range while others can use the whole
+32 bit address range for DMA. The correct setting depends on your
+controller and should be autodetected by the driver. An example is the
+24 bit region which is specified by a mask of 0x00fffffe.
diff --git a/Documentation/m68k/kernel-options.txt b/Documentation/m68k/kernel-options.txt
deleted file mode 100644
index 79d21246c75a..000000000000
--- a/Documentation/m68k/kernel-options.txt
+++ /dev/null
@@ -1,884 +0,0 @@
-
-
-				  Command Line Options for Linux/m68k
-				  ===================================
-
-Last Update: 2 May 1999
-Linux/m68k version: 2.2.6
-Author: Roman.Hodek@informatik.uni-erlangen.de (Roman Hodek)
-Update: jds@kom.auc.dk (Jes Sorensen) and faq@linux-m68k.org (Chris Lawrence)
-
-0) Introduction
-===============
-
-  Often I've been asked which command line options the Linux/m68k
-kernel understands, or how the exact syntax for the ... option is, or
-... about the option ... . I hope, this document supplies all the
-answers...
-
-  Note that some options might be outdated, their descriptions being
-incomplete or missing. Please update the information and send in the
-patches.
-
-
-1) Overview of the Kernel's Option Processing
-=============================================
-
-The kernel knows three kinds of options on its command line:
-
-  1) kernel options
-  2) environment settings
-  3) arguments for init
-
-To which of these classes an argument belongs is determined as
-follows: If the option is known to the kernel itself, i.e. if the name
-(the part before the '=') or, in some cases, the whole argument string
-is known to the kernel, it belongs to class 1. Otherwise, if the
-argument contains an '=', it is of class 2, and the definition is put
-into init's environment. All other arguments are passed to init as
-command line options.
-
-  This document describes the valid kernel options for Linux/m68k in
-the version mentioned at the start of this file. Later revisions may
-add new such options, and some may be missing in older versions.
-
-  In general, the value (the part after the '=') of an option is a
-list of values separated by commas. The interpretation of these values
-is up to the driver that "owns" the option. This association of
-options with drivers is also the reason that some are further
-subdivided.
-
-
-2) General Kernel Options
-=========================
-
-2.1) root=
-----------
-
-Syntax: root=/dev/<device>
-    or: root=<hex_number>
-
-This tells the kernel which device it should mount as the root
-filesystem. The device must be a block device with a valid filesystem
-on it.
-
-  The first syntax gives the device by name. These names are converted
-into a major/minor number internally in the kernel in an unusual way.
-Normally, this "conversion" is done by the device files in /dev, but
-this isn't possible here, because the root filesystem (with /dev)
-isn't mounted yet... So the kernel parses the name itself, with some
-hardcoded name to number mappings. The name must always be a
-combination of two or three letters, followed by a decimal number.
-Valid names are:
-
-  /dev/ram: -> 0x0100 (initial ramdisk)
-  /dev/hda: -> 0x0300 (first IDE disk)
-  /dev/hdb: -> 0x0340 (second IDE disk)
-  /dev/sda: -> 0x0800 (first SCSI disk)
-  /dev/sdb: -> 0x0810 (second SCSI disk)
-  /dev/sdc: -> 0x0820 (third SCSI disk)
-  /dev/sdd: -> 0x0830 (forth SCSI disk)
-  /dev/sde: -> 0x0840 (fifth SCSI disk)
-  /dev/fd : -> 0x0200 (floppy disk)
-
-  The name must be followed by a decimal number, that stands for the
-partition number. Internally, the value of the number is just
-added to the device number mentioned in the table above. The
-exceptions are /dev/ram and /dev/fd, where /dev/ram refers to an
-initial ramdisk loaded by your bootstrap program (please consult the
-instructions for your bootstrap program to find out how to load an
-initial ramdisk). As of kernel version 2.0.18 you must specify
-/dev/ram as the root device if you want to boot from an initial
-ramdisk. For the floppy devices, /dev/fd, the number stands for the
-floppy drive number (there are no partitions on floppy disks). I.e.,
-/dev/fd0 stands for the first drive, /dev/fd1 for the second, and so
-on. Since the number is just added, you can also force the disk format
-by adding a number greater than 3. If you look into your /dev
-directory, use can see the /dev/fd0D720 has major 2 and minor 16. You
-can specify this device for the root FS by writing "root=/dev/fd16" on
-the kernel command line.
-
-[Strange and maybe uninteresting stuff ON]
-
-  This unusual translation of device names has some strange
-consequences: If, for example, you have a symbolic link from /dev/fd
-to /dev/fd0D720 as an abbreviation for floppy driver #0 in DD format,
-you cannot use this name for specifying the root device, because the
-kernel cannot see this symlink before mounting the root FS and it
-isn't in the table above. If you use it, the root device will not be 
-set at all, without an error message. Another example: You cannot use a
-partition on e.g. the sixth SCSI disk as the root filesystem, if you
-want to specify it by name. This is, because only the devices up to
-/dev/sde are in the table above, but not /dev/sdf. Although, you can
-use the sixth SCSI disk for the root FS, but you have to specify the
-device by number... (see below). Or, even more strange, you can use the
-fact that there is no range checking of the partition number, and your
-knowledge that each disk uses 16 minors, and write "root=/dev/sde17"
-(for /dev/sdf1).
-
-[Strange and maybe uninteresting stuff OFF]
-
-  If the device containing your root partition isn't in the table
-above, you can also specify it by major and minor numbers. These are
-written in hex, with no prefix and no separator between. E.g., if you
-have a CD with contents appropriate as a root filesystem in the first
-SCSI CD-ROM drive, you boot from it by "root=0b00". Here, hex "0b" =
-decimal 11 is the major of SCSI CD-ROMs, and the minor 0 stands for
-the first of these. You can find out all valid major numbers by
-looking into include/linux/major.h.
-
-In addition to major and minor numbers, if the device containing your
-root partition uses a partition table format with unique partition
-identifiers, then you may use them.  For instance,
-"root=PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF".  It is also
-possible to reference another partition on the same device using a
-known partition UUID as the starting point.  For example,
-if partition 5 of the device has the UUID of
-00112233-4455-6677-8899-AABBCCDDEEFF then partition 3 may be found as
-follows:
-  PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF/PARTNROFF=-2
-
-Authoritative information can be found in
-"Documentation/admin-guide/kernel-parameters.rst".
-
-
-2.2) ro, rw
------------
-
-Syntax: ro
-    or: rw
-
-These two options tell the kernel whether it should mount the root
-filesystem read-only or read-write. The default is read-only, except
-for ramdisks, which default to read-write.
-
-
-2.3) debug
-----------
-
-Syntax: debug
-
-This raises the kernel log level to 10 (the default is 7). This is the
-same level as set by the "dmesg" command, just that the maximum level
-selectable by dmesg is 8.
-
-
-2.4) debug=
------------
-
-Syntax: debug=<device>
-
-This option causes certain kernel messages be printed to the selected
-debugging device. This can aid debugging the kernel, since the
-messages can be captured and analyzed on some other machine. Which
-devices are possible depends on the machine type. There are no checks
-for the validity of the device name. If the device isn't implemented,
-nothing happens.
-
-  Messages logged this way are in general stack dumps after kernel
-memory faults or bad kernel traps, and kernel panics. To be exact: all
-messages of level 0 (panic messages) and all messages printed while
-the log level is 8 or more (their level doesn't matter). Before stack
-dumps, the kernel sets the log level to 10 automatically. A level of
-at least 8 can also be set by the "debug" command line option (see
-2.3) and at run time with "dmesg -n 8".
-
-Devices possible for Amiga:
-
- - "ser": built-in serial port; parameters: 9600bps, 8N1
- - "mem": Save the messages to a reserved area in chip mem. After
-          rebooting, they can be read under AmigaOS with the tool
-          'dmesg'.
-
-Devices possible for Atari:
-
- - "ser1": ST-MFP serial port ("Modem1"); parameters: 9600bps, 8N1
- - "ser2": SCC channel B serial port ("Modem2"); parameters: 9600bps, 8N1
- - "ser" : default serial port
-           This is "ser2" for a Falcon, and "ser1" for any other machine
- - "midi": The MIDI port; parameters: 31250bps, 8N1
- - "par" : parallel port
-           The printing routine for this implements a timeout for the
-           case there's no printer connected (else the kernel would
-           lock up). The timeout is not exact, but usually a few
-           seconds.
-
-
-2.6) ramdisk_size=
--------------
-
-Syntax: ramdisk_size=<size>
-
-  This option instructs the kernel to set up a ramdisk of the given
-size in KBytes. Do not use this option if the ramdisk contents are
-passed by bootstrap! In this case, the size is selected automatically
-and should not be overwritten.
-
-  The only application is for root filesystems on floppy disks, that
-should be loaded into memory. To do that, select the corresponding
-size of the disk as ramdisk size, and set the root device to the disk
-drive (with "root=").
-
-
-2.7) swap=
-2.8) buff=
------------
-
-  I can't find any sign of these options in 2.2.6.
-
-
-3) General Device Options (Amiga and Atari)
-===========================================
-
-3.1) ether=
------------
-
-Syntax: ether=[<irq>[,<base_addr>[,<mem_start>[,<mem_end>]]]],<dev-name>
-
-  <dev-name> is the name of a net driver, as specified in
-drivers/net/Space.c in the Linux source. Most prominent are eth0, ...
-eth3, sl0, ... sl3, ppp0, ..., ppp3, dummy, and lo.
-
-  The non-ethernet drivers (sl, ppp, dummy, lo) obviously ignore the
-settings by this options. Also, the existing ethernet drivers for
-Linux/m68k (ariadne, a2065, hydra) don't use them because Zorro boards
-are really Plug-'n-Play, so the "ether=" option is useless altogether
-for Linux/m68k.
-
-
-3.2) hd=
---------
-
-Syntax: hd=<cylinders>,<heads>,<sectors>
-
-  This option sets the disk geometry of an IDE disk. The first hd=
-option is for the first IDE disk, the second for the second one.
-(I.e., you can give this option twice.) In most cases, you won't have
-to use this option, since the kernel can obtain the geometry data
-itself. It exists just for the case that this fails for one of your
-disks.
-
-
-3.3) max_scsi_luns=
--------------------
-
-Syntax: max_scsi_luns=<n>
-
-  Sets the maximum number of LUNs (logical units) of SCSI devices to
-be scanned. Valid values for <n> are between 1 and 8. Default is 8 if
-"Probe all LUNs on each SCSI device" was selected during the kernel
-configuration, else 1.
-
-
-3.4) st=
---------
-
-Syntax: st=<buffer_size>,[<write_thres>,[<max_buffers>]]
-
-  Sets several parameters of the SCSI tape driver. <buffer_size> is
-the number of 512-byte buffers reserved for tape operations for each
-device. <write_thres> sets the number of blocks which must be filled
-to start an actual write operation to the tape. Maximum value is the
-total number of buffers. <max_buffer> limits the total number of
-buffers allocated for all tape devices.
-
-
-3.5) dmasound=
---------------
-
-Syntax: dmasound=[<buffers>,<buffer-size>[,<catch-radius>]]
-
-  This option controls some configurations of the Linux/m68k DMA sound
-driver (Amiga and Atari): <buffers> is the number of buffers you want
-to use (minimum 4, default 4), <buffer-size> is the size of each
-buffer in kilobytes (minimum 4, default 32) and <catch-radius> says
-how much percent of error will be tolerated when setting a frequency
-(maximum 10, default 0). For example with 3% you can play 8000Hz
-AU-Files on the Falcon with its hardware frequency of 8195Hz and thus
-don't need to expand the sound.
-
-
-
-4) Options for Atari Only
-=========================
-
-4.1) video=
------------
-
-Syntax: video=<fbname>:<sub-options...>
-
-The <fbname> parameter specifies the name of the frame buffer,
-eg. most atari users will want to specify `atafb' here. The
-<sub-options> is a comma-separated list of the sub-options listed
-below.
-
-NB: Please notice that this option was renamed from `atavideo' to
-    `video' during the development of the 1.3.x kernels, thus you
-    might need to update your boot-scripts if upgrading to 2.x from
-    an 1.2.x kernel.
-
-NBB: The behavior of video= was changed in 2.1.57 so the recommended
-option is to specify the name of the frame buffer.
-
-4.1.1) Video Mode
------------------
-
-This sub-option may be any of the predefined video modes, as listed
-in atari/atafb.c in the Linux/m68k source tree. The kernel will
-activate the given video mode at boot time and make it the default
-mode, if the hardware allows. Currently defined names are:
-
- - stlow           : 320x200x4
- - stmid, default5 : 640x200x2
- - sthigh, default4: 640x400x1
- - ttlow           : 320x480x8, TT only
- - ttmid, default1 : 640x480x4, TT only
- - tthigh, default2: 1280x960x1, TT only
- - vga2            : 640x480x1, Falcon only
- - vga4            : 640x480x2, Falcon only
- - vga16, default3 : 640x480x4, Falcon only
- - vga256          : 640x480x8, Falcon only
- - falh2           : 896x608x1, Falcon only
- - falh16          : 896x608x4, Falcon only
-
-  If no video mode is given on the command line, the kernel tries the
-modes names "default<n>" in turn, until one is possible with the
-hardware in use.
-
-  A video mode setting doesn't make sense, if the external driver is
-activated by a "external:" sub-option.
-
-4.1.2) inverse
---------------
-
-Invert the display. This affects both, text (consoles) and graphics
-(X) display. Usually, the background is chosen to be black. With this
-option, you can make the background white.
-
-4.1.3) font
------------
-
-Syntax: font:<fontname>
-
-Specify the font to use in text modes. Currently you can choose only
-between `VGA8x8', `VGA8x16' and `PEARL8x8'. `VGA8x8' is default, if the
-vertical size of the display is less than 400 pixel rows. Otherwise, the
-`VGA8x16' font is the default.
-
-4.1.4) hwscroll_
-----------------
-
-Syntax: hwscroll_<n>
-
-The number of additional lines of video memory to reserve for
-speeding up the scrolling ("hardware scrolling"). Hardware scrolling
-is possible only if the kernel can set the video base address in steps
-fine enough. This is true for STE, MegaSTE, TT, and Falcon. It is not
-possible with plain STs and graphics cards (The former because the
-base address must be on a 256 byte boundary there, the latter because
-the kernel doesn't know how to set the base address at all.)
-
-  By default, <n> is set to the number of visible text lines on the
-display. Thus, the amount of video memory is doubled, compared to no
-hardware scrolling. You can turn off the hardware scrolling altogether
-by setting <n> to 0.
-
-4.1.5) internal:
-----------------
-
-Syntax: internal:<xres>;<yres>[;<xres_max>;<yres_max>;<offset>]
-
-This option specifies the capabilities of some extended internal video
-hardware, like e.g. OverScan. <xres> and <yres> give the (extended)
-dimensions of the screen.
-
-  If your OverScan needs a black border, you have to write the last
-three arguments of the "internal:". <xres_max> is the maximum line
-length the hardware allows, <yres_max> the maximum number of lines.
-<offset> is the offset of the visible part of the screen memory to its
-physical start, in bytes.
-
-  Often, extended interval video hardware has to be activated somehow.
-For this, see the "sw_*" options below.
-
-4.1.6) external:
-----------------
-
-Syntax:
-  external:<xres>;<yres>;<depth>;<org>;<scrmem>[;<scrlen>[;<vgabase>\
-           [;<colw>[;<coltype>[;<xres_virtual>]]]]]
-
-[I had to break this line...]
-
-  This is probably the most complicated parameter... It specifies that
-you have some external video hardware (a graphics board), and how to
-use it under Linux/m68k. The kernel cannot know more about the hardware
-than you tell it here! The kernel also is unable to set or change any
-video modes, since it doesn't know about any board internal. So, you
-have to switch to that video mode before you start Linux, and cannot
-switch to another mode once Linux has started.
-
-  The first 3 parameters of this sub-option should be obvious: <xres>,
-<yres> and <depth> give the dimensions of the screen and the number of
-planes (depth). The depth is the logarithm to base 2 of the number
-of colors possible. (Or, the other way round: The number of colors is
-2^depth).
-
-  You have to tell the kernel furthermore how the video memory is
-organized. This is done by a letter as <org> parameter:
-
- 'n': "normal planes", i.e. one whole plane after another
- 'i': "interleaved planes", i.e. 16 bit of the first plane, than 16 bit
-      of the next, and so on... This mode is used only with the
-	  built-in Atari video modes, I think there is no card that
-	  supports this mode.
- 'p': "packed pixels", i.e. <depth> consecutive bits stand for all
-	  planes of one pixel; this is the most common mode for 8 planes
-	  (256 colors) on graphic cards
- 't': "true color" (more or less packed pixels, but without a color
-	  lookup table); usually depth is 24
-
-For monochrome modes (i.e., <depth> is 1), the <org> letter has a
-different meaning:
-
- 'n': normal colors, i.e. 0=white, 1=black
- 'i': inverted colors, i.e. 0=black, 1=white
-
-  The next important information about the video hardware is the base
-address of the video memory. That is given in the <scrmem> parameter,
-as a hexadecimal number with a "0x" prefix. You have to find out this
-address in the documentation of your hardware.
-
-  The next parameter, <scrlen>, tells the kernel about the size of the
-video memory. If it's missing, the size is calculated from <xres>,
-<yres>, and <depth>. For now, it is not useful to write a value here.
-It would be used only for hardware scrolling (which isn't possible
-with the external driver, because the kernel cannot set the video base
-address), or for virtual resolutions under X (which the X server
-doesn't support yet). So, it's currently best to leave this field
-empty, either by ending the "external:" after the video address or by
-writing two consecutive semicolons, if you want to give a <vgabase>
-(it is allowed to leave this parameter empty).
-
-  The <vgabase> parameter is optional. If it is not given, the kernel
-cannot read or write any color registers of the video hardware, and
-thus you have to set appropriate colors before you start Linux. But if
-your card is somehow VGA compatible, you can tell the kernel the base
-address of the VGA register set, so it can change the color lookup
-table. You have to look up this address in your board's documentation.
-To avoid misunderstandings: <vgabase> is the _base_ address, i.e. a 4k
-aligned address. For read/writing the color registers, the kernel
-uses the addresses vgabase+0x3c7...vgabase+0x3c9. The <vgabase>
-parameter is written in hexadecimal with a "0x" prefix, just as
-<scrmem>.
-
-  <colw> is meaningful only if <vgabase> is specified. It tells the
-kernel how wide each of the color register is, i.e. the number of bits
-per single color (red/green/blue). Default is 6, another quite usual
-value is 8.
-
-  Also <coltype> is used together with <vgabase>. It tells the kernel
-about the color register model of your gfx board. Currently, the types
-"vga" (which is also the default) and "mv300" (SANG MV300) are
-implemented.
-
-  Parameter <xres_virtual> is required for ProMST or ET4000 cards where
-the physical linelength differs from the visible length. With ProMST, 
-xres_virtual must be set to 2048. For ET4000, xres_virtual depends on the
-initialisation of the video-card.
-If you're missing a corresponding yres_virtual: the external part is legacy,
-therefore we don't support hardware-dependent functions like hardware-scroll,
-panning or blanking.
-
-4.1.7) eclock:
---------------
-
-The external pixel clock attached to the Falcon VIDEL shifter. This
-currently works only with the ScreenWonder!
-
-4.1.8) monitorcap:
--------------------
-
-Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax>
-
-This describes the capabilities of a multisync monitor. Don't use it
-with a fixed-frequency monitor! For now, only the Falcon frame buffer
-uses the settings of "monitorcap:".
-
-  <vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies
-your monitor can work with, in Hz. <hmin> and <hmax> are the same for
-the horizontal frequency, in kHz.
-
-  The defaults are 58;62;31;32 (VGA compatible).
-
-  The defaults for TV/SC1224/SC1435 cover both PAL and NTSC standards.
-
-4.1.9) keep
-------------
-
-If this option is given, the framebuffer device doesn't do any video
-mode calculations and settings on its own. The only Atari fb device
-that does this currently is the Falcon.
-
-  What you reach with this: Settings for unknown video extensions
-aren't overridden by the driver, so you can still use the mode found
-when booting, when the driver doesn't know to set this mode itself.
-But this also means, that you can't switch video modes anymore...
-
-  An example where you may want to use "keep" is the ScreenBlaster for
-the Falcon.
-
-
-4.2) atamouse=
---------------
-
-Syntax: atamouse=<x-threshold>,[<y-threshold>]
-
-  With this option, you can set the mouse movement reporting threshold.
-This is the number of pixels of mouse movement that have to accumulate
-before the IKBD sends a new mouse packet to the kernel. Higher values
-reduce the mouse interrupt load and thus reduce the chance of keyboard
-overruns. Lower values give a slightly faster mouse responses and
-slightly better mouse tracking.
-
-  You can set the threshold in x and y separately, but usually this is
-of little practical use. If there's just one number in the option, it
-is used for both dimensions. The default value is 2 for both
-thresholds.
-
-
-4.3) ataflop=
--------------
-
-Syntax: ataflop=<drive type>[,<trackbuffering>[,<steprateA>[,<steprateB>]]]
-
-   The drive type may be 0, 1, or 2, for DD, HD, and ED, resp. This
-   setting affects how many buffers are reserved and which formats are
-   probed (see also below). The default is 1 (HD). Only one drive type
-   can be selected. If you have two disk drives, select the "better"
-   type.
-
-   The second parameter <trackbuffer> tells the kernel whether to use
-   track buffering (1) or not (0). The default is machine-dependent:
-   no for the Medusa and yes for all others.
-
-   With the two following parameters, you can change the default
-   steprate used for drive A and B, resp. 
-
-
-4.4) atascsi=
--------------
-
-Syntax: atascsi=<can_queue>[,<cmd_per_lun>[,<scat-gat>[,<host-id>[,<tagged>]]]]
-
-  This option sets some parameters for the Atari native SCSI driver.
-Generally, any number of arguments can be omitted from the end. And
-for each of the numbers, a negative value means "use default". The
-defaults depend on whether TT-style or Falcon-style SCSI is used.
-Below, defaults are noted as n/m, where the first value refers to
-TT-SCSI and the latter to Falcon-SCSI. If an illegal value is given
-for one parameter, an error message is printed and that one setting is
-ignored (others aren't affected).
-
-  <can_queue>:
-    This is the maximum number of SCSI commands queued internally to the
-    Atari SCSI driver. A value of 1 effectively turns off the driver
-    internal multitasking (if it causes problems). Legal values are >=
-    1. <can_queue> can be as high as you like, but values greater than
-    <cmd_per_lun> times the number of SCSI targets (LUNs) you have
-    don't make sense. Default: 16/8.
-
-  <cmd_per_lun>:
-    Maximum number of SCSI commands issued to the driver for one
-    logical unit (LUN, usually one SCSI target). Legal values start
-    from 1. If tagged queuing (see below) is not used, values greater
-    than 2 don't make sense, but waste memory. Otherwise, the maximum
-    is the number of command tags available to the driver (currently
-    32). Default: 8/1. (Note: Values > 1 seem to cause problems on a
-    Falcon, cause not yet known.)
-
-      The <cmd_per_lun> value at a great part determines the amount of
-    memory SCSI reserves for itself. The formula is rather
-    complicated, but I can give you some hints:
-      no scatter-gather  : cmd_per_lun * 232 bytes
-      full scatter-gather: cmd_per_lun * approx. 17 Kbytes
-
-  <scat-gat>:
-    Size of the scatter-gather table, i.e. the number of requests
-    consecutive on the disk that can be merged into one SCSI command.
-    Legal values are between 0 and 255. Default: 255/0. Note: This
-    value is forced to 0 on a Falcon, since scatter-gather isn't
-    possible with the ST-DMA. Not using scatter-gather hurts
-    performance significantly.
-
-  <host-id>:
-    The SCSI ID to be used by the initiator (your Atari). This is
-    usually 7, the highest possible ID. Every ID on the SCSI bus must
-    be unique. Default: determined at run time: If the NV-RAM checksum
-    is valid, and bit 7 in byte 30 of the NV-RAM is set, the lower 3
-    bits of this byte are used as the host ID. (This method is defined
-    by Atari and also used by some TOS HD drivers.) If the above
-    isn't given, the default ID is 7. (both, TT and Falcon).
-
-  <tagged>:
-    0 means turn off tagged queuing support, all other values > 0 mean
-    use tagged queuing for targets that support it. Default: currently
-    off, but this may change when tagged queuing handling has been
-    proved to be reliable.
-
-    Tagged queuing means that more than one command can be issued to
-    one LUN, and the SCSI device itself orders the requests so they
-    can be performed in optimal order. Not all SCSI devices support
-    tagged queuing (:-().
-
-4.5 switches=
--------------
-
-Syntax: switches=<list of switches>
-
-  With this option you can switch some hardware lines that are often
-used to enable/disable certain hardware extensions. Examples are
-OverScan, overclocking, ...
-
-  The <list of switches> is a comma-separated list of the following
-items:
-
-  ikbd: set RTS of the keyboard ACIA high
-  midi: set RTS of the MIDI ACIA high
-  snd6: set bit 6 of the PSG port A
-  snd7: set bit 6 of the PSG port A
-
-It doesn't make sense to mention a switch more than once (no
-difference to only once), but you can give as many switches as you
-want to enable different features. The switch lines are set as early
-as possible during kernel initialization (even before determining the
-present hardware.)
-
-  All of the items can also be prefixed with "ov_", i.e. "ov_ikbd",
-"ov_midi", ... These options are meant for switching on an OverScan
-video extension. The difference to the bare option is that the
-switch-on is done after video initialization, and somehow synchronized
-to the HBLANK. A speciality is that ov_ikbd and ov_midi are switched
-off before rebooting, so that OverScan is disabled and TOS boots
-correctly.
-
-  If you give an option both, with and without the "ov_" prefix, the
-earlier initialization ("ov_"-less) takes precedence. But the
-switching-off on reset still happens in this case.
-
-5) Options for Amiga Only:
-==========================
-
-5.1) video=
------------
-
-Syntax: video=<fbname>:<sub-options...>
-
-The <fbname> parameter specifies the name of the frame buffer, valid
-options are `amifb', `cyber', 'virge', `retz3' and `clgen', provided
-that the respective frame buffer devices have been compiled into the
-kernel (or compiled as loadable modules). The behavior of the <fbname>
-option was changed in 2.1.57 so it is now recommended to specify this
-option.
-
-The <sub-options> is a comma-separated list of the sub-options listed
-below. This option is organized similar to the Atari version of the
-"video"-option (4.1), but knows fewer sub-options.
-
-5.1.1) video mode
------------------
-
-Again, similar to the video mode for the Atari (see 4.1.1). Predefined
-modes depend on the used frame buffer device.
-
-OCS, ECS and AGA machines all use the color frame buffer. The following
-predefined video modes are available:
-
-NTSC modes:
- - ntsc            : 640x200, 15 kHz, 60 Hz
- - ntsc-lace       : 640x400, 15 kHz, 60 Hz interlaced
-PAL modes:
- - pal             : 640x256, 15 kHz, 50 Hz
- - pal-lace        : 640x512, 15 kHz, 50 Hz interlaced
-ECS modes:
- - multiscan       : 640x480, 29 kHz, 57 Hz
- - multiscan-lace  : 640x960, 29 kHz, 57 Hz interlaced
- - euro36          : 640x200, 15 kHz, 72 Hz
- - euro36-lace     : 640x400, 15 kHz, 72 Hz interlaced
- - euro72          : 640x400, 29 kHz, 68 Hz
- - euro72-lace     : 640x800, 29 kHz, 68 Hz interlaced
- - super72         : 800x300, 23 kHz, 70 Hz
- - super72-lace    : 800x600, 23 kHz, 70 Hz interlaced
- - dblntsc-ff      : 640x400, 27 kHz, 57 Hz
- - dblntsc-lace    : 640x800, 27 kHz, 57 Hz interlaced
- - dblpal-ff       : 640x512, 27 kHz, 47 Hz
- - dblpal-lace     : 640x1024, 27 kHz, 47 Hz interlaced
- - dblntsc         : 640x200, 27 kHz, 57 Hz doublescan
- - dblpal          : 640x256, 27 kHz, 47 Hz doublescan
-VGA modes:
- - vga             : 640x480, 31 kHz, 60 Hz
- - vga70           : 640x400, 31 kHz, 70 Hz
-
-Please notice that the ECS and VGA modes require either an ECS or AGA
-chipset, and that these modes are limited to 2-bit color for the ECS
-chipset and 8-bit color for the AGA chipset.
-
-5.1.2) depth
-------------
-
-Syntax: depth:<nr. of bit-planes>
-
-Specify the number of bit-planes for the selected video-mode.
-
-5.1.3) inverse
---------------
-
-Use inverted display (black on white). Functionally the same as the
-"inverse" sub-option for the Atari.
-
-5.1.4) font
------------
-
-Syntax: font:<fontname>
-
-Specify the font to use in text modes. Functionally the same as the
-"font" sub-option for the Atari, except that `PEARL8x8' is used instead
-of `VGA8x8' if the vertical size of the display is less than 400 pixel
-rows.
-
-5.1.5) monitorcap:
--------------------
-
-Syntax: monitorcap:<vmin>;<vmax>;<hmin>;<hmax>
-
-This describes the capabilities of a multisync monitor. For now, only
-the color frame buffer uses the settings of "monitorcap:".
-
-  <vmin> and <vmax> are the minimum and maximum, resp., vertical frequencies
-your monitor can work with, in Hz. <hmin> and <hmax> are the same for
-the horizontal frequency, in kHz.
-
-  The defaults are 50;90;15;38 (Generic Amiga multisync monitor).
-
-
-5.2) fd_def_df0=
-----------------
-
-Syntax: fd_def_df0=<value>
-
-Sets the df0 value for "silent" floppy drives. The value should be in
-hexadecimal with "0x" prefix.
-
-
-5.3) wd33c93=
--------------
-
-Syntax: wd33c93=<sub-options...>
-
-These options affect the A590/A2091, A3000 and GVP Series II SCSI
-controllers.
-
-The <sub-options> is a comma-separated list of the sub-options listed
-below.
-
-5.3.1) nosync
--------------
-
-Syntax: nosync:bitmask
-
-  bitmask is a byte where the 1st 7 bits correspond with the 7
-possible SCSI devices. Set a bit to prevent sync negotiation on that
-device. To maintain backwards compatibility, a command-line such as
-"wd33c93=255" will be automatically translated to
-"wd33c93=nosync:0xff". The default is to disable sync negotiation for
-all devices, eg. nosync:0xff.
-
-5.3.2) period
--------------
-
-Syntax: period:ns
-
-  `ns' is the minimum # of nanoseconds in a SCSI data transfer
-period. Default is 500; acceptable values are 250 - 1000.
-
-5.3.3) disconnect
------------------
-
-Syntax: disconnect:x
-
-  Specify x = 0 to never allow disconnects, 2 to always allow them.
-x = 1 does 'adaptive' disconnects, which is the default and generally
-the best choice.
-
-5.3.4) debug
-------------
-
-Syntax: debug:x
-
-  If `DEBUGGING_ON' is defined, x is a bit mask that causes various
-types of debug output to printed - see the DB_xxx defines in
-wd33c93.h.
-
-5.3.5) clock
-------------
-
-Syntax: clock:x
-
-  x = clock input in MHz for WD33c93 chip. Normal values would be from
-8 through 20. The default value depends on your hostadapter(s),
-default for the A3000 internal controller is 14, for the A2091 it's 8
-and for the GVP hostadapters it's either 8 or 14, depending on the
-hostadapter and the SCSI-clock jumper present on some GVP
-hostadapters.
-
-5.3.6) next
------------
-
-  No argument. Used to separate blocks of keywords when there's more
-than one wd33c93-based host adapter in the system.
-
-5.3.7) nodma
-------------
-
-Syntax: nodma:x
-
-  If x is 1 (or if the option is just written as "nodma"), the WD33c93
-controller will not use DMA (= direct memory access) to access the
-Amiga's memory.  This is useful for some systems (like A3000's and
-A4000's with the A3640 accelerator, revision 3.0) that have problems
-using DMA to chip memory.  The default is 0, i.e. to use DMA if
-possible.
-
-
-5.4) gvp11=
------------
-
-Syntax: gvp11=<addr-mask>
-
-  The earlier versions of the GVP driver did not handle DMA
-address-mask settings correctly which made it necessary for some
-people to use this option, in order to get their GVP controller
-running under Linux. These problems have hopefully been solved and the
-use of this option is now highly unrecommended!
-
-  Incorrect use can lead to unpredictable behavior, so please only use
-this option if you *know* what you are doing and have a reason to do
-so. In any case if you experience problems and need to use this
-option, please inform us about it by mailing to the Linux/68k kernel
-mailing list.
-
-  The address mask set by this option specifies which addresses are
-valid for DMA with the GVP Series II SCSI controller. An address is
-valid, if no bits are set except the bits that are set in the mask,
-too.
-
-  Some versions of the GVP can only DMA into a 24 bit address range,
-some can address a 25 bit address range while others can use the whole
-32 bit address range for DMA. The correct setting depends on your
-controller and should be autodetected by the driver. An example is the
-24 bit region which is specified by a mask of 0x00fffffe.
-
-
-/* Local Variables: */
-/* mode: text       */
-/* End:             */
-- 
cgit v1.2.3-55-g7522


From 01c0aa794305ae08eb977d0719e43577e93f9ef5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 08:37:58 -0300
Subject: docs: cma/debugfs.txt: convert docs to ReST and rename to *.rst

The debugfs interface for CMA should be there together with other
mm-related documents.

Convert this small file to ReST and move it to its rightful place.

The conversion is actually quite simple: just add a title for the
document. In order to make it to look better for the audience,
also mark the "echo" command as a literal block.

While this is not part of any book, mark it as :orphan:,
in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/cma/debugfs.rst | 27 +++++++++++++++++++++++++++
 Documentation/cma/debugfs.txt | 21 ---------------------
 2 files changed, 27 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/cma/debugfs.rst
 delete mode 100644 Documentation/cma/debugfs.txt

diff --git a/Documentation/cma/debugfs.rst b/Documentation/cma/debugfs.rst
new file mode 100644
index 000000000000..518fe401b5ee
--- /dev/null
+++ b/Documentation/cma/debugfs.rst
@@ -0,0 +1,27 @@
+:orphan:
+
+=====================
+CMA Debugfs Interface
+=====================
+
+The CMA debugfs interface is useful to retrieve basic information out of the
+different CMA areas and to test allocation/release in each of the areas.
+
+Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
+kernel's CMA index. So the first CMA zone would be:
+
+	<debugfs>/cma/cma-0
+
+The structure of the files created under that directory is as follows:
+
+ - [RO] base_pfn: The base PFN (Page Frame Number) of the zone.
+ - [RO] count: Amount of memory in the CMA area.
+ - [RO] order_per_bit: Order of pages represented by one bit.
+ - [RO] bitmap: The bitmap of page states in the zone.
+ - [WO] alloc: Allocate N pages from that CMA area. For example::
+
+	echo 5 > <debugfs>/cma/cma-2/alloc
+
+would try to allocate 5 pages from the cma-2 area.
+
+ - [WO] free: Free N pages from that CMA area, similar to the above.
diff --git a/Documentation/cma/debugfs.txt b/Documentation/cma/debugfs.txt
deleted file mode 100644
index 6cef20a8cedc..000000000000
--- a/Documentation/cma/debugfs.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-The CMA debugfs interface is useful to retrieve basic information out of the
-different CMA areas and to test allocation/release in each of the areas.
-
-Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
-kernel's CMA index. So the first CMA zone would be:
-
-	<debugfs>/cma/cma-0
-
-The structure of the files created under that directory is as follows:
-
- - [RO] base_pfn: The base PFN (Page Frame Number) of the zone.
- - [RO] count: Amount of memory in the CMA area.
- - [RO] order_per_bit: Order of pages represented by one bit.
- - [RO] bitmap: The bitmap of page states in the zone.
- - [WO] alloc: Allocate N pages from that CMA area. For example:
-
-	echo 5 > <debugfs>/cma/cma-2/alloc
-
-would try to allocate 5 pages from the cma-2 area.
-
- - [WO] free: Free N pages from that CMA area, similar to the above.
-- 
cgit v1.2.3-55-g7522


From 8db8acee4b326bfd5bc9a164a7f9ef844ec0fd2e Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 08:44:17 -0300
Subject: docs: console.txt: convert docs to ReST and rename to *.rst

Convert this small file to ReST in preparation for adding it to
the driver-api book.

While this is not part of the driver-api book, mark it as
:orphan:, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
---
 Documentation/console/console.rst | 152 ++++++++++++++++++++++++++++++++++++++
 Documentation/console/console.txt | 145 ------------------------------------
 Documentation/fb/fbcon.rst        |   4 +-
 drivers/tty/Kconfig               |   2 +-
 4 files changed, 155 insertions(+), 148 deletions(-)
 create mode 100644 Documentation/console/console.rst
 delete mode 100644 Documentation/console/console.txt

diff --git a/Documentation/console/console.rst b/Documentation/console/console.rst
new file mode 100644
index 000000000000..b374141b027e
--- /dev/null
+++ b/Documentation/console/console.rst
@@ -0,0 +1,152 @@
+:orphan:
+
+===============
+Console Drivers
+===============
+
+The Linux kernel has 2 general types of console drivers.  The first type is
+assigned by the kernel to all the virtual consoles during the boot process.
+This type will be called 'system driver', and only one system driver is allowed
+to exist. The system driver is persistent and it can never be unloaded, though
+it may become inactive.
+
+The second type has to be explicitly loaded and unloaded. This will be called
+'modular driver' by this document. Multiple modular drivers can coexist at
+any time with each driver sharing the console with other drivers including
+the system driver. However, modular drivers cannot take over the console
+that is currently occupied by another modular driver. (Exception: Drivers that
+call do_take_over_console() will succeed in the takeover regardless of the type
+of driver occupying the consoles.) They can only take over the console that is
+occupied by the system driver. In the same token, if the modular driver is
+released by the console, the system driver will take over.
+
+Modular drivers, from the programmer's point of view, have to call::
+
+	 do_take_over_console() - load and bind driver to console layer
+	 give_up_console() - unload driver; it will only work if driver
+			     is fully unbound
+
+In newer kernels, the following are also available::
+
+	 do_register_con_driver()
+	 do_unregister_con_driver()
+
+If sysfs is enabled, the contents of /sys/class/vtconsole can be
+examined. This shows the console backends currently registered by the
+system which are named vtcon<n> where <n> is an integer from 0 to 15.
+Thus::
+
+       ls /sys/class/vtconsole
+       .  ..  vtcon0  vtcon1
+
+Each directory in /sys/class/vtconsole has 3 files::
+
+     ls /sys/class/vtconsole/vtcon0
+     .  ..  bind  name  uevent
+
+What do these files signify?
+
+     1. bind - this is a read/write file. It shows the status of the driver if
+        read, or acts to bind or unbind the driver to the virtual consoles
+        when written to. The possible values are:
+
+	0
+	  - means the driver is not bound and if echo'ed, commands the driver
+	    to unbind
+
+        1
+	  - means the driver is bound and if echo'ed, commands the driver to
+	    bind
+
+     2. name - read-only file. Shows the name of the driver in this format::
+
+	  cat /sys/class/vtconsole/vtcon0/name
+	  (S) VGA+
+
+	      '(S)' stands for a (S)ystem driver, i.e., it cannot be directly
+	      commanded to bind or unbind
+
+	      'VGA+' is the name of the driver
+
+	  cat /sys/class/vtconsole/vtcon1/name
+	  (M) frame buffer device
+
+	      In this case, '(M)' stands for a (M)odular driver, one that can be
+	      directly commanded to bind or unbind.
+
+     3. uevent - ignore this file
+
+When unbinding, the modular driver is detached first, and then the system
+driver takes over the consoles vacated by the driver. Binding, on the other
+hand, will bind the driver to the consoles that are currently occupied by a
+system driver.
+
+NOTE1:
+  Binding and unbinding must be selected in Kconfig. It's under::
+
+    Device Drivers ->
+	Character devices ->
+		Support for binding and unbinding console drivers
+
+NOTE2:
+  If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
+  unbinding will not succeed. An example of an application that sets the
+  console to KD_GRAPHICS is X.
+
+How useful is this feature? This is very useful for console driver
+developers. By unbinding the driver from the console layer, one can unload the
+driver, make changes, recompile, reload and rebind the driver without any need
+for rebooting the kernel. For regular users who may want to switch from
+framebuffer console to VGA console and vice versa, this feature also makes
+this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
+for more details.)
+
+Notes for developers
+====================
+
+do_take_over_console() is now broken up into::
+
+     do_register_con_driver()
+     do_bind_con_driver() - private function
+
+give_up_console() is a wrapper to do_unregister_con_driver(), and a driver must
+be fully unbound for this call to succeed. con_is_bound() will check if the
+driver is bound or not.
+
+Guidelines for console driver writers
+=====================================
+
+In order for binding to and unbinding from the console to properly work,
+console drivers must follow these guidelines:
+
+1. All drivers, except system drivers, must call either do_register_con_driver()
+   or do_take_over_console(). do_register_con_driver() will just add the driver
+   to the console's internal list. It won't take over the
+   console. do_take_over_console(), as it name implies, will also take over (or
+   bind to) the console.
+
+2. All resources allocated during con->con_init() must be released in
+   con->con_deinit().
+
+3. All resources allocated in con->con_startup() must be released when the
+   driver, which was previously bound, becomes unbound.  The console layer
+   does not have a complementary call to con->con_startup() so it's up to the
+   driver to check when it's legal to release these resources. Calling
+   con_is_bound() in con->con_deinit() will help.  If the call returned
+   false(), then it's safe to release the resources.  This balance has to be
+   ensured because con->con_startup() can be called again when a request to
+   rebind the driver to the console arrives.
+
+4. Upon exit of the driver, ensure that the driver is totally unbound. If the
+   condition is satisfied, then the driver must call do_unregister_con_driver()
+   or give_up_console().
+
+5. do_unregister_con_driver() can also be called on conditions which make it
+   impossible for the driver to service console requests.  This can happen
+   with the framebuffer console that suddenly lost all of its drivers.
+
+The current crop of console drivers should still work correctly, but binding
+and unbinding them may cause problems. With minimal fixes, these drivers can
+be made to work correctly.
+
+Antonino Daplas <adaplas@pol.net>
diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt
deleted file mode 100644
index d73c2ab4beda..000000000000
--- a/Documentation/console/console.txt
+++ /dev/null
@@ -1,145 +0,0 @@
-Console Drivers
-===============
-
-The Linux kernel has 2 general types of console drivers.  The first type is
-assigned by the kernel to all the virtual consoles during the boot process.
-This type will be called 'system driver', and only one system driver is allowed
-to exist. The system driver is persistent and it can never be unloaded, though
-it may become inactive.
-
-The second type has to be explicitly loaded and unloaded. This will be called
-'modular driver' by this document. Multiple modular drivers can coexist at
-any time with each driver sharing the console with other drivers including
-the system driver. However, modular drivers cannot take over the console
-that is currently occupied by another modular driver. (Exception: Drivers that
-call do_take_over_console() will succeed in the takeover regardless of the type
-of driver occupying the consoles.) They can only take over the console that is
-occupied by the system driver. In the same token, if the modular driver is
-released by the console, the system driver will take over.
-
-Modular drivers, from the programmer's point of view, have to call:
-
-	 do_take_over_console() - load and bind driver to console layer
-	 give_up_console() - unload driver; it will only work if driver
-			     is fully unbound
-
-In newer kernels, the following are also available:
-
-	 do_register_con_driver()
-	 do_unregister_con_driver()
-
-If sysfs is enabled, the contents of /sys/class/vtconsole can be
-examined. This shows the console backends currently registered by the
-system which are named vtcon<n> where <n> is an integer from 0 to 15. Thus:
-
-       ls /sys/class/vtconsole
-       .  ..  vtcon0  vtcon1
-
-Each directory in /sys/class/vtconsole has 3 files:
-
-     ls /sys/class/vtconsole/vtcon0
-     .  ..  bind  name  uevent
-
-What do these files signify?
-
-     1. bind - this is a read/write file. It shows the status of the driver if
-        read, or acts to bind or unbind the driver to the virtual consoles
-        when written to. The possible values are:
-
-	0 - means the driver is not bound and if echo'ed, commands the driver
-	    to unbind
-
-        1 - means the driver is bound and if echo'ed, commands the driver to
-	    bind
-
-     2. name - read-only file. Shows the name of the driver in this format:
-
-	cat /sys/class/vtconsole/vtcon0/name
-	(S) VGA+
-
-	    '(S)' stands for a (S)ystem driver, i.e., it cannot be directly
-	    commanded to bind or unbind
-
-	    'VGA+' is the name of the driver
-
-	cat /sys/class/vtconsole/vtcon1/name
-	(M) frame buffer device
-
-	    In this case, '(M)' stands for a (M)odular driver, one that can be
-	    directly commanded to bind or unbind.
-
-     3. uevent - ignore this file
-
-When unbinding, the modular driver is detached first, and then the system
-driver takes over the consoles vacated by the driver. Binding, on the other
-hand, will bind the driver to the consoles that are currently occupied by a
-system driver.
-
-NOTE1: Binding and unbinding must be selected in Kconfig. It's under:
-
-Device Drivers -> Character devices -> Support for binding and unbinding
-console drivers
-
-NOTE2: If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
-unbinding will not succeed. An example of an application that sets the console
-to KD_GRAPHICS is X.
-
-How useful is this feature? This is very useful for console driver
-developers. By unbinding the driver from the console layer, one can unload the
-driver, make changes, recompile, reload and rebind the driver without any need
-for rebooting the kernel. For regular users who may want to switch from
-framebuffer console to VGA console and vice versa, this feature also makes
-this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
-for more details.)
-
-Notes for developers:
-=====================
-
-do_take_over_console() is now broken up into:
-
-     do_register_con_driver()
-     do_bind_con_driver() - private function
-
-give_up_console() is a wrapper to do_unregister_con_driver(), and a driver must
-be fully unbound for this call to succeed. con_is_bound() will check if the
-driver is bound or not.
-
-Guidelines for console driver writers:
-=====================================
-
-In order for binding to and unbinding from the console to properly work,
-console drivers must follow these guidelines:
-
-1. All drivers, except system drivers, must call either do_register_con_driver()
-   or do_take_over_console(). do_register_con_driver() will just add the driver
-   to the console's internal list. It won't take over the
-   console. do_take_over_console(), as it name implies, will also take over (or
-   bind to) the console.
-
-2. All resources allocated during con->con_init() must be released in
-   con->con_deinit().
-
-3. All resources allocated in con->con_startup() must be released when the
-   driver, which was previously bound, becomes unbound.  The console layer
-   does not have a complementary call to con->con_startup() so it's up to the
-   driver to check when it's legal to release these resources. Calling
-   con_is_bound() in con->con_deinit() will help.  If the call returned
-   false(), then it's safe to release the resources.  This balance has to be
-   ensured because con->con_startup() can be called again when a request to
-   rebind the driver to the console arrives.
-
-4. Upon exit of the driver, ensure that the driver is totally unbound. If the
-   condition is satisfied, then the driver must call do_unregister_con_driver()
-   or give_up_console().
-
-5. do_unregister_con_driver() can also be called on conditions which make it
-   impossible for the driver to service console requests.  This can happen
-   with the framebuffer console that suddenly lost all of its drivers.
-
-The current crop of console drivers should still work correctly, but binding
-and unbinding them may cause problems. With minimal fixes, these drivers can
-be made to work correctly.
-
-==========================
-Antonino Daplas <adaplas@pol.net>
-
diff --git a/Documentation/fb/fbcon.rst b/Documentation/fb/fbcon.rst
index 1da65b9000de..26bc5cdaabab 100644
--- a/Documentation/fb/fbcon.rst
+++ b/Documentation/fb/fbcon.rst
@@ -187,7 +187,7 @@ the hardware. Thus, in a VGA console::
 Assuming the VGA driver can be unloaded, one must first unbind the VGA driver
 from the console layer before unloading the driver.  The VGA driver cannot be
 unloaded if it is still bound to the console layer. (See
-Documentation/console/console.txt for more information).
+Documentation/console/console.rst for more information).
 
 This is more complicated in the case of the framebuffer console (fbcon),
 because fbcon is an intermediate layer between the console and the drivers::
@@ -204,7 +204,7 @@ fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from
 fbcon.
 
 So, how do we unbind fbcon from the console? Part of the answer is in
-Documentation/console/console.txt. To summarize:
+Documentation/console/console.rst. To summarize:
 
 Echo a value to the bind file that represents the framebuffer console
 driver. So assuming vtcon1 represents fbcon, then::
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 0e3e4dacbc12..1cb50f19d58c 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -93,7 +93,7 @@ config VT_HW_CONSOLE_BINDING
          select the console driver that will serve as the backend for the
          virtual terminals.
 
-	 See <file:Documentation/console/console.txt> for more
+	 See <file:Documentation/console/console.rst> for more
 	 information. For framebuffer console users, please refer to
 	 <file:Documentation/fb/fbcon.rst>.
 
-- 
cgit v1.2.3-55-g7522


From 93d2c159673325624ef3f2d14ededfcdf76f948b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 08:50:59 -0300
Subject: docs: pti_intel_mid.txt: convert it to pti_intel_mid.rst

Convert this small file to ReST format and rename it.

Most of the conversion were related to adjusting whitespaces
in order for each section to be properly parsed.

While this is not part of any book, mark it as :orphan:, in order
to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/pti/pti_intel_mid.rst | 106 ++++++++++++++++++++++++++++++++++++
 Documentation/pti/pti_intel_mid.txt |  99 ---------------------------------
 2 files changed, 106 insertions(+), 99 deletions(-)
 create mode 100644 Documentation/pti/pti_intel_mid.rst
 delete mode 100644 Documentation/pti/pti_intel_mid.txt

diff --git a/Documentation/pti/pti_intel_mid.rst b/Documentation/pti/pti_intel_mid.rst
new file mode 100644
index 000000000000..ea05725174cb
--- /dev/null
+++ b/Documentation/pti/pti_intel_mid.rst
@@ -0,0 +1,106 @@
+:orphan:
+
+=============
+Intel MID PTI
+=============
+
+The Intel MID PTI project is HW implemented in Intel Atom
+system-on-a-chip designs based on the Parallel Trace
+Interface for MIPI P1149.7 cJTAG standard.  The kernel solution
+for this platform involves the following files::
+
+	./include/linux/pti.h
+	./drivers/.../n_tracesink.h
+	./drivers/.../n_tracerouter.c
+	./drivers/.../n_tracesink.c
+	./drivers/.../pti.c
+
+pti.c is the driver that enables various debugging features
+popular on platforms from certain mobile manufacturers.
+n_tracerouter.c and n_tracesink.c allow extra system information to
+be collected and routed to the pti driver, such as trace
+debugging data from a modem.  Although n_tracerouter
+and n_tracesink are a part of the complete PTI solution,
+these two line disciplines can work separately from
+pti.c and route any data stream from one /dev/tty node
+to another /dev/tty node via kernel-space.  This provides
+a stable, reliable connection that will not break unless
+the user-space application shuts down (plus avoids
+kernel->user->kernel context switch overheads of routing
+data).
+
+An example debugging usage for this driver system:
+
+  * Hook /dev/ttyPTI0 to syslogd.  Opening this port will also start
+    a console device to further capture debugging messages to PTI.
+  * Hook /dev/ttyPTI1 to modem debugging data to write to PTI HW.
+    This is where n_tracerouter and n_tracesink are used.
+  * Hook /dev/pti to a user-level debugging application for writing
+    to PTI HW.
+  * `Use mipi_` Kernel Driver API in other device drivers for
+    debugging to PTI by first requesting a PTI write address via
+    mipi_request_masterchannel(1).
+
+Below is example pseudo-code on how a 'privileged' application
+can hook up n_tracerouter and n_tracesink to any tty on
+a system.  'Privileged' means the application has enough
+privileges to successfully manipulate the ldisc drivers
+but is not just blindly executing as 'root'. Keep in mind
+the use of ioctl(,TIOCSETD,) is not specific to the n_tracerouter
+and n_tracesink line discpline drivers but is a generic
+operation for a program to use a line discpline driver
+on a tty port other than the default n_tty::
+
+  /////////// To hook up n_tracerouter and n_tracesink /////////
+
+  // Note that n_tracerouter depends on n_tracesink.
+  #include <errno.h>
+  #define ONE_TTY "/dev/ttyOne"
+  #define TWO_TTY "/dev/ttyTwo"
+
+  // needed global to hand onto ldisc connection
+  static int g_fd_source = -1;
+  static int g_fd_sink  = -1;
+
+  // these two vars used to grab LDISC values from loaded ldisc drivers
+  // in OS.  Look at /proc/tty/ldiscs to get the right numbers from
+  // the ldiscs loaded in the system.
+  int source_ldisc_num, sink_ldisc_num = -1;
+  int retval;
+
+  g_fd_source = open(ONE_TTY, O_RDWR); // must be R/W
+  g_fd_sink   = open(TWO_TTY, O_RDWR); // must be R/W
+
+  if (g_fd_source <= 0) || (g_fd_sink <= 0) {
+     // doubt you'll want to use these exact error lines of code
+     printf("Error on open(). errno: %d\n",errno);
+     return errno;
+  }
+
+  retval = ioctl(g_fd_sink, TIOCSETD, &sink_ldisc_num);
+  if (retval < 0) {
+     printf("Error on ioctl().  errno: %d\n", errno);
+     return errno;
+  }
+
+  retval = ioctl(g_fd_source, TIOCSETD, &source_ldisc_num);
+  if (retval < 0) {
+     printf("Error on ioctl().  errno: %d\n", errno);
+     return errno;
+  }
+
+  /////////// To disconnect n_tracerouter and n_tracesink ////////
+
+  // First make sure data through the ldiscs has stopped.
+
+  // Second, disconnect ldiscs.  This provides a
+  // little cleaner shutdown on tty stack.
+  sink_ldisc_num = 0;
+  source_ldisc_num = 0;
+  ioctl(g_fd_uart, TIOCSETD, &sink_ldisc_num);
+  ioctl(g_fd_gadget, TIOCSETD, &source_ldisc_num);
+
+  // Three, program closes connection, and cleanup:
+  close(g_fd_uart);
+  close(g_fd_gadget);
+  g_fd_uart = g_fd_gadget = NULL;
diff --git a/Documentation/pti/pti_intel_mid.txt b/Documentation/pti/pti_intel_mid.txt
deleted file mode 100644
index e7a5b6d1f7a9..000000000000
--- a/Documentation/pti/pti_intel_mid.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-The Intel MID PTI project is HW implemented in Intel Atom
-system-on-a-chip designs based on the Parallel Trace
-Interface for MIPI P1149.7 cJTAG standard.  The kernel solution
-for this platform involves the following files:
-
-./include/linux/pti.h
-./drivers/.../n_tracesink.h
-./drivers/.../n_tracerouter.c
-./drivers/.../n_tracesink.c
-./drivers/.../pti.c
-
-pti.c is the driver that enables various debugging features
-popular on platforms from certain mobile manufacturers.
-n_tracerouter.c and n_tracesink.c allow extra system information to
-be collected and routed to the pti driver, such as trace
-debugging data from a modem.  Although n_tracerouter
-and n_tracesink are a part of the complete PTI solution,
-these two line disciplines can work separately from
-pti.c and route any data stream from one /dev/tty node
-to another /dev/tty node via kernel-space.  This provides
-a stable, reliable connection that will not break unless
-the user-space application shuts down (plus avoids
-kernel->user->kernel context switch overheads of routing
-data).
-
-An example debugging usage for this driver system:
-   *Hook /dev/ttyPTI0 to syslogd.  Opening this port will also start
-    a console device to further capture debugging messages to PTI.
-   *Hook /dev/ttyPTI1 to modem debugging data to write to PTI HW.
-    This is where n_tracerouter and n_tracesink are used.
-   *Hook /dev/pti to a user-level debugging application for writing
-    to PTI HW.
-   *Use mipi_* Kernel Driver API in other device drivers for
-    debugging to PTI by first requesting a PTI write address via
-    mipi_request_masterchannel(1).
-
-Below is example pseudo-code on how a 'privileged' application
-can hook up n_tracerouter and n_tracesink to any tty on
-a system.  'Privileged' means the application has enough
-privileges to successfully manipulate the ldisc drivers
-but is not just blindly executing as 'root'. Keep in mind
-the use of ioctl(,TIOCSETD,) is not specific to the n_tracerouter
-and n_tracesink line discpline drivers but is a generic
-operation for a program to use a line discpline driver
-on a tty port other than the default n_tty.
-
-/////////// To hook up n_tracerouter and n_tracesink /////////
-
-// Note that n_tracerouter depends on n_tracesink.
-#include <errno.h>
-#define ONE_TTY "/dev/ttyOne"
-#define TWO_TTY "/dev/ttyTwo"
-
-// needed global to hand onto ldisc connection
-static int g_fd_source = -1;
-static int g_fd_sink  = -1;
-
-// these two vars used to grab LDISC values from loaded ldisc drivers
-// in OS.  Look at /proc/tty/ldiscs to get the right numbers from
-// the ldiscs loaded in the system.
-int source_ldisc_num, sink_ldisc_num = -1;
-int retval;
-
-g_fd_source = open(ONE_TTY, O_RDWR); // must be R/W
-g_fd_sink   = open(TWO_TTY, O_RDWR); // must be R/W
-
-if (g_fd_source <= 0) || (g_fd_sink <= 0) {
-   // doubt you'll want to use these exact error lines of code
-   printf("Error on open(). errno: %d\n",errno);
-   return errno;
-}
-
-retval = ioctl(g_fd_sink, TIOCSETD, &sink_ldisc_num);
-if (retval < 0) {
-   printf("Error on ioctl().  errno: %d\n", errno);
-   return errno;
-}
-
-retval = ioctl(g_fd_source, TIOCSETD, &source_ldisc_num);
-if (retval < 0) {
-   printf("Error on ioctl().  errno: %d\n", errno);
-   return errno;
-}
-
-/////////// To disconnect n_tracerouter and n_tracesink ////////
-
-// First make sure data through the ldiscs has stopped.
-
-// Second, disconnect ldiscs.  This provides a
-// little cleaner shutdown on tty stack.
-sink_ldisc_num = 0;
-source_ldisc_num = 0;
-ioctl(g_fd_uart, TIOCSETD, &sink_ldisc_num);
-ioctl(g_fd_gadget, TIOCSETD, &source_ldisc_num);
-
-// Three, program closes connection, and cleanup:
-close(g_fd_uart);
-close(g_fd_gadget);
-g_fd_uart = g_fd_gadget = NULL;
-- 
cgit v1.2.3-55-g7522


From 0d07cf5e53a21e35289adc3ab99b6804ff0c3833 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 08:58:05 -0300
Subject: docs: early-userspace: convert docs to ReST and rename to *.rst

The two files there describes a Kernel API feature, used to
support early userspace stuff. Prepare for moving them to
the kernel API book by converting to ReST format.

The conversion itself was quite trivial: just add/mark a few
titles as such, add a literal block markup, add a table markup
and a few blank lines, in order to make Sphinx to properly parse it.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/early-userspace/README               | 151 --------------------
 Documentation/early-userspace/buffer-format.rst    | 119 ++++++++++++++++
 Documentation/early-userspace/buffer-format.txt    | 112 ---------------
 .../early-userspace/early_userspace_support.rst    | 154 +++++++++++++++++++++
 Documentation/early-userspace/index.rst            |  18 +++
 Documentation/filesystems/nfs/nfsroot.txt          |   2 +-
 .../filesystems/ramfs-rootfs-initramfs.txt         |   4 +-
 usr/Kconfig                                        |   2 +-
 8 files changed, 295 insertions(+), 267 deletions(-)
 delete mode 100644 Documentation/early-userspace/README
 create mode 100644 Documentation/early-userspace/buffer-format.rst
 delete mode 100644 Documentation/early-userspace/buffer-format.txt
 create mode 100644 Documentation/early-userspace/early_userspace_support.rst
 create mode 100644 Documentation/early-userspace/index.rst

diff --git a/Documentation/early-userspace/README b/Documentation/early-userspace/README
deleted file mode 100644
index 955d667dc87e..000000000000
--- a/Documentation/early-userspace/README
+++ /dev/null
@@ -1,151 +0,0 @@
-Early userspace support
-=======================
-
-Last update: 2004-12-20 tlh
-
-
-"Early userspace" is a set of libraries and programs that provide
-various pieces of functionality that are important enough to be
-available while a Linux kernel is coming up, but that don't need to be
-run inside the kernel itself.
-
-It consists of several major infrastructure components:
-
-- gen_init_cpio, a program that builds a cpio-format archive
-  containing a root filesystem image.  This archive is compressed, and
-  the compressed image is linked into the kernel image.
-- initramfs, a chunk of code that unpacks the compressed cpio image
-  midway through the kernel boot process.
-- klibc, a userspace C library, currently packaged separately, that is
-  optimized for correctness and small size.
-
-The cpio file format used by initramfs is the "newc" (aka "cpio -H newc")
-format, and is documented in the file "buffer-format.txt".  There are
-two ways to add an early userspace image: specify an existing cpio
-archive to be used as the image or have the kernel build process build
-the image from specifications.
-
-CPIO ARCHIVE method
-
-You can create a cpio archive that contains the early userspace image.
-Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it
-will be used directly.  Only a single cpio file may be specified in
-CONFIG_INITRAMFS_SOURCE and directory and file names are not allowed in
-combination with a cpio archive.
-
-IMAGE BUILDING method
-
-The kernel build process can also build an early userspace image from
-source parts rather than supplying a cpio archive.  This method provides
-a way to create images with root-owned files even though the image was
-built by an unprivileged user.
-
-The image is specified as one or more sources in
-CONFIG_INITRAMFS_SOURCE.  Sources can be either directories or files -
-cpio archives are *not* allowed when building from sources.
-
-A source directory will have it and all of its contents packaged.  The
-specified directory name will be mapped to '/'.  When packaging a
-directory, limited user and group ID translation can be performed.
-INITRAMFS_ROOT_UID can be set to a user ID that needs to be mapped to
-user root (0).  INITRAMFS_ROOT_GID can be set to a group ID that needs
-to be mapped to group root (0).
-
-A source file must be directives in the format required by the
-usr/gen_init_cpio utility (run 'usr/gen_init_cpio -h' to get the
-file format).  The directives in the file will be passed directly to
-usr/gen_init_cpio.
-
-When a combination of directories and files are specified then the
-initramfs image will be an aggregate of all of them.  In this way a user
-can create a 'root-image' directory and install all files into it.
-Because device-special files cannot be created by a unprivileged user,
-special files can be listed in a 'root-files' file.  Both 'root-image'
-and 'root-files' can be listed in CONFIG_INITRAMFS_SOURCE and a complete
-early userspace image can be built by an unprivileged user.
-
-As a technical note, when directories and files are specified, the
-entire CONFIG_INITRAMFS_SOURCE is passed to
-usr/gen_initramfs_list.sh.  This means that CONFIG_INITRAMFS_SOURCE
-can really be interpreted as any legal argument to
-gen_initramfs_list.sh.  If a directory is specified as an argument then
-the contents are scanned, uid/gid translation is performed, and
-usr/gen_init_cpio file directives are output.  If a directory is
-specified as an argument to usr/gen_initramfs_list.sh then the
-contents of the file are simply copied to the output.  All of the output
-directives from directory scanning and file contents copying are
-processed by usr/gen_init_cpio.
-
-See also 'usr/gen_initramfs_list.sh -h'.
-
-Where's this all leading?
-=========================
-
-The klibc distribution contains some of the necessary software to make
-early userspace useful.  The klibc distribution is currently
-maintained separately from the kernel.
-
-You can obtain somewhat infrequent snapshots of klibc from
-https://www.kernel.org/pub/linux/libs/klibc/
-
-For active users, you are better off using the klibc git
-repository, at http://git.kernel.org/?p=libs/klibc/klibc.git
-
-The standalone klibc distribution currently provides three components,
-in addition to the klibc library:
-
-- ipconfig, a program that configures network interfaces.  It can
-  configure them statically, or use DHCP to obtain information
-  dynamically (aka "IP autoconfiguration").
-- nfsmount, a program that can mount an NFS filesystem.
-- kinit, the "glue" that uses ipconfig and nfsmount to replace the old
-  support for IP autoconfig, mount a filesystem over NFS, and continue
-  system boot using that filesystem as root.
-
-kinit is built as a single statically linked binary to save space.
-
-Eventually, several more chunks of kernel functionality will hopefully
-move to early userspace:
-
-- Almost all of init/do_mounts* (the beginning of this is already in
-  place)
-- ACPI table parsing
-- Insert unwieldy subsystem that doesn't really need to be in kernel
-  space here
-
-If kinit doesn't meet your current needs and you've got bytes to burn,
-the klibc distribution includes a small Bourne-compatible shell (ash)
-and a number of other utilities, so you can replace kinit and build
-custom initramfs images that meet your needs exactly.
-
-For questions and help, you can sign up for the early userspace
-mailing list at http://www.zytor.com/mailman/listinfo/klibc
-
-How does it work?
-=================
-
-The kernel has currently 3 ways to mount the root filesystem:
-
-a) all required device and filesystem drivers compiled into the kernel, no
-   initrd.  init/main.c:init() will call prepare_namespace() to mount the
-   final root filesystem, based on the root= option and optional init= to run
-   some other init binary than listed at the end of init/main.c:init().
-
-b) some device and filesystem drivers built as modules and stored in an
-   initrd.  The initrd must contain a binary '/linuxrc' which is supposed to
-   load these driver modules.  It is also possible to mount the final root
-   filesystem via linuxrc and use the pivot_root syscall.  The initrd is
-   mounted and executed via prepare_namespace().
-
-c) using initramfs.  The call to prepare_namespace() must be skipped.
-   This means that a binary must do all the work.  Said binary can be stored
-   into initramfs either via modifying usr/gen_init_cpio.c or via the new
-   initrd format, an cpio archive.  It must be called "/init".  This binary
-   is responsible to do all the things prepare_namespace() would do.
-
-   To maintain backwards compatibility, the /init binary will only run if it
-   comes via an initramfs cpio archive.  If this is not the case,
-   init/main.c:init() will run prepare_namespace() to mount the final root
-   and exec one of the predefined init binaries.
-
-Bryan O'Sullivan <bos@serpentine.com>
diff --git a/Documentation/early-userspace/buffer-format.rst b/Documentation/early-userspace/buffer-format.rst
new file mode 100644
index 000000000000..7f74e301fdf3
--- /dev/null
+++ b/Documentation/early-userspace/buffer-format.rst
@@ -0,0 +1,119 @@
+=======================
+initramfs buffer format
+=======================
+
+Al Viro, H. Peter Anvin
+
+Last revision: 2002-01-13
+
+Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
+getting {replaced/complemented} with the new "initial ramfs"
+(initramfs) protocol.  The initramfs contents is passed using the same
+memory buffer protocol used by the initrd protocol, but the contents
+is different.  The initramfs buffer contains an archive which is
+expanded into a ramfs filesystem; this document details the format of
+the initramfs buffer format.
+
+The initramfs buffer format is based around the "newc" or "crc" CPIO
+formats, and can be created with the cpio(1) utility.  The cpio
+archive can be compressed using gzip(1).  One valid version of an
+initramfs buffer is thus a single .cpio.gz file.
+
+The full format of the initramfs buffer is defined by the following
+grammar, where::
+
+	*	is used to indicate "0 or more occurrences of"
+	(|)	indicates alternatives
+	+	indicates concatenation
+	GZIP()	indicates the gzip(1) of the operand
+	ALGN(n)	means padding with null bytes to an n-byte boundary
+
+	initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
+
+	cpio_gzip_archive := GZIP(cpio_archive)
+
+	cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
+
+	cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
+
+	cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
+
+
+In human terms, the initramfs buffer contains a collection of
+compressed and/or uncompressed cpio archives (in the "newc" or "crc"
+formats); arbitrary amounts zero bytes (for padding) can be added
+between members.
+
+The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
+not ignored; see "handling of hard links" below.
+
+The structure of the cpio_header is as follows (all fields contain
+hexadecimal ASCII numbers fully padded with '0' on the left to the
+full width of the field, for example, the integer 4780 is represented
+by the ASCII string "000012ac"):
+
+============= ================== ==============================================
+Field name    Field size	 Meaning
+============= ================== ==============================================
+c_magic	      6 bytes		 The string "070701" or "070702"
+c_ino	      8 bytes		 File inode number
+c_mode	      8 bytes		 File mode and permissions
+c_uid	      8 bytes		 File uid
+c_gid	      8 bytes		 File gid
+c_nlink	      8 bytes		 Number of links
+c_mtime	      8 bytes		 Modification time
+c_filesize    8 bytes		 Size of data field
+c_maj	      8 bytes		 Major part of file device number
+c_min	      8 bytes		 Minor part of file device number
+c_rmaj	      8 bytes		 Major part of device node reference
+c_rmin	      8 bytes		 Minor part of device node reference
+c_namesize    8 bytes		 Length of filename, including final \0
+c_chksum      8 bytes		 Checksum of data field if c_magic is 070702;
+				 otherwise zero
+============= ================== ==============================================
+
+The c_mode field matches the contents of st_mode returned by stat(2)
+on Linux, and encodes the file type and file permissions.
+
+The c_filesize should be zero for any file which is not a regular file
+or symlink.
+
+The c_chksum field contains a simple 32-bit unsigned sum of all the
+bytes in the data field.  cpio(1) refers to this as "crc", which is
+clearly incorrect (a cyclic redundancy check is a different and
+significantly stronger integrity check), however, this is the
+algorithm used.
+
+If the filename is "TRAILER!!!" this is actually an end-of-archive
+marker; the c_filesize for an end-of-archive marker must be zero.
+
+
+Handling of hard links
+======================
+
+When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino)
+tuple is looked up in a tuple buffer.  If not found, it is entered in
+the tuple buffer and the entry is created as usual; if found, a hard
+link rather than a second copy of the file is created.  It is not
+necessary (but permitted) to include a second copy of the file
+contents; if the file contents is not included, the c_filesize field
+should be set to zero to indicate no data section follows.  If data is
+present, the previous instance of the file is overwritten; this allows
+the data-carrying instance of a file to occur anywhere in the sequence
+(GNU cpio is reported to attach the data to the last instance of a
+file only.)
+
+c_filesize must not be zero for a symlink.
+
+When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is
+reset.  This permits archives which are generated independently to be
+concatenated.
+
+To combine file data from different sources (without having to
+regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
+the following techniques can be used:
+
+a) Separate the different file data sources with a "TRAILER!!!"
+   end-of-archive marker, or
+
+b) Make sure c_nlink == 1 for all nondirectory entries.
diff --git a/Documentation/early-userspace/buffer-format.txt b/Documentation/early-userspace/buffer-format.txt
deleted file mode 100644
index e1fd7f9dad16..000000000000
--- a/Documentation/early-userspace/buffer-format.txt
+++ /dev/null
@@ -1,112 +0,0 @@
-		       initramfs buffer format
-		       -----------------------
-
-		       Al Viro, H. Peter Anvin
-		      Last revision: 2002-01-13
-
-Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
-getting {replaced/complemented} with the new "initial ramfs"
-(initramfs) protocol.  The initramfs contents is passed using the same
-memory buffer protocol used by the initrd protocol, but the contents
-is different.  The initramfs buffer contains an archive which is
-expanded into a ramfs filesystem; this document details the format of
-the initramfs buffer format.
-
-The initramfs buffer format is based around the "newc" or "crc" CPIO
-formats, and can be created with the cpio(1) utility.  The cpio
-archive can be compressed using gzip(1).  One valid version of an
-initramfs buffer is thus a single .cpio.gz file.
-
-The full format of the initramfs buffer is defined by the following
-grammar, where:
-	*	is used to indicate "0 or more occurrences of"
-	(|)	indicates alternatives
-	+	indicates concatenation
-	GZIP()	indicates the gzip(1) of the operand
-	ALGN(n)	means padding with null bytes to an n-byte boundary
-
-	initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
-
-	cpio_gzip_archive := GZIP(cpio_archive)
-
-	cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
-
-	cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
-
-	cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
-
-
-In human terms, the initramfs buffer contains a collection of
-compressed and/or uncompressed cpio archives (in the "newc" or "crc"
-formats); arbitrary amounts zero bytes (for padding) can be added
-between members.
-
-The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
-not ignored; see "handling of hard links" below.
-
-The structure of the cpio_header is as follows (all fields contain
-hexadecimal ASCII numbers fully padded with '0' on the left to the
-full width of the field, for example, the integer 4780 is represented
-by the ASCII string "000012ac"):
-
-Field name    Field size	 Meaning
-c_magic	      6 bytes		 The string "070701" or "070702"
-c_ino	      8 bytes		 File inode number
-c_mode	      8 bytes		 File mode and permissions
-c_uid	      8 bytes		 File uid
-c_gid	      8 bytes		 File gid
-c_nlink	      8 bytes		 Number of links
-c_mtime	      8 bytes		 Modification time
-c_filesize    8 bytes		 Size of data field
-c_maj	      8 bytes		 Major part of file device number
-c_min	      8 bytes		 Minor part of file device number
-c_rmaj	      8 bytes		 Major part of device node reference
-c_rmin	      8 bytes		 Minor part of device node reference
-c_namesize    8 bytes		 Length of filename, including final \0
-c_chksum      8 bytes		 Checksum of data field if c_magic is 070702;
-				 otherwise zero
-
-The c_mode field matches the contents of st_mode returned by stat(2)
-on Linux, and encodes the file type and file permissions.
-
-The c_filesize should be zero for any file which is not a regular file
-or symlink.
-
-The c_chksum field contains a simple 32-bit unsigned sum of all the
-bytes in the data field.  cpio(1) refers to this as "crc", which is
-clearly incorrect (a cyclic redundancy check is a different and
-significantly stronger integrity check), however, this is the
-algorithm used.
-
-If the filename is "TRAILER!!!" this is actually an end-of-archive
-marker; the c_filesize for an end-of-archive marker must be zero.
-
-
-*** Handling of hard links
-
-When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino)
-tuple is looked up in a tuple buffer.  If not found, it is entered in
-the tuple buffer and the entry is created as usual; if found, a hard
-link rather than a second copy of the file is created.  It is not
-necessary (but permitted) to include a second copy of the file
-contents; if the file contents is not included, the c_filesize field
-should be set to zero to indicate no data section follows.  If data is
-present, the previous instance of the file is overwritten; this allows
-the data-carrying instance of a file to occur anywhere in the sequence
-(GNU cpio is reported to attach the data to the last instance of a
-file only.)
-
-c_filesize must not be zero for a symlink.
-
-When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is
-reset.  This permits archives which are generated independently to be
-concatenated.
-
-To combine file data from different sources (without having to
-regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
-the following techniques can be used:
-
-a) Separate the different file data sources with a "TRAILER!!!"
-   end-of-archive marker, or
-
-b) Make sure c_nlink == 1 for all nondirectory entries.
diff --git a/Documentation/early-userspace/early_userspace_support.rst b/Documentation/early-userspace/early_userspace_support.rst
new file mode 100644
index 000000000000..3deefb34046b
--- /dev/null
+++ b/Documentation/early-userspace/early_userspace_support.rst
@@ -0,0 +1,154 @@
+=======================
+Early userspace support
+=======================
+
+Last update: 2004-12-20 tlh
+
+
+"Early userspace" is a set of libraries and programs that provide
+various pieces of functionality that are important enough to be
+available while a Linux kernel is coming up, but that don't need to be
+run inside the kernel itself.
+
+It consists of several major infrastructure components:
+
+- gen_init_cpio, a program that builds a cpio-format archive
+  containing a root filesystem image.  This archive is compressed, and
+  the compressed image is linked into the kernel image.
+- initramfs, a chunk of code that unpacks the compressed cpio image
+  midway through the kernel boot process.
+- klibc, a userspace C library, currently packaged separately, that is
+  optimized for correctness and small size.
+
+The cpio file format used by initramfs is the "newc" (aka "cpio -H newc")
+format, and is documented in the file "buffer-format.txt".  There are
+two ways to add an early userspace image: specify an existing cpio
+archive to be used as the image or have the kernel build process build
+the image from specifications.
+
+CPIO ARCHIVE method
+-------------------
+
+You can create a cpio archive that contains the early userspace image.
+Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it
+will be used directly.  Only a single cpio file may be specified in
+CONFIG_INITRAMFS_SOURCE and directory and file names are not allowed in
+combination with a cpio archive.
+
+IMAGE BUILDING method
+---------------------
+
+The kernel build process can also build an early userspace image from
+source parts rather than supplying a cpio archive.  This method provides
+a way to create images with root-owned files even though the image was
+built by an unprivileged user.
+
+The image is specified as one or more sources in
+CONFIG_INITRAMFS_SOURCE.  Sources can be either directories or files -
+cpio archives are *not* allowed when building from sources.
+
+A source directory will have it and all of its contents packaged.  The
+specified directory name will be mapped to '/'.  When packaging a
+directory, limited user and group ID translation can be performed.
+INITRAMFS_ROOT_UID can be set to a user ID that needs to be mapped to
+user root (0).  INITRAMFS_ROOT_GID can be set to a group ID that needs
+to be mapped to group root (0).
+
+A source file must be directives in the format required by the
+usr/gen_init_cpio utility (run 'usr/gen_init_cpio -h' to get the
+file format).  The directives in the file will be passed directly to
+usr/gen_init_cpio.
+
+When a combination of directories and files are specified then the
+initramfs image will be an aggregate of all of them.  In this way a user
+can create a 'root-image' directory and install all files into it.
+Because device-special files cannot be created by a unprivileged user,
+special files can be listed in a 'root-files' file.  Both 'root-image'
+and 'root-files' can be listed in CONFIG_INITRAMFS_SOURCE and a complete
+early userspace image can be built by an unprivileged user.
+
+As a technical note, when directories and files are specified, the
+entire CONFIG_INITRAMFS_SOURCE is passed to
+usr/gen_initramfs_list.sh.  This means that CONFIG_INITRAMFS_SOURCE
+can really be interpreted as any legal argument to
+gen_initramfs_list.sh.  If a directory is specified as an argument then
+the contents are scanned, uid/gid translation is performed, and
+usr/gen_init_cpio file directives are output.  If a directory is
+specified as an argument to usr/gen_initramfs_list.sh then the
+contents of the file are simply copied to the output.  All of the output
+directives from directory scanning and file contents copying are
+processed by usr/gen_init_cpio.
+
+See also 'usr/gen_initramfs_list.sh -h'.
+
+Where's this all leading?
+=========================
+
+The klibc distribution contains some of the necessary software to make
+early userspace useful.  The klibc distribution is currently
+maintained separately from the kernel.
+
+You can obtain somewhat infrequent snapshots of klibc from
+https://www.kernel.org/pub/linux/libs/klibc/
+
+For active users, you are better off using the klibc git
+repository, at http://git.kernel.org/?p=libs/klibc/klibc.git
+
+The standalone klibc distribution currently provides three components,
+in addition to the klibc library:
+
+- ipconfig, a program that configures network interfaces.  It can
+  configure them statically, or use DHCP to obtain information
+  dynamically (aka "IP autoconfiguration").
+- nfsmount, a program that can mount an NFS filesystem.
+- kinit, the "glue" that uses ipconfig and nfsmount to replace the old
+  support for IP autoconfig, mount a filesystem over NFS, and continue
+  system boot using that filesystem as root.
+
+kinit is built as a single statically linked binary to save space.
+
+Eventually, several more chunks of kernel functionality will hopefully
+move to early userspace:
+
+- Almost all of init/do_mounts* (the beginning of this is already in
+  place)
+- ACPI table parsing
+- Insert unwieldy subsystem that doesn't really need to be in kernel
+  space here
+
+If kinit doesn't meet your current needs and you've got bytes to burn,
+the klibc distribution includes a small Bourne-compatible shell (ash)
+and a number of other utilities, so you can replace kinit and build
+custom initramfs images that meet your needs exactly.
+
+For questions and help, you can sign up for the early userspace
+mailing list at http://www.zytor.com/mailman/listinfo/klibc
+
+How does it work?
+=================
+
+The kernel has currently 3 ways to mount the root filesystem:
+
+a) all required device and filesystem drivers compiled into the kernel, no
+   initrd.  init/main.c:init() will call prepare_namespace() to mount the
+   final root filesystem, based on the root= option and optional init= to run
+   some other init binary than listed at the end of init/main.c:init().
+
+b) some device and filesystem drivers built as modules and stored in an
+   initrd.  The initrd must contain a binary '/linuxrc' which is supposed to
+   load these driver modules.  It is also possible to mount the final root
+   filesystem via linuxrc and use the pivot_root syscall.  The initrd is
+   mounted and executed via prepare_namespace().
+
+c) using initramfs.  The call to prepare_namespace() must be skipped.
+   This means that a binary must do all the work.  Said binary can be stored
+   into initramfs either via modifying usr/gen_init_cpio.c or via the new
+   initrd format, an cpio archive.  It must be called "/init".  This binary
+   is responsible to do all the things prepare_namespace() would do.
+
+   To maintain backwards compatibility, the /init binary will only run if it
+   comes via an initramfs cpio archive.  If this is not the case,
+   init/main.c:init() will run prepare_namespace() to mount the final root
+   and exec one of the predefined init binaries.
+
+Bryan O'Sullivan <bos@serpentine.com>
diff --git a/Documentation/early-userspace/index.rst b/Documentation/early-userspace/index.rst
new file mode 100644
index 000000000000..2b8eb6132058
--- /dev/null
+++ b/Documentation/early-userspace/index.rst
@@ -0,0 +1,18 @@
+:orphan:
+
+===============
+Early Userspace
+===============
+
+.. toctree::
+    :maxdepth: 1
+
+    early_userspace_support
+    buffer-format
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt
index d2963123eb1c..4862d3d77e27 100644
--- a/Documentation/filesystems/nfs/nfsroot.txt
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -239,7 +239,7 @@ rdinit=<executable file>
   A description of the process of mounting the root file system can be
   found in:
 
-    Documentation/early-userspace/README
+    Documentation/early-userspace/early_userspace_support.rst
 
 
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index 79637d227e85..fa985909dbca 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -105,7 +105,7 @@ All this differs from the old initrd in several ways:
   - The old initrd file was a gzipped filesystem image (in some file format,
     such as ext2, that needed a driver built into the kernel), while the new
     initramfs archive is a gzipped cpio archive (like tar only simpler,
-    see cpio(1) and Documentation/early-userspace/buffer-format.txt).  The
+    see cpio(1) and Documentation/early-userspace/buffer-format.rst).  The
     kernel's cpio extraction code is not only extremely small, it's also
     __init text and data that can be discarded during the boot process.
 
@@ -159,7 +159,7 @@ One advantage of the configuration file is that root access is not required to
 set permissions or create device nodes in the new archive.  (Note that those
 two example "file" entries expect to find files named "init.sh" and "busybox" in
 a directory called "initramfs", under the linux-2.6.* directory.  See
-Documentation/early-userspace/README for more details.)
+Documentation/early-userspace/early_userspace_support.rst for more details.)
 
 The kernel does not depend on external cpio tools.  If you specify a
 directory instead of a configuration file, the kernel's build infrastructure
diff --git a/usr/Kconfig b/usr/Kconfig
index 43658b8a975e..86e37e297278 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -18,7 +18,7 @@ config INITRAMFS_SOURCE
 	  When multiple directories and files are specified then the
 	  initramfs image will be the aggregate of all of them.
 
-	  See <file:Documentation/early-userspace/README> for more details.
+	  See <file:Documentation/early-userspace/early_userspace_support.rst> for more details.
 
 	  If you are not sure, leave it blank.
 
-- 
cgit v1.2.3-55-g7522


From dc7a12bdfccd94c31f79e294f16f7549bd411b49 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sun, 14 Apr 2019 15:51:10 -0300
Subject: docs: arm: convert docs to ReST and rename to *.rst

Converts ARM the text files to ReST, preparing them to be an
architecture book.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by Corentin Labbe <clabbe.montjoie@gmail.com> # For sun4i-ss
---
 Documentation/arm/Booting                          | 218 ---------
 Documentation/arm/IXP4xx                           | 172 -------
 Documentation/arm/Interrupts                       | 167 -------
 Documentation/arm/Marvell/README                   | 395 ---------------
 Documentation/arm/Microchip/README                 | 169 -------
 Documentation/arm/Netwinder                        |  78 ---
 Documentation/arm/OMAP/DSS                         | 362 --------------
 Documentation/arm/OMAP/README                      |  11 -
 Documentation/arm/OMAP/omap_pm                     | 154 ------
 Documentation/arm/Porting                          | 135 ------
 Documentation/arm/README                           | 204 --------
 Documentation/arm/SA1100/ADSBitsy                  |  43 --
 Documentation/arm/SA1100/Assabet                   | 300 ------------
 Documentation/arm/SA1100/Brutus                    |  66 ---
 Documentation/arm/SA1100/CERF                      |  29 --
 Documentation/arm/SA1100/FreeBird                  |  21 -
 Documentation/arm/SA1100/GraphicsClient            |  98 ----
 Documentation/arm/SA1100/GraphicsMaster            |  53 --
 Documentation/arm/SA1100/HUW_WEBPANEL              |  17 -
 Documentation/arm/SA1100/Itsy                      |  39 --
 Documentation/arm/SA1100/LART                      |  14 -
 Documentation/arm/SA1100/PLEB                      |  11 -
 Documentation/arm/SA1100/Pangolin                  |  23 -
 Documentation/arm/SA1100/Tifon                     |   7 -
 Documentation/arm/SA1100/Yopy                      |   2 -
 Documentation/arm/SA1100/empeg                     |   2 -
 Documentation/arm/SA1100/nanoEngine                |  11 -
 Documentation/arm/SA1100/serial_UART               |  47 --
 Documentation/arm/SH-Mobile/.gitignore             |   1 -
 Documentation/arm/SPEAr/overview.txt               |  63 ---
 Documentation/arm/Samsung-S3C24XX/CPUfreq.txt      |  75 ---
 Documentation/arm/Samsung-S3C24XX/EB2410ITX.txt    |  58 ---
 Documentation/arm/Samsung-S3C24XX/GPIO.txt         | 171 -------
 Documentation/arm/Samsung-S3C24XX/H1940.txt        |  40 --
 Documentation/arm/Samsung-S3C24XX/NAND.txt         |  30 --
 Documentation/arm/Samsung-S3C24XX/Overview.txt     | 318 ------------
 Documentation/arm/Samsung-S3C24XX/S3C2412.txt      | 120 -----
 Documentation/arm/Samsung-S3C24XX/S3C2413.txt      |  21 -
 Documentation/arm/Samsung-S3C24XX/SMDK2440.txt     |  56 ---
 Documentation/arm/Samsung-S3C24XX/Suspend.txt      | 137 ------
 Documentation/arm/Samsung-S3C24XX/USB-Host.txt     |  93 ----
 Documentation/arm/Samsung/Bootloader-interface.txt |  68 ---
 Documentation/arm/Samsung/GPIO.txt                 |  40 --
 Documentation/arm/Samsung/Overview.txt             |  86 ----
 .../arm/Samsung/clksrc-change-registers.awk        | 166 -------
 Documentation/arm/Setup                            | 129 -----
 Documentation/arm/VFP/release-notes.txt            |  55 ---
 Documentation/arm/arm.rst                          | 214 +++++++++
 Documentation/arm/booting.rst                      | 237 +++++++++
 Documentation/arm/cluster-pm-race-avoidance.rst    | 533 +++++++++++++++++++++
 Documentation/arm/cluster-pm-race-avoidance.txt    | 498 -------------------
 Documentation/arm/firmware.rst                     |  72 +++
 Documentation/arm/firmware.txt                     |  70 ---
 Documentation/arm/index.rst                        |  80 ++++
 Documentation/arm/interrupts.rst                   | 169 +++++++
 Documentation/arm/ixp4xx.rst                       | 173 +++++++
 Documentation/arm/kernel_mode_neon.rst             | 124 +++++
 Documentation/arm/kernel_mode_neon.txt             | 121 -----
 Documentation/arm/kernel_user_helpers.rst          | 268 +++++++++++
 Documentation/arm/kernel_user_helpers.txt          | 267 -----------
 Documentation/arm/keystone/Overview.txt            |  55 ---
 Documentation/arm/keystone/knav-qmss.rst           |  60 +++
 Documentation/arm/keystone/knav-qmss.txt           |  56 ---
 Documentation/arm/keystone/overview.rst            |  74 +++
 Documentation/arm/marvel.rst                       | 488 +++++++++++++++++++
 Documentation/arm/mem_alignment                    |  58 ---
 Documentation/arm/mem_alignment.rst                |  63 +++
 Documentation/arm/memory.rst                       |  93 ++++
 Documentation/arm/memory.txt                       |  88 ----
 Documentation/arm/microchip.rst                    | 204 ++++++++
 Documentation/arm/netwinder.rst                    |  85 ++++
 Documentation/arm/nwfpe/NOTES                      |  29 --
 Documentation/arm/nwfpe/README                     |  70 ---
 Documentation/arm/nwfpe/README.FPE                 | 156 ------
 Documentation/arm/nwfpe/TODO                       |  67 ---
 Documentation/arm/nwfpe/index.rst                  |  11 +
 Documentation/arm/nwfpe/netwinder-fpe.rst          | 162 +++++++
 Documentation/arm/nwfpe/notes.rst                  |  32 ++
 Documentation/arm/nwfpe/nwfpe.rst                  |  74 +++
 Documentation/arm/nwfpe/todo.rst                   |  72 +++
 Documentation/arm/omap/dss.rst                     | 372 ++++++++++++++
 Documentation/arm/omap/index.rst                   |  10 +
 Documentation/arm/omap/omap.rst                    |  18 +
 Documentation/arm/omap/omap_pm.rst                 | 165 +++++++
 Documentation/arm/porting.rst                      | 137 ++++++
 Documentation/arm/pxa/mfp.rst                      | 288 +++++++++++
 Documentation/arm/pxa/mfp.txt                      | 286 -----------
 Documentation/arm/sa1100/adsbitsy.rst              |  51 ++
 Documentation/arm/sa1100/assabet.rst               | 301 ++++++++++++
 Documentation/arm/sa1100/brutus.rst                |  69 +++
 Documentation/arm/sa1100/cerf.rst                  |  35 ++
 Documentation/arm/sa1100/freebird.rst              |  25 +
 Documentation/arm/sa1100/graphicsclient.rst        | 102 ++++
 Documentation/arm/sa1100/graphicsmaster.rst        |  60 +++
 Documentation/arm/sa1100/huw_webpanel.rst          |  21 +
 Documentation/arm/sa1100/index.rst                 |  23 +
 Documentation/arm/sa1100/itsy.rst                  |  47 ++
 Documentation/arm/sa1100/lart.rst                  |  15 +
 Documentation/arm/sa1100/nanoengine.rst            |  11 +
 Documentation/arm/sa1100/pangolin.rst              |  29 ++
 Documentation/arm/sa1100/pleb.rst                  |  13 +
 Documentation/arm/sa1100/serial_uart.rst           |  51 ++
 Documentation/arm/sa1100/tifon.rst                 |   7 +
 Documentation/arm/sa1100/yopy.rst                  |   5 +
 Documentation/arm/samsung-s3c24xx/cpufreq.rst      |  76 +++
 Documentation/arm/samsung-s3c24xx/eb2410itx.rst    |  59 +++
 Documentation/arm/samsung-s3c24xx/gpio.rst         | 172 +++++++
 Documentation/arm/samsung-s3c24xx/h1940.rst        |  41 ++
 Documentation/arm/samsung-s3c24xx/index.rst        |  18 +
 Documentation/arm/samsung-s3c24xx/nand.rst         |  30 ++
 Documentation/arm/samsung-s3c24xx/overview.rst     | 319 ++++++++++++
 Documentation/arm/samsung-s3c24xx/s3c2412.rst      | 121 +++++
 Documentation/arm/samsung-s3c24xx/s3c2413.rst      |  22 +
 Documentation/arm/samsung-s3c24xx/smdk2440.rst     |  57 +++
 Documentation/arm/samsung-s3c24xx/suspend.rst      | 137 ++++++
 Documentation/arm/samsung-s3c24xx/usb-host.rst     |  91 ++++
 Documentation/arm/samsung/bootloader-interface.rst |  81 ++++
 .../arm/samsung/clksrc-change-registers.awk        | 166 +++++++
 Documentation/arm/samsung/gpio.rst                 |  41 ++
 Documentation/arm/samsung/index.rst                |  10 +
 Documentation/arm/samsung/overview.rst             |  89 ++++
 Documentation/arm/setup.rst                        | 108 +++++
 Documentation/arm/sh-mobile/.gitignore             |   1 +
 Documentation/arm/spear/overview.rst               |  65 +++
 Documentation/arm/sti/overview.rst                 |  36 ++
 Documentation/arm/sti/overview.txt                 |  33 --
 Documentation/arm/sti/stih407-overview.rst         |  19 +
 Documentation/arm/sti/stih407-overview.txt         |  18 -
 Documentation/arm/sti/stih415-overview.rst         |  14 +
 Documentation/arm/sti/stih415-overview.txt         |  12 -
 Documentation/arm/sti/stih416-overview.rst         |  13 +
 Documentation/arm/sti/stih416-overview.txt         |  12 -
 Documentation/arm/sti/stih418-overview.rst         |  21 +
 Documentation/arm/sti/stih418-overview.txt         |  20 -
 Documentation/arm/stm32/overview.rst               |   2 -
 Documentation/arm/stm32/stm32f429-overview.rst     |   7 +-
 Documentation/arm/stm32/stm32f746-overview.rst     |   7 +-
 Documentation/arm/stm32/stm32f769-overview.rst     |   7 +-
 Documentation/arm/stm32/stm32h743-overview.rst     |   7 +-
 Documentation/arm/stm32/stm32mp157-overview.rst    |   3 +-
 Documentation/arm/sunxi.rst                        | 150 ++++++
 Documentation/arm/sunxi/README                     | 102 ----
 Documentation/arm/sunxi/clocks.rst                 |  57 +++
 Documentation/arm/sunxi/clocks.txt                 |  56 ---
 Documentation/arm/swp_emulation                    |  27 --
 Documentation/arm/swp_emulation.rst                |  27 ++
 Documentation/arm/tcm.rst                          | 161 +++++++
 Documentation/arm/tcm.txt                          | 155 ------
 Documentation/arm/uefi.rst                         |  67 +++
 Documentation/arm/uefi.txt                         |  60 ---
 Documentation/arm/vfp/release-notes.rst            |  57 +++
 Documentation/arm/vlocks.rst                       | 212 ++++++++
 Documentation/arm/vlocks.txt                       | 211 --------
 Documentation/devicetree/bindings/arm/xen.txt      |   2 +-
 Documentation/devicetree/booting-without-of.txt    |   4 +-
 Documentation/index.rst                            |   1 +
 Documentation/translations/zh_CN/arm/Booting       |   4 +-
 .../translations/zh_CN/arm/kernel_user_helpers.txt |   4 +-
 MAINTAINERS                                        |   4 +-
 arch/arm/Kconfig                                   |   2 +-
 arch/arm/common/mcpm_entry.c                       |   2 +-
 arch/arm/common/mcpm_head.S                        |   2 +-
 arch/arm/common/vlock.S                            |   2 +-
 arch/arm/include/asm/setup.h                       |   2 +-
 arch/arm/include/uapi/asm/setup.h                  |   2 +-
 arch/arm/kernel/entry-armv.S                       |   2 +-
 arch/arm/mach-exynos/common.h                      |   2 +-
 arch/arm/mach-ixp4xx/Kconfig                       |  14 +-
 arch/arm/mach-s3c24xx/pm.c                         |   2 +-
 arch/arm/mm/Kconfig                                |   4 +-
 arch/arm/plat-samsung/Kconfig                      |   6 +-
 arch/arm/tools/mach-types                          |   2 +-
 arch/arm64/Kconfig                                 |   2 +-
 arch/arm64/kernel/kuser32.S                        |   2 +-
 arch/mips/bmips/setup.c                            |   2 +-
 drivers/crypto/sunxi-ss/sun4i-ss-cipher.c          |   2 +-
 drivers/crypto/sunxi-ss/sun4i-ss-core.c            |   2 +-
 drivers/crypto/sunxi-ss/sun4i-ss-hash.c            |   2 +-
 drivers/crypto/sunxi-ss/sun4i-ss.h                 |   2 +-
 drivers/input/touchscreen/sun4i-ts.c               |   2 +-
 drivers/tty/serial/Kconfig                         |   2 +-
 181 files changed, 7731 insertions(+), 7166 deletions(-)
 delete mode 100644 Documentation/arm/Booting
 delete mode 100644 Documentation/arm/IXP4xx
 delete mode 100644 Documentation/arm/Interrupts
 delete mode 100644 Documentation/arm/Marvell/README
 delete mode 100644 Documentation/arm/Microchip/README
 delete mode 100644 Documentation/arm/Netwinder
 delete mode 100644 Documentation/arm/OMAP/DSS
 delete mode 100644 Documentation/arm/OMAP/README
 delete mode 100644 Documentation/arm/OMAP/omap_pm
 delete mode 100644 Documentation/arm/Porting
 delete mode 100644 Documentation/arm/README
 delete mode 100644 Documentation/arm/SA1100/ADSBitsy
 delete mode 100644 Documentation/arm/SA1100/Assabet
 delete mode 100644 Documentation/arm/SA1100/Brutus
 delete mode 100644 Documentation/arm/SA1100/CERF
 delete mode 100644 Documentation/arm/SA1100/FreeBird
 delete mode 100644 Documentation/arm/SA1100/GraphicsClient
 delete mode 100644 Documentation/arm/SA1100/GraphicsMaster
 delete mode 100644 Documentation/arm/SA1100/HUW_WEBPANEL
 delete mode 100644 Documentation/arm/SA1100/Itsy
 delete mode 100644 Documentation/arm/SA1100/LART
 delete mode 100644 Documentation/arm/SA1100/PLEB
 delete mode 100644 Documentation/arm/SA1100/Pangolin
 delete mode 100644 Documentation/arm/SA1100/Tifon
 delete mode 100644 Documentation/arm/SA1100/Yopy
 delete mode 100644 Documentation/arm/SA1100/empeg
 delete mode 100644 Documentation/arm/SA1100/nanoEngine
 delete mode 100644 Documentation/arm/SA1100/serial_UART
 delete mode 100644 Documentation/arm/SH-Mobile/.gitignore
 delete mode 100644 Documentation/arm/SPEAr/overview.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/EB2410ITX.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/GPIO.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/H1940.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/NAND.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/Overview.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/S3C2412.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/S3C2413.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/SMDK2440.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/Suspend.txt
 delete mode 100644 Documentation/arm/Samsung-S3C24XX/USB-Host.txt
 delete mode 100644 Documentation/arm/Samsung/Bootloader-interface.txt
 delete mode 100644 Documentation/arm/Samsung/GPIO.txt
 delete mode 100644 Documentation/arm/Samsung/Overview.txt
 delete mode 100755 Documentation/arm/Samsung/clksrc-change-registers.awk
 delete mode 100644 Documentation/arm/Setup
 delete mode 100644 Documentation/arm/VFP/release-notes.txt
 create mode 100644 Documentation/arm/arm.rst
 create mode 100644 Documentation/arm/booting.rst
 create mode 100644 Documentation/arm/cluster-pm-race-avoidance.rst
 delete mode 100644 Documentation/arm/cluster-pm-race-avoidance.txt
 create mode 100644 Documentation/arm/firmware.rst
 delete mode 100644 Documentation/arm/firmware.txt
 create mode 100644 Documentation/arm/index.rst
 create mode 100644 Documentation/arm/interrupts.rst
 create mode 100644 Documentation/arm/ixp4xx.rst
 create mode 100644 Documentation/arm/kernel_mode_neon.rst
 delete mode 100644 Documentation/arm/kernel_mode_neon.txt
 create mode 100644 Documentation/arm/kernel_user_helpers.rst
 delete mode 100644 Documentation/arm/kernel_user_helpers.txt
 delete mode 100644 Documentation/arm/keystone/Overview.txt
 create mode 100644 Documentation/arm/keystone/knav-qmss.rst
 delete mode 100644 Documentation/arm/keystone/knav-qmss.txt
 create mode 100644 Documentation/arm/keystone/overview.rst
 create mode 100644 Documentation/arm/marvel.rst
 delete mode 100644 Documentation/arm/mem_alignment
 create mode 100644 Documentation/arm/mem_alignment.rst
 create mode 100644 Documentation/arm/memory.rst
 delete mode 100644 Documentation/arm/memory.txt
 create mode 100644 Documentation/arm/microchip.rst
 create mode 100644 Documentation/arm/netwinder.rst
 delete mode 100644 Documentation/arm/nwfpe/NOTES
 delete mode 100644 Documentation/arm/nwfpe/README
 delete mode 100644 Documentation/arm/nwfpe/README.FPE
 delete mode 100644 Documentation/arm/nwfpe/TODO
 create mode 100644 Documentation/arm/nwfpe/index.rst
 create mode 100644 Documentation/arm/nwfpe/netwinder-fpe.rst
 create mode 100644 Documentation/arm/nwfpe/notes.rst
 create mode 100644 Documentation/arm/nwfpe/nwfpe.rst
 create mode 100644 Documentation/arm/nwfpe/todo.rst
 create mode 100644 Documentation/arm/omap/dss.rst
 create mode 100644 Documentation/arm/omap/index.rst
 create mode 100644 Documentation/arm/omap/omap.rst
 create mode 100644 Documentation/arm/omap/omap_pm.rst
 create mode 100644 Documentation/arm/porting.rst
 create mode 100644 Documentation/arm/pxa/mfp.rst
 delete mode 100644 Documentation/arm/pxa/mfp.txt
 create mode 100644 Documentation/arm/sa1100/adsbitsy.rst
 create mode 100644 Documentation/arm/sa1100/assabet.rst
 create mode 100644 Documentation/arm/sa1100/brutus.rst
 create mode 100644 Documentation/arm/sa1100/cerf.rst
 create mode 100644 Documentation/arm/sa1100/freebird.rst
 create mode 100644 Documentation/arm/sa1100/graphicsclient.rst
 create mode 100644 Documentation/arm/sa1100/graphicsmaster.rst
 create mode 100644 Documentation/arm/sa1100/huw_webpanel.rst
 create mode 100644 Documentation/arm/sa1100/index.rst
 create mode 100644 Documentation/arm/sa1100/itsy.rst
 create mode 100644 Documentation/arm/sa1100/lart.rst
 create mode 100644 Documentation/arm/sa1100/nanoengine.rst
 create mode 100644 Documentation/arm/sa1100/pangolin.rst
 create mode 100644 Documentation/arm/sa1100/pleb.rst
 create mode 100644 Documentation/arm/sa1100/serial_uart.rst
 create mode 100644 Documentation/arm/sa1100/tifon.rst
 create mode 100644 Documentation/arm/sa1100/yopy.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/cpufreq.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/eb2410itx.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/gpio.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/h1940.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/index.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/nand.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/overview.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/s3c2412.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/s3c2413.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/smdk2440.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/suspend.rst
 create mode 100644 Documentation/arm/samsung-s3c24xx/usb-host.rst
 create mode 100644 Documentation/arm/samsung/bootloader-interface.rst
 create mode 100755 Documentation/arm/samsung/clksrc-change-registers.awk
 create mode 100644 Documentation/arm/samsung/gpio.rst
 create mode 100644 Documentation/arm/samsung/index.rst
 create mode 100644 Documentation/arm/samsung/overview.rst
 create mode 100644 Documentation/arm/setup.rst
 create mode 100644 Documentation/arm/sh-mobile/.gitignore
 create mode 100644 Documentation/arm/spear/overview.rst
 create mode 100644 Documentation/arm/sti/overview.rst
 delete mode 100644 Documentation/arm/sti/overview.txt
 create mode 100644 Documentation/arm/sti/stih407-overview.rst
 delete mode 100644 Documentation/arm/sti/stih407-overview.txt
 create mode 100644 Documentation/arm/sti/stih415-overview.rst
 delete mode 100644 Documentation/arm/sti/stih415-overview.txt
 create mode 100644 Documentation/arm/sti/stih416-overview.rst
 delete mode 100644 Documentation/arm/sti/stih416-overview.txt
 create mode 100644 Documentation/arm/sti/stih418-overview.rst
 delete mode 100644 Documentation/arm/sti/stih418-overview.txt
 create mode 100644 Documentation/arm/sunxi.rst
 delete mode 100644 Documentation/arm/sunxi/README
 create mode 100644 Documentation/arm/sunxi/clocks.rst
 delete mode 100644 Documentation/arm/sunxi/clocks.txt
 delete mode 100644 Documentation/arm/swp_emulation
 create mode 100644 Documentation/arm/swp_emulation.rst
 create mode 100644 Documentation/arm/tcm.rst
 delete mode 100644 Documentation/arm/tcm.txt
 create mode 100644 Documentation/arm/uefi.rst
 delete mode 100644 Documentation/arm/uefi.txt
 create mode 100644 Documentation/arm/vfp/release-notes.rst
 create mode 100644 Documentation/arm/vlocks.rst
 delete mode 100644 Documentation/arm/vlocks.txt

diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting
deleted file mode 100644
index f1f965ce93d6..000000000000
--- a/Documentation/arm/Booting
+++ /dev/null
@@ -1,218 +0,0 @@
-			Booting ARM Linux
-			=================
-
-Author:	Russell King
-Date  : 18 May 2002
-
-The following documentation is relevant to 2.4.18-rmk6 and beyond.
-
-In order to boot ARM Linux, you require a boot loader, which is a small
-program that runs before the main kernel.  The boot loader is expected
-to initialise various devices, and eventually call the Linux kernel,
-passing information to the kernel.
-
-Essentially, the boot loader should provide (as a minimum) the
-following:
-
-1. Setup and initialise the RAM.
-2. Initialise one serial port.
-3. Detect the machine type.
-4. Setup the kernel tagged list.
-5. Load initramfs.
-6. Call the kernel image.
-
-
-1. Setup and initialise RAM
----------------------------
-
-Existing boot loaders:		MANDATORY
-New boot loaders:		MANDATORY
-
-The boot loader is expected to find and initialise all RAM that the
-kernel will use for volatile data storage in the system.  It performs
-this in a machine dependent manner.  (It may use internal algorithms
-to automatically locate and size all RAM, or it may use knowledge of
-the RAM in the machine, or any other method the boot loader designer
-sees fit.)
-
-
-2. Initialise one serial port
------------------------------
-
-Existing boot loaders:		OPTIONAL, RECOMMENDED
-New boot loaders:		OPTIONAL, RECOMMENDED
-
-The boot loader should initialise and enable one serial port on the
-target.  This allows the kernel serial driver to automatically detect
-which serial port it should use for the kernel console (generally
-used for debugging purposes, or communication with the target.)
-
-As an alternative, the boot loader can pass the relevant 'console='
-option to the kernel via the tagged lists specifying the port, and
-serial format options as described in
-
-       Documentation/admin-guide/kernel-parameters.rst.
-
-
-3. Detect the machine type
---------------------------
-
-Existing boot loaders:		OPTIONAL
-New boot loaders:		MANDATORY except for DT-only platforms
-
-The boot loader should detect the machine type its running on by some
-method.  Whether this is a hard coded value or some algorithm that
-looks at the connected hardware is beyond the scope of this document.
-The boot loader must ultimately be able to provide a MACH_TYPE_xxx
-value to the kernel. (see linux/arch/arm/tools/mach-types).  This
-should be passed to the kernel in register r1.
-
-For DT-only platforms, the machine type will be determined by device
-tree.  set the machine type to all ones (~0).  This is not strictly
-necessary, but assures that it will not match any existing types.
-
-4. Setup boot data
-------------------
-
-Existing boot loaders:		OPTIONAL, HIGHLY RECOMMENDED
-New boot loaders:		MANDATORY
-
-The boot loader must provide either a tagged list or a dtb image for
-passing configuration data to the kernel.  The physical address of the
-boot data is passed to the kernel in register r2.
-
-4a. Setup the kernel tagged list
---------------------------------
-
-The boot loader must create and initialise the kernel tagged list.
-A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE.
-The ATAG_CORE tag may or may not be empty.  An empty ATAG_CORE tag
-has the size field set to '2' (0x00000002).  The ATAG_NONE must set
-the size field to zero.
-
-Any number of tags can be placed in the list.  It is undefined
-whether a repeated tag appends to the information carried by the
-previous tag, or whether it replaces the information in its
-entirety; some tags behave as the former, others the latter.
-
-The boot loader must pass at a minimum the size and location of
-the system memory, and root filesystem location.  Therefore, the
-minimum tagged list should look:
-
-	+-----------+
-base ->	| ATAG_CORE |  |
-	+-----------+  |
-	| ATAG_MEM  |  | increasing address
-	+-----------+  |
-	| ATAG_NONE |  |
-	+-----------+  v
-
-The tagged list should be stored in system RAM.
-
-The tagged list must be placed in a region of memory where neither
-the kernel decompressor nor initrd 'bootp' program will overwrite
-it.  The recommended placement is in the first 16KiB of RAM.
-
-4b. Setup the device tree
--------------------------
-
-The boot loader must load a device tree image (dtb) into system ram
-at a 64bit aligned address and initialize it with the boot data.  The
-dtb format is documented in Documentation/devicetree/booting-without-of.txt.
-The kernel will look for the dtb magic value of 0xd00dfeed at the dtb
-physical address to determine if a dtb has been passed instead of a
-tagged list.
-
-The boot loader must pass at a minimum the size and location of the
-system memory, and the root filesystem location.  The dtb must be
-placed in a region of memory where the kernel decompressor will not
-overwrite it, while remaining within the region which will be covered
-by the kernel's low-memory mapping.
-
-A safe location is just above the 128MiB boundary from start of RAM.
-
-5. Load initramfs.
-------------------
-
-Existing boot loaders:		OPTIONAL
-New boot loaders:		OPTIONAL
-
-If an initramfs is in use then, as with the dtb, it must be placed in
-a region of memory where the kernel decompressor will not overwrite it
-while also with the region which will be covered by the kernel's
-low-memory mapping.
-
-A safe location is just above the device tree blob which itself will
-be loaded just above the 128MiB boundary from the start of RAM as
-recommended above.
-
-6. Calling the kernel image
----------------------------
-
-Existing boot loaders:		MANDATORY
-New boot loaders:		MANDATORY
-
-There are two options for calling the kernel zImage.  If the zImage
-is stored in flash, and is linked correctly to be run from flash,
-then it is legal for the boot loader to call the zImage in flash
-directly.
-
-The zImage may also be placed in system RAM and called there.  The
-kernel should be placed in the first 128MiB of RAM.  It is recommended
-that it is loaded above 32MiB in order to avoid the need to relocate
-prior to decompression, which will make the boot process slightly
-faster.
-
-When booting a raw (non-zImage) kernel the constraints are tighter.
-In this case the kernel must be loaded at an offset into system equal
-to TEXT_OFFSET - PAGE_OFFSET.
-
-In any case, the following conditions must be met:
-
-- Quiesce all DMA capable devices so that memory does not get
-  corrupted by bogus network packets or disk data. This will save
-  you many hours of debug.
-
-- CPU register settings
-  r0 = 0,
-  r1 = machine type number discovered in (3) above.
-  r2 = physical address of tagged list in system RAM, or
-       physical address of device tree block (dtb) in system RAM
-
-- CPU mode
-  All forms of interrupts must be disabled (IRQs and FIQs)
-
-  For CPUs which do not include the ARM virtualization extensions, the
-  CPU must be in SVC mode.  (A special exception exists for Angel)
-
-  CPUs which include support for the virtualization extensions can be
-  entered in HYP mode in order to enable the kernel to make full use of
-  these extensions.  This is the recommended boot method for such CPUs,
-  unless the virtualisations are already in use by a pre-installed
-  hypervisor.
-
-  If the kernel is not entered in HYP mode for any reason, it must be
-  entered in SVC mode.
-
-- Caches, MMUs
-  The MMU must be off.
-  Instruction cache may be on or off.
-  Data cache must be off.
-
-  If the kernel is entered in HYP mode, the above requirements apply to
-  the HYP mode configuration in addition to the ordinary PL1 (privileged
-  kernel modes) configuration.  In addition, all traps into the
-  hypervisor must be disabled, and PL1 access must be granted for all
-  peripherals and CPU resources for which this is architecturally
-  possible.  Except for entering in HYP mode, the system configuration
-  should be such that a kernel which does not include support for the
-  virtualization extensions can boot correctly without extra help.
-
-- The boot loader is expected to call the kernel image by jumping
-  directly to the first instruction of the kernel image.
-
-  On CPUs supporting the ARM instruction set, the entry must be
-  made in ARM state, even for a Thumb-2 kernel.
-
-  On CPUs supporting only the Thumb instruction set such as
-  Cortex-M class CPUs, the entry must be made in Thumb state.
diff --git a/Documentation/arm/IXP4xx b/Documentation/arm/IXP4xx
deleted file mode 100644
index e48b74de6ac0..000000000000
--- a/Documentation/arm/IXP4xx
+++ /dev/null
@@ -1,172 +0,0 @@
-
--------------------------------------------------------------------------
-Release Notes for Linux on Intel's IXP4xx Network Processor
-
-Maintained by Deepak Saxena <dsaxena@plexity.net>
--------------------------------------------------------------------------
-
-1. Overview
-
-Intel's IXP4xx network processor is a highly integrated SOC that
-is targeted for network applications, though it has become popular 
-in industrial control and other areas due to low cost and power
-consumption. The IXP4xx family currently consists of several processors
-that support different network offload functions such as encryption,
-routing, firewalling, etc. The IXP46x family is an updated version which
-supports faster speeds, new memory and flash configurations, and more
-integration such as an on-chip I2C controller.
-
-For more information on the various versions of the CPU, see:
-
-   http://developer.intel.com/design/network/products/npfamily/ixp4xx.htm
-
-Intel also made the IXCP1100 CPU for sometime which is an IXP4xx 
-stripped of much of the network intelligence.
-
-2. Linux Support
-
-Linux currently supports the following features on the IXP4xx chips:
-
-- Dual serial ports
-- PCI interface
-- Flash access (MTD/JFFS)
-- I2C through GPIO on IXP42x
-- GPIO for input/output/interrupts 
-  See arch/arm/mach-ixp4xx/include/mach/platform.h for access functions.
-- Timers (watchdog, OS)
-
-The following components of the chips are not supported by Linux and
-require the use of Intel's proprietary CSR software:
-
-- USB device interface
-- Network interfaces (HSS, Utopia, NPEs, etc)
-- Network offload functionality
-
-If you need to use any of the above, you need to download Intel's
-software from:
-
-   http://developer.intel.com/design/network/products/npfamily/ixp425.htm    
-
-DO NOT POST QUESTIONS TO THE LINUX MAILING LISTS REGARDING THE PROPRIETARY
-SOFTWARE.
-
-There are several websites that provide directions/pointers on using
-Intel's software:
-
-   http://sourceforge.net/projects/ixp4xx-osdg/
-   Open Source Developer's Guide for using uClinux and the Intel libraries 
-
-http://gatewaymaker.sourceforge.net/ 
-   Simple one page summary of building a gateway using an IXP425 and Linux
-
-http://ixp425.sourceforge.net/
-   ATM device driver for IXP425 that relies on Intel's libraries
-
-3. Known Issues/Limitations
-
-3a. Limited inbound PCI window
-
-The IXP4xx family allows for up to 256MB of memory but the PCI interface
-can only expose 64MB of that memory to the PCI bus. This means that if
-you are running with > 64MB, all PCI buffers outside of the accessible
-range will be bounced using the routines in arch/arm/common/dmabounce.c.
-   
-3b. Limited outbound PCI window
-
-IXP4xx provides two methods of accessing PCI memory space:
-
-1) A direct mapped window from 0x48000000 to 0x4bffffff (64MB).
-   To access PCI via this space, we simply ioremap() the BAR
-   into the kernel and we can use the standard read[bwl]/write[bwl]
-   macros. This is the preffered method due to speed but it
-   limits the system to just 64MB of PCI memory. This can be 
-   problamatic if using video cards and other memory-heavy devices.
-          
-2) If > 64MB of memory space is required, the IXP4xx can be 
-   configured to use indirect registers to access PCI This allows 
-   for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus. 
-   The disadvantage of this is that every PCI access requires 
-   three local register accesses plus a spinlock, but in some 
-   cases the performance hit is acceptable. In addition, you cannot 
-   mmap() PCI devices in this case due to the indirect nature
-   of the PCI window.
-
-By default, the direct method is used for performance reasons. If
-you need more PCI memory, enable the IXP4XX_INDIRECT_PCI config option.
-
-3c. GPIO as Interrupts
-
-Currently the code only handles level-sensitive GPIO interrupts 
-
-4. Supported platforms
-
-ADI Engineering Coyote Gateway Reference Platform
-http://www.adiengineering.com/productsCoyote.html
-
-   The ADI Coyote platform is reference design for those building 
-   small residential/office gateways. One NPE is connected to a 10/100
-   interface, one to 4-port 10/100 switch, and the third to and ADSL
-   interface. In addition, it also supports to POTs interfaces connected
-   via SLICs. Note that those are not supported by Linux ATM. Finally,
-   the platform has two mini-PCI slots used for 802.11[bga] cards.
-   Finally, there is an IDE port hanging off the expansion bus.
-
-Gateworks Avila Network Platform
-http://www.gateworks.com/support/overview.php
-
-   The Avila platform is basically and IXDP425 with the 4 PCI slots
-   replaced with mini-PCI slots and a CF IDE interface hanging off
-   the expansion bus.
-
-Intel IXDP425 Development Platform
-http://www.intel.com/design/network/products/npfamily/ixdpg425.htm  
-
-   This is Intel's standard reference platform for the IXDP425 and is 
-   also known as the Richfield board. It contains 4 PCI slots, 16MB
-   of flash, two 10/100 ports and one ADSL port.
-
-Intel IXDP465 Development Platform
-http://www.intel.com/design/network/products/npfamily/ixdp465.htm
-
-   This is basically an IXDP425 with an IXP465 and 32M of flash instead
-   of just 16.
-
-Intel IXDPG425 Development Platform
-
-   This is basically and ADI Coyote board with a NEC EHCI controller
-   added. One issue with this board is that the mini-PCI slots only
-   have the 3.3v line connected, so you can't use a PCI to mini-PCI
-   adapter with an E100 card. So to NFS root you need to use either
-   the CSR or a WiFi card and a ramdisk that BOOTPs and then does
-   a pivot_root to NFS.
-
-Motorola PrPMC1100 Processor Mezanine Card
-http://www.fountainsys.com
-
-   The PrPMC1100 is based on the IXCP1100 and is meant to plug into
-   and IXP2400/2800 system to act as the system controller. It simply
-   contains a CPU and 16MB of flash on the board and needs to be
-   plugged into a carrier board to function. Currently Linux only
-   supports the Motorola PrPMC carrier board for this platform.
-
-5. TODO LIST
-
-- Add support for Coyote IDE
-- Add support for edge-based GPIO interrupts
-- Add support for CF IDE on expansion bus
-
-6. Thanks
-
-The IXP4xx work has been funded by Intel Corp. and MontaVista Software, Inc.
-
-The following people have contributed patches/comments/etc:
-
-Lennerty Buytenhek
-Lutz Jaenicke
-Justin Mayfield
-Robert E. Ranslam
-[I know I've forgotten others, please email me to be added] 
-
--------------------------------------------------------------------------
-
-Last Update: 01/04/2005
diff --git a/Documentation/arm/Interrupts b/Documentation/arm/Interrupts
deleted file mode 100644
index f09ab1b90ef1..000000000000
--- a/Documentation/arm/Interrupts
+++ /dev/null
@@ -1,167 +0,0 @@
-2.5.2-rmk5
-----------
-
-This is the first kernel that contains a major shake up of some of the
-major architecture-specific subsystems.
-
-Firstly, it contains some pretty major changes to the way we handle the
-MMU TLB.  Each MMU TLB variant is now handled completely separately -
-we have TLB v3, TLB v4 (without write buffer), TLB v4 (with write buffer),
-and finally TLB v4 (with write buffer, with I TLB invalidate entry).
-There is more assembly code inside each of these functions, mainly to
-allow more flexible TLB handling for the future.
-
-Secondly, the IRQ subsystem.
-
-The 2.5 kernels will be having major changes to the way IRQs are handled.
-Unfortunately, this means that machine types that touch the irq_desc[]
-array (basically all machine types) will break, and this means every
-machine type that we currently have.
-
-Lets take an example.  On the Assabet with Neponset, we have:
-
-                  GPIO25                 IRR:2
-        SA1100 ------------> Neponset -----------> SA1111
-                                         IIR:1
-                                      -----------> USAR
-                                         IIR:0
-                                      -----------> SMC9196
-
-The way stuff currently works, all SA1111 interrupts are mutually
-exclusive of each other - if you're processing one interrupt from the
-SA1111 and another comes in, you have to wait for that interrupt to
-finish processing before you can service the new interrupt.  Eg, an
-IDE PIO-based interrupt on the SA1111 excludes all other SA1111 and
-SMC9196 interrupts until it has finished transferring its multi-sector
-data, which can be a long time.  Note also that since we loop in the
-SA1111 IRQ handler, SA1111 IRQs can hold off SMC9196 IRQs indefinitely.
-
-
-The new approach brings several new ideas...
-
-We introduce the concept of a "parent" and a "child".  For example,
-to the Neponset handler, the "parent" is GPIO25, and the "children"d
-are SA1111, SMC9196 and USAR.
-
-We also bring the idea of an IRQ "chip" (mainly to reduce the size of
-the irqdesc array).  This doesn't have to be a real "IC"; indeed the
-SA11x0 IRQs are handled by two separate "chip" structures, one for
-GPIO0-10, and another for all the rest.  It is just a container for
-the various operations (maybe this'll change to a better name).
-This structure has the following operations:
-
-struct irqchip {
-        /*
-         * Acknowledge the IRQ.
-         * If this is a level-based IRQ, then it is expected to mask the IRQ
-         * as well.
-         */
-        void (*ack)(unsigned int irq);
-        /*
-         * Mask the IRQ in hardware.
-         */
-        void (*mask)(unsigned int irq);
-        /*
-         * Unmask the IRQ in hardware.
-         */
-        void (*unmask)(unsigned int irq);
-        /*
-         * Re-run the IRQ
-         */
-        void (*rerun)(unsigned int irq);
-        /*
-         * Set the type of the IRQ.
-         */
-        int (*type)(unsigned int irq, unsigned int, type);
-};
-
-ack    - required.  May be the same function as mask for IRQs
-         handled by do_level_IRQ.
-mask   - required.
-unmask - required.
-rerun  - optional.  Not required if you're using do_level_IRQ for all
-         IRQs that use this 'irqchip'.  Generally expected to re-trigger
-         the hardware IRQ if possible.  If not, may call the handler
-	 directly.
-type   - optional.  If you don't support changing the type of an IRQ,
-         it should be null so people can detect if they are unable to
-         set the IRQ type.
-
-For each IRQ, we keep the following information:
-
-        - "disable" depth (number of disable_irq()s without enable_irq()s)
-        - flags indicating what we can do with this IRQ (valid, probe,
-          noautounmask) as before
-        - status of the IRQ (probing, enable, etc)
-        - chip
-        - per-IRQ handler
-        - irqaction structure list
-
-The handler can be one of the 3 standard handlers - "level", "edge" and
-"simple", or your own specific handler if you need to do something special.
-
-The "level" handler is what we currently have - its pretty simple.
-"edge" knows about the brokenness of such IRQ implementations - that you
-need to leave the hardware IRQ enabled while processing it, and queueing
-further IRQ events should the IRQ happen again while processing.  The
-"simple" handler is very basic, and does not perform any hardware
-manipulation, nor state tracking.  This is useful for things like the
-SMC9196 and USAR above.
-
-So, what's changed?
-
-1. Machine implementations must not write to the irqdesc array.
-
-2. New functions to manipulate the irqdesc array.  The first 4 are expected
-   to be useful only to machine specific code.  The last is recommended to
-   only be used by machine specific code, but may be used in drivers if
-   absolutely necessary.
-
-        set_irq_chip(irq,chip)
-
-                Set the mask/unmask methods for handling this IRQ
-
-        set_irq_handler(irq,handler)
-
-                Set the handler for this IRQ (level, edge, simple)
-
-        set_irq_chained_handler(irq,handler)
-
-                Set a "chained" handler for this IRQ - automatically
-                enables this IRQ (eg, Neponset and SA1111 handlers).
-
-        set_irq_flags(irq,flags)
-
-                Set the valid/probe/noautoenable flags.
-
-        set_irq_type(irq,type)
-
-                Set active the IRQ edge(s)/level.  This replaces the
-                SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
-                function.  Type should be one of IRQ_TYPE_xxx defined in
-		<linux/irq.h>
-
-3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
-
-4. Direct access to SA1111 INTPOL is deprecated.  Use set_irq_type instead.
-
-5. A handler is expected to perform any necessary acknowledgement of the
-   parent IRQ via the correct chip specific function.  For instance, if
-   the SA1111 is directly connected to a SA1110 GPIO, then you should
-   acknowledge the SA1110 IRQ each time you re-read the SA1111 IRQ status.
-
-6. For any child which doesn't have its own IRQ enable/disable controls
-   (eg, SMC9196), the handler must mask or acknowledge the parent IRQ
-   while the child handler is called, and the child handler should be the
-   "simple" handler (not "edge" nor "level").  After the handler completes,
-   the parent IRQ should be unmasked, and the status of all children must
-   be re-checked for pending events.  (see the Neponset IRQ handler for
-   details).
-
-7. fixup_irq() is gone, as is arch/arm/mach-*/include/mach/irq.h
-
-Please note that this will not solve all problems - some of them are
-hardware based.  Mixing level-based and edge-based IRQs on the same
-parent signal (eg neponset) is one such area where a software based
-solution can't provide the full answer to low IRQ latency.
-
diff --git a/Documentation/arm/Marvell/README b/Documentation/arm/Marvell/README
deleted file mode 100644
index 56ada27c53be..000000000000
--- a/Documentation/arm/Marvell/README
+++ /dev/null
@@ -1,395 +0,0 @@
-ARM Marvell SoCs
-================
-
-This document lists all the ARM Marvell SoCs that are currently
-supported in mainline by the Linux kernel. As the Marvell families of
-SoCs are large and complex, it is hard to understand where the support
-for a particular SoC is available in the Linux kernel. This document
-tries to help in understanding where those SoCs are supported, and to
-match them with their corresponding public datasheet, when available.
-
-Orion family
-------------
-
-  Flavors:
-        88F5082
-        88F5181
-        88F5181L
-        88F5182
-               Datasheet               : http://www.embeddedarm.com/documentation/third-party/MV88F5182-datasheet.pdf
-               Programmer's User Guide : http://www.embeddedarm.com/documentation/third-party/MV88F5182-opensource-manual.pdf
-               User Manual             : http://www.embeddedarm.com/documentation/third-party/MV88F5182-usermanual.pdf
-        88F5281
-               Datasheet               : http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf
-        88F6183
-  Core: Feroceon 88fr331 (88f51xx) or 88fr531-vd (88f52xx) ARMv5 compatible
-  Linux kernel mach directory: arch/arm/mach-orion5x
-  Linux kernel plat directory: arch/arm/plat-orion
-
-Kirkwood family
----------------
-
-  Flavors:
-        88F6282 a.k.a Armada 300
-                Product Brief  : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
-        88F6283 a.k.a Armada 310
-                Product Brief  : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
-        88F6190
-                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6190-003_WEB.pdf
-                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
-                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
-        88F6192
-                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6192-003_ver1.pdf
-                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
-                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
-        88F6182
-        88F6180
-                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6180-003_ver1.pdf
-                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6180_OpenSource.pdf
-                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
-        88F6281
-                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6281-004_ver1.pdf
-                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6281_OpenSource.pdf
-                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
-  Homepage: http://www.marvell.com/embedded-processors/kirkwood/
-  Core: Feroceon 88fr131 ARMv5 compatible
-  Linux kernel mach directory: arch/arm/mach-mvebu
-  Linux kernel plat directory: none
-
-Discovery family
-----------------
-
-  Flavors:
-        MV78100
-                Product Brief  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78100-003_WEB.pdf
-                Hardware Spec  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78100_OpenSource.pdf
-                Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
-        MV78200
-                Product Brief  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78200-002_WEB.pdf
-                Hardware Spec  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78200_OpenSource.pdf
-                Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
-        MV76100
-                Not supported by the Linux kernel.
-
-  Core: Feroceon 88fr571-vd ARMv5 compatible
-
-  Linux kernel mach directory: arch/arm/mach-mv78xx0
-  Linux kernel plat directory: arch/arm/plat-orion
-
-EBU Armada family
------------------
-
-  Armada 370 Flavors:
-        88F6710
-        88F6707
-        88F6W11
-    Product Brief:   http://www.marvell.com/embedded-processors/armada-300/assets/Marvell_ARMADA_370_SoC.pdf
-    Hardware Spec:   http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-datasheet.pdf
-    Functional Spec: http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-FunctionalSpec-datasheet.pdf
-    Core: Sheeva ARMv7 compatible PJ4B
-
-  Armada 375 Flavors:
-	88F6720
-    Product Brief: http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA_375_SoC-01_product_brief.pdf
-    Core: ARM Cortex-A9
-
-  Armada 38x Flavors:
-	88F6810	Armada 380
-	88F6820 Armada 385
-	88F6828 Armada 388
-    Product infos:   http://www.marvell.com/embedded-processors/armada-38x/
-    Functional Spec: https://marvellcorp.wufoo.com/forms/marvell-armada-38x-functional-specifications/
-    Core: ARM Cortex-A9
-
-  Armada 39x Flavors:
-	88F6920 Armada 390
-	88F6928 Armada 398
-    Product infos: http://www.marvell.com/embedded-processors/armada-39x/
-    Core: ARM Cortex-A9
-
-  Armada XP Flavors:
-        MV78230
-        MV78260
-        MV78460
-    NOTE: not to be confused with the non-SMP 78xx0 SoCs
-    Product Brief: http://www.marvell.com/embedded-processors/armada-xp/assets/Marvell-ArmadaXP-SoC-product%20brief.pdf
-    Functional Spec: http://www.marvell.com/embedded-processors/armada-xp/assets/ARMADA-XP-Functional-SpecDatasheet.pdf
-    Hardware Specs:
-      http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78230_OS.PDF
-      http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78260_OS.PDF
-      http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78460_OS.PDF
-    Core: Sheeva ARMv7 compatible Dual-core or Quad-core PJ4B-MP
-
-  Linux kernel mach directory: arch/arm/mach-mvebu
-  Linux kernel plat directory: none
-
-EBU Armada family ARMv8
------------------------
-
-  Armada 3710/3720 Flavors:
-	88F3710
-	88F3720
-	Core: ARM Cortex A53 (ARMv8)
-
-	Homepage: http://www.marvell.com/embedded-processors/armada-3700/
-	Product Brief: http://www.marvell.com/embedded-processors/assets/PB-88F3700-FNL.pdf
-	Device tree files: arch/arm64/boot/dts/marvell/armada-37*
-
-  Armada 7K Flavors:
-	88F7020 (AP806 Dual + one CP110)
-	88F7040 (AP806 Quad + one CP110)
-	Core: ARM Cortex A72
-
-	Homepage: http://www.marvell.com/embedded-processors/armada-70xx/
-	Product Brief: http://www.marvell.com/embedded-processors/assets/Armada7020PB-Jan2016.pdf
-		       http://www.marvell.com/embedded-processors/assets/Armada7040PB-Jan2016.pdf
-	Device tree files: arch/arm64/boot/dts/marvell/armada-70*
-
-  Armada 8K Flavors:
-	88F8020 (AP806 Dual + two CP110)
-	88F8040 (AP806 Quad + two CP110)
-	Core: ARM Cortex A72
-
-	Homepage: http://www.marvell.com/embedded-processors/armada-80xx/
-	Product Brief: http://www.marvell.com/embedded-processors/assets/Armada8020PB-Jan2016.pdf
-		       http://www.marvell.com/embedded-processors/assets/Armada8040PB-Jan2016.pdf
-	Device tree files: arch/arm64/boot/dts/marvell/armada-80*
-
-Avanta family
--------------
-
-  Flavors:
-       88F6510
-       88F6530P
-       88F6550
-       88F6560
-  Homepage     : http://www.marvell.com/broadband/
-  Product Brief: http://www.marvell.com/broadband/assets/Marvell_Avanta_88F6510_305_060-001_product_brief.pdf
-  No public datasheet available.
-
-  Core: ARMv5 compatible
-
-  Linux kernel mach directory: no code in mainline yet, planned for the future
-  Linux kernel plat directory: no code in mainline yet, planned for the future
-
-Storage family
---------------
-
-  Armada SP:
-	88RC1580
-    Product infos: http://www.marvell.com/storage/armada-sp/
-    Core: Sheeva ARMv7 comatible Quad-core PJ4C
-    (not supported in upstream Linux kernel)
-
-Dove family (application processor)
------------------------------------
-
-  Flavors:
-        88AP510 a.k.a Armada 510
-                Product Brief   : http://www.marvell.com/application-processors/armada-500/assets/Marvell_Armada510_SoC.pdf
-                Hardware Spec   : http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Hardware-Spec.pdf
-                Functional Spec : http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Functional-Spec.pdf
-  Homepage: http://www.marvell.com/application-processors/armada-500/
-  Core: ARMv7 compatible
-
-  Directory: arch/arm/mach-mvebu (DT enabled platforms)
-             arch/arm/mach-dove (non-DT enabled platforms)
-
-PXA 2xx/3xx/93x/95x family
---------------------------
-
-  Flavors:
-        PXA21x, PXA25x, PXA26x
-             Application processor only
-             Core: ARMv5 XScale1 core
-        PXA270, PXA271, PXA272
-             Product Brief         : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_pb.pdf
-             Design guide          : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_design_guide.pdf
-             Developers manual     : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_dev_man.pdf
-             Specification         : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_emts.pdf
-             Specification update  : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_spec_update.pdf
-             Application processor only
-             Core: ARMv5 XScale2 core
-        PXA300, PXA310, PXA320
-             PXA 300 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA300_PB_R4.pdf
-             PXA 310 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA310_PB_R4.pdf
-             PXA 320 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA320_PB_R4.pdf
-             Design guide          : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Design_Guide.pdf
-             Developers manual     : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Developers_Manual.zip
-             Specifications        : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_EMTS.pdf
-             Specification Update  : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Spec_Update.zip
-             Reference Manual      : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_TavorP_BootROM_Ref_Manual.pdf
-             Application processor only
-             Core: ARMv5 XScale3 core
-        PXA930, PXA935
-             Application processor with Communication processor
-             Core: ARMv5 XScale3 core
-        PXA955
-             Application processor with Communication processor
-             Core: ARMv7 compatible Sheeva PJ4 core
-
-   Comments:
-
-    * This line of SoCs originates from the XScale family developed by
-      Intel and acquired by Marvell in ~2006. The PXA21x, PXA25x,
-      PXA26x, PXA27x, PXA3xx and PXA93x were developed by Intel, while
-      the later PXA95x were developed by Marvell.
-
-    * Due to their XScale origin, these SoCs have virtually nothing in
-      common with the other (Kirkwood, Dove, etc.) families of Marvell
-      SoCs, except with the MMP/MMP2 family of SoCs.
-
-   Linux kernel mach directory: arch/arm/mach-pxa
-   Linux kernel plat directory: arch/arm/plat-pxa
-
-MMP/MMP2/MMP3 family (communication processor)
------------------------------------------
-
-   Flavors:
-        PXA168, a.k.a Armada 168
-             Homepage             : http://www.marvell.com/application-processors/armada-100/armada-168.jsp
-             Product brief        : http://www.marvell.com/application-processors/armada-100/assets/pxa_168_pb.pdf
-             Hardware manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_datasheet.pdf
-             Software manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_software_manual.pdf
-             Specification update : http://www.marvell.com/application-processors/armada-100/assets/ARMADA16x_Spec_update.pdf
-             Boot ROM manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_ref_manual.pdf
-             App node package     : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_app_note_package.pdf
-             Application processor only
-             Core: ARMv5 compatible Marvell PJ1 88sv331 (Mohawk)
-        PXA910/PXA920
-             Homepage             : http://www.marvell.com/communication-processors/pxa910/
-             Product Brief        : http://www.marvell.com/communication-processors/pxa910/assets/Marvell_PXA910_Platform-001_PB_final.pdf
-             Application processor with Communication processor
-             Core: ARMv5 compatible Marvell PJ1 88sv331 (Mohawk)
-        PXA688, a.k.a. MMP2, a.k.a Armada 610
-             Product Brief        : http://www.marvell.com/application-processors/armada-600/assets/armada610_pb.pdf
-             Application processor only
-             Core: ARMv7 compatible Sheeva PJ4 88sv581x core
-	PXA2128, a.k.a. MMP3 (OLPC XO4, Linux support not upstream)
-	     Product Brief	  : http://www.marvell.com/application-processors/armada/pxa2128/assets/Marvell-ARMADA-PXA2128-SoC-PB.pdf
-	     Application processor only
-	     Core: Dual-core ARMv7 compatible Sheeva PJ4C core
-	PXA960/PXA968/PXA978 (Linux support not upstream)
-	     Application processor with Communication Processor
-	     Core: ARMv7 compatible Sheeva PJ4 core
-	PXA986/PXA988 (Linux support not upstream)
-	     Application processor with Communication Processor
-	     Core: Dual-core ARMv7 compatible Sheeva PJ4B-MP core
-	PXA1088/PXA1920 (Linux support not upstream)
-	     Application processor with Communication Processor
-	     Core: quad-core ARMv7 Cortex-A7
-	PXA1908/PXA1928/PXA1936
-	     Application processor with Communication Processor
-	     Core: multi-core ARMv8 Cortex-A53
-
-   Comments:
-
-    * This line of SoCs originates from the XScale family developed by
-      Intel and acquired by Marvell in ~2006. All the processors of
-      this MMP/MMP2 family were developed by Marvell.
-
-    * Due to their XScale origin, these SoCs have virtually nothing in
-      common with the other (Kirkwood, Dove, etc.) families of Marvell
-      SoCs, except with the PXA family of SoCs listed above.
-
-   Linux kernel mach directory: arch/arm/mach-mmp
-   Linux kernel plat directory: arch/arm/plat-pxa
-
-Berlin family (Multimedia Solutions)
--------------------------------------
-
-  Flavors:
-	88DE3010, Armada 1000 (no Linux support)
-		Core:		Marvell PJ1 (ARMv5TE), Dual-core
-		Product Brief:	http://www.marvell.com.cn/digital-entertainment/assets/armada_1000_pb.pdf
-	88DE3005, Armada 1500 Mini
-		Design name:	BG2CD
-		Core:		ARM Cortex-A9, PL310 L2CC
-	88DE3006, Armada 1500 Mini Plus
-		Design name:	BG2CDP
-		Core:		Dual Core ARM Cortex-A7
-	88DE3100, Armada 1500
-		Design name:	BG2
-		Core:		Marvell PJ4B-MP (ARMv7), Tauros3 L2CC
-	88DE3114, Armada 1500 Pro
-		Design name:	BG2Q
-		Core:		Quad Core ARM Cortex-A9, PL310 L2CC
-	88DE3214, Armada 1500 Pro 4K
-		Design name:	BG3
-		Core:		ARM Cortex-A15, CA15 integrated L2CC
-	88DE3218, ARMADA 1500 Ultra
-		Core:		ARM Cortex-A53
-
-  Homepage: https://www.synaptics.com/products/multimedia-solutions
-  Directory: arch/arm/mach-berlin
-
-  Comments:
-
-   * This line of SoCs is based on Marvell Sheeva or ARM Cortex CPUs
-     with Synopsys DesignWare (IRQ, GPIO, Timers, ...) and PXA IP (SDHCI, USB, ETH, ...).
-
-   * The Berlin family was acquired by Synaptics from Marvell in 2017.
-
-CPU Cores
----------
-
-The XScale cores were designed by Intel, and shipped by Marvell in the older
-PXA processors. Feroceon is a Marvell designed core that developed in-house,
-and that evolved into Sheeva. The XScale and Feroceon cores were phased out
-over time and replaced with Sheeva cores in later products, which subsequently
-got replaced with licensed ARM Cortex-A cores.
-
-  XScale 1
-	CPUID 0x69052xxx
-	ARMv5, iWMMXt
-  XScale 2
-	CPUID 0x69054xxx
-	ARMv5, iWMMXt
-  XScale 3
-	CPUID 0x69056xxx or 0x69056xxx
-	ARMv5, iWMMXt
-  Feroceon-1850 88fr331 "Mohawk"
-	CPUID 0x5615331x or 0x41xx926x
-	ARMv5TE, single issue
-  Feroceon-2850 88fr531-vd "Jolteon"
-	CPUID 0x5605531x or 0x41xx926x
-	ARMv5TE, VFP, dual-issue
-  Feroceon 88fr571-vd "Jolteon"
-	CPUID 0x5615571x
-	ARMv5TE, VFP, dual-issue
-  Feroceon 88fr131 "Mohawk-D"
-	CPUID 0x5625131x
-	ARMv5TE, single-issue in-order
-  Sheeva PJ1 88sv331 "Mohawk"
-	CPUID 0x561584xx
-	ARMv5, single-issue iWMMXt v2
-  Sheeva PJ4 88sv581x "Flareon"
-	CPUID 0x560f581x
-	ARMv7, idivt, optional iWMMXt v2
-  Sheeva PJ4B 88sv581x
-	CPUID 0x561f581x
-	ARMv7, idivt, optional iWMMXt v2
-  Sheeva PJ4B-MP / PJ4C
-	CPUID 0x562f584x
-	ARMv7, idivt/idiva, LPAE, optional iWMMXt v2 and/or NEON
-
-Long-term plans
----------------
-
- * Unify the mach-dove/, mach-mv78xx0/, mach-orion5x/ into the
-   mach-mvebu/ to support all SoCs from the Marvell EBU (Engineering
-   Business Unit) in a single mach-<foo> directory. The plat-orion/
-   would therefore disappear.
-
- * Unify the mach-mmp/ and mach-pxa/ into the same mach-pxa
-   directory. The plat-pxa/ would therefore disappear.
-
-Credits
--------
-
- Maen Suleiman <maen@marvell.com>
- Lior Amsalem <alior@marvell.com>
- Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
- Andrew Lunn <andrew@lunn.ch>
- Nicolas Pitre <nico@fluxnic.net>
- Eric Miao <eric.y.miao@gmail.com>
diff --git a/Documentation/arm/Microchip/README b/Documentation/arm/Microchip/README
deleted file mode 100644
index a366f37d38f1..000000000000
--- a/Documentation/arm/Microchip/README
+++ /dev/null
@@ -1,169 +0,0 @@
-ARM Microchip SoCs (aka AT91)
-=============================
-
-
-Introduction
-------------
-This document gives useful information about the ARM Microchip SoCs that are
-currently supported in Linux Mainline (you know, the one on kernel.org).
-
-It is important to note that the Microchip (previously Atmel) ARM-based MPU
-product line is historically named "AT91" or "at91" throughout the Linux kernel
-development process even if this product prefix has completely disappeared from
-the official Microchip product name. Anyway, files, directories, git trees,
-git branches/tags and email subject always contain this "at91" sub-string.
-
-
-AT91 SoCs
----------
-Documentation and detailed datasheet for each product are available on
-the Microchip website: http://www.microchip.com.
-
-  Flavors:
-    * ARM 920 based SoC
-      - at91rm9200
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-1768-32-bit-ARM920T-Embedded-Microprocessor-AT91RM9200_Datasheet.pdf
-
-    * ARM 926 based SoCs
-      - at91sam9260
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6221-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9260_Datasheet.pdf
-
-      - at91sam9xe
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6254-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9XE_Datasheet.pdf
-
-      - at91sam9261
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6062-ARM926EJ-S-Microprocessor-SAM9261_Datasheet.pdf
-
-      - at91sam9263
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6249-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9263_Datasheet.pdf
-
-      - at91sam9rl
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/doc6289.pdf
-
-      - at91sam9g20
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001516A.pdf
-
-      - at91sam9g45 family
-        - at91sam9g45
-        - at91sam9g46
-        - at91sam9m10
-        - at91sam9m11 (device superset)
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6437-32-bit-ARM926-Embedded-Microprocessor-SAM9M11_Datasheet.pdf
-
-      - at91sam9x5 family (aka "The 5 series")
-        - at91sam9g15
-        - at91sam9g25
-        - at91sam9g35
-        - at91sam9x25
-        - at91sam9x35
-        + Datasheet (can be considered as covering the whole family)
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-11055-32-bit-ARM926EJ-S-Microcontroller-SAM9X35_Datasheet.pdf
-
-      - at91sam9n12
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001517A.pdf
-
-    * ARM Cortex-A5 based SoCs
-      - sama5d3 family
-        - sama5d31
-        - sama5d33
-        - sama5d34
-        - sama5d35
-        - sama5d36 (device superset)
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-11121-32-bit-Cortex-A5-Microcontroller-SAMA5D3_Datasheet.pdf
-
-    * ARM Cortex-A5 + NEON based SoCs
-      - sama5d4 family
-        - sama5d41
-        - sama5d42
-        - sama5d43
-        - sama5d44 (device superset)
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/60001525A.pdf
-
-      - sama5d2 family
-        - sama5d21
-        - sama5d22
-        - sama5d23
-        - sama5d24
-        - sama5d26
-        - sama5d27 (device superset)
-        - sama5d28 (device superset + environmental monitors)
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001476B.pdf
-
-    * ARM Cortex-M7 MCUs
-      - sams70 family
-        - sams70j19
-        - sams70j20
-        - sams70j21
-        - sams70n19
-        - sams70n20
-        - sams70n21
-        - sams70q19
-        - sams70q20
-        - sams70q21
-
-      - samv70 family
-        - samv70j19
-        - samv70j20
-        - samv70n19
-        - samv70n20
-        - samv70q19
-        - samv70q20
-
-      - samv71 family
-        - samv71j19
-        - samv71j20
-        - samv71j21
-        - samv71n19
-        - samv71n20
-        - samv71n21
-        - samv71q19
-        - samv71q20
-        - samv71q21
-
-        + Datasheet
-          http://ww1.microchip.com/downloads/en/DeviceDoc/60001527A.pdf
-
-
-Linux kernel information
-------------------------
-Linux kernel mach directory: arch/arm/mach-at91
-MAINTAINERS entry is: "ARM/Microchip (AT91) SoC support"
-
-
-Device Tree for AT91 SoCs and boards
-------------------------------------
-All AT91 SoCs are converted to Device Tree. Since Linux 3.19, these products
-must use this method to boot the Linux kernel.
-
-Work In Progress statement:
-Device Tree files and Device Tree bindings that apply to AT91 SoCs and boards are
-considered as "Unstable". To be completely clear, any at91 binding can change at
-any time. So, be sure to use a Device Tree Binary and a Kernel Image generated from
-the same source tree.
-Please refer to the Documentation/devicetree/bindings/ABI.txt file for a
-definition of a "Stable" binding/ABI.
-This statement will be removed by AT91 MAINTAINERS when appropriate.
-
-Naming conventions and best practice:
-- SoCs Device Tree Source Include files are named after the official name of
-  the product (at91sam9g20.dtsi or sama5d33.dtsi for instance).
-- Device Tree Source Include files (.dtsi) are used to collect common nodes that can be
-  shared across SoCs or boards (sama5d3.dtsi or at91sam9x5cm.dtsi for instance).
-  When collecting nodes for a particular peripheral or topic, the identifier have to
-  be placed at the end of the file name, separated with a "_" (at91sam9x5_can.dtsi
-  or sama5d3_gmac.dtsi for example).
-- board Device Tree Source files (.dts) are prefixed by the string "at91-" so
-  that they can be identified easily. Note that some files are historical exceptions
-  to this rule (sama5d3[13456]ek.dts, usb_a9g20.dts or animeo_ip.dts for example).
diff --git a/Documentation/arm/Netwinder b/Documentation/arm/Netwinder
deleted file mode 100644
index f1b457fbd3de..000000000000
--- a/Documentation/arm/Netwinder
+++ /dev/null
@@ -1,78 +0,0 @@
-NetWinder specific documentation
-================================
-
-The NetWinder is a small low-power computer, primarily designed
-to run Linux.  It is based around the StrongARM RISC processor,
-DC21285 PCI bridge, with PC-type hardware glued around it.
-
-Port usage
-==========
-
-Min    - Max	Description
----------------------------
-0x0000 - 0x000f	DMA1
-0x0020 - 0x0021	PIC1
-0x0060 - 0x006f	Keyboard
-0x0070 - 0x007f	RTC
-0x0080 - 0x0087	DMA1
-0x0088 - 0x008f	DMA2
-0x00a0 - 0x00a3	PIC2
-0x00c0 - 0x00df	DMA2
-0x0180 - 0x0187	IRDA
-0x01f0 - 0x01f6	ide0
-0x0201		Game port
-0x0203		RWA010 configuration read
-0x0220 - ?	SoundBlaster
-0x0250 - ?	WaveArtist
-0x0279		RWA010 configuration index
-0x02f8 - 0x02ff	Serial ttyS1
-0x0300 - 0x031f	Ether10
-0x0338		GPIO1
-0x033a		GPIO2
-0x0370 - 0x0371	W83977F configuration registers
-0x0388 - ?	AdLib
-0x03c0 - 0x03df	VGA
-0x03f6		ide0
-0x03f8 - 0x03ff	Serial ttyS0
-0x0400 - 0x0408	DC21143
-0x0480 - 0x0487	DMA1
-0x0488 - 0x048f	DMA2
-0x0a79		RWA010 configuration write
-0xe800 - 0xe80f	ide0/ide1 BM DMA
-
-
-Interrupt usage
-===============
-
-IRQ	type	Description
----------------------------
- 0	ISA	100Hz timer
- 1	ISA	Keyboard
- 2	ISA	cascade
- 3	ISA	Serial ttyS1
- 4	ISA	Serial ttyS0
- 5	ISA	PS/2 mouse
- 6	ISA	IRDA
- 7	ISA	Printer
- 8	ISA	RTC alarm
- 9	ISA
-10	ISA	GP10 (Orange reset button)
-11	ISA
-12	ISA	WaveArtist
-13	ISA
-14	ISA	hda1
-15	ISA
-
-DMA usage
-=========
-
-DMA	type	Description
----------------------------
- 0	ISA	IRDA
- 1	ISA
- 2	ISA	cascade
- 3	ISA	WaveArtist
- 4	ISA
- 5	ISA
- 6	ISA
- 7	ISA	WaveArtist
diff --git a/Documentation/arm/OMAP/DSS b/Documentation/arm/OMAP/DSS
deleted file mode 100644
index 4484e021290e..000000000000
--- a/Documentation/arm/OMAP/DSS
+++ /dev/null
@@ -1,362 +0,0 @@
-OMAP2/3 Display Subsystem
--------------------------
-
-This is an almost total rewrite of the OMAP FB driver in drivers/video/omap
-(let's call it DSS1). The main differences between DSS1 and DSS2 are DSI,
-TV-out and multiple display support, but there are lots of small improvements
-also.
-
-The DSS2 driver (omapdss module) is in arch/arm/plat-omap/dss/, and the FB,
-panel and controller drivers are in drivers/video/omap2/. DSS1 and DSS2 live
-currently side by side, you can choose which one to use.
-
-Features
---------
-
-Working and tested features include:
-
-- MIPI DPI (parallel) output
-- MIPI DSI output in command mode
-- MIPI DBI (RFBI) output
-- SDI output
-- TV output
-- All pieces can be compiled as a module or inside kernel
-- Use DISPC to update any of the outputs
-- Use CPU to update RFBI or DSI output
-- OMAP DISPC planes
-- RGB16, RGB24 packed, RGB24 unpacked
-- YUV2, UYVY
-- Scaling
-- Adjusting DSS FCK to find a good pixel clock
-- Use DSI DPLL to create DSS FCK
-
-Tested boards include:
-- OMAP3 SDP board
-- Beagle board
-- N810
-
-omapdss driver
---------------
-
-The DSS driver does not itself have any support for Linux framebuffer, V4L or
-such like the current ones, but it has an internal kernel API that upper level
-drivers can use.
-
-The DSS driver models OMAP's overlays, overlay managers and displays in a
-flexible way to enable non-common multi-display configuration. In addition to
-modelling the hardware overlays, omapdss supports virtual overlays and overlay
-managers. These can be used when updating a display with CPU or system DMA.
-
-omapdss driver support for audio
---------------------------------
-There exist several display technologies and standards that support audio as
-well. Hence, it is relevant to update the DSS device driver to provide an audio
-interface that may be used by an audio driver or any other driver interested in
-the functionality.
-
-The audio_enable function is intended to prepare the relevant
-IP for playback (e.g., enabling an audio FIFO, taking in/out of reset
-some IP, enabling companion chips, etc). It is intended to be called before
-audio_start. The audio_disable function performs the reverse operation and is
-intended to be called after audio_stop.
-
-While a given DSS device driver may support audio, it is possible that for
-certain configurations audio is not supported (e.g., an HDMI display using a
-VESA video timing). The audio_supported function is intended to query whether
-the current configuration of the display supports audio.
-
-The audio_config function is intended to configure all the relevant audio
-parameters of the display. In order to make the function independent of any
-specific DSS device driver, a struct omap_dss_audio is defined. Its purpose
-is to contain all the required parameters for audio configuration. At the
-moment, such structure contains pointers to IEC-60958 channel status word
-and CEA-861 audio infoframe structures. This should be enough to support
-HDMI and DisplayPort, as both are based on CEA-861 and IEC-60958.
-
-The audio_enable/disable, audio_config and audio_supported functions could be
-implemented as functions that may sleep. Hence, they should not be called
-while holding a spinlock or a readlock.
-
-The audio_start/audio_stop function is intended to effectively start/stop audio
-playback after the configuration has taken place. These functions are designed
-to be used in an atomic context. Hence, audio_start should return quickly and be
-called only after all the needed resources for audio playback (audio FIFOs,
-DMA channels, companion chips, etc) have been enabled to begin data transfers.
-audio_stop is designed to only stop the audio transfers. The resources used
-for playback are released using audio_disable.
-
-The enum omap_dss_audio_state may be used to help the implementations of
-the interface to keep track of the audio state. The initial state is _DISABLED;
-then, the state transitions to _CONFIGURED, and then, when it is ready to
-play audio, to _ENABLED. The state _PLAYING is used when the audio is being
-rendered.
-
-
-Panel and controller drivers
-----------------------------
-
-The drivers implement panel or controller specific functionality and are not
-usually visible to users except through omapfb driver.  They register
-themselves to the DSS driver.
-
-omapfb driver
--------------
-
-The omapfb driver implements arbitrary number of standard linux framebuffers.
-These framebuffers can be routed flexibly to any overlays, thus allowing very
-dynamic display architecture.
-
-The driver exports some omapfb specific ioctls, which are compatible with the
-ioctls in the old driver.
-
-The rest of the non standard features are exported via sysfs. Whether the final
-implementation will use sysfs, or ioctls, is still open.
-
-V4L2 drivers
-------------
-
-V4L2 is being implemented in TI.
-
-From omapdss point of view the V4L2 drivers should be similar to framebuffer
-driver.
-
-Architecture
---------------------
-
-Some clarification what the different components do:
-
-    - Framebuffer is a memory area inside OMAP's SRAM/SDRAM that contains the
-      pixel data for the image. Framebuffer has width and height and color
-      depth.
-    - Overlay defines where the pixels are read from and where they go on the
-      screen. The overlay may be smaller than framebuffer, thus displaying only
-      part of the framebuffer. The position of the overlay may be changed if
-      the overlay is smaller than the display.
-    - Overlay manager combines the overlays in to one image and feeds them to
-      display.
-    - Display is the actual physical display device.
-
-A framebuffer can be connected to multiple overlays to show the same pixel data
-on all of the overlays. Note that in this case the overlay input sizes must be
-the same, but, in case of video overlays, the output size can be different. Any
-framebuffer can be connected to any overlay.
-
-An overlay can be connected to one overlay manager. Also DISPC overlays can be
-connected only to DISPC overlay managers, and virtual overlays can be only
-connected to virtual overlays.
-
-An overlay manager can be connected to one display. There are certain
-restrictions which kinds of displays an overlay manager can be connected:
-
-    - DISPC TV overlay manager can be only connected to TV display.
-    - Virtual overlay managers can only be connected to DBI or DSI displays.
-    - DISPC LCD overlay manager can be connected to all displays, except TV
-      display.
-
-Sysfs
------
-The sysfs interface is mainly used for testing. I don't think sysfs
-interface is the best for this in the final version, but I don't quite know
-what would be the best interfaces for these things.
-
-The sysfs interface is divided to two parts: DSS and FB.
-
-/sys/class/graphics/fb? directory:
-mirror		0=off, 1=on
-rotate		Rotation 0-3 for 0, 90, 180, 270 degrees
-rotate_type	0 = DMA rotation, 1 = VRFB rotation
-overlays	List of overlay numbers to which framebuffer pixels go
-phys_addr	Physical address of the framebuffer
-virt_addr	Virtual address of the framebuffer
-size		Size of the framebuffer
-
-/sys/devices/platform/omapdss/overlay? directory:
-enabled		0=off, 1=on
-input_size	width,height (ie. the framebuffer size)
-manager		Destination overlay manager name
-name
-output_size	width,height
-position	x,y
-screen_width	width
-global_alpha   	global alpha 0-255 0=transparent 255=opaque
-
-/sys/devices/platform/omapdss/manager? directory:
-display				Destination display
-name
-alpha_blending_enabled		0=off, 1=on
-trans_key_enabled		0=off, 1=on
-trans_key_type			gfx-destination, video-source
-trans_key_value			transparency color key (RGB24)
-default_color			default background color (RGB24)
-
-/sys/devices/platform/omapdss/display? directory:
-ctrl_name	Controller name
-mirror		0=off, 1=on
-update_mode	0=off, 1=auto, 2=manual
-enabled		0=off, 1=on
-name
-rotate		Rotation 0-3 for 0, 90, 180, 270 degrees
-timings		Display timings (pixclock,xres/hfp/hbp/hsw,yres/vfp/vbp/vsw)
-		When writing, two special timings are accepted for tv-out:
-		"pal" and "ntsc"
-panel_name
-tear_elim	Tearing elimination 0=off, 1=on
-output_type	Output type (video encoder only): "composite" or "svideo"
-
-There are also some debugfs files at <debugfs>/omapdss/ which show information
-about clocks and registers.
-
-Examples
---------
-
-The following definitions have been made for the examples below:
-
-ovl0=/sys/devices/platform/omapdss/overlay0
-ovl1=/sys/devices/platform/omapdss/overlay1
-ovl2=/sys/devices/platform/omapdss/overlay2
-
-mgr0=/sys/devices/platform/omapdss/manager0
-mgr1=/sys/devices/platform/omapdss/manager1
-
-lcd=/sys/devices/platform/omapdss/display0
-dvi=/sys/devices/platform/omapdss/display1
-tv=/sys/devices/platform/omapdss/display2
-
-fb0=/sys/class/graphics/fb0
-fb1=/sys/class/graphics/fb1
-fb2=/sys/class/graphics/fb2
-
-Default setup on OMAP3 SDP
---------------------------
-
-Here's the default setup on OMAP3 SDP board. All planes go to LCD. DVI
-and TV-out are not in use. The columns from left to right are:
-framebuffers, overlays, overlay managers, displays. Framebuffers are
-handled by omapfb, and the rest by the DSS.
-
-FB0 --- GFX  -\            DVI
-FB1 --- VID1 --+- LCD ---- LCD
-FB2 --- VID2 -/   TV ----- TV
-
-Example: Switch from LCD to DVI
-----------------------
-
-w=`cat $dvi/timings | cut -d "," -f 2 | cut -d "/" -f 1`
-h=`cat $dvi/timings | cut -d "," -f 3 | cut -d "/" -f 1`
-
-echo "0" > $lcd/enabled
-echo "" > $mgr0/display
-fbset -fb /dev/fb0 -xres $w -yres $h -vxres $w -vyres $h
-# at this point you have to switch the dvi/lcd dip-switch from the omap board
-echo "dvi" > $mgr0/display
-echo "1" > $dvi/enabled
-
-After this the configuration looks like:
-
-FB0 --- GFX  -\         -- DVI
-FB1 --- VID1 --+- LCD -/   LCD
-FB2 --- VID2 -/   TV ----- TV
-
-Example: Clone GFX overlay to LCD and TV
--------------------------------
-
-w=`cat $tv/timings | cut -d "," -f 2 | cut -d "/" -f 1`
-h=`cat $tv/timings | cut -d "," -f 3 | cut -d "/" -f 1`
-
-echo "0" > $ovl0/enabled
-echo "0" > $ovl1/enabled
-
-echo "" > $fb1/overlays
-echo "0,1" > $fb0/overlays
-
-echo "$w,$h" > $ovl1/output_size
-echo "tv" > $ovl1/manager
-
-echo "1" > $ovl0/enabled
-echo "1" > $ovl1/enabled
-
-echo "1" > $tv/enabled
-
-After this the configuration looks like (only relevant parts shown):
-
-FB0 +-- GFX  ---- LCD ---- LCD
-     \- VID1 ---- TV  ---- TV
-
-Misc notes
-----------
-
-OMAP FB allocates the framebuffer memory using the standard dma allocator. You
-can enable Contiguous Memory Allocator (CONFIG_CMA) to improve the dma
-allocator, and if CMA is enabled, you use "cma=" kernel parameter to increase
-the global memory area for CMA.
-
-Using DSI DPLL to generate pixel clock it is possible produce the pixel clock
-of 86.5MHz (max possible), and with that you get 1280x1024@57 output from DVI.
-
-Rotation and mirroring currently only supports RGB565 and RGB8888 modes. VRFB
-does not support mirroring.
-
-VRFB rotation requires much more memory than non-rotated framebuffer, so you
-probably need to increase your vram setting before using VRFB rotation. Also,
-many applications may not work with VRFB if they do not pay attention to all
-framebuffer parameters.
-
-Kernel boot arguments
----------------------
-
-omapfb.mode=<display>:<mode>[,...]
-	- Default video mode for specified displays. For example,
-	  "dvi:800x400MR-24@60".  See drivers/video/modedb.c.
-	  There are also two special modes: "pal" and "ntsc" that
-	  can be used to tv out.
-
-omapfb.vram=<fbnum>:<size>[@<physaddr>][,...]
-	- VRAM allocated for a framebuffer. Normally omapfb allocates vram
-	  depending on the display size. With this you can manually allocate
-	  more or define the physical address of each framebuffer. For example,
-	  "1:4M" to allocate 4M for fb1.
-
-omapfb.debug=<y|n>
-	- Enable debug printing. You have to have OMAPFB debug support enabled
-	  in kernel config.
-
-omapfb.test=<y|n>
-	- Draw test pattern to framebuffer whenever framebuffer settings change.
-	  You need to have OMAPFB debug support enabled in kernel config.
-
-omapfb.vrfb=<y|n>
-	- Use VRFB rotation for all framebuffers.
-
-omapfb.rotate=<angle>
-	- Default rotation applied to all framebuffers.
-	  0 - 0 degree rotation
-	  1 - 90 degree rotation
-	  2 - 180 degree rotation
-	  3 - 270 degree rotation
-
-omapfb.mirror=<y|n>
-	- Default mirror for all framebuffers. Only works with DMA rotation.
-
-omapdss.def_disp=<display>
-	- Name of default display, to which all overlays will be connected.
-	  Common examples are "lcd" or "tv".
-
-omapdss.debug=<y|n>
-	- Enable debug printing. You have to have DSS debug support enabled in
-	  kernel config.
-
-TODO
-----
-
-DSS locking
-
-Error checking
-- Lots of checks are missing or implemented just as BUG()
-
-System DMA update for DSI
-- Can be used for RGB16 and RGB24P modes. Probably not for RGB24U (how
-  to skip the empty byte?)
-
-OMAP1 support
-- Not sure if needed
-
diff --git a/Documentation/arm/OMAP/README b/Documentation/arm/OMAP/README
deleted file mode 100644
index 90c6c57d61e8..000000000000
--- a/Documentation/arm/OMAP/README
+++ /dev/null
@@ -1,11 +0,0 @@
-This file contains documentation for running mainline
-kernel on omaps.
-
-KERNEL		NEW DEPENDENCIES
-v4.3+		Update is needed for custom .config files to make sure
-		CONFIG_REGULATOR_PBIAS is enabled for MMC1 to work
-		properly.
-
-v4.18+		Update is needed for custom .config files to make sure
-		CONFIG_MMC_SDHCI_OMAP is enabled for all MMC instances
-		to work in DRA7 and K2G based boards.
diff --git a/Documentation/arm/OMAP/omap_pm b/Documentation/arm/OMAP/omap_pm
deleted file mode 100644
index 4ae915a9f899..000000000000
--- a/Documentation/arm/OMAP/omap_pm
+++ /dev/null
@@ -1,154 +0,0 @@
-
-The OMAP PM interface
-=====================
-
-This document describes the temporary OMAP PM interface.  Driver
-authors use these functions to communicate minimum latency or
-throughput constraints to the kernel power management code.
-Over time, the intention is to merge features from the OMAP PM
-interface into the Linux PM QoS code.
-
-Drivers need to express PM parameters which:
-
-- support the range of power management parameters present in the TI SRF;
-
-- separate the drivers from the underlying PM parameter
-  implementation, whether it is the TI SRF or Linux PM QoS or Linux
-  latency framework or something else;
-
-- specify PM parameters in terms of fundamental units, such as
-  latency and throughput, rather than units which are specific to OMAP
-  or to particular OMAP variants;
-
-- allow drivers which are shared with other architectures (e.g.,
-  DaVinci) to add these constraints in a way which won't affect non-OMAP
-  systems,
-
-- can be implemented immediately with minimal disruption of other
-  architectures.
-
-
-This document proposes the OMAP PM interface, including the following
-five power management functions for driver code:
-
-1. Set the maximum MPU wakeup latency:
-   (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t)
-
-2. Set the maximum device wakeup latency:
-   (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t)
-
-3. Set the maximum system DMA transfer start latency (CORE pwrdm):
-   (*pdata->set_max_sdma_lat)(struct device *dev, long t)
-
-4. Set the minimum bus throughput needed by a device:
-   (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r)
-
-5. Return the number of times the device has lost context
-   (*pdata->get_dev_context_loss_count)(struct device *dev)
-
-
-Further documentation for all OMAP PM interface functions can be
-found in arch/arm/plat-omap/include/mach/omap-pm.h.
-
-
-The OMAP PM layer is intended to be temporary
----------------------------------------------
-
-The intention is that eventually the Linux PM QoS layer should support
-the range of power management features present in OMAP3.  As this
-happens, existing drivers using the OMAP PM interface can be modified
-to use the Linux PM QoS code; and the OMAP PM interface can disappear.
-
-
-Driver usage of the OMAP PM functions
--------------------------------------
-
-As the 'pdata' in the above examples indicates, these functions are
-exposed to drivers through function pointers in driver .platform_data
-structures.  The function pointers are initialized by the board-*.c
-files to point to the corresponding OMAP PM functions:
-.set_max_dev_wakeup_lat will point to
-omap_pm_set_max_dev_wakeup_lat(), etc.  Other architectures which do
-not support these functions should leave these function pointers set
-to NULL.  Drivers should use the following idiom:
-
-        if (pdata->set_max_dev_wakeup_lat)
-            (*pdata->set_max_dev_wakeup_lat)(dev, t);
-
-The most common usage of these functions will probably be to specify
-the maximum time from when an interrupt occurs, to when the device
-becomes accessible.  To accomplish this, driver writers should use the
-set_max_mpu_wakeup_lat() function to constrain the MPU wakeup
-latency, and the set_max_dev_wakeup_lat() function to constrain the
-device wakeup latency (from clk_enable() to accessibility).  For
-example,
-
-        /* Limit MPU wakeup latency */
-        if (pdata->set_max_mpu_wakeup_lat)
-            (*pdata->set_max_mpu_wakeup_lat)(dev, tc);
-
-        /* Limit device powerdomain wakeup latency */
-        if (pdata->set_max_dev_wakeup_lat)
-            (*pdata->set_max_dev_wakeup_lat)(dev, td);
-
-        /* total wakeup latency in this example: (tc + td) */
-
-The PM parameters can be overwritten by calling the function again
-with the new value.  The settings can be removed by calling the
-function with a t argument of -1 (except in the case of
-set_max_bus_tput(), which should be called with an r argument of 0).
-
-The fifth function above, omap_pm_get_dev_context_loss_count(),
-is intended as an optimization to allow drivers to determine whether the
-device has lost its internal context.  If context has been lost, the
-driver must restore its internal context before proceeding.
-
-
-Other specialized interface functions
--------------------------------------
-
-The five functions listed above are intended to be usable by any
-device driver.  DSPBridge and CPUFreq have a few special requirements.
-DSPBridge expresses target DSP performance levels in terms of OPP IDs.
-CPUFreq expresses target MPU performance levels in terms of MPU
-frequency.  The OMAP PM interface contains functions for these
-specialized cases to convert that input information (OPPs/MPU
-frequency) into the form that the underlying power management
-implementation needs:
-
-6. (*pdata->dsp_get_opp_table)(void)
-
-7. (*pdata->dsp_set_min_opp)(u8 opp_id)
-
-8. (*pdata->dsp_get_opp)(void)
-
-9. (*pdata->cpu_get_freq_table)(void)
-
-10. (*pdata->cpu_set_freq)(unsigned long f)
-
-11. (*pdata->cpu_get_freq)(void)
-
-Customizing OPP for platform
-============================
-Defining CONFIG_PM should enable OPP layer for the silicon
-and the registration of OPP table should take place automatically.
-However, in special cases, the default OPP table may need to be
-tweaked, for e.g.:
- * enable default OPPs which are disabled by default, but which
-   could be enabled on a platform
- * Disable an unsupported OPP on the platform
- * Define and add a custom opp table entry
-in these cases, the board file needs to do additional steps as follows:
-arch/arm/mach-omapx/board-xyz.c
-	#include "pm.h"
-	....
-	static void __init omap_xyz_init_irq(void)
-	{
-		....
-		/* Initialize the default table */
-		omapx_opp_init();
-		/* Do customization to the defaults */
-		....
-	}
-NOTE: omapx_opp_init will be omap3_opp_init or as required
-based on the omap family.
diff --git a/Documentation/arm/Porting b/Documentation/arm/Porting
deleted file mode 100644
index a492233931b9..000000000000
--- a/Documentation/arm/Porting
+++ /dev/null
@@ -1,135 +0,0 @@
-Taken from list archive at http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2001-July/004064.html
-
-Initial definitions
--------------------
-
-The following symbol definitions rely on you knowing the translation that
-__virt_to_phys() does for your machine.  This macro converts the passed
-virtual address to a physical address.  Normally, it is simply:
-
-		phys = virt - PAGE_OFFSET + PHYS_OFFSET
-
-
-Decompressor Symbols
---------------------
-
-ZTEXTADDR
-	Start address of decompressor.  There's no point in talking about
-	virtual or physical addresses here, since the MMU will be off at
-	the time when you call the decompressor code.  You normally call
-	the kernel at this address to start it booting.  This doesn't have
-	to be located in RAM, it can be in flash or other read-only or
-	read-write addressable medium.
-
-ZBSSADDR
-	Start address of zero-initialised work area for the decompressor.
-	This must be pointing at RAM.  The decompressor will zero initialise
-	this for you.  Again, the MMU will be off.
-
-ZRELADDR
-	This is the address where the decompressed kernel will be written,
-	and eventually executed.  The following constraint must be valid:
-
-		__virt_to_phys(TEXTADDR) == ZRELADDR
-
-	The initial part of the kernel is carefully coded to be position
-	independent.
-
-INITRD_PHYS
-	Physical address to place the initial RAM disk.  Only relevant if
-	you are using the bootpImage stuff (which only works on the old
-	struct param_struct).
-
-INITRD_VIRT
-	Virtual address of the initial RAM disk.  The following  constraint
-	must be valid:
-
-		__virt_to_phys(INITRD_VIRT) == INITRD_PHYS
-
-PARAMS_PHYS
-	Physical address of the struct param_struct or tag list, giving the
-	kernel various parameters about its execution environment.
-
-
-Kernel Symbols
---------------
-
-PHYS_OFFSET
-	Physical start address of the first bank of RAM.
-
-PAGE_OFFSET
-	Virtual start address of the first bank of RAM.  During the kernel
-	boot phase, virtual address PAGE_OFFSET will be mapped to physical
-	address PHYS_OFFSET, along with any other mappings you supply.
-	This should be the same value as TASK_SIZE.
-
-TASK_SIZE
-	The maximum size of a user process in bytes.  Since user space
-	always starts at zero, this is the maximum address that a user
-	process can access+1.  The user space stack grows down from this
-	address.
-
-	Any virtual address below TASK_SIZE is deemed to be user process
-	area, and therefore managed dynamically on a process by process
-	basis by the kernel.  I'll call this the user segment.
-
-	Anything above TASK_SIZE is common to all processes.  I'll call
-	this the kernel segment.
-
-	(In other words, you can't put IO mappings below TASK_SIZE, and
-	hence PAGE_OFFSET).
-
-TEXTADDR
-	Virtual start address of kernel, normally PAGE_OFFSET + 0x8000.
-	This is where the kernel image ends up.  With the latest kernels,
-	it must be located at 32768 bytes into a 128MB region.  Previous
-	kernels placed a restriction of 256MB here.
-
-DATAADDR
-	Virtual address for the kernel data segment.  Must not be defined
-	when using the decompressor.
-
-VMALLOC_START
-VMALLOC_END
-	Virtual addresses bounding the vmalloc() area.  There must not be
-	any static mappings in this area; vmalloc will overwrite them.
-	The addresses must also be in the kernel segment (see above).
-	Normally, the vmalloc() area starts VMALLOC_OFFSET bytes above the
-	last virtual RAM address (found using variable high_memory).
-
-VMALLOC_OFFSET
-	Offset normally incorporated into VMALLOC_START to provide a hole
-	between virtual RAM and the vmalloc area.  We do this to allow
-	out of bounds memory accesses (eg, something writing off the end
-	of the mapped memory map) to be caught.  Normally set to 8MB.
-
-Architecture Specific Macros
-----------------------------
-
-BOOT_MEM(pram,pio,vio)
-	`pram' specifies the physical start address of RAM.  Must always
-	be present, and should be the same as PHYS_OFFSET.
-
-	`pio' is the physical address of an 8MB region containing IO for
-	use with the debugging macros in arch/arm/kernel/debug-armv.S.
-
-	`vio' is the virtual address of the 8MB debugging region.
-
-	It is expected that the debugging region will be re-initialised
-	by the architecture specific code later in the code (via the
-	MAPIO function).
-
-BOOT_PARAMS
-	Same as, and see PARAMS_PHYS.
-
-FIXUP(func)
-	Machine specific fixups, run before memory subsystems have been
-	initialised.
-
-MAPIO(func)
-	Machine specific function to map IO areas (including the debug
-	region above).
-
-INITIRQ(func)
-	Machine specific function to initialise interrupts.
-
diff --git a/Documentation/arm/README b/Documentation/arm/README
deleted file mode 100644
index 9d1e5b2c92e6..000000000000
--- a/Documentation/arm/README
+++ /dev/null
@@ -1,204 +0,0 @@
-			   ARM Linux 2.6
-			   =============
-
-    Please check <ftp://ftp.arm.linux.org.uk/pub/armlinux> for
-    updates.
-
-Compilation of kernel
----------------------
-
-  In order to compile ARM Linux, you will need a compiler capable of
-  generating ARM ELF code with GNU extensions.  GCC 3.3 is known to be
-  a good compiler.  Fortunately, you needn't guess.  The kernel will report
-  an error if your compiler is a recognized offender.
-
-  To build ARM Linux natively, you shouldn't have to alter the ARCH = line
-  in the top level Makefile.  However, if you don't have the ARM Linux ELF
-  tools installed as default, then you should change the CROSS_COMPILE
-  line as detailed below.
-
-  If you wish to cross-compile, then alter the following lines in the top
-  level make file:
-
-    ARCH = <whatever>
-	with
-    ARCH = arm
-
-	and
-
-    CROSS_COMPILE=
-	to
-    CROSS_COMPILE=<your-path-to-your-compiler-without-gcc>
-	eg.
-    CROSS_COMPILE=arm-linux-
-
-  Do a 'make config', followed by 'make Image' to build the kernel 
-  (arch/arm/boot/Image).  A compressed image can be built by doing a 
-  'make zImage' instead of 'make Image'.
-
-
-Bug reports etc
----------------
-
-  Please send patches to the patch system.  For more information, see
-  http://www.arm.linux.org.uk/developer/patches/info.php Always include some
-  explanation as to what the patch does and why it is needed.
-
-  Bug reports should be sent to linux-arm-kernel@lists.arm.linux.org.uk,
-  or submitted through the web form at
-  http://www.arm.linux.org.uk/developer/ 
-
-  When sending bug reports, please ensure that they contain all relevant
-  information, eg. the kernel messages that were printed before/during
-  the problem, what you were doing, etc.
-
-
-Include files
--------------
-
-  Several new include directories have been created under include/asm-arm,
-  which are there to reduce the clutter in the top-level directory.  These
-  directories, and their purpose is listed below:
-
-   arch-*	machine/platform specific header files
-   hardware	driver-internal ARM specific data structures/definitions
-   mach		descriptions of generic ARM to specific machine interfaces
-   proc-*	processor dependent header files (currently only two
-		categories)
-
-
-Machine/Platform support
-------------------------
-
-  The ARM tree contains support for a lot of different machine types.  To
-  continue supporting these differences, it has become necessary to split
-  machine-specific parts by directory.  For this, the machine category is
-  used to select which directories and files get included (we will use
-  $(MACHINE) to refer to the category)
-
-  To this end, we now have arch/arm/mach-$(MACHINE) directories which are
-  designed to house the non-driver files for a particular machine (eg, PCI,
-  memory management, architecture definitions etc).  For all future
-  machines, there should be a corresponding arch/arm/mach-$(MACHINE)/include/mach
-  directory.
-
-
-Modules
--------
-
-  Although modularisation is supported (and required for the FP emulator),
-  each module on an ARM2/ARM250/ARM3 machine when is loaded will take
-  memory up to the next 32k boundary due to the size of the pages.
-  Therefore, is modularisation on these machines really worth it?
-
-  However, ARM6 and up machines allow modules to take multiples of 4k, and
-  as such Acorn RiscPCs and other architectures using these processors can
-  make good use of modularisation.
-
-
-ADFS Image files
-----------------
-
-  You can access image files on your ADFS partitions by mounting the ADFS
-  partition, and then using the loopback device driver.  You must have
-  losetup installed.
-
-  Please note that the PCEmulator DOS partitions have a partition table at
-  the start, and as such, you will have to give '-o offset' to losetup.
-
-
-Request to developers
----------------------
-
-  When writing device drivers which include a separate assembler file, please
-  include it in with the C file, and not the arch/arm/lib directory.  This
-  allows the driver to be compiled as a loadable module without requiring
-  half the code to be compiled into the kernel image.
-
-  In general, try to avoid using assembler unless it is really necessary.  It
-  makes drivers far less easy to port to other hardware.
-
-
-ST506 hard drives
------------------
-
-  The ST506 hard drive controllers seem to be working fine (if a little
-  slowly).  At the moment they will only work off the controllers on an
-  A4x0's motherboard, but for it to work off a Podule just requires
-  someone with a podule to add the addresses for the IRQ mask and the
-  HDC base to the source.
-
-  As of 31/3/96 it works with two drives (you should get the ADFS
-  *configure harddrive set to 2). I've got an internal 20MB and a great
-  big external 5.25" FH 64MB drive (who could ever want more :-) ).
-
-  I've just got 240K/s off it (a dd with bs=128k); thats about half of what
-  RiscOS gets; but it's a heck of a lot better than the 50K/s I was getting
-  last week :-)
-
-  Known bug: Drive data errors can cause a hang; including cases where
-  the controller has fixed the error using ECC. (Possibly ONLY
-  in that case...hmm).
-
-
-1772 Floppy
------------
-  This also seems to work OK, but hasn't been stressed much lately.  It
-  hasn't got any code for disc change detection in there at the moment which
-  could be a bit of a problem!  Suggestions on the correct way to do this
-  are welcome.
-
-
-CONFIG_MACH_ and CONFIG_ARCH_
------------------------------
-  A change was made in 2003 to the macro names for new machines.
-  Historically, CONFIG_ARCH_ was used for the bonafide architecture,
-  e.g. SA1100, as well as implementations of the architecture,
-  e.g. Assabet.  It was decided to change the implementation macros
-  to read CONFIG_MACH_ for clarity.  Moreover, a retroactive fixup has
-  not been made because it would complicate patching.
-
-  Previous registrations may be found online.
-
-    <http://www.arm.linux.org.uk/developer/machines/>
-
-Kernel entry (head.S)
---------------------------
-  The initial entry into the kernel is via head.S, which uses machine
-  independent code.  The machine is selected by the value of 'r1' on
-  entry, which must be kept unique.
-
-  Due to the large number of machines which the ARM port of Linux provides
-  for, we have a method to manage this which ensures that we don't end up
-  duplicating large amounts of code.
-
-  We group machine (or platform) support code into machine classes.  A
-  class typically based around one or more system on a chip devices, and
-  acts as a natural container around the actual implementations.  These
-  classes are given directories - arch/arm/mach-<class> and
-  arch/arm/mach-<class> - which contain the source files to/include/mach
-  support the machine class.  This directories also contain any machine
-  specific supporting code.
-
-  For example, the SA1100 class is based upon the SA1100 and SA1110 SoC
-  devices, and contains the code to support the way the on-board and off-
-  board devices are used, or the device is setup, and provides that
-  machine specific "personality."
-
-  For platforms that support device tree (DT), the machine selection is
-  controlled at runtime by passing the device tree blob to the kernel.  At
-  compile-time, support for the machine type must be selected.  This allows for
-  a single multiplatform kernel build to be used for several machine types.
-
-  For platforms that do not use device tree, this machine selection is
-  controlled by the machine type ID, which acts both as a run-time and a
-  compile-time code selection method.  You can register a new machine via the
-  web site at:
-
-    <http://www.arm.linux.org.uk/developer/machines/>
-
-  Note: Please do not register a machine type for DT-only platforms.  If your
-  platform is DT-only, you do not need a registered machine type.
-
----
-Russell King (15/03/2004)
diff --git a/Documentation/arm/SA1100/ADSBitsy b/Documentation/arm/SA1100/ADSBitsy
deleted file mode 100644
index f9f62e8c0719..000000000000
--- a/Documentation/arm/SA1100/ADSBitsy
+++ /dev/null
@@ -1,43 +0,0 @@
-ADS Bitsy Single Board Computer
-(It is different from Bitsy(iPAQ) of Compaq)
-
-For more details, contact Applied Data Systems or see
-http://www.applieddata.net/products.html
-
-The Linux support for this product has been provided by
-Woojung Huh <whuh@applieddata.net>
-
-Use 'make adsbitsy_config' before any 'make config'.
-This will set up defaults for ADS Bitsy support.
-
-The kernel zImage is linked to be loaded and executed at 0xc0400000.
-
-Linux can  be used with the ADS BootLoader that ships with the
-newer rev boards. See their documentation on how to load Linux.
-
-Supported peripherals:
-- SA1100 LCD frame buffer (8/16bpp...sort of)
-- SA1111 USB Master
-- SA1100 serial port
-- pcmcia, compact flash
-- touchscreen(ucb1200)
-- console on LCD screen
-- serial ports (ttyS[0-2])
-  - ttyS0 is default for serial console
-
-To do:
-- everything else!  :-)
-
-Notes:
-
-- The flash on board is divided into 3 partitions.
-  You should be careful to use flash on board.
-  Its partition is different from GraphicsClient Plus and GraphicsMaster
-
-- 16bpp mode requires a different cable than what ships with the board.
-  Contact ADS or look through the manual to wire your own. Currently,
-  if you compile with 16bit mode support and switch into a lower bpp
-  mode, the timing is off so the image is corrupted.  This will be
-  fixed soon.
-
-Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/SA1100/Assabet b/Documentation/arm/SA1100/Assabet
deleted file mode 100644
index e08a6739e72c..000000000000
--- a/Documentation/arm/SA1100/Assabet
+++ /dev/null
@@ -1,300 +0,0 @@
-The Intel Assabet (SA-1110 evaluation) board
-============================================
-
-Please see:
-http://developer.intel.com
-
-Also some notes from John G Dorsey <jd5q@andrew.cmu.edu>:
-http://www.cs.cmu.edu/~wearable/software/assabet.html
-
-
-Building the kernel
--------------------
-
-To build the kernel with current defaults:
-
-	make assabet_config
-	make oldconfig
-	make zImage
-
-The resulting kernel image should be available in linux/arch/arm/boot/zImage.
-
-
-Installing a bootloader
------------------------
-
-A couple of bootloaders able to boot Linux on Assabet are available:
-
-BLOB (http://www.lartmaker.nl/lartware/blob/)
-
-   BLOB is a bootloader used within the LART project.  Some contributed
-   patches were merged into BLOB to add support for Assabet.
-
-Compaq's Bootldr + John Dorsey's patch for Assabet support
-(http://www.handhelds.org/Compaq/bootldr.html)
-(http://www.wearablegroup.org/software/bootldr/)
-
-   Bootldr is the bootloader developed by Compaq for the iPAQ Pocket PC.
-   John Dorsey has produced add-on patches to add support for Assabet and
-   the JFFS filesystem.
-
-RedBoot (http://sources.redhat.com/redboot/)
-
-   RedBoot is a bootloader developed by Red Hat based on the eCos RTOS
-   hardware abstraction layer.  It supports Assabet amongst many other
-   hardware platforms.
-
-RedBoot is currently the recommended choice since it's the only one to have
-networking support, and is the most actively maintained.
-
-Brief examples on how to boot Linux with RedBoot are shown below.  But first
-you need to have RedBoot installed in your flash memory.  A known to work
-precompiled RedBoot binary is available from the following location:
-
-ftp://ftp.netwinder.org/users/n/nico/
-ftp://ftp.arm.linux.org.uk/pub/linux/arm/people/nico/
-ftp://ftp.handhelds.org/pub/linux/arm/sa-1100-patches/
-
-Look for redboot-assabet*.tgz.  Some installation infos are provided in
-redboot-assabet*.txt.
-
-
-Initial RedBoot configuration
------------------------------
-
-The commands used here are explained in The RedBoot User's Guide available
-on-line at http://sources.redhat.com/ecos/docs.html.
-Please refer to it for explanations.
-
-If you have a CF network card (my Assabet kit contained a CF+ LP-E from
-Socket Communications Inc.), you should strongly consider using it for TFTP
-file transfers.  You must insert it before RedBoot runs since it can't detect
-it dynamically.
-
-To initialize the flash directory:
-
-	fis init -f
-
-To initialize the non-volatile settings, like whether you want to use BOOTP or
-a static IP address, etc, use this command:
-
-	fconfig -i
-
-
-Writing a kernel image into flash
----------------------------------
-
-First, the kernel image must be loaded into RAM.  If you have the zImage file
-available on a TFTP server:
-
-	load zImage -r -b 0x100000
-
-If you rather want to use Y-Modem upload over the serial port:
-
-	load -m ymodem -r -b 0x100000
-
-To write it to flash:
-
-	fis create "Linux kernel" -b 0x100000 -l 0xc0000
-
-
-Booting the kernel
-------------------
-
-The kernel still requires a filesystem to boot.  A ramdisk image can be loaded
-as follows:
-
-	load ramdisk_image.gz -r -b 0x800000
-
-Again, Y-Modem upload can be used instead of TFTP by replacing the file name
-by '-y ymodem'.
-
-Now the kernel can be retrieved from flash like this:
-
-	fis load "Linux kernel"
-
-or loaded as described previously.  To boot the kernel:
-
-	exec -b 0x100000 -l 0xc0000
-
-The ramdisk image could be stored into flash as well, but there are better
-solutions for on-flash filesystems as mentioned below.
-
-
-Using JFFS2
------------
-
-Using JFFS2 (the Second Journalling Flash File System) is probably the most
-convenient way to store a writable filesystem into flash.  JFFS2 is used in
-conjunction with the MTD layer which is responsible for low-level flash
-management.  More information on the Linux MTD can be found on-line at:
-http://www.linux-mtd.infradead.org/.  A JFFS howto with some infos about
-creating JFFS/JFFS2 images is available from the same site.
-
-For instance, a sample JFFS2 image can be retrieved from the same FTP sites
-mentioned below for the precompiled RedBoot image.
-
-To load this file:
-
-	load sample_img.jffs2 -r -b 0x100000
-
-The result should look like:
-
-RedBoot> load sample_img.jffs2 -r -b 0x100000
-Raw file loaded 0x00100000-0x00377424
-
-Now we must know the size of the unallocated flash:
-
-	fis free
-
-Result:
-
-RedBoot> fis free
-  0x500E0000 .. 0x503C0000
-
-The values above may be different depending on the size of the filesystem and
-the type of flash.  See their usage below as an example and take care of
-substituting yours appropriately.
-
-We must determine some values:
-
-size of unallocated flash:	0x503c0000 - 0x500e0000 = 0x2e0000
-size of the filesystem image:	0x00377424 - 0x00100000 = 0x277424
-
-We want to fit the filesystem image of course, but we also want to give it all
-the remaining flash space as well.  To write it:
-
-	fis unlock -f 0x500E0000 -l 0x2e0000
-	fis erase -f 0x500E0000 -l 0x2e0000
-	fis write -b 0x100000 -l 0x277424 -f 0x500E0000
-	fis create "JFFS2" -n -f 0x500E0000 -l 0x2e0000
-
-Now the filesystem is associated to a MTD "partition" once Linux has discovered
-what they are in the boot process.  From Redboot, the 'fis list' command
-displays them:
-
-RedBoot> fis list
-Name              FLASH addr  Mem addr    Length      Entry point
-RedBoot           0x50000000  0x50000000  0x00020000  0x00000000
-RedBoot config    0x503C0000  0x503C0000  0x00020000  0x00000000
-FIS directory     0x503E0000  0x503E0000  0x00020000  0x00000000
-Linux kernel      0x50020000  0x00100000  0x000C0000  0x00000000
-JFFS2             0x500E0000  0x500E0000  0x002E0000  0x00000000
-
-However Linux should display something like:
-
-SA1100 flash: probing 32-bit flash bus
-SA1100 flash: Found 2 x16 devices at 0x0 in 32-bit mode
-Using RedBoot partition definition
-Creating 5 MTD partitions on "SA1100 flash":
-0x00000000-0x00020000 : "RedBoot"
-0x00020000-0x000e0000 : "Linux kernel"
-0x000e0000-0x003c0000 : "JFFS2"
-0x003c0000-0x003e0000 : "RedBoot config"
-0x003e0000-0x00400000 : "FIS directory"
-
-What's important here is the position of the partition we are interested in,
-which is the third one.  Within Linux, this correspond to /dev/mtdblock2.
-Therefore to boot Linux with the kernel and its root filesystem in flash, we
-need this RedBoot command:
-
-	fis load "Linux kernel"
-	exec -b 0x100000 -l 0xc0000 -c "root=/dev/mtdblock2"
-
-Of course other filesystems than JFFS might be used, like cramfs for example.
-You might want to boot with a root filesystem over NFS, etc.  It is also
-possible, and sometimes more convenient, to flash a filesystem directly from
-within Linux while booted from a ramdisk or NFS.  The Linux MTD repository has
-many tools to deal with flash memory as well, to erase it for example.  JFFS2
-can then be mounted directly on a freshly erased partition and files can be
-copied over directly.  Etc...
-
-
-RedBoot scripting
------------------
-
-All the commands above aren't so useful if they have to be typed in every
-time the Assabet is rebooted.  Therefore it's possible to automate the boot
-process using RedBoot's scripting capability.
-
-For example, I use this to boot Linux with both the kernel and the ramdisk
-images retrieved from a TFTP server on the network:
-
-RedBoot> fconfig
-Run script at boot: false true
-Boot script:
-Enter script, terminate with empty line
->> load zImage -r -b 0x100000
->> load ramdisk_ks.gz -r -b 0x800000
->> exec -b 0x100000 -l 0xc0000
->>
-Boot script timeout (1000ms resolution): 3
-Use BOOTP for network configuration: true
-GDB connection port: 9000
-Network debug at boot time: false
-Update RedBoot non-volatile configuration - are you sure (y/n)? y
-
-Then, rebooting the Assabet is just a matter of waiting for the login prompt.
-
-
-
-Nicolas Pitre
-nico@fluxnic.net
-June 12, 2001
-
-
-Status of peripherals in -rmk tree (updated 14/10/2001)
--------------------------------------------------------
-
-Assabet:
- Serial ports:
-  Radio:		TX, RX, CTS, DSR, DCD, RI
-   PM:			Not tested.
-  COM:			TX, RX, CTS, DSR, DCD, RTS, DTR, PM
-   PM:			Not tested.
-  I2C:			Implemented, not fully tested.
-  L3:			Fully tested, pass.
-   PM:			Not tested.
-
- Video:
-  LCD:			Fully tested.  PM
-			(LCD doesn't like being blanked with
-			 neponset connected)
-  Video out:		Not fully
-
- Audio:
-  UDA1341:
-   Playback:		Fully tested, pass.
-   Record:		Implemented, not tested.
-   PM:			Not tested.
-
-  UCB1200:
-   Audio play:		Implemented, not heavily tested.
-   Audio rec:		Implemented, not heavily tested.
-   Telco audio play:	Implemented, not heavily tested.
-   Telco audio rec:	Implemented, not heavily tested.
-   POTS control:	No
-   Touchscreen:		Yes
-   PM:			Not tested.
-
- Other:
-  PCMCIA:
-   LPE:			Fully tested, pass.
-  USB:			No
-  IRDA:
-   SIR:			Fully tested, pass.
-   FIR:			Fully tested, pass.
-   PM:			Not tested.
-
-Neponset:
- Serial ports:
-  COM1,2:	TX, RX, CTS, DSR, DCD, RTS, DTR
-   PM:			Not tested.
-  USB:			Implemented, not heavily tested.
-  PCMCIA:		Implemented, not heavily tested.
-   PM:			Not tested.
-  CF:			Implemented, not heavily tested.
-   PM:			Not tested.
-
-More stuff can be found in the -np (Nicolas Pitre's) tree.
-
diff --git a/Documentation/arm/SA1100/Brutus b/Documentation/arm/SA1100/Brutus
deleted file mode 100644
index 6a3aa95e9bfd..000000000000
--- a/Documentation/arm/SA1100/Brutus
+++ /dev/null
@@ -1,66 +0,0 @@
-Brutus is an evaluation platform for the SA1100 manufactured by Intel.  
-For more details, see:
-
-http://developer.intel.com
-
-To compile for Brutus, you must issue the following commands:
-
-	make brutus_config
-	make config
-	[accept all the defaults]
-	make zImage
-
-The resulting kernel will end up in linux/arch/arm/boot/zImage.  This file
-must be loaded at 0xc0008000 in Brutus's memory and execution started at
-0xc0008000 as well with the value of registers r0 = 0 and r1 = 16 upon
-entry.
-
-But prior to execute the kernel, a ramdisk image must also be loaded in
-memory.  Use memory address 0xd8000000 for this.  Note that the file 
-containing the (compressed) ramdisk image must not exceed 4 MB.
-
-Typically, you'll need angelboot to load the kernel.
-The following angelboot.opt file should be used:
-
------ begin angelboot.opt -----
-base 0xc0008000
-entry 0xc0008000
-r0 0x00000000
-r1 0x00000010
-device /dev/ttyS0
-options "9600 8N1"
-baud 115200
-otherfile ramdisk_img.gz
-otherbase 0xd8000000
------ end angelboot.opt -----
-
-Then load the kernel and ramdisk with:
-
-	angelboot -f angelboot.opt zImage
-
-The first Brutus serial port (assumed to be linked to /dev/ttyS0 on your
-host PC) is used by angel to load the kernel and ramdisk image. The serial
-console is provided through the second Brutus serial port. To access it,
-you may use minicom configured with /dev/ttyS1, 9600 baud, 8N1, no flow
-control.
-
-Currently supported:
-	- RS232 serial ports
-	- audio output
-	- LCD screen
-	- keyboard
-	
-The actual Brutus support may not be complete without extra patches. 
-If such patches exist, they should be found from 
-ftp.netwinder.org/users/n/nico.
-
-A full PCMCIA support is still missing, although it's possible to hack
-some drivers in order to drive already inserted cards at boot time with
-little modifications.
-
-Any contribution is welcome.
-
-Please send patches to nico@fluxnic.net
-
-Have Fun !
-
diff --git a/Documentation/arm/SA1100/CERF b/Documentation/arm/SA1100/CERF
deleted file mode 100644
index b3d845301ef1..000000000000
--- a/Documentation/arm/SA1100/CERF
+++ /dev/null
@@ -1,29 +0,0 @@
-*** The StrongARM version of the CerfBoard/Cube has been discontinued ***
-
-The Intrinsyc CerfBoard is a StrongARM 1110-based computer on a board
-that measures approximately 2" square. It includes an Ethernet
-controller, an RS232-compatible serial port, a USB function port, and
-one CompactFlash+ slot on the back. Pictures can be found at the
-Intrinsyc website, http://www.intrinsyc.com.
-
-This document describes the support in the Linux kernel for the
-Intrinsyc CerfBoard.
-
-Supported in this version:
-   - CompactFlash+ slot (select PCMCIA in General Setup and any options
-     that may be required)
-   - Onboard Crystal CS8900 Ethernet controller (Cerf CS8900A support in
-     Network Devices)
-   - Serial ports with a serial console (hardcoded to 38400 8N1)
-
-In order to get this kernel onto your Cerf, you need a server that runs
-both BOOTP and TFTP. Detailed instructions should have come with your
-evaluation kit on how to use the bootloader. This series of commands
-will suffice:
-
-   make ARCH=arm CROSS_COMPILE=arm-linux- cerfcube_defconfig
-   make ARCH=arm CROSS_COMPILE=arm-linux- zImage
-   make ARCH=arm CROSS_COMPILE=arm-linux- modules
-   cp arch/arm/boot/zImage <TFTP directory>
-
-support@intrinsyc.com
diff --git a/Documentation/arm/SA1100/FreeBird b/Documentation/arm/SA1100/FreeBird
deleted file mode 100644
index ab9193663b2b..000000000000
--- a/Documentation/arm/SA1100/FreeBird
+++ /dev/null
@@ -1,21 +0,0 @@
-Freebird-1.1 is produced by Legend(C), Inc.
-http://web.archive.org/web/*/http://www.legend.com.cn
-and software/linux maintained by Coventive(C), Inc.
-(http://www.coventive.com)
-
-Based on the Nicolas's strongarm kernel tree.
-
-===============================================================
-Maintainer:
-
-Chester Kuo <chester@coventive.com>
-	    <chester@linux.org.tw>
-
-Author :
-Tim wu <timwu@coventive.com>
-CIH <cih@coventive.com>
-Eric Peng <ericpeng@coventive.com>
-Jeff Lee <jeff_lee@coventive.com>
-Allen Cheng
-Tony Liu <tonyliu@coventive.com>
-
diff --git a/Documentation/arm/SA1100/GraphicsClient b/Documentation/arm/SA1100/GraphicsClient
deleted file mode 100644
index 867bb35943af..000000000000
--- a/Documentation/arm/SA1100/GraphicsClient
+++ /dev/null
@@ -1,98 +0,0 @@
-ADS GraphicsClient Plus Single Board Computer
-
-For more details, contact Applied Data Systems or see
-http://www.applieddata.net/products.html
-
-The original Linux support for this product has been provided by 
-Nicolas Pitre <nico@fluxnic.net>. Continued development work by
-Woojung Huh <whuh@applieddata.net>
-
-It's currently possible to mount a root filesystem via NFS providing a
-complete Linux environment.  Otherwise a ramdisk image may be used.  The
-board supports MTD/JFFS, so you could also mount something on there.
-
-Use 'make graphicsclient_config' before any 'make config'.  This will set up
-defaults for GraphicsClient Plus support.
-
-The kernel zImage is linked to be loaded and executed at 0xc0200000.  
-Also the following registers should have the specified values upon entry:
-
-	r0 = 0
-	r1 = 29	(this is the GraphicsClient architecture number)
-
-Linux can  be used with the ADS BootLoader that ships with the
-newer rev boards. See their documentation on how to load Linux.
-Angel is not available for the GraphicsClient Plus AFAIK.
-
-There is a  board known as just the GraphicsClient that ADS used to
-produce but has end of lifed. This code will not work on the older
-board with the ADS bootloader, but should still work with Angel,
-as outlined below.  In any case, if you're planning on deploying
-something en masse, you should probably get the newer board.
-
-If using Angel on the older boards, here is a typical angel.opt option file
-if the kernel is loaded through the Angel Debug Monitor:
-
------ begin angelboot.opt -----
-base 0xc0200000
-entry 0xc0200000
-r0 0x00000000
-r1 0x0000001d
-device /dev/ttyS1
-options "38400 8N1"
-baud 115200
-#otherfile ramdisk.gz
-#otherbase 0xc0800000
-exec minicom
------ end angelboot.opt -----
-
-Then the kernel (and ramdisk if otherfile/otherbase lines above are
-uncommented) would be loaded with:
-
-	angelboot -f angelboot.opt zImage
-
-Here it is assumed that the board is connected to ttyS1 on your PC
-and that minicom is preconfigured with /dev/ttyS1, 38400 baud, 8N1, no flow
-control by default.
-
-If any other bootloader is used, ensure it accomplish the same, especially
-for r0/r1 register values before jumping into the kernel.
-
-
-Supported peripherals:
-- SA1100 LCD frame buffer (8/16bpp...sort of)
-- on-board SMC 92C96 ethernet NIC
-- SA1100 serial port
-- flash memory access (MTD/JFFS)
-- pcmcia
-- touchscreen(ucb1200)
-- ps/2 keyboard
-- console on LCD screen
-- serial ports (ttyS[0-2])
-  - ttyS0 is default for serial console
-- Smart I/O (ADC, keypad, digital inputs, etc)
-  See http://www.eurotech-inc.com/linux-sbc.asp for IOCTL documentation
-  and example user space code. ps/2 keybd is multiplexed through this driver
-
-To do:
-- UCB1200 audio with new ucb_generic layer
-- everything else!  :-)
-
-Notes:
-
-- The flash on board is divided into 3 partitions.  mtd0 is where
-  the ADS boot ROM and zImage is stored.  It's been marked as
-  read-only to keep you from blasting over the bootloader. :)  mtd1 is
-  for the ramdisk.gz image.  mtd2 is user flash space and can be
-  utilized for either JFFS or if you're feeling crazy, running ext2
-  on top of it. If you're not using the ADS bootloader, you're
-  welcome to blast over the mtd1 partition also.
-
-- 16bpp mode requires a different cable than what ships with the board.
-  Contact ADS or look through the manual to wire your own. Currently,
-  if you compile with 16bit mode support and switch into a lower bpp
-  mode, the timing is off so the image is corrupted.  This will be
-  fixed soon.
-
-Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
-
diff --git a/Documentation/arm/SA1100/GraphicsMaster b/Documentation/arm/SA1100/GraphicsMaster
deleted file mode 100644
index 9145088a0ba2..000000000000
--- a/Documentation/arm/SA1100/GraphicsMaster
+++ /dev/null
@@ -1,53 +0,0 @@
-ADS GraphicsMaster Single Board Computer
-
-For more details, contact Applied Data Systems or see
-http://www.applieddata.net/products.html
-
-The original Linux support for this product has been provided by
-Nicolas Pitre <nico@fluxnic.net>. Continued development work by
-Woojung Huh <whuh@applieddata.net>
-
-Use 'make graphicsmaster_config' before any 'make config'.
-This will set up defaults for GraphicsMaster support.
-
-The kernel zImage is linked to be loaded and executed at 0xc0400000.
-
-Linux can  be used with the ADS BootLoader that ships with the
-newer rev boards. See their documentation on how to load Linux.
-
-Supported peripherals:
-- SA1100 LCD frame buffer (8/16bpp...sort of)
-- SA1111 USB Master
-- on-board SMC 92C96 ethernet NIC
-- SA1100 serial port
-- flash memory access (MTD/JFFS)
-- pcmcia, compact flash
-- touchscreen(ucb1200)
-- ps/2 keyboard
-- console on LCD screen
-- serial ports (ttyS[0-2])
-  - ttyS0 is default for serial console
-- Smart I/O (ADC, keypad, digital inputs, etc)
-  See http://www.eurotech-inc.com/linux-sbc.asp for IOCTL documentation
-  and example user space code. ps/2 keybd is multiplexed through this driver
-
-To do:
-- everything else!  :-)
-
-Notes:
-
-- The flash on board is divided into 3 partitions.  mtd0 is where
-  the zImage is stored.  It's been marked as read-only to keep you
-  from blasting over the bootloader. :)  mtd1 is
-  for the ramdisk.gz image.  mtd2 is user flash space and can be
-  utilized for either JFFS or if you're feeling crazy, running ext2
-  on top of it. If you're not using the ADS bootloader, you're
-  welcome to blast over the mtd1 partition also.
-
-- 16bpp mode requires a different cable than what ships with the board.
-  Contact ADS or look through the manual to wire your own. Currently,
-  if you compile with 16bit mode support and switch into a lower bpp
-  mode, the timing is off so the image is corrupted.  This will be
-  fixed soon.
-
-Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/SA1100/HUW_WEBPANEL b/Documentation/arm/SA1100/HUW_WEBPANEL
deleted file mode 100644
index fd56b48d4833..000000000000
--- a/Documentation/arm/SA1100/HUW_WEBPANEL
+++ /dev/null
@@ -1,17 +0,0 @@
-The HUW_WEBPANEL is a product of the german company Hoeft & Wessel AG
-
-If you want more information, please visit
-http://www.hoeft-wessel.de
-
-To build the kernel:
-	make huw_webpanel_config
-	make oldconfig
-	[accept all defaults]
-	make zImage
-
-Mostly of the work is done by:
-Roman Jordan         jor@hoeft-wessel.de
-Christoph Schulz    schu@hoeft-wessel.de
-
-2000/12/18/
-
diff --git a/Documentation/arm/SA1100/Itsy b/Documentation/arm/SA1100/Itsy
deleted file mode 100644
index 44b94997fa0d..000000000000
--- a/Documentation/arm/SA1100/Itsy
+++ /dev/null
@@ -1,39 +0,0 @@
-Itsy is a research project done by the Western Research Lab, and Systems
-Research Center in Palo Alto, CA. The Itsy project is one of several
-research projects at Compaq that are related to pocket computing.
-
-For more information, see:
-
-	http://www.hpl.hp.com/downloads/crl/itsy/
-
-Notes on initial 2.4 Itsy support (8/27/2000) :
-The port was done on an Itsy version 1.5 machine with a daughtercard with
-64 Meg of DRAM and 32 Meg of Flash. The initial work includes support for
-serial console (to see what you're doing).  No other devices have been
-enabled.
-
-To build, do a "make menuconfig" (or xmenuconfig) and select Itsy support.
-Disable Flash and LCD support. and then do a make zImage.
-Finally, you will need to cd to arch/arm/boot/tools and execute a make there
-to build the params-itsy program used to boot the kernel.
-
-In order to install the port of 2.4 to the itsy, You will need to set the
-configuration parameters in the monitor as follows:
-Arg 1:0x08340000, Arg2: 0xC0000000, Arg3:18 (0x12), Arg4:0
-Make sure the start-routine address is set to 0x00060000.
-
-Next, flash the params-itsy program to 0x00060000 ("p 1 0x00060000" in the
-flash menu)  Flash the kernel in arch/arm/boot/zImage into 0x08340000
-("p 1 0x00340000").  Finally flash an initial ramdisk into 0xC8000000
-("p 2 0x0")  We used ramdisk-2-30.gz from the 0.11 version directory on
-handhelds.org.
-
-The serial connection we established was at:
- 8-bit data, no parity, 1 stop bit(s), 115200.00 b/s. in the monitor, in the
-params-itsy program, and in the kernel itself.  This can be changed, but
-not easily. The monitor parameters are easily changed, the params program
-setup is assembly outl's, and the kernel is a configuration item specific to
-the itsy. (i.e. grep for CONFIG_SA1100_ITSY and you'll find where it is.)
-
-
-This should get you a properly booting 2.4 kernel on the itsy.
diff --git a/Documentation/arm/SA1100/LART b/Documentation/arm/SA1100/LART
deleted file mode 100644
index 6d412b685598..000000000000
--- a/Documentation/arm/SA1100/LART
+++ /dev/null
@@ -1,14 +0,0 @@
-Linux Advanced Radio Terminal (LART)
-------------------------------------
-
-The LART is a small (7.5 x 10cm) SA-1100 board, designed for embedded
-applications. It has 32 MB DRAM, 4MB Flash ROM, double RS232 and all
-other StrongARM-gadgets. Almost all SA signals are directly accessible
-through a number of connectors. The powersupply accepts voltages
-between 3.5V and 16V and is overdimensioned to support a range of
-daughterboards. A quad Ethernet / IDE / PS2 / sound daughterboard
-is under development, with plenty of others in different stages of
-planning.
-
-The hardware designs for this board have been released under an open license;
-see the LART page at http://www.lartmaker.nl/ for more information.
diff --git a/Documentation/arm/SA1100/PLEB b/Documentation/arm/SA1100/PLEB
deleted file mode 100644
index b9c8a631a351..000000000000
--- a/Documentation/arm/SA1100/PLEB
+++ /dev/null
@@ -1,11 +0,0 @@
-The PLEB project was started as a student initiative at the School of
-Computer Science and Engineering, University of New South Wales to make a
-pocket computer capable of running the Linux Kernel.
-
-PLEB support has yet to be fully integrated.
-
-For more information, see:
-
-	http://www.cse.unsw.edu.au
-
-
diff --git a/Documentation/arm/SA1100/Pangolin b/Documentation/arm/SA1100/Pangolin
deleted file mode 100644
index 077a6120e129..000000000000
--- a/Documentation/arm/SA1100/Pangolin
+++ /dev/null
@@ -1,23 +0,0 @@
-Pangolin is a StrongARM 1110-based evaluation platform produced
-by Dialogue Technology (http://www.dialogue.com.tw/).
-It has EISA slots for ease of configuration with SDRAM/Flash
-memory card, USB/Serial/Audio card, Compact Flash card,
-PCMCIA/IDE card and TFT-LCD card.
-
-To compile for Pangolin, you must issue the following commands:
-
-	make pangolin_config
-	make oldconfig
-	make zImage
-
-Supported peripherals:
-- SA1110 serial port (UART1/UART2/UART3)
-- flash memory access
-- compact flash driver
-- UDA1341 sound driver
-- SA1100 LCD controller for 800x600 16bpp TFT-LCD
-- MQ-200 driver for 800x600 16bpp TFT-LCD
-- Penmount(touch panel) driver
-- PCMCIA driver
-- SMC91C94 LAN driver
-- IDE driver (experimental)
diff --git a/Documentation/arm/SA1100/Tifon b/Documentation/arm/SA1100/Tifon
deleted file mode 100644
index dd1934d9c851..000000000000
--- a/Documentation/arm/SA1100/Tifon
+++ /dev/null
@@ -1,7 +0,0 @@
-Tifon
------
-
-More info has to come...
-
-Contact: Peter Danielsson <peter.danielsson@era-t.ericsson.se>
-
diff --git a/Documentation/arm/SA1100/Yopy b/Documentation/arm/SA1100/Yopy
deleted file mode 100644
index e14f16d836ac..000000000000
--- a/Documentation/arm/SA1100/Yopy
+++ /dev/null
@@ -1,2 +0,0 @@
-See http://www.yopydeveloper.org for more.
-
diff --git a/Documentation/arm/SA1100/empeg b/Documentation/arm/SA1100/empeg
deleted file mode 100644
index 4ece4849a42c..000000000000
--- a/Documentation/arm/SA1100/empeg
+++ /dev/null
@@ -1,2 +0,0 @@
-See ../empeg/README
-
diff --git a/Documentation/arm/SA1100/nanoEngine b/Documentation/arm/SA1100/nanoEngine
deleted file mode 100644
index 48a7934f95f6..000000000000
--- a/Documentation/arm/SA1100/nanoEngine
+++ /dev/null
@@ -1,11 +0,0 @@
-nanoEngine
-----------
-
-"nanoEngine" is a SA1110 based single board computer from 
-Bright Star Engineering Inc.  See www.brightstareng.com/arm
-for more info.
-(Ref: Stuart Adams <sja@brightstareng.com>)
-
-Also visit Larry Doolittle's "Linux for the nanoEngine" site:
-http://www.brightstareng.com/arm/nanoeng.htm
-
diff --git a/Documentation/arm/SA1100/serial_UART b/Documentation/arm/SA1100/serial_UART
deleted file mode 100644
index a63966f1d083..000000000000
--- a/Documentation/arm/SA1100/serial_UART
+++ /dev/null
@@ -1,47 +0,0 @@
-The SA1100 serial port had its major/minor numbers officially assigned:
-
-> Date: Sun, 24 Sep 2000 21:40:27 -0700
-> From: H. Peter Anvin <hpa@transmeta.com>
-> To: Nicolas Pitre <nico@CAM.ORG>
-> Cc: Device List Maintainer <device@lanana.org>
-> Subject: Re: device
-> 
-> Okay.  Note that device numbers 204 and 205 are used for "low density
-> serial devices", so you will have a range of minors on those majors (the
-> tty device layer handles this just fine, so you don't have to worry about
-> doing anything special.)
-> 
-> So your assignments are:
-> 
-> 204 char        Low-density serial ports
->                   5 = /dev/ttySA0               SA1100 builtin serial port 0
->                   6 = /dev/ttySA1               SA1100 builtin serial port 1
->                   7 = /dev/ttySA2               SA1100 builtin serial port 2
-> 
-> 205 char        Low-density serial ports (alternate device)
->                   5 = /dev/cusa0                Callout device for ttySA0
->                   6 = /dev/cusa1                Callout device for ttySA1
->                   7 = /dev/cusa2                Callout device for ttySA2
->
-
-You must create those inodes in /dev on the root filesystem used
-by your SA1100-based device:
-
-	mknod ttySA0 c 204 5
-	mknod ttySA1 c 204 6
-	mknod ttySA2 c 204 7
-	mknod cusa0 c 205 5
-	mknod cusa1 c 205 6
-	mknod cusa2 c 205 7
-
-In addition to the creation of the appropriate device nodes above, you
-must ensure your user space applications make use of the correct device
-name. The classic example is the content of the /etc/inittab file where
-you might have a getty process started on ttyS0.  In this case:
-
-- replace occurrences of ttyS0 with ttySA0, ttyS1 with ttySA1, etc.
-
-- don't forget to add 'ttySA0', 'console', or the appropriate tty name
-  in /etc/securetty for root to be allowed to login as well.
-
-
diff --git a/Documentation/arm/SH-Mobile/.gitignore b/Documentation/arm/SH-Mobile/.gitignore
deleted file mode 100644
index c928dbf3cc88..000000000000
--- a/Documentation/arm/SH-Mobile/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-vrl4
diff --git a/Documentation/arm/SPEAr/overview.txt b/Documentation/arm/SPEAr/overview.txt
deleted file mode 100644
index 1b049be6c84f..000000000000
--- a/Documentation/arm/SPEAr/overview.txt
+++ /dev/null
@@ -1,63 +0,0 @@
-			SPEAr ARM Linux Overview
-			==========================
-
-Introduction
-------------
-
-  SPEAr (Structured Processor Enhanced Architecture).
-  weblink : http://www.st.com/spear
-
-  The ST Microelectronics SPEAr range of ARM9/CortexA9 System-on-Chip CPUs are
-  supported by the 'spear' platform of ARM Linux. Currently SPEAr1310,
-  SPEAr1340, SPEAr300, SPEAr310, SPEAr320 and SPEAr600 SOCs are supported.
-
-  Hierarchy in SPEAr is as follows:
-
-  SPEAr (Platform)
-	- SPEAr3XX (3XX SOC series, based on ARM9)
-		- SPEAr300 (SOC)
-			- SPEAr300 Evaluation Board
-		- SPEAr310 (SOC)
-			- SPEAr310 Evaluation Board
-		- SPEAr320 (SOC)
-			- SPEAr320 Evaluation Board
-	- SPEAr6XX (6XX SOC series, based on ARM9)
-		- SPEAr600 (SOC)
-			- SPEAr600 Evaluation Board
-	- SPEAr13XX (13XX SOC series, based on ARM CORTEXA9)
-		- SPEAr1310 (SOC)
-			- SPEAr1310 Evaluation Board
-		- SPEAr1340 (SOC)
-			- SPEAr1340 Evaluation Board
-
-  Configuration
-  -------------
-
-  A generic configuration is provided for each machine, and can be used as the
-  default by
-	make spear13xx_defconfig
-	make spear3xx_defconfig
-	make spear6xx_defconfig
-
-  Layout
-  ------
-
-  The common files for multiple machine families (SPEAr3xx, SPEAr6xx and
-  SPEAr13xx) are located in the platform code contained in arch/arm/plat-spear
-  with headers in plat/.
-
-  Each machine series have a directory with name arch/arm/mach-spear followed by
-  series name. Like mach-spear3xx, mach-spear6xx and mach-spear13xx.
-
-  Common file for machines of spear3xx family is mach-spear3xx/spear3xx.c, for
-  spear6xx is mach-spear6xx/spear6xx.c and for spear13xx family is
-  mach-spear13xx/spear13xx.c. mach-spear* also contain soc/machine specific
-  files, like spear1310.c, spear1340.c spear300.c, spear310.c, spear320.c and
-  spear600.c.  mach-spear* doesn't contains board specific files as they fully
-  support Flattened Device Tree.
-
-
-  Document Author
-  ---------------
-
-  Viresh Kumar <vireshk@kernel.org>, (c) 2010-2012 ST Microelectronics
diff --git a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
deleted file mode 100644
index fa968aa99d67..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-		S3C24XX CPUfreq support
-		=======================
-
-Introduction
-------------
-
- The S3C24XX series support a number of power saving systems, such as
- the ability to change the core, memory and peripheral operating
- frequencies. The core control is exported via the CPUFreq driver
- which has a number of different manual or automatic controls over the
- rate the core is running at.
-
- There are two forms of the driver depending on the specific CPU and
- how the clocks are arranged. The first implementation used as single
- PLL to feed the ARM, memory and peripherals via a series of dividers
- and muxes and this is the implementation that is documented here. A
- newer version where there is a separate PLL and clock divider for the
- ARM core is available as a separate driver.
-
-
-Layout
-------
-
- The code core manages the CPU specific drivers, any data that they
- need to register and the interface to the generic drivers/cpufreq
- system. Each CPU registers a driver to control the PLL, clock dividers
- and anything else associated with it. Any board that wants to use this
- framework needs to supply at least basic details of what is required.
-
- The core registers with drivers/cpufreq at init time if all the data
- necessary has been supplied.
-
-
-CPU support
------------
-
- The support for each CPU depends on the facilities provided by the
- SoC and the driver as each device has different PLL and clock chains
- associated with it.
-
-
-Slow Mode
----------
-
- The SLOW mode where the PLL is turned off altogether and the
- system is fed by the external crystal input is currently not
- supported.
-
-
-sysfs
------
-
- The core code exports extra information via sysfs in the directory
- devices/system/cpu/cpu0/arch-freq.
-
-
-Board Support
--------------
-
- Each board that wants to use the cpufreq code must register some basic
- information with the core driver to provide information about what the
- board requires and any restrictions being placed on it.
-
- The board needs to supply information about whether it needs the IO bank
- timings changing, any maximum frequency limits and information about the
- SDRAM refresh rate.
-
-
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2009 Simtec Electronics
-Licensed under GPLv2
diff --git a/Documentation/arm/Samsung-S3C24XX/EB2410ITX.txt b/Documentation/arm/Samsung-S3C24XX/EB2410ITX.txt
deleted file mode 100644
index b87292e05f2f..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/EB2410ITX.txt
+++ /dev/null
@@ -1,58 +0,0 @@
-		Simtec Electronics EB2410ITX (BAST)
-		===================================
-
-	http://www.simtec.co.uk/products/EB2410ITX/
-
-Introduction
-------------
-
-  The EB2410ITX is a S3C2410 based development board with a variety of
-  peripherals and expansion connectors. This board is also known by
-  the shortened name of Bast.
-
-
-Configuration
--------------
-
-  To set the default configuration, use `make bast_defconfig` which
-  supports the commonly used features of this board.
-
-
-Support
--------
-
-  Official support information can be found on the Simtec Electronics
-  website, at the product page http://www.simtec.co.uk/products/EB2410ITX/
-
-  Useful links:
-
-    - Resources Page http://www.simtec.co.uk/products/EB2410ITX/resources.html
-
-    - Board FAQ at http://www.simtec.co.uk/products/EB2410ITX/faq.html
-
-    - Bootloader info http://www.simtec.co.uk/products/SWABLE/resources.html
-      and FAQ http://www.simtec.co.uk/products/SWABLE/faq.html
-
-
-MTD
----
-
-  The NAND and NOR support has been merged from the linux-mtd project.
-  Any problems, see http://www.linux-mtd.infradead.org/ for more
-  information or up-to-date versions of linux-mtd.
-
-
-IDE
----
-
-  Both onboard IDE ports are supported, however there is no support for
-  changing speed of devices, PIO Mode 4 capable drives should be used.
-
-
-Maintainers
------------
-
-  This board is maintained by Simtec Electronics.
-
-
-Copyright 2004 Ben Dooks, Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/GPIO.txt b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
deleted file mode 100644
index e8f918b96123..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/GPIO.txt
+++ /dev/null
@@ -1,171 +0,0 @@
-			S3C24XX GPIO Control
-			====================
-
-Introduction
-------------
-
-  The s3c2410 kernel provides an interface to configure and
-  manipulate the state of the GPIO pins, and find out other
-  information about them.
-
-  There are a number of conditions attached to the configuration
-  of the s3c2410 GPIO system, please read the Samsung provided
-  data-sheet/users manual to find out the complete list.
-
-  See Documentation/arm/Samsung/GPIO.txt for the core implementation.
-
-
-GPIOLIB
--------
-
-  With the event of the GPIOLIB in drivers/gpio, support for some
-  of the GPIO functions such as reading and writing a pin will
-  be removed in favour of this common access method.
-
-  Once all the extant drivers have been converted, the functions
-  listed below will be removed (they may be marked as __deprecated
-  in the near future).
-
-  The following functions now either have a s3c_ specific variant
-  or are merged into gpiolib. See the definitions in
-  arch/arm/plat-samsung/include/plat/gpio-cfg.h:
-
-  s3c2410_gpio_setpin()		gpio_set_value() or gpio_direction_output()
-  s3c2410_gpio_getpin()		gpio_get_value() or gpio_direction_input()
-  s3c2410_gpio_getirq()		gpio_to_irq()
-  s3c2410_gpio_cfgpin()		s3c_gpio_cfgpin()
-  s3c2410_gpio_getcfg()		s3c_gpio_getcfg()
-  s3c2410_gpio_pullup()		s3c_gpio_setpull()
-
-
-GPIOLIB conversion
-------------------
-
-If you need to convert your board or driver to use gpiolib from the phased
-out s3c2410 API, then here are some notes on the process.
-
-1) If your board is exclusively using an GPIO, say to control peripheral
-   power, then it will require to claim the gpio with gpio_request() before
-   it can use it.
-
-   It is recommended to check the return value, with at least WARN_ON()
-   during initialisation.
-
-2) The s3c2410_gpio_cfgpin() can be directly replaced with s3c_gpio_cfgpin()
-   as they have the same arguments, and can either take the pin specific
-   values, or the more generic special-function-number arguments.
-
-3) s3c2410_gpio_pullup() changes have the problem that while the
-   s3c2410_gpio_pullup(x, 1) can be easily translated to the
-   s3c_gpio_setpull(x, S3C_GPIO_PULL_NONE), the s3c2410_gpio_pullup(x, 0)
-   are not so easy.
-
-   The s3c2410_gpio_pullup(x, 0) case enables the pull-up (or in the case
-   of some of the devices, a pull-down) and as such the new API distinguishes
-   between the UP and DOWN case. There is currently no 'just turn on' setting
-   which may be required if this becomes a problem.
-
-4) s3c2410_gpio_setpin() can be replaced by gpio_set_value(), the old call
-   does not implicitly configure the relevant gpio to output. The gpio
-   direction should be changed before using gpio_set_value().
-
-5) s3c2410_gpio_getpin() is replaceable by gpio_get_value() if the pin
-   has been set to input. It is currently unknown what the behaviour is
-   when using gpio_get_value() on an output pin (s3c2410_gpio_getpin
-   would return the value the pin is supposed to be outputting).
-
-6) s3c2410_gpio_getirq() should be directly replaceable with the
-   gpio_to_irq() call.
-
-The s3c2410_gpio and gpio_ calls have always operated on the same gpio
-numberspace, so there is no problem with converting the gpio numbering
-between the calls.
-
-
-Headers
--------
-
-  See arch/arm/mach-s3c24xx/include/mach/regs-gpio.h for the list
-  of GPIO pins, and the configuration values for them. This
-  is included by using #include <mach/regs-gpio.h>
-
-
-PIN Numbers
------------
-
-  Each pin has an unique number associated with it in regs-gpio.h,
-  e.g. S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
-  the GPIO functions which pin is to be used.
-
-  With the conversion to gpiolib, there is no longer a direct conversion
-  from gpio pin number to register base address as in earlier kernels. This
-  is due to the number space required for newer SoCs where the later
-  GPIOs are not contiguous.
-
-
-Configuring a pin
------------------
-
-  The following function allows the configuration of a given pin to
-  be changed.
-
-    void s3c_gpio_cfgpin(unsigned int pin, unsigned int function);
-
-  e.g.:
-
-     s3c_gpio_cfgpin(S3C2410_GPA(0), S3C_GPIO_SFN(1));
-     s3c_gpio_cfgpin(S3C2410_GPE(8), S3C_GPIO_SFN(2));
-
-   which would turn GPA(0) into the lowest Address line A0, and set
-   GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
-
-
-Reading the current configuration
----------------------------------
-
-  The current configuration of a pin can be read by using standard
-  gpiolib function:
-
-  s3c_gpio_getcfg(unsigned int pin);
-
-  The return value will be from the same set of values which can be
-  passed to s3c_gpio_cfgpin().
-
-
-Configuring a pull-up resistor
-------------------------------
-
-  A large proportion of the GPIO pins on the S3C2410 can have weak
-  pull-up resistors enabled. This can be configured by the following
-  function:
-
-    void s3c_gpio_setpull(unsigned int pin, unsigned int to);
-
-  Where the to value is S3C_GPIO_PULL_NONE to set the pull-up off,
-  and S3C_GPIO_PULL_UP to enable the specified pull-up. Any other
-  values are currently undefined.
-
-
-Getting and setting the state of a PIN
---------------------------------------
-
-  These calls are now implemented by the relevant gpiolib calls, convert
-  your board or driver to use gpiolib.
-
-
-Getting the IRQ number associated with a PIN
---------------------------------------------
-
-  A standard gpiolib function can map the given pin number to an IRQ
-  number to pass to the IRQ system.
-
-   int gpio_to_irq(unsigned int pin);
-
-  Note, not all pins have an IRQ.
-
-
-Author
--------
-
-Ben Dooks, 03 October 2004
-Copyright 2004 Ben Dooks, Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/H1940.txt b/Documentation/arm/Samsung-S3C24XX/H1940.txt
deleted file mode 100644
index b738859b1fc0..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/H1940.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-		HP IPAQ H1940
-		=============
-
-http://www.handhelds.org/projects/h1940.html
-
-Introduction
-------------
-
-  The HP H1940 is a S3C2410 based handheld device, with
-  bluetooth connectivity.
-
-
-Support
--------
-
-  A variety of information is available
-
-  handhelds.org project page:
-
-    http://www.handhelds.org/projects/h1940.html
-
-  handhelds.org wiki page:
-
-    http://handhelds.org/moin/moin.cgi/HpIpaqH1940
-
-  Herbert Pötzl pages:
-
-    http://vserver.13thfloor.at/H1940/
-
-
-Maintainers
------------
-
-  This project is being maintained and developed by a variety
-  of people, including Ben Dooks, Arnaud Patard, and Herbert Pötzl.
-
-  Thanks to the many others who have also provided support.
-
-
-(c) 2005 Ben Dooks
diff --git a/Documentation/arm/Samsung-S3C24XX/NAND.txt b/Documentation/arm/Samsung-S3C24XX/NAND.txt
deleted file mode 100644
index bc478a3409b8..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/NAND.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-			S3C24XX NAND Support
-			====================
-
-Introduction
-------------
-
-Small Page NAND
----------------
-
-The driver uses a 512 byte (1 page) ECC code for this setup. The
-ECC code is not directly compatible with the default kernel ECC
-code, so the driver enforces its own OOB layout and ECC parameters
-
-Large Page NAND
----------------
-
-The driver is capable of handling NAND flash with a 2KiB page
-size, with support for hardware ECC generation and correction.
-
-Unlike the 512byte page mode, the driver generates ECC data for
-each 256 byte block in an 2KiB page. This means that more than
-one error in a page can be rectified. It also means that the
-OOB layout remains the default kernel layout for these flashes.
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2007 Simtec Electronics
-
diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt
deleted file mode 100644
index 00d3c3141e21..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/Overview.txt
+++ /dev/null
@@ -1,318 +0,0 @@
-			S3C24XX ARM Linux Overview
-			==========================
-
-
-
-Introduction
-------------
-
-  The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
-  by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
-  S3C2412, S3C2413, S3C2416, S3C2440, S3C2442, S3C2443 and S3C2450 devices
-  are supported.
-
-  Support for the S3C2400 and S3C24A0 series was never completed and the
-  corresponding code has been removed after a while.  If someone wishes to
-  revive this effort, partial support can be retrieved from earlier Linux
-  versions.
-
-  The S3C2416 and S3C2450 devices are very similar and S3C2450 support is
-  included under the arch/arm/mach-s3c2416 directory. Note, while core
-  support for these SoCs is in, work on some of the extra peripherals
-  and extra interrupts is still ongoing.
-
-
-Configuration
--------------
-
-  A generic S3C2410 configuration is provided, and can be used as the
-  default by `make s3c2410_defconfig`. This configuration has support
-  for all the machines, and the commonly used features on them.
-
-  Certain machines may have their own default configurations as well,
-  please check the machine specific documentation.
-
-
-Layout
-------
-
-  The core support files are located in the platform code contained in
-  arch/arm/plat-s3c24xx with headers in include/asm-arm/plat-s3c24xx.
-  This directory should be kept to items shared between the platform
-  code (arch/arm/plat-s3c24xx) and the arch/arm/mach-s3c24* code.
-
-  Each cpu has a directory with the support files for it, and the
-  machines that carry the device. For example S3C2410 is contained
-  in arch/arm/mach-s3c2410 and S3C2440 in arch/arm/mach-s3c2440
-
-  Register, kernel and platform data definitions are held in the
-  arch/arm/mach-s3c2410 directory./include/mach
-
-arch/arm/plat-s3c24xx:
-
-  Files in here are either common to all the s3c24xx family,
-  or are common to only some of them with names to indicate this
-  status. The files that are not common to all are generally named
-  with the initial cpu they support in the series to ensure a short
-  name without any possibility of confusion with newer devices.
-
-  As an example, initially s3c244x would cover s3c2440 and s3c2442, but
-  with the s3c2443 which does not share many of the same drivers in
-  this directory, the name becomes invalid. We stick to s3c2440-<x>
-  to indicate a driver that is s3c2440 and s3c2442 compatible.
-
-  This does mean that to find the status of any given SoC, a number
-  of directories may need to be searched.
-
-
-Machines
---------
-
-  The currently supported machines are as follows:
-
-  Simtec Electronics EB2410ITX (BAST)
-
-    A general purpose development board, see EB2410ITX.txt for further
-    details
-
-  Simtec Electronics IM2440D20 (Osiris)
-
-    CPU Module from Simtec Electronics, with a S3C2440A CPU, nand flash
-    and a PCMCIA controller.
-
-  Samsung SMDK2410
-
-    Samsung's own development board, geared for PDA work.
-
-  Samsung/Aiji SMDK2412
-
-    The S3C2412 version of the SMDK2440.
-
-  Samsung/Aiji SMDK2413
-
-    The S3C2412 version of the SMDK2440.
-
-  Samsung/Meritech SMDK2440
-
-    The S3C2440 compatible version of the SMDK2440, which has the
-    option of an S3C2440 or S3C2442 CPU module.
-
-  Thorcom VR1000
-
-    Custom embedded board
-
-  HP IPAQ 1940
-
-    Handheld (IPAQ), available in several varieties
-
-  HP iPAQ rx3715
-
-    S3C2440 based IPAQ, with a number of variations depending on
-    features shipped.
-
-  Acer N30
-
-    A S3C2410 based PDA from Acer.  There is a Wiki page at
-    http://handhelds.org/moin/moin.cgi/AcerN30Documentation .
-
-  AML M5900
-
-    American Microsystems' M5900
-
-  Nex Vision Nexcoder
-  Nex Vision Otom
-
-    Two machines by Nex Vision
-
-
-Adding New Machines
--------------------
-
-  The architecture has been designed to support as many machines as can
-  be configured for it in one kernel build, and any future additions
-  should keep this in mind before altering items outside of their own
-  machine files.
-
-  Machine definitions should be kept in linux/arch/arm/mach-s3c2410,
-  and there are a number of examples that can be looked at.
-
-  Read the kernel patch submission policies as well as the
-  Documentation/arm directory before submitting patches. The
-  ARM kernel series is managed by Russell King, and has a patch system
-  located at http://www.arm.linux.org.uk/developer/patches/
-  as well as mailing lists that can be found from the same site.
-
-  As a courtesy, please notify <ben-linux@fluff.org> of any new
-  machines or other modifications.
-
-  Any large scale modifications, or new drivers should be discussed
-  on the ARM kernel mailing list (linux-arm-kernel) before being
-  attempted. See http://www.arm.linux.org.uk/mailinglists/ for the
-  mailing list information.
-
-
-I2C
----
-
-  The hardware I2C core in the CPU is supported in single master
-  mode, and can be configured via platform data.
-
-
-RTC
----
-
-  Support for the onboard RTC unit, including alarm function.
-
-  This has recently been upgraded to use the new RTC core,
-  and the module has been renamed to rtc-s3c to fit in with
-  the new rtc naming scheme.
-
-
-Watchdog
---------
-
-  The onchip watchdog is available via the standard watchdog
-  interface.
-
-
-NAND
-----
-
-  The current kernels now have support for the s3c2410 NAND
-  controller. If there are any problems the latest linux-mtd
-  code can be found from http://www.linux-mtd.infradead.org/
-
-  For more information see Documentation/arm/Samsung-S3C24XX/NAND.txt
-
-
-SD/MMC
-------
-
-  The SD/MMC hardware pre S3C2443 is supported in the current
-  kernel, the driver is drivers/mmc/host/s3cmci.c and supports
-  1 and 4 bit SD or MMC cards.
-
-  The SDIO behaviour of this driver has not been fully tested. There is no
-  current support for hardware SDIO interrupts.
-
-
-Serial
-------
-
-  The s3c2410 serial driver provides support for the internal
-  serial ports. These devices appear as /dev/ttySAC0 through 3.
-
-  To create device nodes for these, use the following commands
-
-    mknod ttySAC0 c 204 64
-    mknod ttySAC1 c 204 65
-    mknod ttySAC2 c 204 66
-
-
-GPIO
-----
-
-  The core contains support for manipulating the GPIO, see the
-  documentation in GPIO.txt in the same directory as this file.
-
-  Newer kernels carry GPIOLIB, and support is being moved towards
-  this with some of the older support in line to be removed.
-
-  As of v2.6.34, the move towards using gpiolib support is almost
-  complete, and very little of the old calls are left.
-
-  See Documentation/arm/Samsung-S3C24XX/GPIO.txt for the S3C24XX specific
-  support and Documentation/arm/Samsung/GPIO.txt for the core Samsung
-  implementation.
-
-
-Clock Management
-----------------
-
-  The core provides the interface defined in the header file
-  include/asm-arm/hardware/clock.h, to allow control over the
-  various clock units
-
-
-Suspend to RAM
---------------
-
-  For boards that provide support for suspend to RAM, the
-  system can be placed into low power suspend.
-
-  See Suspend.txt for more information.
-
-
-SPI
----
-
-  SPI drivers are available for both the in-built hardware
-  (although there is no DMA support yet) and a generic
-  GPIO based solution.
-
-
-LEDs
-----
-
-  There is support for GPIO based LEDs via a platform driver
-  in the LED subsystem.
-
-
-Platform Data
--------------
-
-  Whenever a device has platform specific data that is specified
-  on a per-machine basis, care should be taken to ensure the
-  following:
-
-    1) that default data is not left in the device to confuse the
-       driver if a machine does not set it at startup
-
-    2) the data should (if possible) be marked as __initdata,
-       to ensure that the data is thrown away if the machine is
-       not the one currently in use.
-
-       The best way of doing this is to make a function that
-       kmalloc()s an area of memory, and copies the __initdata
-       and then sets the relevant device's platform data. Making
-       the function `__init` takes care of ensuring it is discarded
-       with the rest of the initialisation code
-
-       static __init void s3c24xx_xxx_set_platdata(struct xxx_data *pd)
-       {
-           struct s3c2410_xxx_mach_info *npd;
-
-	   npd = kmalloc(sizeof(struct s3c2410_xxx_mach_info), GFP_KERNEL);
-	   if (npd) {
-	      memcpy(npd, pd, sizeof(struct s3c2410_xxx_mach_info));
-	      s3c_device_xxx.dev.platform_data = npd;
-	   } else {
-              printk(KERN_ERR "no memory for xxx platform data\n");
-	   }
-	}
-
-	Note, since the code is marked as __init, it should not be
-	exported outside arch/arm/mach-s3c2410/, or exported to
-	modules via EXPORT_SYMBOL() and related functions.
-
-
-Port Contributors
------------------
-
-  Ben Dooks (BJD)
-  Vincent Sanders
-  Herbert Potzl
-  Arnaud Patard (RTP)
-  Roc Wu
-  Klaus Fetscher
-  Dimitry Andric
-  Shannon Holland
-  Guillaume Gourat (NexVision)
-  Christer Weinigel (wingel) (Acer N30)
-  Lucas Correia Villa Real (S3C2400 port)
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2004-2006 Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2412.txt b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt
deleted file mode 100644
index dc1fd362d3c1..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/S3C2412.txt
+++ /dev/null
@@ -1,120 +0,0 @@
-		S3C2412 ARM Linux Overview
-		==========================
-
-Introduction
-------------
-
-  The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs
-  from Samsung. This part has an ARM926-EJS core, capable of running up
-  to 266MHz (see data-sheet for more information)
-
-
-Clock
------
-
-  The core clock code provides a set of clocks to the drivers, and allows
-  for source selection and a number of other features.
-
-
-Power
------
-
-  No support for suspend/resume to RAM in the current system.
-
-
-DMA
----
-
-  No current support for DMA.
-
-
-GPIO
-----
-
-  There is support for setting the GPIO to input/output/special function
-  and reading or writing to them.
-
-
-UART
-----
-
-  The UART hardware is similar to the S3C2440, and is supported by the
-  s3c2410 driver in the drivers/serial directory.
-
-
-NAND
-----
-
-  The NAND hardware is similar to the S3C2440, and is supported by the
-  s3c2410 driver in the drivers/mtd/nand/raw directory.
-
-
-USB Host
---------
-
-  The USB hardware is similar to the S3C2410, with extended clock source
-  control. The OHCI portion is supported by the ohci-s3c2410 driver, and
-  the clock control selection is supported by the core clock code.
-
-
-USB Device
-----------
-
-  No current support in the kernel
-
-
-IRQs
-----
-
-  All the standard, and external interrupt sources are supported. The
-  extra sub-sources are not yet supported.
-
-
-RTC
----
-
-  The RTC hardware is similar to the S3C2410, and is supported by the
-  s3c2410-rtc driver.
-
-
-Watchdog
---------
-
-  The watchdog hardware is the same as the S3C2410, and is supported by
-  the s3c2410_wdt driver.
-
-
-MMC/SD/SDIO
------------
-
-  No current support for the MMC/SD/SDIO block.
-
-IIC
----
-
-  The IIC hardware is the same as the S3C2410, and is supported by the
-  i2c-s3c24xx driver.
-
-
-IIS
----
-
-  No current support for the IIS interface.
-
-
-SPI
----
-
-  No current support for the SPI interfaces.
-
-
-ATA
----
-
-  No current support for the on-board ATA block.
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2006 Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2413.txt b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt
deleted file mode 100644
index 909bdc7dd7b5..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/S3C2413.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-		S3C2413 ARM Linux Overview
-		==========================
-
-Introduction
-------------
-
-  The S3C2413 is an extended version of the S3C2412, with an camera
-  interface and mobile DDR memory support. See the S3C2412 support
-  documentation for more information.
-
-
-Camera Interface
----------------
-
-  This block is currently not supported.
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2006 Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt b/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt
deleted file mode 100644
index 429390bd4684..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt
+++ /dev/null
@@ -1,56 +0,0 @@
-		Samsung/Meritech SMDK2440
-		=========================
-
-Introduction
-------------
-
-  The SMDK2440 is a two part evaluation board for the Samsung S3C2440
-  processor. It includes support for LCD, SmartMedia, Audio, SD and
-  10MBit Ethernet, and expansion headers for various signals, including
-  the camera and unused GPIO.
-
-
-Configuration
--------------
-
-  To set the default configuration, use `make smdk2440_defconfig` which
-  will configure the common features of this board, or use
-  `make s3c2410_config` to include support for all s3c2410/s3c2440 machines
-
-
-Support
--------
-
-  Ben Dooks' SMDK2440 site at http://www.fluff.org/ben/smdk2440/ which
-  includes linux based USB download tools.
-
-  Some of the h1940 patches that can be found from the H1940 project
-  site at http://www.handhelds.org/projects/h1940.html can also be
-  applied to this board.
-
-
-Peripherals
------------
-
-  There is no current support for any of the extra peripherals on the
-  base-board itself.
-
-
-MTD
----
-
-  The NAND flash should be supported by the in kernel MTD NAND support,
-  NOR flash will be added later.
-
-
-Maintainers
------------
-
-  This board is being maintained by Ben Dooks, for more info, see
-  http://www.fluff.org/ben/smdk2440/
-
-  Many thanks to Dimitry Andric of TomTom for the loan of the SMDK2440,
-  and to Simtec Electronics for allowing me time to work on this.
-
-
-(c) 2004 Ben Dooks
diff --git a/Documentation/arm/Samsung-S3C24XX/Suspend.txt b/Documentation/arm/Samsung-S3C24XX/Suspend.txt
deleted file mode 100644
index cb4f0c0cdf9d..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/Suspend.txt
+++ /dev/null
@@ -1,137 +0,0 @@
-			S3C24XX Suspend Support
-			=======================
-
-
-Introduction
-------------
-
-  The S3C24XX supports a low-power suspend mode, where the SDRAM is kept
-  in Self-Refresh mode, and all but the essential peripheral blocks are
-  powered down. For more information on how this works, please look
-  at the relevant CPU datasheet from Samsung.
-
-
-Requirements
-------------
-
-  1) A bootloader that can support the necessary resume operation
-
-  2) Support for at least 1 source for resume
-
-  3) CONFIG_PM enabled in the kernel
-
-  4) Any peripherals that are going to be powered down at the same
-     time require suspend/resume support.
-
-
-Resuming
---------
-
-  The S3C2410 user manual defines the process of sending the CPU to
-  sleep and how it resumes. The default behaviour of the Linux code
-  is to set the GSTATUS3 register to the physical address of the
-  code to resume Linux operation.
-
-  GSTATUS4 is currently left alone by the sleep code, and is free to
-  use for any other purposes (for example, the EB2410ITX uses this to
-  save memory configuration in).
-
-
-Machine Support
----------------
-
-  The machine specific functions must call the s3c_pm_init() function
-  to say that its bootloader is capable of resuming. This can be as
-  simple as adding the following to the machine's definition:
-
-  INITMACHINE(s3c_pm_init)
-
-  A board can do its own setup before calling s3c_pm_init, if it
-  needs to setup anything else for power management support.
-
-  There is currently no support for over-riding the default method of
-  saving the resume address, if your board requires it, then contact
-  the maintainer and discuss what is required.
-
-  Note, the original method of adding an late_initcall() is wrong,
-  and will end up initialising all compiled machines' pm init!
-
-  The following is an example of code used for testing wakeup from
-  an falling edge on IRQ_EINT0:
-
-
-static irqreturn_t button_irq(int irq, void *pw)
-{
-	return IRQ_HANDLED;
-}
-
-statuc void __init machine_init(void)
-{
-	...
-
-	request_irq(IRQ_EINT0, button_irq, IRQF_TRIGGER_FALLING,
-		   "button-irq-eint0", NULL);
-
-	enable_irq_wake(IRQ_EINT0);
-
-	s3c_pm_init();
-}
-
-
-Debugging
----------
-
-  There are several important things to remember when using PM suspend:
-
-  1) The uart drivers will disable the clocks to the UART blocks when
-     suspending, which means that use of printascii() or similar direct
-     access to the UARTs will cause the debug to stop.
-
-  2) While the pm code itself will attempt to re-enable the UART clocks,
-     care should be taken that any external clock sources that the UARTs
-     rely on are still enabled at that point.
-
-  3) If any debugging is placed in the resume path, then it must have the
-     relevant clocks and peripherals setup before use (ie, bootloader).
-
-     For example, if you transmit a character from the UART, the baud
-     rate and uart controls must be setup beforehand.
-
-
-Configuration
--------------
-
-  The S3C2410 specific configuration in `System Type` defines various
-  aspects of how the S3C2410 suspend and resume support is configured
-
-  `S3C2410 PM Suspend debug`
-
-    This option prints messages to the serial console before and after
-    the actual suspend, giving detailed information on what is
-    happening
-
-
-  `S3C2410 PM Suspend Memory CRC`
-
-    Allows the entire memory to be checksummed before and after the
-    suspend to see if there has been any corruption of the contents.
-
-    Note, the time to calculate the CRC is dependent on the CPU speed
-    and the size of memory. For an 64Mbyte RAM area on an 200MHz
-    S3C2410, this can take approximately 4 seconds to complete.
-
-    This support requires the CRC32 function to be enabled.
-
-
-  `S3C2410 PM Suspend CRC Chunksize (KiB)`
-
-    Defines the size of memory each CRC chunk covers. A smaller value
-    will mean that the CRC data block will take more memory, but will
-    identify any faults with better precision
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2004 Simtec Electronics
-
diff --git a/Documentation/arm/Samsung-S3C24XX/USB-Host.txt b/Documentation/arm/Samsung-S3C24XX/USB-Host.txt
deleted file mode 100644
index f82b1faefad5..000000000000
--- a/Documentation/arm/Samsung-S3C24XX/USB-Host.txt
+++ /dev/null
@@ -1,93 +0,0 @@
-			S3C24XX USB Host support
-			========================
-
-
-
-Introduction
-------------
-
-  This document details the S3C2410/S3C2440 in-built OHCI USB host support.
-
-Configuration
--------------
-
-  Enable at least the following kernel options:
-
-  menuconfig:
-
-   Device Drivers  --->
-     USB support  --->
-       <*> Support for Host-side USB
-       <*>   OHCI HCD support
-
-
-  .config:
-    CONFIG_USB
-    CONFIG_USB_OHCI_HCD
-
-
-  Once these options are configured, the standard set of USB device
-  drivers can be configured and used.
-
-
-Board Support
--------------
-
-  The driver attaches to a platform device, which will need to be
-  added by the board specific support file in linux/arch/arm/mach-s3c2410,
-  such as mach-bast.c or mach-smdk2410.c
-
-  The platform device's platform_data field is only needed if the
-  board implements extra power control or over-current monitoring.
-
-  The OHCI driver does not ensure the state of the S3C2410's MISCCTRL
-  register, so if both ports are to be used for the host, then it is
-  the board support file's responsibility to ensure that the second
-  port is configured to be connected to the OHCI core.
-
-
-Platform Data
--------------
-
-  See arch/arm/mach-s3c2410/include/mach/usb-control.h for the
-  descriptions of the platform device data. An implementation
-  can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c .
-
-  The `struct s3c2410_hcd_info` contains a pair of functions
-  that get called to enable over-current detection, and to
-  control the port power status.
-
-  The ports are numbered 0 and 1.
-
-  power_control:
-
-    Called to enable or disable the power on the port.
-
-  enable_oc:
-
-    Called to enable or disable the over-current monitoring.
-    This should claim or release the resources being used to
-    check the power condition on the port, such as an IRQ.
-
-  report_oc:
-
-    The OHCI driver fills this field in for the over-current code
-    to call when there is a change to the over-current state on
-    an port. The ports argument is a bitmask of 1 bit per port,
-    with bit X being 1 for an over-current on port X.
-
-    The function s3c2410_usb_report_oc() has been provided to
-    ensure this is called correctly.
-
-  port[x]:
-
-    This is struct describes each port, 0 or 1. The platform driver
-    should set the flags field of each port to S3C_HCDFLG_USED if
-    the port is enabled.
-
-
-
-Document Author
----------------
-
-Ben Dooks, Copyright 2005 Simtec Electronics
diff --git a/Documentation/arm/Samsung/Bootloader-interface.txt b/Documentation/arm/Samsung/Bootloader-interface.txt
deleted file mode 100644
index d17ed518a7ea..000000000000
--- a/Documentation/arm/Samsung/Bootloader-interface.txt
+++ /dev/null
@@ -1,68 +0,0 @@
-      Interface between kernel and boot loaders on Exynos boards
-      ==========================================================
-
-Author: Krzysztof Kozlowski
-Date  : 6 June 2015
-
-The document tries to describe currently used interface between Linux kernel
-and boot loaders on Samsung Exynos based boards. This is not a definition
-of interface but rather a description of existing state, a reference
-for information purpose only.
-
-In the document "boot loader" means any of following: U-boot, proprietary
-SBOOT or any other firmware for ARMv7 and ARMv8 initializing the board before
-executing kernel.
-
-
-1. Non-Secure mode
-
-Address:      sysram_ns_base_addr
-Offset        Value                                        Purpose
-=============================================================================
-0x08          exynos_cpu_resume_ns, mcpm_entry_point       System suspend
-0x0c          0x00000bad (Magic cookie)                    System suspend
-0x1c          exynos4_secondary_startup                    Secondary CPU boot
-0x1c + 4*cpu  exynos4_secondary_startup (Exynos4412)       Secondary CPU boot
-0x20          0xfcba0d10 (Magic cookie)                    AFTR
-0x24          exynos_cpu_resume_ns                         AFTR
-0x28 + 4*cpu  0x8 (Magic cookie, Exynos3250)               AFTR
-0x28          0x0 or last value during resume (Exynos542x) System suspend
-
-
-2. Secure mode
-
-Address:      sysram_base_addr
-Offset        Value                                        Purpose
-=============================================================================
-0x00          exynos4_secondary_startup                    Secondary CPU boot
-0x04          exynos4_secondary_startup (Exynos542x)       Secondary CPU boot
-4*cpu         exynos4_secondary_startup (Exynos4412)       Secondary CPU boot
-0x20          exynos_cpu_resume (Exynos4210 r1.0)          AFTR
-0x24          0xfcba0d10 (Magic cookie, Exynos4210 r1.0)   AFTR
-
-Address:      pmu_base_addr
-Offset        Value                                        Purpose
-=============================================================================
-0x0800        exynos_cpu_resume                            AFTR, suspend
-0x0800        mcpm_entry_point (Exynos542x with MCPM)      AFTR, suspend
-0x0804        0xfcba0d10 (Magic cookie)                    AFTR
-0x0804        0x00000bad (Magic cookie)                    System suspend
-0x0814        exynos4_secondary_startup (Exynos4210 r1.1)  Secondary CPU boot
-0x0818        0xfcba0d10 (Magic cookie, Exynos4210 r1.1)   AFTR
-0x081C        exynos_cpu_resume (Exynos4210 r1.1)          AFTR
-
-
-3. Other (regardless of secure/non-secure mode)
-
-Address:      pmu_base_addr
-Offset        Value                           Purpose
-=============================================================================
-0x0908        Non-zero                        Secondary CPU boot up indicator
-                                              on Exynos3250 and Exynos542x
-
-
-4. Glossary
-
-AFTR - ARM Off Top Running, a low power mode, Cortex cores and many other
-modules are power gated, except the TOP modules
-MCPM - Multi-Cluster Power Management
diff --git a/Documentation/arm/Samsung/GPIO.txt b/Documentation/arm/Samsung/GPIO.txt
deleted file mode 100644
index 795adfd88081..000000000000
--- a/Documentation/arm/Samsung/GPIO.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-		Samsung GPIO implementation
-		===========================
-
-Introduction
-------------
-
-This outlines the Samsung GPIO implementation and the architecture
-specific calls provided alongside the drivers/gpio core.
-
-
-S3C24XX (Legacy)
-----------------
-
-See Documentation/arm/Samsung-S3C24XX/GPIO.txt for more information
-about these devices. Their implementation has been brought into line
-with the core samsung implementation described in this document.
-
-
-GPIOLIB integration
--------------------
-
-The gpio implementation uses gpiolib as much as possible, only providing
-specific calls for the items that require Samsung specific handling, such
-as pin special-function or pull resistor control.
-
-GPIO numbering is synchronised between the Samsung and gpiolib system.
-
-
-PIN configuration
------------------
-
-Pin configuration is specific to the Samsung architecture, with each SoC
-registering the necessary information for the core gpio configuration
-implementation to configure pins as necessary.
-
-The s3c_gpio_cfgpin() and s3c_gpio_setpull() provide the means for a
-driver or machine to change gpio configuration.
-
-See arch/arm/plat-samsung/include/plat/gpio-cfg.h for more information
-on these functions.
diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt
deleted file mode 100644
index 8f7309bad460..000000000000
--- a/Documentation/arm/Samsung/Overview.txt
+++ /dev/null
@@ -1,86 +0,0 @@
-		Samsung ARM Linux Overview
-		==========================
-
-Introduction
-------------
-
-  The Samsung range of ARM SoCs spans many similar devices, from the initial
-  ARM9 through to the newest ARM cores. This document shows an overview of
-  the current kernel support, how to use it and where to find the code
-  that supports this.
-
-  The currently supported SoCs are:
-
-  - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list
-  - S3C64XX: S3C6400 and S3C6410
-  - S5PC110 / S5PV210
-
-
-S3C24XX Systems
----------------
-
-  There is still documentation in Documnetation/arm/Samsung-S3C24XX/ which
-  deals with the architecture and drivers specific to these devices.
-
-  See Documentation/arm/Samsung-S3C24XX/Overview.txt for more information
-  on the implementation details and specific support.
-
-
-Configuration
--------------
-
-  A number of configurations are supplied, as there is no current way of
-  unifying all the SoCs into one kernel.
-
-  s5pc110_defconfig - S5PC110 specific default configuration
-  s5pv210_defconfig - S5PV210 specific default configuration
-
-
-Layout
-------
-
-  The directory layout is currently being restructured, and consists of
-  several platform directories and then the machine specific directories
-  of the CPUs being built for.
-
-  plat-samsung provides the base for all the implementations, and is the
-  last in the line of include directories that are processed for the build
-  specific information. It contains the base clock, GPIO and device definitions
-  to get the system running.
-
-  plat-s3c24xx is for s3c24xx specific builds, see the S3C24XX docs.
-
-  plat-s5p is for s5p specific builds, and contains common support for the
-  S5P specific systems. Not all S5Ps use all the features in this directory
-  due to differences in the hardware.
-
-
-Layout changes
---------------
-
-  The old plat-s3c and plat-s5pc1xx directories have been removed, with
-  support moved to either plat-samsung or plat-s5p as necessary. These moves
-  where to simplify the include and dependency issues involved with having
-  so many different platform directories.
-
-
-Port Contributors
------------------
-
-  Ben Dooks (BJD)
-  Vincent Sanders
-  Herbert Potzl
-  Arnaud Patard (RTP)
-  Roc Wu
-  Klaus Fetscher
-  Dimitry Andric
-  Shannon Holland
-  Guillaume Gourat (NexVision)
-  Christer Weinigel (wingel) (Acer N30)
-  Lucas Correia Villa Real (S3C2400 port)
-
-
-Document Author
----------------
-
-Copyright 2009-2010 Ben Dooks <ben-linux@fluff.org>
diff --git a/Documentation/arm/Samsung/clksrc-change-registers.awk b/Documentation/arm/Samsung/clksrc-change-registers.awk
deleted file mode 100755
index 7be1b8aa7cd9..000000000000
--- a/Documentation/arm/Samsung/clksrc-change-registers.awk
+++ /dev/null
@@ -1,166 +0,0 @@
-#!/usr/bin/awk -f
-#
-# Copyright 2010 Ben Dooks <ben-linux@fluff.org>
-#
-# Released under GPLv2
-
-# example usage
-# ./clksrc-change-registers.awk arch/arm/plat-s5pc1xx/include/plat/regs-clock.h < src > dst
-
-function extract_value(s)
-{
-    eqat = index(s, "=")
-    comat = index(s, ",")
-    return substr(s, eqat+2, (comat-eqat)-2)
-}
-
-function remove_brackets(b)
-{
-    return substr(b, 2, length(b)-2)
-}
-
-function splitdefine(l, p)
-{
-    r = split(l, tp)
-
-    p[0] = tp[2]
-    p[1] = remove_brackets(tp[3])
-}
-
-function find_length(f)
-{
-    if (0)
-	printf "find_length " f "\n" > "/dev/stderr"
-
-    if (f ~ /0x1/)
-	return 1
-    else if (f ~ /0x3/)
-	return 2
-    else if (f ~ /0x7/)
-	return 3
-    else if (f ~ /0xf/)
-	return 4
-
-    printf "unknown length " f "\n" > "/dev/stderr"
-    exit
-}
-
-function find_shift(s)
-{
-    id = index(s, "<")
-    if (id <= 0) {
-	printf "cannot find shift " s "\n" > "/dev/stderr"
-	exit
-    }
-
-    return substr(s, id+2)
-}
-
-
-BEGIN {
-    if (ARGC < 2) {
-	print "too few arguments" > "/dev/stderr"
-	exit
-    }
-
-# read the header file and find the mask values that we will need
-# to replace and create an associative array of values
-
-    while (getline line < ARGV[1] > 0) {
-	if (line ~ /\#define.*_MASK/ &&
-	    !(line ~ /USB_SIG_MASK/)) {
-	    splitdefine(line, fields)
-	    name = fields[0]
-	    if (0)
-		printf "MASK " line "\n" > "/dev/stderr"
-	    dmask[name,0] = find_length(fields[1])
-	    dmask[name,1] = find_shift(fields[1])
-	    if (0)
-		printf "=> '" name "' LENGTH=" dmask[name,0] " SHIFT=" dmask[name,1] "\n" > "/dev/stderr"
-	} else {
-	}
-    }
-
-    delete ARGV[1]
-}
-
-/clksrc_clk.*=.*{/ {
-    shift=""
-    mask=""
-    divshift=""
-    reg_div=""
-    reg_src=""
-    indent=1
-
-    print $0
-
-    for(; indent >= 1;) {
-	if ((getline line) <= 0) {
-	    printf "unexpected end of file" > "/dev/stderr"
-	    exit 1;
-	}
-
-	if (line ~ /\.shift/) {
-	    shift = extract_value(line)
-	} else if (line ~ /\.mask/) {
-	    mask = extract_value(line)
-	} else if (line ~ /\.reg_divider/) {
-	    reg_div = extract_value(line)
-	} else if (line ~ /\.reg_source/) {
-	    reg_src = extract_value(line)
-	} else if (line ~ /\.divider_shift/) {
-	    divshift = extract_value(line)
-	} else if (line ~ /{/) {
-		indent++
-		print line
-	    } else if (line ~ /}/) {
-	    indent--
-
-	    if (indent == 0) {
-		if (0) {
-		    printf "shift '" shift   "' ='" dmask[shift,0] "'\n" > "/dev/stderr"
-		    printf "mask  '" mask    "'\n" > "/dev/stderr"
-		    printf "dshft '" divshift "'\n" > "/dev/stderr"
-		    printf "rdiv  '" reg_div "'\n" > "/dev/stderr"
-		    printf "rsrc  '" reg_src "'\n" > "/dev/stderr"
-		}
-
-		generated = mask
-		sub(reg_src, reg_div, generated)
-
-		if (0) {
-		    printf "/* rsrc " reg_src " */\n"
-		    printf "/* rdiv " reg_div " */\n"
-		    printf "/* shift " shift " */\n"
-		    printf "/* mask " mask " */\n"
-		    printf "/* generated " generated " */\n"
-		}
-
-		if (reg_div != "") {
-		    printf "\t.reg_div = { "
-		    printf ".reg = " reg_div ", "
-		    printf ".shift = " dmask[generated,1] ", "
-		    printf ".size = " dmask[generated,0] ", "
-		    printf "},\n"
-		}
-
-		printf "\t.reg_src = { "
-		printf ".reg = " reg_src ", "
-		printf ".shift = " dmask[mask,1] ", "
-		printf ".size = " dmask[mask,0] ", "
-
-		printf "},\n"
-
-	    }
-
-	    print line
-	} else {
-	    print line
-	}
-
-	if (0)
-	    printf indent ":" line "\n" > "/dev/stderr"
-    }
-}
-
-// && ! /clksrc_clk.*=.*{/ { print $0 }
diff --git a/Documentation/arm/Setup b/Documentation/arm/Setup
deleted file mode 100644
index 0cb1e64bde80..000000000000
--- a/Documentation/arm/Setup
+++ /dev/null
@@ -1,129 +0,0 @@
-Kernel initialisation parameters on ARM Linux
----------------------------------------------
-
-The following document describes the kernel initialisation parameter
-structure, otherwise known as 'struct param_struct' which is used
-for most ARM Linux architectures.
-
-This structure is used to pass initialisation parameters from the
-kernel loader to the Linux kernel proper, and may be short lived
-through the kernel initialisation process.  As a general rule, it
-should not be referenced outside of arch/arm/kernel/setup.c:setup_arch().
-
-There are a lot of parameters listed in there, and they are described
-below:
-
- page_size
-
-   This parameter must be set to the page size of the machine, and
-   will be checked by the kernel.
-
- nr_pages
-
-   This is the total number of pages of memory in the system.  If
-   the memory is banked, then this should contain the total number
-   of pages in the system.
-
-   If the system contains separate VRAM, this value should not
-   include this information.
-
- ramdisk_size
-
-   This is now obsolete, and should not be used.
-
- flags
-
-   Various kernel flags, including:
-    bit 0 - 1 = mount root read only
-    bit 1 - unused
-    bit 2 - 0 = load ramdisk
-    bit 3 - 0 = prompt for ramdisk
-
- rootdev
-
-   major/minor number pair of device to mount as the root filesystem.
-
- video_num_cols
- video_num_rows
-
-   These two together describe the character size of the dummy console,
-   or VGA console character size.  They should not be used for any other
-   purpose.
-
-   It's generally a good idea to set these to be either standard VGA, or
-   the equivalent character size of your fbcon display.  This then allows
-   all the bootup messages to be displayed correctly.
-
- video_x
- video_y
-
-   This describes the character position of cursor on VGA console, and
-   is otherwise unused. (should not be used for other console types, and
-   should not be used for other purposes).
-
- memc_control_reg
-
-   MEMC chip control register for Acorn Archimedes and Acorn A5000
-   based machines.  May be used differently by different architectures.
-
- sounddefault
-
-   Default sound setting on Acorn machines.  May be used differently by
-   different architectures.
-
- adfsdrives
-
-   Number of ADFS/MFM disks.  May be used differently by different
-   architectures.
-
- bytes_per_char_h
- bytes_per_char_v
-
-   These are now obsolete, and should not be used.
-
- pages_in_bank[4]
-
-   Number of pages in each bank of the systems memory (used for RiscPC).
-   This is intended to be used on systems where the physical memory
-   is non-contiguous from the processors point of view.
-
- pages_in_vram
-
-   Number of pages in VRAM (used on Acorn RiscPC).  This value may also
-   be used by loaders if the size of the video RAM can't be obtained
-   from the hardware.
-
- initrd_start
- initrd_size
-
-   This describes the kernel virtual start address and size of the
-   initial ramdisk.
-
- rd_start
-
-   Start address in sectors of the ramdisk image on a floppy disk.
-
- system_rev
-
-   system revision number.
-
- system_serial_low
- system_serial_high
-
-   system 64-bit serial number
-
- mem_fclk_21285
-
-   The speed of the external oscillator to the 21285 (footbridge),
-   which control's the speed of the memory bus, timer & serial port.
-   Depending upon the speed of the cpu its value can be between
-   0-66 MHz. If no params are passed or a value of zero is passed,
-   then a value of 50 Mhz is the default on 21285 architectures.
-
- paths[8][128]
-
-   These are now obsolete, and should not be used.
-
- commandline
-
-   Kernel command line parameters.  Details can be found elsewhere.
diff --git a/Documentation/arm/VFP/release-notes.txt b/Documentation/arm/VFP/release-notes.txt
deleted file mode 100644
index 28a2795705ca..000000000000
--- a/Documentation/arm/VFP/release-notes.txt
+++ /dev/null
@@ -1,55 +0,0 @@
-Release notes for Linux Kernel VFP support code
------------------------------------------------
-
-Date: 	20 May 2004
-Author:	Russell King
-
-This is the first release of the Linux Kernel VFP support code.  It
-provides support for the exceptions bounced from VFP hardware found
-on ARM926EJ-S.
-
-This release has been validated against the SoftFloat-2b library by
-John R. Hauser using the TestFloat-2a test suite.  Details of this
-library and test suite can be found at:
-
-   http://www.jhauser.us/arithmetic/SoftFloat.html
-
-The operations which have been tested with this package are:
-
- - fdiv
- - fsub
- - fadd
- - fmul
- - fcmp
- - fcmpe
- - fcvtd
- - fcvts
- - fsito
- - ftosi
- - fsqrt
-
-All the above pass softfloat tests with the following exceptions:
-
-- fadd/fsub shows some differences in the handling of +0 / -0 results
-  when input operands differ in signs.
-- the handling of underflow exceptions is slightly different.  If a
-  result underflows before rounding, but becomes a normalised number
-  after rounding, we do not signal an underflow exception.
-
-Other operations which have been tested by basic assembly-only tests
-are:
-
- - fcpy
- - fabs
- - fneg
- - ftoui
- - ftosiz
- - ftouiz
-
-The combination operations have not been tested:
-
- - fmac
- - fnmac
- - fmsc
- - fnmsc
- - fnmul
diff --git a/Documentation/arm/arm.rst b/Documentation/arm/arm.rst
new file mode 100644
index 000000000000..2edc509df92a
--- /dev/null
+++ b/Documentation/arm/arm.rst
@@ -0,0 +1,214 @@
+=======================
+ARM Linux 2.6 and upper
+=======================
+
+    Please check <ftp://ftp.arm.linux.org.uk/pub/armlinux> for
+    updates.
+
+Compilation of kernel
+---------------------
+
+  In order to compile ARM Linux, you will need a compiler capable of
+  generating ARM ELF code with GNU extensions.  GCC 3.3 is known to be
+  a good compiler.  Fortunately, you needn't guess.  The kernel will report
+  an error if your compiler is a recognized offender.
+
+  To build ARM Linux natively, you shouldn't have to alter the ARCH = line
+  in the top level Makefile.  However, if you don't have the ARM Linux ELF
+  tools installed as default, then you should change the CROSS_COMPILE
+  line as detailed below.
+
+  If you wish to cross-compile, then alter the following lines in the top
+  level make file::
+
+    ARCH = <whatever>
+
+  with::
+
+    ARCH = arm
+
+  and::
+
+    CROSS_COMPILE=
+
+  to::
+
+    CROSS_COMPILE=<your-path-to-your-compiler-without-gcc>
+
+  eg.::
+
+    CROSS_COMPILE=arm-linux-
+
+  Do a 'make config', followed by 'make Image' to build the kernel
+  (arch/arm/boot/Image).  A compressed image can be built by doing a
+  'make zImage' instead of 'make Image'.
+
+
+Bug reports etc
+---------------
+
+  Please send patches to the patch system.  For more information, see
+  http://www.arm.linux.org.uk/developer/patches/info.php Always include some
+  explanation as to what the patch does and why it is needed.
+
+  Bug reports should be sent to linux-arm-kernel@lists.arm.linux.org.uk,
+  or submitted through the web form at
+  http://www.arm.linux.org.uk/developer/
+
+  When sending bug reports, please ensure that they contain all relevant
+  information, eg. the kernel messages that were printed before/during
+  the problem, what you were doing, etc.
+
+
+Include files
+-------------
+
+  Several new include directories have been created under include/asm-arm,
+  which are there to reduce the clutter in the top-level directory.  These
+  directories, and their purpose is listed below:
+
+  ============= ==========================================================
+   `arch-*`	machine/platform specific header files
+   `hardware`	driver-internal ARM specific data structures/definitions
+   `mach`	descriptions of generic ARM to specific machine interfaces
+   `proc-*`	processor dependent header files (currently only two
+		categories)
+  ============= ==========================================================
+
+
+Machine/Platform support
+------------------------
+
+  The ARM tree contains support for a lot of different machine types.  To
+  continue supporting these differences, it has become necessary to split
+  machine-specific parts by directory.  For this, the machine category is
+  used to select which directories and files get included (we will use
+  $(MACHINE) to refer to the category)
+
+  To this end, we now have arch/arm/mach-$(MACHINE) directories which are
+  designed to house the non-driver files for a particular machine (eg, PCI,
+  memory management, architecture definitions etc).  For all future
+  machines, there should be a corresponding arch/arm/mach-$(MACHINE)/include/mach
+  directory.
+
+
+Modules
+-------
+
+  Although modularisation is supported (and required for the FP emulator),
+  each module on an ARM2/ARM250/ARM3 machine when is loaded will take
+  memory up to the next 32k boundary due to the size of the pages.
+  Therefore, is modularisation on these machines really worth it?
+
+  However, ARM6 and up machines allow modules to take multiples of 4k, and
+  as such Acorn RiscPCs and other architectures using these processors can
+  make good use of modularisation.
+
+
+ADFS Image files
+----------------
+
+  You can access image files on your ADFS partitions by mounting the ADFS
+  partition, and then using the loopback device driver.  You must have
+  losetup installed.
+
+  Please note that the PCEmulator DOS partitions have a partition table at
+  the start, and as such, you will have to give '-o offset' to losetup.
+
+
+Request to developers
+---------------------
+
+  When writing device drivers which include a separate assembler file, please
+  include it in with the C file, and not the arch/arm/lib directory.  This
+  allows the driver to be compiled as a loadable module without requiring
+  half the code to be compiled into the kernel image.
+
+  In general, try to avoid using assembler unless it is really necessary.  It
+  makes drivers far less easy to port to other hardware.
+
+
+ST506 hard drives
+-----------------
+
+  The ST506 hard drive controllers seem to be working fine (if a little
+  slowly).  At the moment they will only work off the controllers on an
+  A4x0's motherboard, but for it to work off a Podule just requires
+  someone with a podule to add the addresses for the IRQ mask and the
+  HDC base to the source.
+
+  As of 31/3/96 it works with two drives (you should get the ADFS
+  `*configure` harddrive set to 2). I've got an internal 20MB and a great
+  big external 5.25" FH 64MB drive (who could ever want more :-) ).
+
+  I've just got 240K/s off it (a dd with bs=128k); thats about half of what
+  RiscOS gets; but it's a heck of a lot better than the 50K/s I was getting
+  last week :-)
+
+  Known bug: Drive data errors can cause a hang; including cases where
+  the controller has fixed the error using ECC. (Possibly ONLY
+  in that case...hmm).
+
+
+1772 Floppy
+-----------
+  This also seems to work OK, but hasn't been stressed much lately.  It
+  hasn't got any code for disc change detection in there at the moment which
+  could be a bit of a problem!  Suggestions on the correct way to do this
+  are welcome.
+
+
+`CONFIG_MACH_` and `CONFIG_ARCH_`
+---------------------------------
+  A change was made in 2003 to the macro names for new machines.
+  Historically, `CONFIG_ARCH_` was used for the bonafide architecture,
+  e.g. SA1100, as well as implementations of the architecture,
+  e.g. Assabet.  It was decided to change the implementation macros
+  to read `CONFIG_MACH_` for clarity.  Moreover, a retroactive fixup has
+  not been made because it would complicate patching.
+
+  Previous registrations may be found online.
+
+    <http://www.arm.linux.org.uk/developer/machines/>
+
+Kernel entry (head.S)
+---------------------
+  The initial entry into the kernel is via head.S, which uses machine
+  independent code.  The machine is selected by the value of 'r1' on
+  entry, which must be kept unique.
+
+  Due to the large number of machines which the ARM port of Linux provides
+  for, we have a method to manage this which ensures that we don't end up
+  duplicating large amounts of code.
+
+  We group machine (or platform) support code into machine classes.  A
+  class typically based around one or more system on a chip devices, and
+  acts as a natural container around the actual implementations.  These
+  classes are given directories - arch/arm/mach-<class> and
+  arch/arm/mach-<class> - which contain the source files to/include/mach
+  support the machine class.  This directories also contain any machine
+  specific supporting code.
+
+  For example, the SA1100 class is based upon the SA1100 and SA1110 SoC
+  devices, and contains the code to support the way the on-board and off-
+  board devices are used, or the device is setup, and provides that
+  machine specific "personality."
+
+  For platforms that support device tree (DT), the machine selection is
+  controlled at runtime by passing the device tree blob to the kernel.  At
+  compile-time, support for the machine type must be selected.  This allows for
+  a single multiplatform kernel build to be used for several machine types.
+
+  For platforms that do not use device tree, this machine selection is
+  controlled by the machine type ID, which acts both as a run-time and a
+  compile-time code selection method.  You can register a new machine via the
+  web site at:
+
+    <http://www.arm.linux.org.uk/developer/machines/>
+
+  Note: Please do not register a machine type for DT-only platforms.  If your
+  platform is DT-only, you do not need a registered machine type.
+
+---
+
+Russell King (15/03/2004)
diff --git a/Documentation/arm/booting.rst b/Documentation/arm/booting.rst
new file mode 100644
index 000000000000..4babb6c6ae1e
--- /dev/null
+++ b/Documentation/arm/booting.rst
@@ -0,0 +1,237 @@
+=================
+Booting ARM Linux
+=================
+
+Author:	Russell King
+
+Date  : 18 May 2002
+
+The following documentation is relevant to 2.4.18-rmk6 and beyond.
+
+In order to boot ARM Linux, you require a boot loader, which is a small
+program that runs before the main kernel.  The boot loader is expected
+to initialise various devices, and eventually call the Linux kernel,
+passing information to the kernel.
+
+Essentially, the boot loader should provide (as a minimum) the
+following:
+
+1. Setup and initialise the RAM.
+2. Initialise one serial port.
+3. Detect the machine type.
+4. Setup the kernel tagged list.
+5. Load initramfs.
+6. Call the kernel image.
+
+
+1. Setup and initialise RAM
+---------------------------
+
+Existing boot loaders:
+	MANDATORY
+New boot loaders:
+	MANDATORY
+
+The boot loader is expected to find and initialise all RAM that the
+kernel will use for volatile data storage in the system.  It performs
+this in a machine dependent manner.  (It may use internal algorithms
+to automatically locate and size all RAM, or it may use knowledge of
+the RAM in the machine, or any other method the boot loader designer
+sees fit.)
+
+
+2. Initialise one serial port
+-----------------------------
+
+Existing boot loaders:
+	OPTIONAL, RECOMMENDED
+New boot loaders:
+	OPTIONAL, RECOMMENDED
+
+The boot loader should initialise and enable one serial port on the
+target.  This allows the kernel serial driver to automatically detect
+which serial port it should use for the kernel console (generally
+used for debugging purposes, or communication with the target.)
+
+As an alternative, the boot loader can pass the relevant 'console='
+option to the kernel via the tagged lists specifying the port, and
+serial format options as described in
+
+       Documentation/admin-guide/kernel-parameters.rst.
+
+
+3. Detect the machine type
+--------------------------
+
+Existing boot loaders:
+	OPTIONAL
+New boot loaders:
+	MANDATORY except for DT-only platforms
+
+The boot loader should detect the machine type its running on by some
+method.  Whether this is a hard coded value or some algorithm that
+looks at the connected hardware is beyond the scope of this document.
+The boot loader must ultimately be able to provide a MACH_TYPE_xxx
+value to the kernel. (see linux/arch/arm/tools/mach-types).  This
+should be passed to the kernel in register r1.
+
+For DT-only platforms, the machine type will be determined by device
+tree.  set the machine type to all ones (~0).  This is not strictly
+necessary, but assures that it will not match any existing types.
+
+4. Setup boot data
+------------------
+
+Existing boot loaders:
+	OPTIONAL, HIGHLY RECOMMENDED
+New boot loaders:
+	MANDATORY
+
+The boot loader must provide either a tagged list or a dtb image for
+passing configuration data to the kernel.  The physical address of the
+boot data is passed to the kernel in register r2.
+
+4a. Setup the kernel tagged list
+--------------------------------
+
+The boot loader must create and initialise the kernel tagged list.
+A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE.
+The ATAG_CORE tag may or may not be empty.  An empty ATAG_CORE tag
+has the size field set to '2' (0x00000002).  The ATAG_NONE must set
+the size field to zero.
+
+Any number of tags can be placed in the list.  It is undefined
+whether a repeated tag appends to the information carried by the
+previous tag, or whether it replaces the information in its
+entirety; some tags behave as the former, others the latter.
+
+The boot loader must pass at a minimum the size and location of
+the system memory, and root filesystem location.  Therefore, the
+minimum tagged list should look::
+
+		+-----------+
+  base ->	| ATAG_CORE |  |
+		+-----------+  |
+		| ATAG_MEM  |  | increasing address
+		+-----------+  |
+		| ATAG_NONE |  |
+		+-----------+  v
+
+The tagged list should be stored in system RAM.
+
+The tagged list must be placed in a region of memory where neither
+the kernel decompressor nor initrd 'bootp' program will overwrite
+it.  The recommended placement is in the first 16KiB of RAM.
+
+4b. Setup the device tree
+-------------------------
+
+The boot loader must load a device tree image (dtb) into system ram
+at a 64bit aligned address and initialize it with the boot data.  The
+dtb format is documented in Documentation/devicetree/booting-without-of.txt.
+The kernel will look for the dtb magic value of 0xd00dfeed at the dtb
+physical address to determine if a dtb has been passed instead of a
+tagged list.
+
+The boot loader must pass at a minimum the size and location of the
+system memory, and the root filesystem location.  The dtb must be
+placed in a region of memory where the kernel decompressor will not
+overwrite it, while remaining within the region which will be covered
+by the kernel's low-memory mapping.
+
+A safe location is just above the 128MiB boundary from start of RAM.
+
+5. Load initramfs.
+------------------
+
+Existing boot loaders:
+	OPTIONAL
+New boot loaders:
+	OPTIONAL
+
+If an initramfs is in use then, as with the dtb, it must be placed in
+a region of memory where the kernel decompressor will not overwrite it
+while also with the region which will be covered by the kernel's
+low-memory mapping.
+
+A safe location is just above the device tree blob which itself will
+be loaded just above the 128MiB boundary from the start of RAM as
+recommended above.
+
+6. Calling the kernel image
+---------------------------
+
+Existing boot loaders:
+	MANDATORY
+New boot loaders:
+	MANDATORY
+
+There are two options for calling the kernel zImage.  If the zImage
+is stored in flash, and is linked correctly to be run from flash,
+then it is legal for the boot loader to call the zImage in flash
+directly.
+
+The zImage may also be placed in system RAM and called there.  The
+kernel should be placed in the first 128MiB of RAM.  It is recommended
+that it is loaded above 32MiB in order to avoid the need to relocate
+prior to decompression, which will make the boot process slightly
+faster.
+
+When booting a raw (non-zImage) kernel the constraints are tighter.
+In this case the kernel must be loaded at an offset into system equal
+to TEXT_OFFSET - PAGE_OFFSET.
+
+In any case, the following conditions must be met:
+
+- Quiesce all DMA capable devices so that memory does not get
+  corrupted by bogus network packets or disk data. This will save
+  you many hours of debug.
+
+- CPU register settings
+
+  - r0 = 0,
+  - r1 = machine type number discovered in (3) above.
+  - r2 = physical address of tagged list in system RAM, or
+    physical address of device tree block (dtb) in system RAM
+
+- CPU mode
+
+  All forms of interrupts must be disabled (IRQs and FIQs)
+
+  For CPUs which do not include the ARM virtualization extensions, the
+  CPU must be in SVC mode.  (A special exception exists for Angel)
+
+  CPUs which include support for the virtualization extensions can be
+  entered in HYP mode in order to enable the kernel to make full use of
+  these extensions.  This is the recommended boot method for such CPUs,
+  unless the virtualisations are already in use by a pre-installed
+  hypervisor.
+
+  If the kernel is not entered in HYP mode for any reason, it must be
+  entered in SVC mode.
+
+- Caches, MMUs
+
+  The MMU must be off.
+
+  Instruction cache may be on or off.
+
+  Data cache must be off.
+
+  If the kernel is entered in HYP mode, the above requirements apply to
+  the HYP mode configuration in addition to the ordinary PL1 (privileged
+  kernel modes) configuration.  In addition, all traps into the
+  hypervisor must be disabled, and PL1 access must be granted for all
+  peripherals and CPU resources for which this is architecturally
+  possible.  Except for entering in HYP mode, the system configuration
+  should be such that a kernel which does not include support for the
+  virtualization extensions can boot correctly without extra help.
+
+- The boot loader is expected to call the kernel image by jumping
+  directly to the first instruction of the kernel image.
+
+  On CPUs supporting the ARM instruction set, the entry must be
+  made in ARM state, even for a Thumb-2 kernel.
+
+  On CPUs supporting only the Thumb instruction set such as
+  Cortex-M class CPUs, the entry must be made in Thumb state.
diff --git a/Documentation/arm/cluster-pm-race-avoidance.rst b/Documentation/arm/cluster-pm-race-avoidance.rst
new file mode 100644
index 000000000000..aa58603d3f28
--- /dev/null
+++ b/Documentation/arm/cluster-pm-race-avoidance.rst
@@ -0,0 +1,533 @@
+=========================================================
+Cluster-wide Power-up/power-down race avoidance algorithm
+=========================================================
+
+This file documents the algorithm which is used to coordinate CPU and
+cluster setup and teardown operations and to manage hardware coherency
+controls safely.
+
+The section "Rationale" explains what the algorithm is for and why it is
+needed.  "Basic model" explains general concepts using a simplified view
+of the system.  The other sections explain the actual details of the
+algorithm in use.
+
+
+Rationale
+---------
+
+In a system containing multiple CPUs, it is desirable to have the
+ability to turn off individual CPUs when the system is idle, reducing
+power consumption and thermal dissipation.
+
+In a system containing multiple clusters of CPUs, it is also desirable
+to have the ability to turn off entire clusters.
+
+Turning entire clusters off and on is a risky business, because it
+involves performing potentially destructive operations affecting a group
+of independently running CPUs, while the OS continues to run.  This
+means that we need some coordination in order to ensure that critical
+cluster-level operations are only performed when it is truly safe to do
+so.
+
+Simple locking may not be sufficient to solve this problem, because
+mechanisms like Linux spinlocks may rely on coherency mechanisms which
+are not immediately enabled when a cluster powers up.  Since enabling or
+disabling those mechanisms may itself be a non-atomic operation (such as
+writing some hardware registers and invalidating large caches), other
+methods of coordination are required in order to guarantee safe
+power-down and power-up at the cluster level.
+
+The mechanism presented in this document describes a coherent memory
+based protocol for performing the needed coordination.  It aims to be as
+lightweight as possible, while providing the required safety properties.
+
+
+Basic model
+-----------
+
+Each cluster and CPU is assigned a state, as follows:
+
+	- DOWN
+	- COMING_UP
+	- UP
+	- GOING_DOWN
+
+::
+
+	    +---------> UP ----------+
+	    |                        v
+
+	COMING_UP                GOING_DOWN
+
+	    ^                        |
+	    +--------- DOWN <--------+
+
+
+DOWN:
+	The CPU or cluster is not coherent, and is either powered off or
+	suspended, or is ready to be powered off or suspended.
+
+COMING_UP:
+	The CPU or cluster has committed to moving to the UP state.
+	It may be part way through the process of initialisation and
+	enabling coherency.
+
+UP:
+	The CPU or cluster is active and coherent at the hardware
+	level.  A CPU in this state is not necessarily being used
+	actively by the kernel.
+
+GOING_DOWN:
+	The CPU or cluster has committed to moving to the DOWN
+	state.  It may be part way through the process of teardown and
+	coherency exit.
+
+
+Each CPU has one of these states assigned to it at any point in time.
+The CPU states are described in the "CPU state" section, below.
+
+Each cluster is also assigned a state, but it is necessary to split the
+state value into two parts (the "cluster" state and "inbound" state) and
+to introduce additional states in order to avoid races between different
+CPUs in the cluster simultaneously modifying the state.  The cluster-
+level states are described in the "Cluster state" section.
+
+To help distinguish the CPU states from cluster states in this
+discussion, the state names are given a `CPU_` prefix for the CPU states,
+and a `CLUSTER_` or `INBOUND_` prefix for the cluster states.
+
+
+CPU state
+---------
+
+In this algorithm, each individual core in a multi-core processor is
+referred to as a "CPU".  CPUs are assumed to be single-threaded:
+therefore, a CPU can only be doing one thing at a single point in time.
+
+This means that CPUs fit the basic model closely.
+
+The algorithm defines the following states for each CPU in the system:
+
+	- CPU_DOWN
+	- CPU_COMING_UP
+	- CPU_UP
+	- CPU_GOING_DOWN
+
+::
+
+	 cluster setup and
+	CPU setup complete          policy decision
+	      +-----------> CPU_UP ------------+
+	      |                                v
+
+	CPU_COMING_UP                   CPU_GOING_DOWN
+
+	      ^                                |
+	      +----------- CPU_DOWN <----------+
+	 policy decision           CPU teardown complete
+	or hardware event
+
+
+The definitions of the four states correspond closely to the states of
+the basic model.
+
+Transitions between states occur as follows.
+
+A trigger event (spontaneous) means that the CPU can transition to the
+next state as a result of making local progress only, with no
+requirement for any external event to happen.
+
+
+CPU_DOWN:
+	A CPU reaches the CPU_DOWN state when it is ready for
+	power-down.  On reaching this state, the CPU will typically
+	power itself down or suspend itself, via a WFI instruction or a
+	firmware call.
+
+	Next state:
+		CPU_COMING_UP
+	Conditions:
+		none
+
+	Trigger events:
+		a) an explicit hardware power-up operation, resulting
+		   from a policy decision on another CPU;
+
+		b) a hardware event, such as an interrupt.
+
+
+CPU_COMING_UP:
+	A CPU cannot start participating in hardware coherency until the
+	cluster is set up and coherent.  If the cluster is not ready,
+	then the CPU will wait in the CPU_COMING_UP state until the
+	cluster has been set up.
+
+	Next state:
+		CPU_UP
+	Conditions:
+		The CPU's parent cluster must be in CLUSTER_UP.
+	Trigger events:
+		Transition of the parent cluster to CLUSTER_UP.
+
+	Refer to the "Cluster state" section for a description of the
+	CLUSTER_UP state.
+
+
+CPU_UP:
+	When a CPU reaches the CPU_UP state, it is safe for the CPU to
+	start participating in local coherency.
+
+	This is done by jumping to the kernel's CPU resume code.
+
+	Note that the definition of this state is slightly different
+	from the basic model definition: CPU_UP does not mean that the
+	CPU is coherent yet, but it does mean that it is safe to resume
+	the kernel.  The kernel handles the rest of the resume
+	procedure, so the remaining steps are not visible as part of the
+	race avoidance algorithm.
+
+	The CPU remains in this state until an explicit policy decision
+	is made to shut down or suspend the CPU.
+
+	Next state:
+		CPU_GOING_DOWN
+	Conditions:
+		none
+	Trigger events:
+		explicit policy decision
+
+
+CPU_GOING_DOWN:
+	While in this state, the CPU exits coherency, including any
+	operations required to achieve this (such as cleaning data
+	caches).
+
+	Next state:
+		CPU_DOWN
+	Conditions:
+		local CPU teardown complete
+	Trigger events:
+		(spontaneous)
+
+
+Cluster state
+-------------
+
+A cluster is a group of connected CPUs with some common resources.
+Because a cluster contains multiple CPUs, it can be doing multiple
+things at the same time.  This has some implications.  In particular, a
+CPU can start up while another CPU is tearing the cluster down.
+
+In this discussion, the "outbound side" is the view of the cluster state
+as seen by a CPU tearing the cluster down.  The "inbound side" is the
+view of the cluster state as seen by a CPU setting the CPU up.
+
+In order to enable safe coordination in such situations, it is important
+that a CPU which is setting up the cluster can advertise its state
+independently of the CPU which is tearing down the cluster.  For this
+reason, the cluster state is split into two parts:
+
+	"cluster" state: The global state of the cluster; or the state
+	on the outbound side:
+
+		- CLUSTER_DOWN
+		- CLUSTER_UP
+		- CLUSTER_GOING_DOWN
+
+	"inbound" state: The state of the cluster on the inbound side.
+
+		- INBOUND_NOT_COMING_UP
+		- INBOUND_COMING_UP
+
+
+	The different pairings of these states results in six possible
+	states for the cluster as a whole::
+
+	                            CLUSTER_UP
+	          +==========> INBOUND_NOT_COMING_UP -------------+
+	          #                                               |
+	                                                          |
+	     CLUSTER_UP     <----+                                |
+	  INBOUND_COMING_UP      |                                v
+
+	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
+	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
+
+	    CLUSTER_DOWN         |                                |
+	  INBOUND_COMING_UP <----+                                |
+	                                                          |
+	          ^                                               |
+	          +===========     CLUSTER_DOWN      <------------+
+	                       INBOUND_NOT_COMING_UP
+
+	Transitions -----> can only be made by the outbound CPU, and
+	only involve changes to the "cluster" state.
+
+	Transitions ===##> can only be made by the inbound CPU, and only
+	involve changes to the "inbound" state, except where there is no
+	further transition possible on the outbound side (i.e., the
+	outbound CPU has put the cluster into the CLUSTER_DOWN state).
+
+	The race avoidance algorithm does not provide a way to determine
+	which exact CPUs within the cluster play these roles.  This must
+	be decided in advance by some other means.  Refer to the section
+	"Last man and first man selection" for more explanation.
+
+
+	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
+	cluster can actually be powered down.
+
+	The parallelism of the inbound and outbound CPUs is observed by
+	the existence of two different paths from CLUSTER_GOING_DOWN/
+	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
+	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
+	COMING_UP in the basic model).  The second path avoids cluster
+	teardown completely.
+
+	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
+	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
+	is trivial and merely resets the state machine ready for the
+	next cycle.
+
+	Details of the allowable transitions follow.
+
+	The next state in each case is notated
+
+		<cluster state>/<inbound state> (<transitioner>)
+
+	where the <transitioner> is the side on which the transition
+	can occur; either the inbound or the outbound side.
+
+
+CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
+	Next state:
+		CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
+	Conditions:
+		none
+
+	Trigger events:
+		a) an explicit hardware power-up operation, resulting
+		   from a policy decision on another CPU;
+
+		b) a hardware event, such as an interrupt.
+
+
+CLUSTER_DOWN/INBOUND_COMING_UP:
+
+	In this state, an inbound CPU sets up the cluster, including
+	enabling of hardware coherency at the cluster level and any
+	other operations (such as cache invalidation) which are required
+	in order to achieve this.
+
+	The purpose of this state is to do sufficient cluster-level
+	setup to enable other CPUs in the cluster to enter coherency
+	safely.
+
+	Next state:
+		CLUSTER_UP/INBOUND_COMING_UP (inbound)
+	Conditions:
+		cluster-level setup and hardware coherency complete
+	Trigger events:
+		(spontaneous)
+
+
+CLUSTER_UP/INBOUND_COMING_UP:
+
+	Cluster-level setup is complete and hardware coherency is
+	enabled for the cluster.  Other CPUs in the cluster can safely
+	enter coherency.
+
+	This is a transient state, leading immediately to
+	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
+	should consider treat these two states as equivalent.
+
+	Next state:
+		CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
+	Conditions:
+		none
+	Trigger events:
+		(spontaneous)
+
+
+CLUSTER_UP/INBOUND_NOT_COMING_UP:
+
+	Cluster-level setup is complete and hardware coherency is
+	enabled for the cluster.  Other CPUs in the cluster can safely
+	enter coherency.
+
+	The cluster will remain in this state until a policy decision is
+	made to power the cluster down.
+
+	Next state:
+		CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
+	Conditions:
+		none
+	Trigger events:
+		policy decision to power down the cluster
+
+
+CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
+
+	An outbound CPU is tearing the cluster down.  The selected CPU
+	must wait in this state until all CPUs in the cluster are in the
+	CPU_DOWN state.
+
+	When all CPUs are in the CPU_DOWN state, the cluster can be torn
+	down, for example by cleaning data caches and exiting
+	cluster-level coherency.
+
+	To avoid wasteful unnecessary teardown operations, the outbound
+	should check the inbound cluster state for asynchronous
+	transitions to INBOUND_COMING_UP.  Alternatively, individual
+	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
+
+
+	Next states:
+
+	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
+		Conditions:
+			cluster torn down and ready to power off
+		Trigger events:
+			(spontaneous)
+
+	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
+		Conditions:
+			none
+
+		Trigger events:
+			a) an explicit hardware power-up operation,
+			   resulting from a policy decision on another
+			   CPU;
+
+			b) a hardware event, such as an interrupt.
+
+
+CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
+
+	The cluster is (or was) being torn down, but another CPU has
+	come online in the meantime and is trying to set up the cluster
+	again.
+
+	If the outbound CPU observes this state, it has two choices:
+
+		a) back out of teardown, restoring the cluster to the
+		   CLUSTER_UP state;
+
+		b) finish tearing the cluster down and put the cluster
+		   in the CLUSTER_DOWN state; the inbound CPU will
+		   set up the cluster again from there.
+
+	Choice (a) permits the removal of some latency by avoiding
+	unnecessary teardown and setup operations in situations where
+	the cluster is not really going to be powered down.
+
+
+	Next states:
+
+	CLUSTER_UP/INBOUND_COMING_UP (outbound)
+		Conditions:
+				cluster-level setup and hardware
+				coherency complete
+
+		Trigger events:
+				(spontaneous)
+
+	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
+		Conditions:
+			cluster torn down and ready to power off
+
+		Trigger events:
+			(spontaneous)
+
+
+Last man and First man selection
+--------------------------------
+
+The CPU which performs cluster tear-down operations on the outbound side
+is commonly referred to as the "last man".
+
+The CPU which performs cluster setup on the inbound side is commonly
+referred to as the "first man".
+
+The race avoidance algorithm documented above does not provide a
+mechanism to choose which CPUs should play these roles.
+
+
+Last man:
+
+When shutting down the cluster, all the CPUs involved are initially
+executing Linux and hence coherent.  Therefore, ordinary spinlocks can
+be used to select a last man safely, before the CPUs become
+non-coherent.
+
+
+First man:
+
+Because CPUs may power up asynchronously in response to external wake-up
+events, a dynamic mechanism is needed to make sure that only one CPU
+attempts to play the first man role and do the cluster-level
+initialisation: any other CPUs must wait for this to complete before
+proceeding.
+
+Cluster-level initialisation may involve actions such as configuring
+coherency controls in the bus fabric.
+
+The current implementation in mcpm_head.S uses a separate mutual exclusion
+mechanism to do this arbitration.  This mechanism is documented in
+detail in vlocks.txt.
+
+
+Features and Limitations
+------------------------
+
+Implementation:
+
+	The current ARM-based implementation is split between
+	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
+	arch/arm/common/mcpm_entry.c (everything else):
+
+	__mcpm_cpu_going_down() signals the transition of a CPU to the
+	CPU_GOING_DOWN state.
+
+	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
+	state.
+
+	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
+	low-level power-up code in mcpm_head.S.  This could
+	involve CPU-specific setup code, but in the current
+	implementation it does not.
+
+	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
+	handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
+	and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
+	the case of an aborted cluster power-down).
+
+	These functions are more complex than the __mcpm_cpu_*()
+	functions due to the extra inter-CPU coordination which
+	is needed for safe transitions at the cluster level.
+
+	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
+	the low-level power-up code in mcpm_head.S.  This
+	typically involves platform-specific setup code,
+	provided by the platform-specific power_up_setup
+	function registered via mcpm_sync_init.
+
+Deep topologies:
+
+	As currently described and implemented, the algorithm does not
+	support CPU topologies involving more than two levels (i.e.,
+	clusters of clusters are not supported).  The algorithm could be
+	extended by replicating the cluster-level states for the
+	additional topological levels, and modifying the transition
+	rules for the intermediate (non-outermost) cluster levels.
+
+
+Colophon
+--------
+
+Originally created and documented by Dave Martin for Linaro Limited, in
+collaboration with Nicolas Pitre and Achin Gupta.
+
+Copyright (C) 2012-2013  Linaro Limited
+Distributed under the terms of Version 2 of the GNU General Public
+License, as defined in linux/COPYING.
diff --git a/Documentation/arm/cluster-pm-race-avoidance.txt b/Documentation/arm/cluster-pm-race-avoidance.txt
deleted file mode 100644
index 750b6fc24af9..000000000000
--- a/Documentation/arm/cluster-pm-race-avoidance.txt
+++ /dev/null
@@ -1,498 +0,0 @@
-Cluster-wide Power-up/power-down race avoidance algorithm
-=========================================================
-
-This file documents the algorithm which is used to coordinate CPU and
-cluster setup and teardown operations and to manage hardware coherency
-controls safely.
-
-The section "Rationale" explains what the algorithm is for and why it is
-needed.  "Basic model" explains general concepts using a simplified view
-of the system.  The other sections explain the actual details of the
-algorithm in use.
-
-
-Rationale
----------
-
-In a system containing multiple CPUs, it is desirable to have the
-ability to turn off individual CPUs when the system is idle, reducing
-power consumption and thermal dissipation.
-
-In a system containing multiple clusters of CPUs, it is also desirable
-to have the ability to turn off entire clusters.
-
-Turning entire clusters off and on is a risky business, because it
-involves performing potentially destructive operations affecting a group
-of independently running CPUs, while the OS continues to run.  This
-means that we need some coordination in order to ensure that critical
-cluster-level operations are only performed when it is truly safe to do
-so.
-
-Simple locking may not be sufficient to solve this problem, because
-mechanisms like Linux spinlocks may rely on coherency mechanisms which
-are not immediately enabled when a cluster powers up.  Since enabling or
-disabling those mechanisms may itself be a non-atomic operation (such as
-writing some hardware registers and invalidating large caches), other
-methods of coordination are required in order to guarantee safe
-power-down and power-up at the cluster level.
-
-The mechanism presented in this document describes a coherent memory
-based protocol for performing the needed coordination.  It aims to be as
-lightweight as possible, while providing the required safety properties.
-
-
-Basic model
------------
-
-Each cluster and CPU is assigned a state, as follows:
-
-	DOWN
-	COMING_UP
-	UP
-	GOING_DOWN
-
-	    +---------> UP ----------+
-	    |                        v
-
-	COMING_UP                GOING_DOWN
-
-	    ^                        |
-	    +--------- DOWN <--------+
-
-
-DOWN:	The CPU or cluster is not coherent, and is either powered off or
-	suspended, or is ready to be powered off or suspended.
-
-COMING_UP: The CPU or cluster has committed to moving to the UP state.
-	It may be part way through the process of initialisation and
-	enabling coherency.
-
-UP:	The CPU or cluster is active and coherent at the hardware
-	level.  A CPU in this state is not necessarily being used
-	actively by the kernel.
-
-GOING_DOWN: The CPU or cluster has committed to moving to the DOWN
-	state.  It may be part way through the process of teardown and
-	coherency exit.
-
-
-Each CPU has one of these states assigned to it at any point in time.
-The CPU states are described in the "CPU state" section, below.
-
-Each cluster is also assigned a state, but it is necessary to split the
-state value into two parts (the "cluster" state and "inbound" state) and
-to introduce additional states in order to avoid races between different
-CPUs in the cluster simultaneously modifying the state.  The cluster-
-level states are described in the "Cluster state" section.
-
-To help distinguish the CPU states from cluster states in this
-discussion, the state names are given a CPU_ prefix for the CPU states,
-and a CLUSTER_ or INBOUND_ prefix for the cluster states.
-
-
-CPU state
----------
-
-In this algorithm, each individual core in a multi-core processor is
-referred to as a "CPU".  CPUs are assumed to be single-threaded:
-therefore, a CPU can only be doing one thing at a single point in time.
-
-This means that CPUs fit the basic model closely.
-
-The algorithm defines the following states for each CPU in the system:
-
-	CPU_DOWN
-	CPU_COMING_UP
-	CPU_UP
-	CPU_GOING_DOWN
-
-	 cluster setup and
-	CPU setup complete          policy decision
-	      +-----------> CPU_UP ------------+
-	      |                                v
-
-	CPU_COMING_UP                   CPU_GOING_DOWN
-
-	      ^                                |
-	      +----------- CPU_DOWN <----------+
-	 policy decision           CPU teardown complete
-	or hardware event
-
-
-The definitions of the four states correspond closely to the states of
-the basic model.
-
-Transitions between states occur as follows.
-
-A trigger event (spontaneous) means that the CPU can transition to the
-next state as a result of making local progress only, with no
-requirement for any external event to happen.
-
-
-CPU_DOWN:
-
-	A CPU reaches the CPU_DOWN state when it is ready for
-	power-down.  On reaching this state, the CPU will typically
-	power itself down or suspend itself, via a WFI instruction or a
-	firmware call.
-
-	Next state:	CPU_COMING_UP
-	Conditions:	none
-
-	Trigger events:
-
-		a) an explicit hardware power-up operation, resulting
-		   from a policy decision on another CPU;
-
-		b) a hardware event, such as an interrupt.
-
-
-CPU_COMING_UP:
-
-	A CPU cannot start participating in hardware coherency until the
-	cluster is set up and coherent.  If the cluster is not ready,
-	then the CPU will wait in the CPU_COMING_UP state until the
-	cluster has been set up.
-
-	Next state:	CPU_UP
-	Conditions:	The CPU's parent cluster must be in CLUSTER_UP.
-	Trigger events:	Transition of the parent cluster to CLUSTER_UP.
-
-	Refer to the "Cluster state" section for a description of the
-	CLUSTER_UP state.
-
-
-CPU_UP:
-	When a CPU reaches the CPU_UP state, it is safe for the CPU to
-	start participating in local coherency.
-
-	This is done by jumping to the kernel's CPU resume code.
-
-	Note that the definition of this state is slightly different
-	from the basic model definition: CPU_UP does not mean that the
-	CPU is coherent yet, but it does mean that it is safe to resume
-	the kernel.  The kernel handles the rest of the resume
-	procedure, so the remaining steps are not visible as part of the
-	race avoidance algorithm.
-
-	The CPU remains in this state until an explicit policy decision
-	is made to shut down or suspend the CPU.
-
-	Next state:	CPU_GOING_DOWN
-	Conditions:	none
-	Trigger events:	explicit policy decision
-
-
-CPU_GOING_DOWN:
-
-	While in this state, the CPU exits coherency, including any
-	operations required to achieve this (such as cleaning data
-	caches).
-
-	Next state:	CPU_DOWN
-	Conditions:	local CPU teardown complete
-	Trigger events:	(spontaneous)
-
-
-Cluster state
--------------
-
-A cluster is a group of connected CPUs with some common resources.
-Because a cluster contains multiple CPUs, it can be doing multiple
-things at the same time.  This has some implications.  In particular, a
-CPU can start up while another CPU is tearing the cluster down.
-
-In this discussion, the "outbound side" is the view of the cluster state
-as seen by a CPU tearing the cluster down.  The "inbound side" is the
-view of the cluster state as seen by a CPU setting the CPU up.
-
-In order to enable safe coordination in such situations, it is important
-that a CPU which is setting up the cluster can advertise its state
-independently of the CPU which is tearing down the cluster.  For this
-reason, the cluster state is split into two parts:
-
-	"cluster" state: The global state of the cluster; or the state
-		on the outbound side:
-
-		CLUSTER_DOWN
-		CLUSTER_UP
-		CLUSTER_GOING_DOWN
-
-	"inbound" state: The state of the cluster on the inbound side.
-
-		INBOUND_NOT_COMING_UP
-		INBOUND_COMING_UP
-
-
-	The different pairings of these states results in six possible
-	states for the cluster as a whole:
-
-	                            CLUSTER_UP
-	          +==========> INBOUND_NOT_COMING_UP -------------+
-	          #                                               |
-	                                                          |
-	     CLUSTER_UP     <----+                                |
-	  INBOUND_COMING_UP      |                                v
-
-	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
-	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
-
-	    CLUSTER_DOWN         |                                |
-	  INBOUND_COMING_UP <----+                                |
-	                                                          |
-	          ^                                               |
-	          +===========     CLUSTER_DOWN      <------------+
-	                       INBOUND_NOT_COMING_UP
-
-	Transitions -----> can only be made by the outbound CPU, and
-	only involve changes to the "cluster" state.
-
-	Transitions ===##> can only be made by the inbound CPU, and only
-	involve changes to the "inbound" state, except where there is no
-	further transition possible on the outbound side (i.e., the
-	outbound CPU has put the cluster into the CLUSTER_DOWN state).
-
-	The race avoidance algorithm does not provide a way to determine
-	which exact CPUs within the cluster play these roles.  This must
-	be decided in advance by some other means.  Refer to the section
-	"Last man and first man selection" for more explanation.
-
-
-	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
-	cluster can actually be powered down.
-
-	The parallelism of the inbound and outbound CPUs is observed by
-	the existence of two different paths from CLUSTER_GOING_DOWN/
-	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
-	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
-	COMING_UP in the basic model).  The second path avoids cluster
-	teardown completely.
-
-	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
-	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
-	is trivial and merely resets the state machine ready for the
-	next cycle.
-
-	Details of the allowable transitions follow.
-
-	The next state in each case is notated
-
-		<cluster state>/<inbound state> (<transitioner>)
-
-	where the <transitioner> is the side on which the transition
-	can occur; either the inbound or the outbound side.
-
-
-CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
-
-	Next state:	CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
-	Conditions:	none
-	Trigger events:
-
-		a) an explicit hardware power-up operation, resulting
-		   from a policy decision on another CPU;
-
-		b) a hardware event, such as an interrupt.
-
-
-CLUSTER_DOWN/INBOUND_COMING_UP:
-
-	In this state, an inbound CPU sets up the cluster, including
-	enabling of hardware coherency at the cluster level and any
-	other operations (such as cache invalidation) which are required
-	in order to achieve this.
-
-	The purpose of this state is to do sufficient cluster-level
-	setup to enable other CPUs in the cluster to enter coherency
-	safely.
-
-	Next state:	CLUSTER_UP/INBOUND_COMING_UP (inbound)
-	Conditions:	cluster-level setup and hardware coherency complete
-	Trigger events:	(spontaneous)
-
-
-CLUSTER_UP/INBOUND_COMING_UP:
-
-	Cluster-level setup is complete and hardware coherency is
-	enabled for the cluster.  Other CPUs in the cluster can safely
-	enter coherency.
-
-	This is a transient state, leading immediately to
-	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
-	should consider treat these two states as equivalent.
-
-	Next state:	CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
-	Conditions:	none
-	Trigger events:	(spontaneous)
-
-
-CLUSTER_UP/INBOUND_NOT_COMING_UP:
-
-	Cluster-level setup is complete and hardware coherency is
-	enabled for the cluster.  Other CPUs in the cluster can safely
-	enter coherency.
-
-	The cluster will remain in this state until a policy decision is
-	made to power the cluster down.
-
-	Next state:	CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
-	Conditions:	none
-	Trigger events:	policy decision to power down the cluster
-
-
-CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
-
-	An outbound CPU is tearing the cluster down.  The selected CPU
-	must wait in this state until all CPUs in the cluster are in the
-	CPU_DOWN state.
-
-	When all CPUs are in the CPU_DOWN state, the cluster can be torn
-	down, for example by cleaning data caches and exiting
-	cluster-level coherency.
-
-	To avoid wasteful unnecessary teardown operations, the outbound
-	should check the inbound cluster state for asynchronous
-	transitions to INBOUND_COMING_UP.  Alternatively, individual
-	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
-
-
-	Next states:
-
-	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
-		Conditions:	cluster torn down and ready to power off
-		Trigger events:	(spontaneous)
-
-	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
-		Conditions:	none
-		Trigger events:
-
-			a) an explicit hardware power-up operation,
-			   resulting from a policy decision on another
-			   CPU;
-
-			b) a hardware event, such as an interrupt.
-
-
-CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
-
-	The cluster is (or was) being torn down, but another CPU has
-	come online in the meantime and is trying to set up the cluster
-	again.
-
-	If the outbound CPU observes this state, it has two choices:
-
-		a) back out of teardown, restoring the cluster to the
-		   CLUSTER_UP state;
-
-		b) finish tearing the cluster down and put the cluster
-		   in the CLUSTER_DOWN state; the inbound CPU will
-		   set up the cluster again from there.
-
-	Choice (a) permits the removal of some latency by avoiding
-	unnecessary teardown and setup operations in situations where
-	the cluster is not really going to be powered down.
-
-
-	Next states:
-
-	CLUSTER_UP/INBOUND_COMING_UP (outbound)
-		Conditions:	cluster-level setup and hardware
-				coherency complete
-		Trigger events:	(spontaneous)
-
-	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
-		Conditions:	cluster torn down and ready to power off
-		Trigger events:	(spontaneous)
-
-
-Last man and First man selection
---------------------------------
-
-The CPU which performs cluster tear-down operations on the outbound side
-is commonly referred to as the "last man".
-
-The CPU which performs cluster setup on the inbound side is commonly
-referred to as the "first man".
-
-The race avoidance algorithm documented above does not provide a
-mechanism to choose which CPUs should play these roles.
-
-
-Last man:
-
-When shutting down the cluster, all the CPUs involved are initially
-executing Linux and hence coherent.  Therefore, ordinary spinlocks can
-be used to select a last man safely, before the CPUs become
-non-coherent.
-
-
-First man:
-
-Because CPUs may power up asynchronously in response to external wake-up
-events, a dynamic mechanism is needed to make sure that only one CPU
-attempts to play the first man role and do the cluster-level
-initialisation: any other CPUs must wait for this to complete before
-proceeding.
-
-Cluster-level initialisation may involve actions such as configuring
-coherency controls in the bus fabric.
-
-The current implementation in mcpm_head.S uses a separate mutual exclusion
-mechanism to do this arbitration.  This mechanism is documented in
-detail in vlocks.txt.
-
-
-Features and Limitations
-------------------------
-
-Implementation:
-
-	The current ARM-based implementation is split between
-	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
-	arch/arm/common/mcpm_entry.c (everything else):
-
-	__mcpm_cpu_going_down() signals the transition of a CPU to the
-		CPU_GOING_DOWN state.
-
-	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
-		state.
-
-	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
-		low-level power-up code in mcpm_head.S.  This could
-		involve CPU-specific setup code, but in the current
-		implementation it does not.
-
-	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
-		handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
-		and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
-		the case of an aborted cluster power-down).
-
-		These functions are more complex than the __mcpm_cpu_*()
-		functions due to the extra inter-CPU coordination which
-		is needed for safe transitions at the cluster level.
-
-	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
-		the low-level power-up code in mcpm_head.S.  This
-		typically involves platform-specific setup code,
-		provided by the platform-specific power_up_setup
-		function registered via mcpm_sync_init.
-
-Deep topologies:
-
-	As currently described and implemented, the algorithm does not
-	support CPU topologies involving more than two levels (i.e.,
-	clusters of clusters are not supported).  The algorithm could be
-	extended by replicating the cluster-level states for the
-	additional topological levels, and modifying the transition
-	rules for the intermediate (non-outermost) cluster levels.
-
-
-Colophon
---------
-
-Originally created and documented by Dave Martin for Linaro Limited, in
-collaboration with Nicolas Pitre and Achin Gupta.
-
-Copyright (C) 2012-2013  Linaro Limited
-Distributed under the terms of Version 2 of the GNU General Public
-License, as defined in linux/COPYING.
diff --git a/Documentation/arm/firmware.rst b/Documentation/arm/firmware.rst
new file mode 100644
index 000000000000..efd844baec1d
--- /dev/null
+++ b/Documentation/arm/firmware.rst
@@ -0,0 +1,72 @@
+==========================================================================
+Interface for registering and calling firmware-specific operations for ARM
+==========================================================================
+
+Written by Tomasz Figa <t.figa@samsung.com>
+
+Some boards are running with secure firmware running in TrustZone secure
+world, which changes the way some things have to be initialized. This makes
+a need to provide an interface for such platforms to specify available firmware
+operations and call them when needed.
+
+Firmware operations can be specified by filling in a struct firmware_ops
+with appropriate callbacks and then registering it with register_firmware_ops()
+function::
+
+	void register_firmware_ops(const struct firmware_ops *ops)
+
+The ops pointer must be non-NULL. More information about struct firmware_ops
+and its members can be found in arch/arm/include/asm/firmware.h header.
+
+There is a default, empty set of operations provided, so there is no need to
+set anything if platform does not require firmware operations.
+
+To call a firmware operation, a helper macro is provided::
+
+	#define call_firmware_op(op, ...)				\
+		((firmware_ops->op) ? firmware_ops->op(__VA_ARGS__) : (-ENOSYS))
+
+the macro checks if the operation is provided and calls it or otherwise returns
+-ENOSYS to signal that given operation is not available (for example, to allow
+fallback to legacy operation).
+
+Example of registering firmware operations::
+
+	/* board file */
+
+	static int platformX_do_idle(void)
+	{
+		/* tell platformX firmware to enter idle */
+		return 0;
+	}
+
+	static int platformX_cpu_boot(int i)
+	{
+		/* tell platformX firmware to boot CPU i */
+		return 0;
+	}
+
+	static const struct firmware_ops platformX_firmware_ops = {
+		.do_idle        = exynos_do_idle,
+		.cpu_boot       = exynos_cpu_boot,
+		/* other operations not available on platformX */
+	};
+
+	/* init_early callback of machine descriptor */
+	static void __init board_init_early(void)
+	{
+		register_firmware_ops(&platformX_firmware_ops);
+	}
+
+Example of using a firmware operation::
+
+	/* some platform code, e.g. SMP initialization */
+
+	__raw_writel(__pa_symbol(exynos4_secondary_startup),
+		CPU1_BOOT_REG);
+
+	/* Call Exynos specific smc call */
+	if (call_firmware_op(cpu_boot, cpu) == -ENOSYS)
+		cpu_boot_legacy(...); /* Try legacy way */
+
+	gic_raise_softirq(cpumask_of(cpu), 1);
diff --git a/Documentation/arm/firmware.txt b/Documentation/arm/firmware.txt
deleted file mode 100644
index 7f175dbb427e..000000000000
--- a/Documentation/arm/firmware.txt
+++ /dev/null
@@ -1,70 +0,0 @@
-Interface for registering and calling firmware-specific operations for ARM.
-----
-Written by Tomasz Figa <t.figa@samsung.com>
-
-Some boards are running with secure firmware running in TrustZone secure
-world, which changes the way some things have to be initialized. This makes
-a need to provide an interface for such platforms to specify available firmware
-operations and call them when needed.
-
-Firmware operations can be specified by filling in a struct firmware_ops
-with appropriate callbacks and then registering it with register_firmware_ops()
-function.
-
-	void register_firmware_ops(const struct firmware_ops *ops)
-
-The ops pointer must be non-NULL. More information about struct firmware_ops
-and its members can be found in arch/arm/include/asm/firmware.h header.
-
-There is a default, empty set of operations provided, so there is no need to
-set anything if platform does not require firmware operations.
-
-To call a firmware operation, a helper macro is provided
-
-	#define call_firmware_op(op, ...)				\
-		((firmware_ops->op) ? firmware_ops->op(__VA_ARGS__) : (-ENOSYS))
-
-the macro checks if the operation is provided and calls it or otherwise returns
--ENOSYS to signal that given operation is not available (for example, to allow
-fallback to legacy operation).
-
-Example of registering firmware operations:
-
-	/* board file */
-
-	static int platformX_do_idle(void)
-	{
-		/* tell platformX firmware to enter idle */
-		return 0;
-	}
-
-	static int platformX_cpu_boot(int i)
-	{
-		/* tell platformX firmware to boot CPU i */
-		return 0;
-	}
-
-	static const struct firmware_ops platformX_firmware_ops = {
-		.do_idle        = exynos_do_idle,
-		.cpu_boot       = exynos_cpu_boot,
-		/* other operations not available on platformX */
-	};
-
-	/* init_early callback of machine descriptor */
-	static void __init board_init_early(void)
-	{
-		register_firmware_ops(&platformX_firmware_ops);
-	}
-
-Example of using a firmware operation:
-
-	/* some platform code, e.g. SMP initialization */
-
-	__raw_writel(__pa_symbol(exynos4_secondary_startup),
-		CPU1_BOOT_REG);
-
-	/* Call Exynos specific smc call */
-	if (call_firmware_op(cpu_boot, cpu) == -ENOSYS)
-		cpu_boot_legacy(...); /* Try legacy way */
-
-	gic_raise_softirq(cpumask_of(cpu), 1);
diff --git a/Documentation/arm/index.rst b/Documentation/arm/index.rst
new file mode 100644
index 000000000000..bd316d1a1802
--- /dev/null
+++ b/Documentation/arm/index.rst
@@ -0,0 +1,80 @@
+﻿:orphan:
+
+================
+ARM Architecture
+================
+
+.. toctree::
+   :maxdepth: 1
+
+   arm
+   booting
+   cluster-pm-race-avoidance
+   firmware
+   interrupts
+   kernel_mode_neon
+   kernel_user_helpers
+   memory
+   mem_alignment
+   tcm
+   setup
+   swp_emulation
+   uefi
+   vlocks
+   porting
+
+SoC-specific documents
+======================
+
+.. toctree::
+   :maxdepth: 1
+
+   ixp4xx
+
+   marvel
+   microchip
+
+   netwinder
+   nwfpe/index
+
+   keystone/overview
+   keystone/knav-qmss
+
+   omap/index
+
+   pxa/mfp
+
+
+   sa1100/index
+
+   stm32/stm32f746-overview
+   stm32/overview
+   stm32/stm32h743-overview
+   stm32/stm32f769-overview
+   stm32/stm32f429-overview
+   stm32/stm32mp157-overview
+
+   sunxi
+
+   samsung/index
+   samsung-s3c24xx/index
+
+   sunxi/clocks
+
+   spear/overview
+
+   sti/stih416-overview
+   sti/stih407-overview
+   sti/stih418-overview
+   sti/overview
+   sti/stih415-overview
+
+   vfp/release-notes
+
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/arm/interrupts.rst b/Documentation/arm/interrupts.rst
new file mode 100644
index 000000000000..2ae70e0e9732
--- /dev/null
+++ b/Documentation/arm/interrupts.rst
@@ -0,0 +1,169 @@
+==========
+Interrupts
+==========
+
+2.5.2-rmk5:
+  This is the first kernel that contains a major shake up of some of the
+  major architecture-specific subsystems.
+
+Firstly, it contains some pretty major changes to the way we handle the
+MMU TLB.  Each MMU TLB variant is now handled completely separately -
+we have TLB v3, TLB v4 (without write buffer), TLB v4 (with write buffer),
+and finally TLB v4 (with write buffer, with I TLB invalidate entry).
+There is more assembly code inside each of these functions, mainly to
+allow more flexible TLB handling for the future.
+
+Secondly, the IRQ subsystem.
+
+The 2.5 kernels will be having major changes to the way IRQs are handled.
+Unfortunately, this means that machine types that touch the irq_desc[]
+array (basically all machine types) will break, and this means every
+machine type that we currently have.
+
+Lets take an example.  On the Assabet with Neponset, we have::
+
+                  GPIO25                 IRR:2
+        SA1100 ------------> Neponset -----------> SA1111
+                                         IIR:1
+                                      -----------> USAR
+                                         IIR:0
+                                      -----------> SMC9196
+
+The way stuff currently works, all SA1111 interrupts are mutually
+exclusive of each other - if you're processing one interrupt from the
+SA1111 and another comes in, you have to wait for that interrupt to
+finish processing before you can service the new interrupt.  Eg, an
+IDE PIO-based interrupt on the SA1111 excludes all other SA1111 and
+SMC9196 interrupts until it has finished transferring its multi-sector
+data, which can be a long time.  Note also that since we loop in the
+SA1111 IRQ handler, SA1111 IRQs can hold off SMC9196 IRQs indefinitely.
+
+
+The new approach brings several new ideas...
+
+We introduce the concept of a "parent" and a "child".  For example,
+to the Neponset handler, the "parent" is GPIO25, and the "children"d
+are SA1111, SMC9196 and USAR.
+
+We also bring the idea of an IRQ "chip" (mainly to reduce the size of
+the irqdesc array).  This doesn't have to be a real "IC"; indeed the
+SA11x0 IRQs are handled by two separate "chip" structures, one for
+GPIO0-10, and another for all the rest.  It is just a container for
+the various operations (maybe this'll change to a better name).
+This structure has the following operations::
+
+  struct irqchip {
+          /*
+           * Acknowledge the IRQ.
+           * If this is a level-based IRQ, then it is expected to mask the IRQ
+           * as well.
+           */
+          void (*ack)(unsigned int irq);
+          /*
+           * Mask the IRQ in hardware.
+           */
+          void (*mask)(unsigned int irq);
+          /*
+           * Unmask the IRQ in hardware.
+           */
+          void (*unmask)(unsigned int irq);
+          /*
+           * Re-run the IRQ
+           */
+          void (*rerun)(unsigned int irq);
+          /*
+           * Set the type of the IRQ.
+           */
+          int (*type)(unsigned int irq, unsigned int, type);
+  };
+
+ack
+       - required.  May be the same function as mask for IRQs
+         handled by do_level_IRQ.
+mask
+       - required.
+unmask
+       - required.
+rerun
+       - optional.  Not required if you're using do_level_IRQ for all
+         IRQs that use this 'irqchip'.  Generally expected to re-trigger
+         the hardware IRQ if possible.  If not, may call the handler
+	 directly.
+type
+       - optional.  If you don't support changing the type of an IRQ,
+         it should be null so people can detect if they are unable to
+         set the IRQ type.
+
+For each IRQ, we keep the following information:
+
+        - "disable" depth (number of disable_irq()s without enable_irq()s)
+        - flags indicating what we can do with this IRQ (valid, probe,
+          noautounmask) as before
+        - status of the IRQ (probing, enable, etc)
+        - chip
+        - per-IRQ handler
+        - irqaction structure list
+
+The handler can be one of the 3 standard handlers - "level", "edge" and
+"simple", or your own specific handler if you need to do something special.
+
+The "level" handler is what we currently have - its pretty simple.
+"edge" knows about the brokenness of such IRQ implementations - that you
+need to leave the hardware IRQ enabled while processing it, and queueing
+further IRQ events should the IRQ happen again while processing.  The
+"simple" handler is very basic, and does not perform any hardware
+manipulation, nor state tracking.  This is useful for things like the
+SMC9196 and USAR above.
+
+So, what's changed?
+===================
+
+1. Machine implementations must not write to the irqdesc array.
+
+2. New functions to manipulate the irqdesc array.  The first 4 are expected
+   to be useful only to machine specific code.  The last is recommended to
+   only be used by machine specific code, but may be used in drivers if
+   absolutely necessary.
+
+        set_irq_chip(irq,chip)
+                Set the mask/unmask methods for handling this IRQ
+
+        set_irq_handler(irq,handler)
+                Set the handler for this IRQ (level, edge, simple)
+
+        set_irq_chained_handler(irq,handler)
+                Set a "chained" handler for this IRQ - automatically
+                enables this IRQ (eg, Neponset and SA1111 handlers).
+
+        set_irq_flags(irq,flags)
+                Set the valid/probe/noautoenable flags.
+
+        set_irq_type(irq,type)
+                Set active the IRQ edge(s)/level.  This replaces the
+                SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
+                function.  Type should be one of IRQ_TYPE_xxx defined in
+		<linux/irq.h>
+
+3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
+
+4. Direct access to SA1111 INTPOL is deprecated.  Use set_irq_type instead.
+
+5. A handler is expected to perform any necessary acknowledgement of the
+   parent IRQ via the correct chip specific function.  For instance, if
+   the SA1111 is directly connected to a SA1110 GPIO, then you should
+   acknowledge the SA1110 IRQ each time you re-read the SA1111 IRQ status.
+
+6. For any child which doesn't have its own IRQ enable/disable controls
+   (eg, SMC9196), the handler must mask or acknowledge the parent IRQ
+   while the child handler is called, and the child handler should be the
+   "simple" handler (not "edge" nor "level").  After the handler completes,
+   the parent IRQ should be unmasked, and the status of all children must
+   be re-checked for pending events.  (see the Neponset IRQ handler for
+   details).
+
+7. fixup_irq() is gone, as is `arch/arm/mach-*/include/mach/irq.h`
+
+Please note that this will not solve all problems - some of them are
+hardware based.  Mixing level-based and edge-based IRQs on the same
+parent signal (eg neponset) is one such area where a software based
+solution can't provide the full answer to low IRQ latency.
diff --git a/Documentation/arm/ixp4xx.rst b/Documentation/arm/ixp4xx.rst
new file mode 100644
index 000000000000..a57235616294
--- /dev/null
+++ b/Documentation/arm/ixp4xx.rst
@@ -0,0 +1,173 @@
+===========================================================
+Release Notes for Linux on Intel's IXP4xx Network Processor
+===========================================================
+
+Maintained by Deepak Saxena <dsaxena@plexity.net>
+-------------------------------------------------------------------------
+
+1. Overview
+
+Intel's IXP4xx network processor is a highly integrated SOC that
+is targeted for network applications, though it has become popular
+in industrial control and other areas due to low cost and power
+consumption. The IXP4xx family currently consists of several processors
+that support different network offload functions such as encryption,
+routing, firewalling, etc. The IXP46x family is an updated version which
+supports faster speeds, new memory and flash configurations, and more
+integration such as an on-chip I2C controller.
+
+For more information on the various versions of the CPU, see:
+
+   http://developer.intel.com/design/network/products/npfamily/ixp4xx.htm
+
+Intel also made the IXCP1100 CPU for sometime which is an IXP4xx
+stripped of much of the network intelligence.
+
+2. Linux Support
+
+Linux currently supports the following features on the IXP4xx chips:
+
+- Dual serial ports
+- PCI interface
+- Flash access (MTD/JFFS)
+- I2C through GPIO on IXP42x
+- GPIO for input/output/interrupts
+  See arch/arm/mach-ixp4xx/include/mach/platform.h for access functions.
+- Timers (watchdog, OS)
+
+The following components of the chips are not supported by Linux and
+require the use of Intel's proprietary CSR software:
+
+- USB device interface
+- Network interfaces (HSS, Utopia, NPEs, etc)
+- Network offload functionality
+
+If you need to use any of the above, you need to download Intel's
+software from:
+
+   http://developer.intel.com/design/network/products/npfamily/ixp425.htm
+
+DO NOT POST QUESTIONS TO THE LINUX MAILING LISTS REGARDING THE PROPRIETARY
+SOFTWARE.
+
+There are several websites that provide directions/pointers on using
+Intel's software:
+
+   - http://sourceforge.net/projects/ixp4xx-osdg/
+     Open Source Developer's Guide for using uClinux and the Intel libraries
+
+   - http://gatewaymaker.sourceforge.net/
+     Simple one page summary of building a gateway using an IXP425 and Linux
+
+   - http://ixp425.sourceforge.net/
+     ATM device driver for IXP425 that relies on Intel's libraries
+
+3. Known Issues/Limitations
+
+3a. Limited inbound PCI window
+
+The IXP4xx family allows for up to 256MB of memory but the PCI interface
+can only expose 64MB of that memory to the PCI bus. This means that if
+you are running with > 64MB, all PCI buffers outside of the accessible
+range will be bounced using the routines in arch/arm/common/dmabounce.c.
+
+3b. Limited outbound PCI window
+
+IXP4xx provides two methods of accessing PCI memory space:
+
+1) A direct mapped window from 0x48000000 to 0x4bffffff (64MB).
+   To access PCI via this space, we simply ioremap() the BAR
+   into the kernel and we can use the standard read[bwl]/write[bwl]
+   macros. This is the preffered method due to speed but it
+   limits the system to just 64MB of PCI memory. This can be
+   problamatic if using video cards and other memory-heavy devices.
+
+2) If > 64MB of memory space is required, the IXP4xx can be
+   configured to use indirect registers to access PCI This allows
+   for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus.
+   The disadvantage of this is that every PCI access requires
+   three local register accesses plus a spinlock, but in some
+   cases the performance hit is acceptable. In addition, you cannot
+   mmap() PCI devices in this case due to the indirect nature
+   of the PCI window.
+
+By default, the direct method is used for performance reasons. If
+you need more PCI memory, enable the IXP4XX_INDIRECT_PCI config option.
+
+3c. GPIO as Interrupts
+
+Currently the code only handles level-sensitive GPIO interrupts
+
+4. Supported platforms
+
+ADI Engineering Coyote Gateway Reference Platform
+http://www.adiengineering.com/productsCoyote.html
+
+   The ADI Coyote platform is reference design for those building
+   small residential/office gateways. One NPE is connected to a 10/100
+   interface, one to 4-port 10/100 switch, and the third to and ADSL
+   interface. In addition, it also supports to POTs interfaces connected
+   via SLICs. Note that those are not supported by Linux ATM. Finally,
+   the platform has two mini-PCI slots used for 802.11[bga] cards.
+   Finally, there is an IDE port hanging off the expansion bus.
+
+Gateworks Avila Network Platform
+http://www.gateworks.com/support/overview.php
+
+   The Avila platform is basically and IXDP425 with the 4 PCI slots
+   replaced with mini-PCI slots and a CF IDE interface hanging off
+   the expansion bus.
+
+Intel IXDP425 Development Platform
+http://www.intel.com/design/network/products/npfamily/ixdpg425.htm
+
+   This is Intel's standard reference platform for the IXDP425 and is
+   also known as the Richfield board. It contains 4 PCI slots, 16MB
+   of flash, two 10/100 ports and one ADSL port.
+
+Intel IXDP465 Development Platform
+http://www.intel.com/design/network/products/npfamily/ixdp465.htm
+
+   This is basically an IXDP425 with an IXP465 and 32M of flash instead
+   of just 16.
+
+Intel IXDPG425 Development Platform
+
+   This is basically and ADI Coyote board with a NEC EHCI controller
+   added. One issue with this board is that the mini-PCI slots only
+   have the 3.3v line connected, so you can't use a PCI to mini-PCI
+   adapter with an E100 card. So to NFS root you need to use either
+   the CSR or a WiFi card and a ramdisk that BOOTPs and then does
+   a pivot_root to NFS.
+
+Motorola PrPMC1100 Processor Mezanine Card
+http://www.fountainsys.com
+
+   The PrPMC1100 is based on the IXCP1100 and is meant to plug into
+   and IXP2400/2800 system to act as the system controller. It simply
+   contains a CPU and 16MB of flash on the board and needs to be
+   plugged into a carrier board to function. Currently Linux only
+   supports the Motorola PrPMC carrier board for this platform.
+
+5. TODO LIST
+
+- Add support for Coyote IDE
+- Add support for edge-based GPIO interrupts
+- Add support for CF IDE on expansion bus
+
+6. Thanks
+
+The IXP4xx work has been funded by Intel Corp. and MontaVista Software, Inc.
+
+The following people have contributed patches/comments/etc:
+
+- Lennerty Buytenhek
+- Lutz Jaenicke
+- Justin Mayfield
+- Robert E. Ranslam
+
+[I know I've forgotten others, please email me to be added]
+
+-------------------------------------------------------------------------
+
+Last Update: 01/04/2005
diff --git a/Documentation/arm/kernel_mode_neon.rst b/Documentation/arm/kernel_mode_neon.rst
new file mode 100644
index 000000000000..9bfb71a2a9b9
--- /dev/null
+++ b/Documentation/arm/kernel_mode_neon.rst
@@ -0,0 +1,124 @@
+================
+Kernel mode NEON
+================
+
+TL;DR summary
+-------------
+* Use only NEON instructions, or VFP instructions that don't rely on support
+  code
+* Isolate your NEON code in a separate compilation unit, and compile it with
+  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
+* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
+  NEON code
+* Don't sleep in your NEON code, and be aware that it will be executed with
+  preemption disabled
+
+
+Introduction
+------------
+It is possible to use NEON instructions (and in some cases, VFP instructions) in
+code that runs in kernel mode. However, for performance reasons, the NEON/VFP
+register file is not preserved and restored at every context switch or taken
+exception like the normal register file is, so some manual intervention is
+required. Furthermore, special care is required for code that may sleep [i.e.,
+may call schedule()], as NEON or VFP instructions will be executed in a
+non-preemptible section for reasons outlined below.
+
+
+Lazy preserve and restore
+-------------------------
+The NEON/VFP register file is managed using lazy preserve (on UP systems) and
+lazy restore (on both SMP and UP systems). This means that the register file is
+kept 'live', and is only preserved and restored when multiple tasks are
+contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
+another core). Lazy restore is implemented by disabling the NEON/VFP unit after
+every context switch, resulting in a trap when subsequently a NEON/VFP
+instruction is issued, allowing the kernel to step in and perform the restore if
+necessary.
+
+Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
+it is required to do an 'eager' preserve of the NEON/VFP register file, and
+enable the NEON/VFP unit explicitly so no exceptions are generated on first
+subsequent use. This is handled by the function kernel_neon_begin(), which
+should be called before any kernel mode NEON or VFP instructions are issued.
+Likewise, the NEON/VFP unit should be disabled again after use to make sure user
+mode will hit the lazy restore trap upon next use. This is handled by the
+function kernel_neon_end().
+
+
+Interruptions in kernel mode
+----------------------------
+For reasons of performance and simplicity, it was decided that there shall be no
+preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
+implies that interruptions of a kernel mode NEON section can only be allowed if
+they are guaranteed not to touch the NEON/VFP registers. For this reason, the
+following rules and restrictions apply in the kernel:
+* NEON/VFP code is not allowed in interrupt context;
+* NEON/VFP code is not allowed to sleep;
+* NEON/VFP code is executed with preemption disabled.
+
+If latency is a concern, it is possible to put back to back calls to
+kernel_neon_end() and kernel_neon_begin() in places in your code where none of
+the NEON registers are live. (Additional calls to kernel_neon_begin() should be
+reasonably cheap if no context switch occurred in the meantime)
+
+
+VFP and support code
+--------------------
+Earlier versions of VFP (prior to version 3) rely on software support for things
+like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
+software assistance, it signals the kernel by raising an undefined instruction
+exception. The kernel responds by inspecting the VFP control registers and the
+current instruction and arguments, and emulates the instruction in software.
+
+Such software assistance is currently not implemented for VFP instructions
+executed in kernel mode. If such a condition is encountered, the kernel will
+fail and generate an OOPS.
+
+
+Separating NEON code from ordinary code
+---------------------------------------
+The compiler is not aware of the special significance of kernel_neon_begin() and
+kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
+between calls to these respective functions. Furthermore, GCC may generate NEON
+instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
+kernel is currently compiled at -O2, future changes may result in NEON/VFP
+instructions appearing in unexpected places if no special care is taken.
+
+Therefore, the recommended and only supported way of using NEON/VFP in the
+kernel is by adhering to the following rules:
+
+* isolate the NEON code in a separate compilation unit and compile it with
+  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
+* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
+  into the unit containing the NEON code from a compilation unit which is *not*
+  built with the GCC flag '-mfpu=neon' set.
+
+As the kernel is compiled with '-msoft-float', the above will guarantee that
+both NEON and VFP instructions will only ever appear in designated compilation
+units at any optimization level.
+
+
+NEON assembler
+--------------
+NEON assembler is supported with no additional caveats as long as the rules
+above are followed.
+
+
+NEON code generated by GCC
+--------------------------
+The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
+parallelism, and generates NEON code from ordinary C source code. This is fully
+supported as long as the rules above are followed.
+
+
+NEON intrinsics
+---------------
+NEON intrinsics are also supported. However, as code using NEON intrinsics
+relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
+observe the following in addition to the rules above:
+
+* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
+  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
+  does not supply);
+* Include <arm_neon.h> last, or at least after <linux/types.h>
diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt
deleted file mode 100644
index b9e060c5b61e..000000000000
--- a/Documentation/arm/kernel_mode_neon.txt
+++ /dev/null
@@ -1,121 +0,0 @@
-Kernel mode NEON
-================
-
-TL;DR summary
--------------
-* Use only NEON instructions, or VFP instructions that don't rely on support
-  code
-* Isolate your NEON code in a separate compilation unit, and compile it with
-  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp'
-* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
-  NEON code
-* Don't sleep in your NEON code, and be aware that it will be executed with
-  preemption disabled
-
-
-Introduction
-------------
-It is possible to use NEON instructions (and in some cases, VFP instructions) in
-code that runs in kernel mode. However, for performance reasons, the NEON/VFP
-register file is not preserved and restored at every context switch or taken
-exception like the normal register file is, so some manual intervention is
-required. Furthermore, special care is required for code that may sleep [i.e.,
-may call schedule()], as NEON or VFP instructions will be executed in a
-non-preemptible section for reasons outlined below.
-
-
-Lazy preserve and restore
--------------------------
-The NEON/VFP register file is managed using lazy preserve (on UP systems) and
-lazy restore (on both SMP and UP systems). This means that the register file is
-kept 'live', and is only preserved and restored when multiple tasks are
-contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
-another core). Lazy restore is implemented by disabling the NEON/VFP unit after
-every context switch, resulting in a trap when subsequently a NEON/VFP
-instruction is issued, allowing the kernel to step in and perform the restore if
-necessary.
-
-Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
-it is required to do an 'eager' preserve of the NEON/VFP register file, and
-enable the NEON/VFP unit explicitly so no exceptions are generated on first
-subsequent use. This is handled by the function kernel_neon_begin(), which
-should be called before any kernel mode NEON or VFP instructions are issued.
-Likewise, the NEON/VFP unit should be disabled again after use to make sure user
-mode will hit the lazy restore trap upon next use. This is handled by the
-function kernel_neon_end().
-
-
-Interruptions in kernel mode
-----------------------------
-For reasons of performance and simplicity, it was decided that there shall be no
-preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
-implies that interruptions of a kernel mode NEON section can only be allowed if
-they are guaranteed not to touch the NEON/VFP registers. For this reason, the
-following rules and restrictions apply in the kernel:
-* NEON/VFP code is not allowed in interrupt context;
-* NEON/VFP code is not allowed to sleep;
-* NEON/VFP code is executed with preemption disabled.
-
-If latency is a concern, it is possible to put back to back calls to
-kernel_neon_end() and kernel_neon_begin() in places in your code where none of
-the NEON registers are live. (Additional calls to kernel_neon_begin() should be
-reasonably cheap if no context switch occurred in the meantime)
-
-
-VFP and support code
---------------------
-Earlier versions of VFP (prior to version 3) rely on software support for things
-like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
-software assistance, it signals the kernel by raising an undefined instruction
-exception. The kernel responds by inspecting the VFP control registers and the
-current instruction and arguments, and emulates the instruction in software.
-
-Such software assistance is currently not implemented for VFP instructions
-executed in kernel mode. If such a condition is encountered, the kernel will
-fail and generate an OOPS.
-
-
-Separating NEON code from ordinary code
----------------------------------------
-The compiler is not aware of the special significance of kernel_neon_begin() and
-kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
-between calls to these respective functions. Furthermore, GCC may generate NEON
-instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
-kernel is currently compiled at -O2, future changes may result in NEON/VFP
-instructions appearing in unexpected places if no special care is taken.
-
-Therefore, the recommended and only supported way of using NEON/VFP in the
-kernel is by adhering to the following rules:
-* isolate the NEON code in a separate compilation unit and compile it with
-  '-march=armv7-a -mfpu=neon -mfloat-abi=softfp';
-* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
-  into the unit containing the NEON code from a compilation unit which is *not*
-  built with the GCC flag '-mfpu=neon' set.
-
-As the kernel is compiled with '-msoft-float', the above will guarantee that
-both NEON and VFP instructions will only ever appear in designated compilation
-units at any optimization level.
-
-
-NEON assembler
---------------
-NEON assembler is supported with no additional caveats as long as the rules
-above are followed.
-
-
-NEON code generated by GCC
---------------------------
-The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
-parallelism, and generates NEON code from ordinary C source code. This is fully
-supported as long as the rules above are followed.
-
-
-NEON intrinsics
----------------
-NEON intrinsics are also supported. However, as code using NEON intrinsics
-relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
-observe the following in addition to the rules above:
-* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
-  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
-  does not supply);
-* Include <arm_neon.h> last, or at least after <linux/types.h>
diff --git a/Documentation/arm/kernel_user_helpers.rst b/Documentation/arm/kernel_user_helpers.rst
new file mode 100644
index 000000000000..eb6f3d916622
--- /dev/null
+++ b/Documentation/arm/kernel_user_helpers.rst
@@ -0,0 +1,268 @@
+============================
+Kernel-provided User Helpers
+============================
+
+These are segment of kernel provided user code reachable from user space
+at a fixed address in kernel memory.  This is used to provide user space
+with some operations which require kernel help because of unimplemented
+native feature and/or instructions in many ARM CPUs. The idea is for this
+code to be executed directly in user mode for best efficiency but which is
+too intimate with the kernel counter part to be left to user libraries.
+In fact this code might even differ from one CPU to another depending on
+the available instruction set, or whether it is a SMP systems. In other
+words, the kernel reserves the right to change this code as needed without
+warning. Only the entry points and their results as documented here are
+guaranteed to be stable.
+
+This is different from (but doesn't preclude) a full blown VDSO
+implementation, however a VDSO would prevent some assembly tricks with
+constants that allows for efficient branching to those code segments. And
+since those code segments only use a few cycles before returning to user
+code, the overhead of a VDSO indirect far call would add a measurable
+overhead to such minimalistic operations.
+
+User space is expected to bypass those helpers and implement those things
+inline (either in the code emitted directly by the compiler, or part of
+the implementation of a library call) when optimizing for a recent enough
+processor that has the necessary native support, but only if resulting
+binaries are already to be incompatible with earlier ARM processors due to
+usage of similar native instructions for other things.  In other words
+don't make binaries unable to run on earlier processors just for the sake
+of not using these kernel helpers if your compiled code is not going to
+use new instructions for other purpose.
+
+New helpers may be added over time, so an older kernel may be missing some
+helpers present in a newer kernel.  For this reason, programs must check
+the value of __kuser_helper_version (see below) before assuming that it is
+safe to call any particular helper.  This check should ideally be
+performed only once at process startup time, and execution aborted early
+if the required helpers are not provided by the kernel version that
+process is running on.
+
+kuser_helper_version
+--------------------
+
+Location:	0xffff0ffc
+
+Reference declaration::
+
+  extern int32_t __kuser_helper_version;
+
+Definition:
+
+  This field contains the number of helpers being implemented by the
+  running kernel.  User space may read this to determine the availability
+  of a particular helper.
+
+Usage example::
+
+  #define __kuser_helper_version (*(int32_t *)0xffff0ffc)
+
+  void check_kuser_version(void)
+  {
+	if (__kuser_helper_version < 2) {
+		fprintf(stderr, "can't do atomic operations, kernel too old\n");
+		abort();
+	}
+  }
+
+Notes:
+
+  User space may assume that the value of this field never changes
+  during the lifetime of any single process.  This means that this
+  field can be read once during the initialisation of a library or
+  startup phase of a program.
+
+kuser_get_tls
+-------------
+
+Location:	0xffff0fe0
+
+Reference prototype::
+
+  void * __kuser_get_tls(void);
+
+Input:
+
+  lr = return address
+
+Output:
+
+  r0 = TLS value
+
+Clobbered registers:
+
+  none
+
+Definition:
+
+  Get the TLS value as previously set via the __ARM_NR_set_tls syscall.
+
+Usage example::
+
+  typedef void * (__kuser_get_tls_t)(void);
+  #define __kuser_get_tls (*(__kuser_get_tls_t *)0xffff0fe0)
+
+  void foo()
+  {
+	void *tls = __kuser_get_tls();
+	printf("TLS = %p\n", tls);
+  }
+
+Notes:
+
+  - Valid only if __kuser_helper_version >= 1 (from kernel version 2.6.12).
+
+kuser_cmpxchg
+-------------
+
+Location:	0xffff0fc0
+
+Reference prototype::
+
+  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);
+
+Input:
+
+  r0 = oldval
+  r1 = newval
+  r2 = ptr
+  lr = return address
+
+Output:
+
+  r0 = success code (zero or non-zero)
+  C flag = set if r0 == 0, clear if r0 != 0
+
+Clobbered registers:
+
+  r3, ip, flags
+
+Definition:
+
+  Atomically store newval in `*ptr` only if `*ptr` is equal to oldval.
+  Return zero if `*ptr` was changed or non-zero if no exchange happened.
+  The C flag is also set if `*ptr` was changed to allow for assembly
+  optimization in the calling code.
+
+Usage example::
+
+  typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
+  #define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)
+
+  int atomic_add(volatile int *ptr, int val)
+  {
+	int old, new;
+
+	do {
+		old = *ptr;
+		new = old + val;
+	} while(__kuser_cmpxchg(old, new, ptr));
+
+	return new;
+  }
+
+Notes:
+
+  - This routine already includes memory barriers as needed.
+
+  - Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
+
+kuser_memory_barrier
+--------------------
+
+Location:	0xffff0fa0
+
+Reference prototype::
+
+  void __kuser_memory_barrier(void);
+
+Input:
+
+  lr = return address
+
+Output:
+
+  none
+
+Clobbered registers:
+
+  none
+
+Definition:
+
+  Apply any needed memory barrier to preserve consistency with data modified
+  manually and __kuser_cmpxchg usage.
+
+Usage example::
+
+  typedef void (__kuser_dmb_t)(void);
+  #define __kuser_dmb (*(__kuser_dmb_t *)0xffff0fa0)
+
+Notes:
+
+  - Valid only if __kuser_helper_version >= 3 (from kernel version 2.6.15).
+
+kuser_cmpxchg64
+---------------
+
+Location:	0xffff0f60
+
+Reference prototype::
+
+  int __kuser_cmpxchg64(const int64_t *oldval,
+                        const int64_t *newval,
+                        volatile int64_t *ptr);
+
+Input:
+
+  r0 = pointer to oldval
+  r1 = pointer to newval
+  r2 = pointer to target value
+  lr = return address
+
+Output:
+
+  r0 = success code (zero or non-zero)
+  C flag = set if r0 == 0, clear if r0 != 0
+
+Clobbered registers:
+
+  r3, lr, flags
+
+Definition:
+
+  Atomically store the 64-bit value pointed by `*newval` in `*ptr` only if `*ptr`
+  is equal to the 64-bit value pointed by `*oldval`.  Return zero if `*ptr` was
+  changed or non-zero if no exchange happened.
+
+  The C flag is also set if `*ptr` was changed to allow for assembly
+  optimization in the calling code.
+
+Usage example::
+
+  typedef int (__kuser_cmpxchg64_t)(const int64_t *oldval,
+                                    const int64_t *newval,
+                                    volatile int64_t *ptr);
+  #define __kuser_cmpxchg64 (*(__kuser_cmpxchg64_t *)0xffff0f60)
+
+  int64_t atomic_add64(volatile int64_t *ptr, int64_t val)
+  {
+	int64_t old, new;
+
+	do {
+		old = *ptr;
+		new = old + val;
+	} while(__kuser_cmpxchg64(&old, &new, ptr));
+
+	return new;
+  }
+
+Notes:
+
+  - This routine already includes memory barriers as needed.
+
+  - Due to the length of this sequence, this spans 2 conventional kuser
+    "slots", therefore 0xffff0f80 is not used as a valid entry point.
+
+  - Valid only if __kuser_helper_version >= 5 (from kernel version 3.1).
diff --git a/Documentation/arm/kernel_user_helpers.txt b/Documentation/arm/kernel_user_helpers.txt
deleted file mode 100644
index 5673594717cf..000000000000
--- a/Documentation/arm/kernel_user_helpers.txt
+++ /dev/null
@@ -1,267 +0,0 @@
-Kernel-provided User Helpers
-============================
-
-These are segment of kernel provided user code reachable from user space
-at a fixed address in kernel memory.  This is used to provide user space
-with some operations which require kernel help because of unimplemented
-native feature and/or instructions in many ARM CPUs. The idea is for this
-code to be executed directly in user mode for best efficiency but which is
-too intimate with the kernel counter part to be left to user libraries.
-In fact this code might even differ from one CPU to another depending on
-the available instruction set, or whether it is a SMP systems. In other
-words, the kernel reserves the right to change this code as needed without
-warning. Only the entry points and their results as documented here are
-guaranteed to be stable.
-
-This is different from (but doesn't preclude) a full blown VDSO
-implementation, however a VDSO would prevent some assembly tricks with
-constants that allows for efficient branching to those code segments. And
-since those code segments only use a few cycles before returning to user
-code, the overhead of a VDSO indirect far call would add a measurable
-overhead to such minimalistic operations.
-
-User space is expected to bypass those helpers and implement those things
-inline (either in the code emitted directly by the compiler, or part of
-the implementation of a library call) when optimizing for a recent enough
-processor that has the necessary native support, but only if resulting
-binaries are already to be incompatible with earlier ARM processors due to
-usage of similar native instructions for other things.  In other words
-don't make binaries unable to run on earlier processors just for the sake
-of not using these kernel helpers if your compiled code is not going to
-use new instructions for other purpose.
-
-New helpers may be added over time, so an older kernel may be missing some
-helpers present in a newer kernel.  For this reason, programs must check
-the value of __kuser_helper_version (see below) before assuming that it is
-safe to call any particular helper.  This check should ideally be
-performed only once at process startup time, and execution aborted early
-if the required helpers are not provided by the kernel version that
-process is running on.
-
-kuser_helper_version
---------------------
-
-Location:	0xffff0ffc
-
-Reference declaration:
-
-  extern int32_t __kuser_helper_version;
-
-Definition:
-
-  This field contains the number of helpers being implemented by the
-  running kernel.  User space may read this to determine the availability
-  of a particular helper.
-
-Usage example:
-
-#define __kuser_helper_version (*(int32_t *)0xffff0ffc)
-
-void check_kuser_version(void)
-{
-	if (__kuser_helper_version < 2) {
-		fprintf(stderr, "can't do atomic operations, kernel too old\n");
-		abort();
-	}
-}
-
-Notes:
-
-  User space may assume that the value of this field never changes
-  during the lifetime of any single process.  This means that this
-  field can be read once during the initialisation of a library or
-  startup phase of a program.
-
-kuser_get_tls
--------------
-
-Location:	0xffff0fe0
-
-Reference prototype:
-
-  void * __kuser_get_tls(void);
-
-Input:
-
-  lr = return address
-
-Output:
-
-  r0 = TLS value
-
-Clobbered registers:
-
-  none
-
-Definition:
-
-  Get the TLS value as previously set via the __ARM_NR_set_tls syscall.
-
-Usage example:
-
-typedef void * (__kuser_get_tls_t)(void);
-#define __kuser_get_tls (*(__kuser_get_tls_t *)0xffff0fe0)
-
-void foo()
-{
-	void *tls = __kuser_get_tls();
-	printf("TLS = %p\n", tls);
-}
-
-Notes:
-
-  - Valid only if __kuser_helper_version >= 1 (from kernel version 2.6.12).
-
-kuser_cmpxchg
--------------
-
-Location:	0xffff0fc0
-
-Reference prototype:
-
-  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);
-
-Input:
-
-  r0 = oldval
-  r1 = newval
-  r2 = ptr
-  lr = return address
-
-Output:
-
-  r0 = success code (zero or non-zero)
-  C flag = set if r0 == 0, clear if r0 != 0
-
-Clobbered registers:
-
-  r3, ip, flags
-
-Definition:
-
-  Atomically store newval in *ptr only if *ptr is equal to oldval.
-  Return zero if *ptr was changed or non-zero if no exchange happened.
-  The C flag is also set if *ptr was changed to allow for assembly
-  optimization in the calling code.
-
-Usage example:
-
-typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
-#define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)
-
-int atomic_add(volatile int *ptr, int val)
-{
-	int old, new;
-
-	do {
-		old = *ptr;
-		new = old + val;
-	} while(__kuser_cmpxchg(old, new, ptr));
-
-	return new;
-}
-
-Notes:
-
-  - This routine already includes memory barriers as needed.
-
-  - Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
-
-kuser_memory_barrier
---------------------
-
-Location:	0xffff0fa0
-
-Reference prototype:
-
-  void __kuser_memory_barrier(void);
-
-Input:
-
-  lr = return address
-
-Output:
-
-  none
-
-Clobbered registers:
-
-  none
-
-Definition:
-
-  Apply any needed memory barrier to preserve consistency with data modified
-  manually and __kuser_cmpxchg usage.
-
-Usage example:
-
-typedef void (__kuser_dmb_t)(void);
-#define __kuser_dmb (*(__kuser_dmb_t *)0xffff0fa0)
-
-Notes:
-
-  - Valid only if __kuser_helper_version >= 3 (from kernel version 2.6.15).
-
-kuser_cmpxchg64
----------------
-
-Location:	0xffff0f60
-
-Reference prototype:
-
-  int __kuser_cmpxchg64(const int64_t *oldval,
-                        const int64_t *newval,
-                        volatile int64_t *ptr);
-
-Input:
-
-  r0 = pointer to oldval
-  r1 = pointer to newval
-  r2 = pointer to target value
-  lr = return address
-
-Output:
-
-  r0 = success code (zero or non-zero)
-  C flag = set if r0 == 0, clear if r0 != 0
-
-Clobbered registers:
-
-  r3, lr, flags
-
-Definition:
-
-  Atomically store the 64-bit value pointed by *newval in *ptr only if *ptr
-  is equal to the 64-bit value pointed by *oldval.  Return zero if *ptr was
-  changed or non-zero if no exchange happened.
-
-  The C flag is also set if *ptr was changed to allow for assembly
-  optimization in the calling code.
-
-Usage example:
-
-typedef int (__kuser_cmpxchg64_t)(const int64_t *oldval,
-                                  const int64_t *newval,
-                                  volatile int64_t *ptr);
-#define __kuser_cmpxchg64 (*(__kuser_cmpxchg64_t *)0xffff0f60)
-
-int64_t atomic_add64(volatile int64_t *ptr, int64_t val)
-{
-	int64_t old, new;
-
-	do {
-		old = *ptr;
-		new = old + val;
-	} while(__kuser_cmpxchg64(&old, &new, ptr));
-
-	return new;
-}
-
-Notes:
-
-  - This routine already includes memory barriers as needed.
-
-  - Due to the length of this sequence, this spans 2 conventional kuser
-    "slots", therefore 0xffff0f80 is not used as a valid entry point.
-
-  - Valid only if __kuser_helper_version >= 5 (from kernel version 3.1).
diff --git a/Documentation/arm/keystone/Overview.txt b/Documentation/arm/keystone/Overview.txt
deleted file mode 100644
index 400c0c270d2e..000000000000
--- a/Documentation/arm/keystone/Overview.txt
+++ /dev/null
@@ -1,55 +0,0 @@
-		TI Keystone Linux Overview
-		--------------------------
-
-Introduction
-------------
-Keystone range of SoCs are based on ARM Cortex-A15 MPCore Processors
-and c66x DSP cores. This document describes essential information required
-for users to run Linux on Keystone based EVMs from Texas Instruments.
-
-Following SoCs  & EVMs are currently supported:-
-
------------- K2HK SoC and EVM --------------------------------------------------
-
-a.k.a Keystone 2 Hawking/Kepler SoC
-TCI6636K2H & TCI6636K2K: See documentation at
-	http://www.ti.com/product/tci6638k2k
-	http://www.ti.com/product/tci6638k2h
-
-EVM:
-http://www.advantech.com/Support/TI-EVM/EVMK2HX_sd.aspx
-
------------- K2E SoC and EVM ---------------------------------------------------
-
-a.k.a Keystone 2 Edison SoC
-K2E  -  66AK2E05: See documentation at
-	http://www.ti.com/product/66AK2E05/technicaldocuments
-
-EVM:
-https://www.einfochips.com/index.php/partnerships/texas-instruments/k2e-evm.html
-
------------- K2L SoC and EVM ---------------------------------------------------
-
-a.k.a Keystone 2 Lamarr SoC
-K2L  -  TCI6630K2L: See documentation at
-	http://www.ti.com/product/TCI6630K2L/technicaldocuments
-EVM:
-https://www.einfochips.com/index.php/partnerships/texas-instruments/k2l-evm.html
-
-Configuration
--------------
-
-All of the K2 SoCs/EVMs share a common defconfig, keystone_defconfig and same
-image is used to boot on individual EVMs. The platform configuration is
-specified through DTS. Following are the DTS used:-
-	K2HK EVM : k2hk-evm.dts
-	K2E EVM  : k2e-evm.dts
-	K2L EVM  : k2l-evm.dts
-
-The device tree documentation for the keystone machines are located at
-        Documentation/devicetree/bindings/arm/keystone/keystone.txt
-
-Document Author
----------------
-Murali Karicheri <m-karicheri2@ti.com>
-Copyright 2015 Texas Instruments
diff --git a/Documentation/arm/keystone/knav-qmss.rst b/Documentation/arm/keystone/knav-qmss.rst
new file mode 100644
index 000000000000..7f7638d80b42
--- /dev/null
+++ b/Documentation/arm/keystone/knav-qmss.rst
@@ -0,0 +1,60 @@
+======================================================================
+Texas Instruments Keystone Navigator Queue Management SubSystem driver
+======================================================================
+
+Driver source code path
+  drivers/soc/ti/knav_qmss.c
+  drivers/soc/ti/knav_qmss_acc.c
+
+The QMSS (Queue Manager Sub System) found on Keystone SOCs is one of
+the main hardware sub system which forms the backbone of the Keystone
+multi-core Navigator. QMSS consist of queue managers, packed-data structure
+processors(PDSP), linking RAM, descriptor pools and infrastructure
+Packet DMA.
+The Queue Manager is a hardware module that is responsible for accelerating
+management of the packet queues. Packets are queued/de-queued by writing or
+reading descriptor address to a particular memory mapped location. The PDSPs
+perform QMSS related functions like accumulation, QoS, or event management.
+Linking RAM registers are used to link the descriptors which are stored in
+descriptor RAM. Descriptor RAM is configurable as internal or external memory.
+The QMSS driver manages the PDSP setups, linking RAM regions,
+queue pool management (allocation, push, pop and notify) and descriptor
+pool management.
+
+knav qmss driver provides a set of APIs to drivers to open/close qmss queues,
+allocate descriptor pools, map the descriptors, push/pop to queues etc. For
+details of the available APIs, please refers to include/linux/soc/ti/knav_qmss.h
+
+DT documentation is available at
+Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt
+
+Accumulator QMSS queues using PDSP firmware
+============================================
+The QMSS PDSP firmware support accumulator channel that can monitor a single
+queue or multiple contiguous queues. drivers/soc/ti/knav_qmss_acc.c is the
+driver that interface with the accumulator PDSP. This configures
+accumulator channels defined in DTS (example in DT documentation) to monitor
+1 or 32 queues per channel. More description on the firmware is available in
+CPPI/QMSS Low Level Driver document (docs/CPPI_QMSS_LLD_SDS.pdf) at
+
+	git://git.ti.com/keystone-rtos/qmss-lld.git
+
+k2_qmss_pdsp_acc48_k2_le_1_0_0_9.bin firmware supports upto 48 accumulator
+channels. This firmware is available under ti-keystone folder of
+firmware.git at
+
+   git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
+
+To use copy the firmware image to lib/firmware folder of the initramfs or
+ubifs file system and provide a sym link to k2_qmss_pdsp_acc48_k2_le_1_0_0_9.bin
+in the file system and boot up the kernel. User would see
+
+ "firmware file ks2_qmss_pdsp_acc48.bin downloaded for PDSP"
+
+in the boot up log if loading of firmware to PDSP is successful.
+
+Use of accumulated queues requires the firmware image to be present in the
+file system. The driver doesn't acc queues to the supported queue range if
+PDSP is not running in the SoC. The API call fails if there is a queue open
+request to an acc queue and PDSP is not running. So make sure to copy firmware
+to file system before using these queue types.
diff --git a/Documentation/arm/keystone/knav-qmss.txt b/Documentation/arm/keystone/knav-qmss.txt
deleted file mode 100644
index fcdb9fd5f53a..000000000000
--- a/Documentation/arm/keystone/knav-qmss.txt
+++ /dev/null
@@ -1,56 +0,0 @@
-* Texas Instruments Keystone Navigator Queue Management SubSystem driver
-
-Driver source code path
-  drivers/soc/ti/knav_qmss.c
-  drivers/soc/ti/knav_qmss_acc.c
-
-The QMSS (Queue Manager Sub System) found on Keystone SOCs is one of
-the main hardware sub system which forms the backbone of the Keystone
-multi-core Navigator. QMSS consist of queue managers, packed-data structure
-processors(PDSP), linking RAM, descriptor pools and infrastructure
-Packet DMA.
-The Queue Manager is a hardware module that is responsible for accelerating
-management of the packet queues. Packets are queued/de-queued by writing or
-reading descriptor address to a particular memory mapped location. The PDSPs
-perform QMSS related functions like accumulation, QoS, or event management.
-Linking RAM registers are used to link the descriptors which are stored in
-descriptor RAM. Descriptor RAM is configurable as internal or external memory.
-The QMSS driver manages the PDSP setups, linking RAM regions,
-queue pool management (allocation, push, pop and notify) and descriptor
-pool management.
-
-knav qmss driver provides a set of APIs to drivers to open/close qmss queues,
-allocate descriptor pools, map the descriptors, push/pop to queues etc. For
-details of the available APIs, please refers to include/linux/soc/ti/knav_qmss.h
-
-DT documentation is available at
-Documentation/devicetree/bindings/soc/ti/keystone-navigator-qmss.txt
-
-Accumulator QMSS queues using PDSP firmware
-============================================
-The QMSS PDSP firmware support accumulator channel that can monitor a single
-queue or multiple contiguous queues. drivers/soc/ti/knav_qmss_acc.c is the
-driver that interface with the accumulator PDSP. This configures
-accumulator channels defined in DTS (example in DT documentation) to monitor
-1 or 32 queues per channel. More description on the firmware is available in
-CPPI/QMSS Low Level Driver document (docs/CPPI_QMSS_LLD_SDS.pdf) at
-	git://git.ti.com/keystone-rtos/qmss-lld.git
-
-k2_qmss_pdsp_acc48_k2_le_1_0_0_9.bin firmware supports upto 48 accumulator
-channels. This firmware is available under ti-keystone folder of
-firmware.git at
-   git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
-
-To use copy the firmware image to lib/firmware folder of the initramfs or
-ubifs file system and provide a sym link to k2_qmss_pdsp_acc48_k2_le_1_0_0_9.bin
-in the file system and boot up the kernel. User would see
-
- "firmware file ks2_qmss_pdsp_acc48.bin downloaded for PDSP"
-
-in the boot up log if loading of firmware to PDSP is successful.
-
-Use of accumulated queues requires the firmware image to be present in the
-file system. The driver doesn't acc queues to the supported queue range if
-PDSP is not running in the SoC. The API call fails if there is a queue open
-request to an acc queue and PDSP is not running. So make sure to copy firmware
-to file system before using these queue types.
diff --git a/Documentation/arm/keystone/overview.rst b/Documentation/arm/keystone/overview.rst
new file mode 100644
index 000000000000..cd90298c493c
--- /dev/null
+++ b/Documentation/arm/keystone/overview.rst
@@ -0,0 +1,74 @@
+==========================
+TI Keystone Linux Overview
+==========================
+
+Introduction
+------------
+Keystone range of SoCs are based on ARM Cortex-A15 MPCore Processors
+and c66x DSP cores. This document describes essential information required
+for users to run Linux on Keystone based EVMs from Texas Instruments.
+
+Following SoCs  & EVMs are currently supported:-
+
+K2HK SoC and EVM
+=================
+
+a.k.a Keystone 2 Hawking/Kepler SoC
+TCI6636K2H & TCI6636K2K: See documentation at
+
+	http://www.ti.com/product/tci6638k2k
+	http://www.ti.com/product/tci6638k2h
+
+EVM:
+  http://www.advantech.com/Support/TI-EVM/EVMK2HX_sd.aspx
+
+K2E SoC and EVM
+===============
+
+a.k.a Keystone 2 Edison SoC
+
+K2E  -  66AK2E05:
+
+See documentation at
+
+	http://www.ti.com/product/66AK2E05/technicaldocuments
+
+EVM:
+   https://www.einfochips.com/index.php/partnerships/texas-instruments/k2e-evm.html
+
+K2L SoC and EVM
+===============
+
+a.k.a Keystone 2 Lamarr SoC
+
+K2L  -  TCI6630K2L:
+
+See documentation at
+	http://www.ti.com/product/TCI6630K2L/technicaldocuments
+
+EVM:
+  https://www.einfochips.com/index.php/partnerships/texas-instruments/k2l-evm.html
+
+Configuration
+-------------
+
+All of the K2 SoCs/EVMs share a common defconfig, keystone_defconfig and same
+image is used to boot on individual EVMs. The platform configuration is
+specified through DTS. Following are the DTS used:
+
+	K2HK EVM:
+		k2hk-evm.dts
+	K2E EVM:
+		k2e-evm.dts
+	K2L EVM:
+		k2l-evm.dts
+
+The device tree documentation for the keystone machines are located at
+
+        Documentation/devicetree/bindings/arm/keystone/keystone.txt
+
+Document Author
+---------------
+Murali Karicheri <m-karicheri2@ti.com>
+
+Copyright 2015 Texas Instruments
diff --git a/Documentation/arm/marvel.rst b/Documentation/arm/marvel.rst
new file mode 100644
index 000000000000..16ab2eb085b8
--- /dev/null
+++ b/Documentation/arm/marvel.rst
@@ -0,0 +1,488 @@
+================
+ARM Marvell SoCs
+================
+
+This document lists all the ARM Marvell SoCs that are currently
+supported in mainline by the Linux kernel. As the Marvell families of
+SoCs are large and complex, it is hard to understand where the support
+for a particular SoC is available in the Linux kernel. This document
+tries to help in understanding where those SoCs are supported, and to
+match them with their corresponding public datasheet, when available.
+
+Orion family
+------------
+
+  Flavors:
+        - 88F5082
+        - 88F5181
+        - 88F5181L
+        - 88F5182
+
+               - Datasheet: http://www.embeddedarm.com/documentation/third-party/MV88F5182-datasheet.pdf
+               - Programmer's User Guide: http://www.embeddedarm.com/documentation/third-party/MV88F5182-opensource-manual.pdf
+               - User Manual: http://www.embeddedarm.com/documentation/third-party/MV88F5182-usermanual.pdf
+        - 88F5281
+
+               - Datasheet: http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf
+        - 88F6183
+  Core:
+	Feroceon 88fr331 (88f51xx) or 88fr531-vd (88f52xx) ARMv5 compatible
+  Linux kernel mach directory:
+	arch/arm/mach-orion5x
+  Linux kernel plat directory:
+	arch/arm/plat-orion
+
+Kirkwood family
+---------------
+
+  Flavors:
+        - 88F6282 a.k.a Armada 300
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
+        - 88F6283 a.k.a Armada 310
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
+        - 88F6190
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6190-003_WEB.pdf
+                - Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
+                - Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+        - 88F6192
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6192-003_ver1.pdf
+                - Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
+                - Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+        - 88F6182
+        - 88F6180
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6180-003_ver1.pdf
+                - Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6180_OpenSource.pdf
+                - Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+        - 88F6281
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6281-004_ver1.pdf
+                - Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6281_OpenSource.pdf
+                - Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+  Homepage:
+	http://www.marvell.com/embedded-processors/kirkwood/
+  Core:
+	Feroceon 88fr131 ARMv5 compatible
+  Linux kernel mach directory:
+	arch/arm/mach-mvebu
+  Linux kernel plat directory:
+	none
+
+Discovery family
+----------------
+
+  Flavors:
+        - MV78100
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78100-003_WEB.pdf
+                - Hardware Spec  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78100_OpenSource.pdf
+                - Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
+        - MV78200
+
+                - Product Brief  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78200-002_WEB.pdf
+                - Hardware Spec  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78200_OpenSource.pdf
+                - Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
+        - MV76100
+
+                Not supported by the Linux kernel.
+
+  Core:
+	Feroceon 88fr571-vd ARMv5 compatible
+
+  Linux kernel mach directory:
+	arch/arm/mach-mv78xx0
+  Linux kernel plat directory:
+	arch/arm/plat-orion
+
+EBU Armada family
+-----------------
+
+  Armada 370 Flavors:
+        - 88F6710
+        - 88F6707
+        - 88F6W11
+
+    - Product Brief:   http://www.marvell.com/embedded-processors/armada-300/assets/Marvell_ARMADA_370_SoC.pdf
+    - Hardware Spec:   http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-datasheet.pdf
+    - Functional Spec: http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA370-FunctionalSpec-datasheet.pdf
+
+  Core:
+	Sheeva ARMv7 compatible PJ4B
+
+  Armada 375 Flavors:
+	- 88F6720
+
+    - Product Brief: http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA_375_SoC-01_product_brief.pdf
+
+  Core:
+	ARM Cortex-A9
+
+  Armada 38x Flavors:
+	- 88F6810	Armada 380
+	- 88F6820 Armada 385
+	- 88F6828 Armada 388
+
+    - Product infos:   http://www.marvell.com/embedded-processors/armada-38x/
+    - Functional Spec: https://marvellcorp.wufoo.com/forms/marvell-armada-38x-functional-specifications/
+
+  Core:
+	ARM Cortex-A9
+
+  Armada 39x Flavors:
+	- 88F6920 Armada 390
+	- 88F6928 Armada 398
+
+    - Product infos: http://www.marvell.com/embedded-processors/armada-39x/
+
+  Core:
+	ARM Cortex-A9
+
+  Armada XP Flavors:
+        - MV78230
+        - MV78260
+        - MV78460
+
+    NOTE:
+	not to be confused with the non-SMP 78xx0 SoCs
+
+    Product Brief:
+	http://www.marvell.com/embedded-processors/armada-xp/assets/Marvell-ArmadaXP-SoC-product%20brief.pdf
+
+    Functional Spec:
+	http://www.marvell.com/embedded-processors/armada-xp/assets/ARMADA-XP-Functional-SpecDatasheet.pdf
+
+    - Hardware Specs:
+
+        - http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78230_OS.PDF
+        - http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78260_OS.PDF
+        - http://www.marvell.com/embedded-processors/armada-xp/assets/HW_MV78460_OS.PDF
+
+  Core:
+	Sheeva ARMv7 compatible Dual-core or Quad-core PJ4B-MP
+
+  Linux kernel mach directory:
+	arch/arm/mach-mvebu
+  Linux kernel plat directory:
+	none
+
+EBU Armada family ARMv8
+-----------------------
+
+  Armada 3710/3720 Flavors:
+	- 88F3710
+	- 88F3720
+
+  Core:
+	ARM Cortex A53 (ARMv8)
+
+  Homepage:
+	http://www.marvell.com/embedded-processors/armada-3700/
+
+  Product Brief:
+	http://www.marvell.com/embedded-processors/assets/PB-88F3700-FNL.pdf
+
+  Device tree files:
+	arch/arm64/boot/dts/marvell/armada-37*
+
+  Armada 7K Flavors:
+	  - 88F7020 (AP806 Dual + one CP110)
+	  - 88F7040 (AP806 Quad + one CP110)
+
+  Core: ARM Cortex A72
+
+  Homepage:
+	http://www.marvell.com/embedded-processors/armada-70xx/
+
+  Product Brief:
+	  - http://www.marvell.com/embedded-processors/assets/Armada7020PB-Jan2016.pdf
+	  - http://www.marvell.com/embedded-processors/assets/Armada7040PB-Jan2016.pdf
+
+  Device tree files:
+	arch/arm64/boot/dts/marvell/armada-70*
+
+  Armada 8K Flavors:
+	- 88F8020 (AP806 Dual + two CP110)
+	- 88F8040 (AP806 Quad + two CP110)
+  Core:
+	ARM Cortex A72
+
+  Homepage:
+	http://www.marvell.com/embedded-processors/armada-80xx/
+
+  Product Brief:
+	  - http://www.marvell.com/embedded-processors/assets/Armada8020PB-Jan2016.pdf
+	  - http://www.marvell.com/embedded-processors/assets/Armada8040PB-Jan2016.pdf
+
+  Device tree files:
+	arch/arm64/boot/dts/marvell/armada-80*
+
+Avanta family
+-------------
+
+  Flavors:
+       - 88F6510
+       - 88F6530P
+       - 88F6550
+       - 88F6560
+
+  Homepage:
+	http://www.marvell.com/broadband/
+
+  Product Brief:
+	http://www.marvell.com/broadband/assets/Marvell_Avanta_88F6510_305_060-001_product_brief.pdf
+
+  No public datasheet available.
+
+  Core:
+	ARMv5 compatible
+
+  Linux kernel mach directory:
+	no code in mainline yet, planned for the future
+  Linux kernel plat directory:
+	no code in mainline yet, planned for the future
+
+Storage family
+--------------
+
+  Armada SP:
+	- 88RC1580
+
+  Product infos:
+	http://www.marvell.com/storage/armada-sp/
+
+  Core:
+	Sheeva ARMv7 comatible Quad-core PJ4C
+
+  (not supported in upstream Linux kernel)
+
+Dove family (application processor)
+-----------------------------------
+
+  Flavors:
+        - 88AP510 a.k.a Armada 510
+
+   Product Brief:
+	http://www.marvell.com/application-processors/armada-500/assets/Marvell_Armada510_SoC.pdf
+
+   Hardware Spec:
+	http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Hardware-Spec.pdf
+
+  Functional Spec:
+	http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Functional-Spec.pdf
+
+  Homepage:
+	http://www.marvell.com/application-processors/armada-500/
+
+  Core:
+	ARMv7 compatible
+
+  Directory:
+	- arch/arm/mach-mvebu (DT enabled platforms)
+        - arch/arm/mach-dove (non-DT enabled platforms)
+
+PXA 2xx/3xx/93x/95x family
+--------------------------
+
+  Flavors:
+        - PXA21x, PXA25x, PXA26x
+             - Application processor only
+             - Core: ARMv5 XScale1 core
+        - PXA270, PXA271, PXA272
+             - Product Brief         : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_pb.pdf
+             - Design guide          : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_design_guide.pdf
+             - Developers manual     : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_dev_man.pdf
+             - Specification         : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_emts.pdf
+             - Specification update  : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_spec_update.pdf
+             - Application processor only
+             - Core: ARMv5 XScale2 core
+        - PXA300, PXA310, PXA320
+             - PXA 300 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA300_PB_R4.pdf
+             - PXA 310 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA310_PB_R4.pdf
+             - PXA 320 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA320_PB_R4.pdf
+             - Design guide          : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Design_Guide.pdf
+             - Developers manual     : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Developers_Manual.zip
+             - Specifications        : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_EMTS.pdf
+             - Specification Update  : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Spec_Update.zip
+             - Reference Manual      : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_TavorP_BootROM_Ref_Manual.pdf
+             - Application processor only
+             - Core: ARMv5 XScale3 core
+        - PXA930, PXA935
+             - Application processor with Communication processor
+             - Core: ARMv5 XScale3 core
+        - PXA955
+             - Application processor with Communication processor
+             - Core: ARMv7 compatible Sheeva PJ4 core
+
+   Comments:
+
+    * This line of SoCs originates from the XScale family developed by
+      Intel and acquired by Marvell in ~2006. The PXA21x, PXA25x,
+      PXA26x, PXA27x, PXA3xx and PXA93x were developed by Intel, while
+      the later PXA95x were developed by Marvell.
+
+    * Due to their XScale origin, these SoCs have virtually nothing in
+      common with the other (Kirkwood, Dove, etc.) families of Marvell
+      SoCs, except with the MMP/MMP2 family of SoCs.
+
+   Linux kernel mach directory:
+	arch/arm/mach-pxa
+   Linux kernel plat directory:
+	arch/arm/plat-pxa
+
+MMP/MMP2/MMP3 family (communication processor)
+----------------------------------------------
+
+   Flavors:
+        - PXA168, a.k.a Armada 168
+             - Homepage             : http://www.marvell.com/application-processors/armada-100/armada-168.jsp
+             - Product brief        : http://www.marvell.com/application-processors/armada-100/assets/pxa_168_pb.pdf
+             - Hardware manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_datasheet.pdf
+             - Software manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_software_manual.pdf
+             - Specification update : http://www.marvell.com/application-processors/armada-100/assets/ARMADA16x_Spec_update.pdf
+             - Boot ROM manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_ref_manual.pdf
+             - App node package     : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_app_note_package.pdf
+             - Application processor only
+             - Core: ARMv5 compatible Marvell PJ1 88sv331 (Mohawk)
+        - PXA910/PXA920
+             - Homepage             : http://www.marvell.com/communication-processors/pxa910/
+             - Product Brief        : http://www.marvell.com/communication-processors/pxa910/assets/Marvell_PXA910_Platform-001_PB_final.pdf
+             - Application processor with Communication processor
+             - Core: ARMv5 compatible Marvell PJ1 88sv331 (Mohawk)
+        - PXA688, a.k.a. MMP2, a.k.a Armada 610
+             - Product Brief        : http://www.marvell.com/application-processors/armada-600/assets/armada610_pb.pdf
+             - Application processor only
+             - Core: ARMv7 compatible Sheeva PJ4 88sv581x core
+	- PXA2128, a.k.a. MMP3 (OLPC XO4, Linux support not upstream)
+	     - Product Brief	  : http://www.marvell.com/application-processors/armada/pxa2128/assets/Marvell-ARMADA-PXA2128-SoC-PB.pdf
+	     - Application processor only
+	     - Core: Dual-core ARMv7 compatible Sheeva PJ4C core
+	- PXA960/PXA968/PXA978 (Linux support not upstream)
+	     - Application processor with Communication Processor
+	     - Core: ARMv7 compatible Sheeva PJ4 core
+	- PXA986/PXA988 (Linux support not upstream)
+	     - Application processor with Communication Processor
+	     - Core: Dual-core ARMv7 compatible Sheeva PJ4B-MP core
+	- PXA1088/PXA1920 (Linux support not upstream)
+	     - Application processor with Communication Processor
+	     - Core: quad-core ARMv7 Cortex-A7
+	- PXA1908/PXA1928/PXA1936
+	     - Application processor with Communication Processor
+	     - Core: multi-core ARMv8 Cortex-A53
+
+   Comments:
+
+    * This line of SoCs originates from the XScale family developed by
+      Intel and acquired by Marvell in ~2006. All the processors of
+      this MMP/MMP2 family were developed by Marvell.
+
+    * Due to their XScale origin, these SoCs have virtually nothing in
+      common with the other (Kirkwood, Dove, etc.) families of Marvell
+      SoCs, except with the PXA family of SoCs listed above.
+
+   Linux kernel mach directory:
+	arch/arm/mach-mmp
+   Linux kernel plat directory:
+	arch/arm/plat-pxa
+
+Berlin family (Multimedia Solutions)
+-------------------------------------
+
+  - Flavors:
+	- 88DE3010, Armada 1000 (no Linux support)
+		- Core:		Marvell PJ1 (ARMv5TE), Dual-core
+		- Product Brief:	http://www.marvell.com.cn/digital-entertainment/assets/armada_1000_pb.pdf
+	- 88DE3005, Armada 1500 Mini
+		- Design name:	BG2CD
+		- Core:		ARM Cortex-A9, PL310 L2CC
+	- 88DE3006, Armada 1500 Mini Plus
+		- Design name:	BG2CDP
+		- Core:		Dual Core ARM Cortex-A7
+	- 88DE3100, Armada 1500
+		- Design name:	BG2
+		- Core:		Marvell PJ4B-MP (ARMv7), Tauros3 L2CC
+	- 88DE3114, Armada 1500 Pro
+		- Design name:	BG2Q
+		- Core:		Quad Core ARM Cortex-A9, PL310 L2CC
+	- 88DE3214, Armada 1500 Pro 4K
+		- Design name:	BG3
+		- Core:		ARM Cortex-A15, CA15 integrated L2CC
+	- 88DE3218, ARMADA 1500 Ultra
+		- Core:		ARM Cortex-A53
+
+  Homepage: https://www.synaptics.com/products/multimedia-solutions
+  Directory: arch/arm/mach-berlin
+
+  Comments:
+
+   * This line of SoCs is based on Marvell Sheeva or ARM Cortex CPUs
+     with Synopsys DesignWare (IRQ, GPIO, Timers, ...) and PXA IP (SDHCI, USB, ETH, ...).
+
+   * The Berlin family was acquired by Synaptics from Marvell in 2017.
+
+CPU Cores
+---------
+
+The XScale cores were designed by Intel, and shipped by Marvell in the older
+PXA processors. Feroceon is a Marvell designed core that developed in-house,
+and that evolved into Sheeva. The XScale and Feroceon cores were phased out
+over time and replaced with Sheeva cores in later products, which subsequently
+got replaced with licensed ARM Cortex-A cores.
+
+  XScale 1
+	CPUID 0x69052xxx
+	ARMv5, iWMMXt
+  XScale 2
+	CPUID 0x69054xxx
+	ARMv5, iWMMXt
+  XScale 3
+	CPUID 0x69056xxx or 0x69056xxx
+	ARMv5, iWMMXt
+  Feroceon-1850 88fr331 "Mohawk"
+	CPUID 0x5615331x or 0x41xx926x
+	ARMv5TE, single issue
+  Feroceon-2850 88fr531-vd "Jolteon"
+	CPUID 0x5605531x or 0x41xx926x
+	ARMv5TE, VFP, dual-issue
+  Feroceon 88fr571-vd "Jolteon"
+	CPUID 0x5615571x
+	ARMv5TE, VFP, dual-issue
+  Feroceon 88fr131 "Mohawk-D"
+	CPUID 0x5625131x
+	ARMv5TE, single-issue in-order
+  Sheeva PJ1 88sv331 "Mohawk"
+	CPUID 0x561584xx
+	ARMv5, single-issue iWMMXt v2
+  Sheeva PJ4 88sv581x "Flareon"
+	CPUID 0x560f581x
+	ARMv7, idivt, optional iWMMXt v2
+  Sheeva PJ4B 88sv581x
+	CPUID 0x561f581x
+	ARMv7, idivt, optional iWMMXt v2
+  Sheeva PJ4B-MP / PJ4C
+	CPUID 0x562f584x
+	ARMv7, idivt/idiva, LPAE, optional iWMMXt v2 and/or NEON
+
+Long-term plans
+---------------
+
+ * Unify the mach-dove/, mach-mv78xx0/, mach-orion5x/ into the
+   mach-mvebu/ to support all SoCs from the Marvell EBU (Engineering
+   Business Unit) in a single mach-<foo> directory. The plat-orion/
+   would therefore disappear.
+
+ * Unify the mach-mmp/ and mach-pxa/ into the same mach-pxa
+   directory. The plat-pxa/ would therefore disappear.
+
+Credits
+-------
+
+- Maen Suleiman <maen@marvell.com>
+- Lior Amsalem <alior@marvell.com>
+- Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+- Andrew Lunn <andrew@lunn.ch>
+- Nicolas Pitre <nico@fluxnic.net>
+- Eric Miao <eric.y.miao@gmail.com>
diff --git a/Documentation/arm/mem_alignment b/Documentation/arm/mem_alignment
deleted file mode 100644
index e110e2781039..000000000000
--- a/Documentation/arm/mem_alignment
+++ /dev/null
@@ -1,58 +0,0 @@
-Too many problems popped up because of unnoticed misaligned memory access in
-kernel code lately.  Therefore the alignment fixup is now unconditionally
-configured in for SA11x0 based targets.  According to Alan Cox, this is a
-bad idea to configure it out, but Russell King has some good reasons for
-doing so on some f***ed up ARM architectures like the EBSA110.  However
-this is not the case on many design I'm aware of, like all SA11x0 based
-ones.
-
-Of course this is a bad idea to rely on the alignment trap to perform
-unaligned memory access in general.  If those access are predictable, you
-are better to use the macros provided by include/asm/unaligned.h.  The
-alignment trap can fixup misaligned access for the exception cases, but at
-a high performance cost.  It better be rare.
-
-Now for user space applications, it is possible to configure the alignment
-trap to SIGBUS any code performing unaligned access (good for debugging bad
-code), or even fixup the access by software like for kernel code.  The later
-mode isn't recommended for performance reasons (just think about the
-floating point emulation that works about the same way).  Fix your code
-instead!
-
-Please note that randomly changing the behaviour without good thought is
-real bad - it changes the behaviour of all unaligned instructions in user
-space, and might cause programs to fail unexpectedly.
-
-To change the alignment trap behavior, simply echo a number into
-/proc/cpu/alignment.  The number is made up from various bits:
-
-bit		behavior when set
----		-----------------
-
-0		A user process performing an unaligned memory access
-		will cause the kernel to print a message indicating
-		process name, pid, pc, instruction, address, and the
-		fault code.
-
-1		The kernel will attempt to fix up the user process
-		performing the unaligned access.  This is of course
-		slow (think about the floating point emulator) and
-		not recommended for production use.
-
-2		The kernel will send a SIGBUS signal to the user process
-		performing the unaligned access.
-
-Note that not all combinations are supported - only values 0 through 5.
-(6 and 7 don't make sense).
-
-For example, the following will turn on the warnings, but without
-fixing up or sending SIGBUS signals:
-
-	echo 1 > /proc/cpu/alignment
-
-You can also read the content of the same file to get statistical
-information on unaligned access occurrences plus the current mode of
-operation for user space code.
-
-
-Nicolas Pitre, Mar 13, 2001.  Modified Russell King, Nov 30, 2001.
diff --git a/Documentation/arm/mem_alignment.rst b/Documentation/arm/mem_alignment.rst
new file mode 100644
index 000000000000..aa22893b62bc
--- /dev/null
+++ b/Documentation/arm/mem_alignment.rst
@@ -0,0 +1,63 @@
+================
+Memory alignment
+================
+
+Too many problems popped up because of unnoticed misaligned memory access in
+kernel code lately.  Therefore the alignment fixup is now unconditionally
+configured in for SA11x0 based targets.  According to Alan Cox, this is a
+bad idea to configure it out, but Russell King has some good reasons for
+doing so on some f***ed up ARM architectures like the EBSA110.  However
+this is not the case on many design I'm aware of, like all SA11x0 based
+ones.
+
+Of course this is a bad idea to rely on the alignment trap to perform
+unaligned memory access in general.  If those access are predictable, you
+are better to use the macros provided by include/asm/unaligned.h.  The
+alignment trap can fixup misaligned access for the exception cases, but at
+a high performance cost.  It better be rare.
+
+Now for user space applications, it is possible to configure the alignment
+trap to SIGBUS any code performing unaligned access (good for debugging bad
+code), or even fixup the access by software like for kernel code.  The later
+mode isn't recommended for performance reasons (just think about the
+floating point emulation that works about the same way).  Fix your code
+instead!
+
+Please note that randomly changing the behaviour without good thought is
+real bad - it changes the behaviour of all unaligned instructions in user
+space, and might cause programs to fail unexpectedly.
+
+To change the alignment trap behavior, simply echo a number into
+/proc/cpu/alignment.  The number is made up from various bits:
+
+===		========================================================
+bit		behavior when set
+===		========================================================
+0		A user process performing an unaligned memory access
+		will cause the kernel to print a message indicating
+		process name, pid, pc, instruction, address, and the
+		fault code.
+
+1		The kernel will attempt to fix up the user process
+		performing the unaligned access.  This is of course
+		slow (think about the floating point emulator) and
+		not recommended for production use.
+
+2		The kernel will send a SIGBUS signal to the user process
+		performing the unaligned access.
+===		========================================================
+
+Note that not all combinations are supported - only values 0 through 5.
+(6 and 7 don't make sense).
+
+For example, the following will turn on the warnings, but without
+fixing up or sending SIGBUS signals::
+
+	echo 1 > /proc/cpu/alignment
+
+You can also read the content of the same file to get statistical
+information on unaligned access occurrences plus the current mode of
+operation for user space code.
+
+
+Nicolas Pitre, Mar 13, 2001.  Modified Russell King, Nov 30, 2001.
diff --git a/Documentation/arm/memory.rst b/Documentation/arm/memory.rst
new file mode 100644
index 000000000000..0521b4ce5c96
--- /dev/null
+++ b/Documentation/arm/memory.rst
@@ -0,0 +1,93 @@
+=================================
+Kernel Memory Layout on ARM Linux
+=================================
+
+		Russell King <rmk@arm.linux.org.uk>
+
+		     November 17, 2005 (2.6.15)
+
+This document describes the virtual memory layout which the Linux
+kernel uses for ARM processors.  It indicates which regions are
+free for platforms to use, and which are used by generic code.
+
+The ARM CPU is capable of addressing a maximum of 4GB virtual memory
+space, and this must be shared between user space processes, the
+kernel, and hardware devices.
+
+As the ARM architecture matures, it becomes necessary to reserve
+certain regions of VM space for use for new facilities; therefore
+this document may reserve more VM space over time.
+
+=============== =============== ===============================================
+Start		End		Use
+=============== =============== ===============================================
+ffff8000	ffffffff	copy_user_page / clear_user_page use.
+				For SA11xx and Xscale, this is used to
+				setup a minicache mapping.
+
+ffff4000	ffffffff	cache aliasing on ARMv6 and later CPUs.
+
+ffff1000	ffff7fff	Reserved.
+				Platforms must not use this address range.
+
+ffff0000	ffff0fff	CPU vector page.
+				The CPU vectors are mapped here if the
+				CPU supports vector relocation (control
+				register V bit.)
+
+fffe0000	fffeffff	XScale cache flush area.  This is used
+				in proc-xscale.S to flush the whole data
+				cache. (XScale does not have TCM.)
+
+fffe8000	fffeffff	DTCM mapping area for platforms with
+				DTCM mounted inside the CPU.
+
+fffe0000	fffe7fff	ITCM mapping area for platforms with
+				ITCM mounted inside the CPU.
+
+ffc00000	ffefffff	Fixmap mapping region.  Addresses provided
+				by fix_to_virt() will be located here.
+
+fee00000	feffffff	Mapping of PCI I/O space. This is a static
+				mapping within the vmalloc space.
+
+VMALLOC_START	VMALLOC_END-1	vmalloc() / ioremap() space.
+				Memory returned by vmalloc/ioremap will
+				be dynamically placed in this region.
+				Machine specific static mappings are also
+				located here through iotable_init().
+				VMALLOC_START is based upon the value
+				of the high_memory variable, and VMALLOC_END
+				is equal to 0xff800000.
+
+PAGE_OFFSET	high_memory-1	Kernel direct-mapped RAM region.
+				This maps the platforms RAM, and typically
+				maps all platform RAM in a 1:1 relationship.
+
+PKMAP_BASE	PAGE_OFFSET-1	Permanent kernel mappings
+				One way of mapping HIGHMEM pages into kernel
+				space.
+
+MODULES_VADDR	MODULES_END-1	Kernel module space
+				Kernel modules inserted via insmod are
+				placed here using dynamic mappings.
+
+00001000	TASK_SIZE-1	User space mappings
+				Per-thread mappings are placed here via
+				the mmap() system call.
+
+00000000	00000fff	CPU vector page / null pointer trap
+				CPUs which do not support vector remapping
+				place their vector page here.  NULL pointer
+				dereferences by both the kernel and user
+				space are also caught via this mapping.
+=============== =============== ===============================================
+
+Please note that mappings which collide with the above areas may result
+in a non-bootable kernel, or may cause the kernel to (eventually) panic
+at run time.
+
+Since future CPUs may impact the kernel mapping layout, user programs
+must not access any memory which is not mapped inside their 0x0001000
+to TASK_SIZE address range.  If they wish to access these areas, they
+must set up their own mappings using open() and mmap().
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
deleted file mode 100644
index 546a39048eb0..000000000000
--- a/Documentation/arm/memory.txt
+++ /dev/null
@@ -1,88 +0,0 @@
-		Kernel Memory Layout on ARM Linux
-
-		Russell King <rmk@arm.linux.org.uk>
-		     November 17, 2005 (2.6.15)
-
-This document describes the virtual memory layout which the Linux
-kernel uses for ARM processors.  It indicates which regions are
-free for platforms to use, and which are used by generic code.
-
-The ARM CPU is capable of addressing a maximum of 4GB virtual memory
-space, and this must be shared between user space processes, the
-kernel, and hardware devices.
-
-As the ARM architecture matures, it becomes necessary to reserve
-certain regions of VM space for use for new facilities; therefore
-this document may reserve more VM space over time.
-
-Start		End		Use
---------------------------------------------------------------------------
-ffff8000	ffffffff	copy_user_page / clear_user_page use.
-				For SA11xx and Xscale, this is used to
-				setup a minicache mapping.
-
-ffff4000	ffffffff	cache aliasing on ARMv6 and later CPUs.
-
-ffff1000	ffff7fff	Reserved.
-				Platforms must not use this address range.
-
-ffff0000	ffff0fff	CPU vector page.
-				The CPU vectors are mapped here if the
-				CPU supports vector relocation (control
-				register V bit.)
-
-fffe0000	fffeffff	XScale cache flush area.  This is used
-				in proc-xscale.S to flush the whole data
-				cache. (XScale does not have TCM.)
-
-fffe8000	fffeffff	DTCM mapping area for platforms with
-				DTCM mounted inside the CPU.
-
-fffe0000	fffe7fff	ITCM mapping area for platforms with
-				ITCM mounted inside the CPU.
-
-ffc00000	ffefffff	Fixmap mapping region.  Addresses provided
-				by fix_to_virt() will be located here.
-
-fee00000	feffffff	Mapping of PCI I/O space. This is a static
-				mapping within the vmalloc space.
-
-VMALLOC_START	VMALLOC_END-1	vmalloc() / ioremap() space.
-				Memory returned by vmalloc/ioremap will
-				be dynamically placed in this region.
-				Machine specific static mappings are also
-				located here through iotable_init().
-				VMALLOC_START is based upon the value
-				of the high_memory variable, and VMALLOC_END
-				is equal to 0xff800000.
-
-PAGE_OFFSET	high_memory-1	Kernel direct-mapped RAM region.
-				This maps the platforms RAM, and typically
-				maps all platform RAM in a 1:1 relationship.
-
-PKMAP_BASE	PAGE_OFFSET-1	Permanent kernel mappings
-				One way of mapping HIGHMEM pages into kernel
-				space.
-
-MODULES_VADDR	MODULES_END-1	Kernel module space
-				Kernel modules inserted via insmod are
-				placed here using dynamic mappings.
-
-00001000	TASK_SIZE-1	User space mappings
-				Per-thread mappings are placed here via
-				the mmap() system call.
-
-00000000	00000fff	CPU vector page / null pointer trap
-				CPUs which do not support vector remapping
-				place their vector page here.  NULL pointer
-				dereferences by both the kernel and user
-				space are also caught via this mapping.
-
-Please note that mappings which collide with the above areas may result
-in a non-bootable kernel, or may cause the kernel to (eventually) panic
-at run time.
-
-Since future CPUs may impact the kernel mapping layout, user programs
-must not access any memory which is not mapped inside their 0x0001000
-to TASK_SIZE address range.  If they wish to access these areas, they
-must set up their own mappings using open() and mmap().
diff --git a/Documentation/arm/microchip.rst b/Documentation/arm/microchip.rst
new file mode 100644
index 000000000000..c9a44c98e868
--- /dev/null
+++ b/Documentation/arm/microchip.rst
@@ -0,0 +1,204 @@
+=============================
+ARM Microchip SoCs (aka AT91)
+=============================
+
+
+Introduction
+------------
+This document gives useful information about the ARM Microchip SoCs that are
+currently supported in Linux Mainline (you know, the one on kernel.org).
+
+It is important to note that the Microchip (previously Atmel) ARM-based MPU
+product line is historically named "AT91" or "at91" throughout the Linux kernel
+development process even if this product prefix has completely disappeared from
+the official Microchip product name. Anyway, files, directories, git trees,
+git branches/tags and email subject always contain this "at91" sub-string.
+
+
+AT91 SoCs
+---------
+Documentation and detailed datasheet for each product are available on
+the Microchip website: http://www.microchip.com.
+
+  Flavors:
+    * ARM 920 based SoC
+      - at91rm9200
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-1768-32-bit-ARM920T-Embedded-Microprocessor-AT91RM9200_Datasheet.pdf
+
+    * ARM 926 based SoCs
+      - at91sam9260
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6221-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9260_Datasheet.pdf
+
+      - at91sam9xe
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6254-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9XE_Datasheet.pdf
+
+      - at91sam9261
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6062-ARM926EJ-S-Microprocessor-SAM9261_Datasheet.pdf
+
+      - at91sam9263
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6249-32-bit-ARM926EJ-S-Embedded-Microprocessor-SAM9263_Datasheet.pdf
+
+      - at91sam9rl
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/doc6289.pdf
+
+      - at91sam9g20
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001516A.pdf
+
+      - at91sam9g45 family
+        - at91sam9g45
+        - at91sam9g46
+        - at91sam9m10
+        - at91sam9m11 (device superset)
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-6437-32-bit-ARM926-Embedded-Microprocessor-SAM9M11_Datasheet.pdf
+
+      - at91sam9x5 family (aka "The 5 series")
+        - at91sam9g15
+        - at91sam9g25
+        - at91sam9g35
+        - at91sam9x25
+        - at91sam9x35
+
+          * Datasheet (can be considered as covering the whole family)
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-11055-32-bit-ARM926EJ-S-Microcontroller-SAM9X35_Datasheet.pdf
+
+      - at91sam9n12
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001517A.pdf
+
+    * ARM Cortex-A5 based SoCs
+      - sama5d3 family
+
+        - sama5d31
+        - sama5d33
+        - sama5d34
+        - sama5d35
+        - sama5d36 (device superset)
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-11121-32-bit-Cortex-A5-Microcontroller-SAMA5D3_Datasheet.pdf
+
+    * ARM Cortex-A5 + NEON based SoCs
+      - sama5d4 family
+
+        - sama5d41
+        - sama5d42
+        - sama5d43
+        - sama5d44 (device superset)
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/60001525A.pdf
+
+      - sama5d2 family
+
+        - sama5d21
+        - sama5d22
+        - sama5d23
+        - sama5d24
+        - sama5d26
+        - sama5d27 (device superset)
+        - sama5d28 (device superset + environmental monitors)
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/DS60001476B.pdf
+
+    * ARM Cortex-M7 MCUs
+      - sams70 family
+
+        - sams70j19
+        - sams70j20
+        - sams70j21
+        - sams70n19
+        - sams70n20
+        - sams70n21
+        - sams70q19
+        - sams70q20
+        - sams70q21
+
+      - samv70 family
+
+        - samv70j19
+        - samv70j20
+        - samv70n19
+        - samv70n20
+        - samv70q19
+        - samv70q20
+
+      - samv71 family
+
+        - samv71j19
+        - samv71j20
+        - samv71j21
+        - samv71n19
+        - samv71n20
+        - samv71n21
+        - samv71q19
+        - samv71q20
+        - samv71q21
+
+          * Datasheet
+
+          http://ww1.microchip.com/downloads/en/DeviceDoc/60001527A.pdf
+
+
+Linux kernel information
+------------------------
+Linux kernel mach directory: arch/arm/mach-at91
+MAINTAINERS entry is: "ARM/Microchip (AT91) SoC support"
+
+
+Device Tree for AT91 SoCs and boards
+------------------------------------
+All AT91 SoCs are converted to Device Tree. Since Linux 3.19, these products
+must use this method to boot the Linux kernel.
+
+Work In Progress statement:
+Device Tree files and Device Tree bindings that apply to AT91 SoCs and boards are
+considered as "Unstable". To be completely clear, any at91 binding can change at
+any time. So, be sure to use a Device Tree Binary and a Kernel Image generated from
+the same source tree.
+Please refer to the Documentation/devicetree/bindings/ABI.txt file for a
+definition of a "Stable" binding/ABI.
+This statement will be removed by AT91 MAINTAINERS when appropriate.
+
+Naming conventions and best practice:
+
+- SoCs Device Tree Source Include files are named after the official name of
+  the product (at91sam9g20.dtsi or sama5d33.dtsi for instance).
+- Device Tree Source Include files (.dtsi) are used to collect common nodes that can be
+  shared across SoCs or boards (sama5d3.dtsi or at91sam9x5cm.dtsi for instance).
+  When collecting nodes for a particular peripheral or topic, the identifier have to
+  be placed at the end of the file name, separated with a "_" (at91sam9x5_can.dtsi
+  or sama5d3_gmac.dtsi for example).
+- board Device Tree Source files (.dts) are prefixed by the string "at91-" so
+  that they can be identified easily. Note that some files are historical exceptions
+  to this rule (sama5d3[13456]ek.dts, usb_a9g20.dts or animeo_ip.dts for example).
diff --git a/Documentation/arm/netwinder.rst b/Documentation/arm/netwinder.rst
new file mode 100644
index 000000000000..8eab66caa2ac
--- /dev/null
+++ b/Documentation/arm/netwinder.rst
@@ -0,0 +1,85 @@
+================================
+NetWinder specific documentation
+================================
+
+The NetWinder is a small low-power computer, primarily designed
+to run Linux.  It is based around the StrongARM RISC processor,
+DC21285 PCI bridge, with PC-type hardware glued around it.
+
+Port usage
+==========
+
+=======  ====== ===============================
+Min      Max	Description
+=======  ====== ===============================
+0x0000   0x000f	DMA1
+0x0020   0x0021	PIC1
+0x0060   0x006f	Keyboard
+0x0070   0x007f	RTC
+0x0080   0x0087	DMA1
+0x0088   0x008f	DMA2
+0x00a0   0x00a3	PIC2
+0x00c0   0x00df	DMA2
+0x0180   0x0187	IRDA
+0x01f0   0x01f6	ide0
+0x0201		Game port
+0x0203		RWA010 configuration read
+0x0220   ?	SoundBlaster
+0x0250   ?	WaveArtist
+0x0279		RWA010 configuration index
+0x02f8   0x02ff	Serial ttyS1
+0x0300   0x031f	Ether10
+0x0338		GPIO1
+0x033a		GPIO2
+0x0370   0x0371	W83977F configuration registers
+0x0388   ?	AdLib
+0x03c0   0x03df	VGA
+0x03f6		ide0
+0x03f8   0x03ff	Serial ttyS0
+0x0400   0x0408	DC21143
+0x0480   0x0487	DMA1
+0x0488   0x048f	DMA2
+0x0a79		RWA010 configuration write
+0xe800   0xe80f	ide0/ide1 BM DMA
+=======  ====== ===============================
+
+
+Interrupt usage
+===============
+
+======= ======= ========================
+IRQ	type	Description
+======= ======= ========================
+ 0	ISA	100Hz timer
+ 1	ISA	Keyboard
+ 2	ISA	cascade
+ 3	ISA	Serial ttyS1
+ 4	ISA	Serial ttyS0
+ 5	ISA	PS/2 mouse
+ 6	ISA	IRDA
+ 7	ISA	Printer
+ 8	ISA	RTC alarm
+ 9	ISA
+10	ISA	GP10 (Orange reset button)
+11	ISA
+12	ISA	WaveArtist
+13	ISA
+14	ISA	hda1
+15	ISA
+======= ======= ========================
+
+DMA usage
+=========
+
+======= ======= ===========
+DMA	type	Description
+======= ======= ===========
+ 0	ISA	IRDA
+ 1	ISA
+ 2	ISA	cascade
+ 3	ISA	WaveArtist
+ 4	ISA
+ 5	ISA
+ 6	ISA
+ 7	ISA	WaveArtist
+======= ======= ===========
diff --git a/Documentation/arm/nwfpe/NOTES b/Documentation/arm/nwfpe/NOTES
deleted file mode 100644
index 40577b5a49d3..000000000000
--- a/Documentation/arm/nwfpe/NOTES
+++ /dev/null
@@ -1,29 +0,0 @@
-There seems to be a problem with exp(double) and our emulator.  I haven't
-been able to track it down yet.  This does not occur with the emulator
-supplied by Russell King.
-
-I also found one oddity in the emulator.  I don't think it is serious but
-will point it out.  The ARM calling conventions require floating point
-registers f4-f7 to be preserved over a function call.  The compiler quite
-often uses an stfe instruction to save f4 on the stack upon entry to a
-function, and an ldfe instruction to restore it before returning.
-
-I was looking at some code, that calculated a double result, stored it in f4
-then made a function call. Upon return from the function call the number in
-f4 had been converted to an extended value in the emulator.
-
-This is a side effect of the stfe instruction.  The double in f4 had to be
-converted to extended, then stored.  If an lfm/sfm combination had been used,
-then no conversion would occur.  This has performance considerations.  The
-result from the function call and f4 were used in a multiplication.  If the
-emulator sees a multiply of a double and extended, it promotes the double to
-extended, then does the multiply in extended precision.
-
-This code will cause this problem:
-
-double x, y, z;
-z = log(x)/log(y);
-
-The result of log(x) (a double) will be calculated, returned in f0, then
-moved to f4 to preserve it over the log(y) call.  The division will be done
-in extended precision, due to the stfe instruction used to save f4 in log(y).
diff --git a/Documentation/arm/nwfpe/README b/Documentation/arm/nwfpe/README
deleted file mode 100644
index 771871de0c8b..000000000000
--- a/Documentation/arm/nwfpe/README
+++ /dev/null
@@ -1,70 +0,0 @@
-This directory contains the version 0.92 test release of the NetWinder 
-Floating Point Emulator.
-
-The majority of the code was written by me, Scott Bambrough It is
-written in C, with a small number of routines in inline assembler
-where required.  It was written quickly, with a goal of implementing a
-working version of all the floating point instructions the compiler
-emits as the first target.  I have attempted to be as optimal as
-possible, but there remains much room for improvement.
-
-I have attempted to make the emulator as portable as possible.  One of
-the problems is with leading underscores on kernel symbols.  Elf
-kernels have no leading underscores, a.out compiled kernels do.  I
-have attempted to use the C_SYMBOL_NAME macro wherever this may be
-important.
-
-Another choice I made was in the file structure.  I have attempted to
-contain all operating system specific code in one module (fpmodule.*).
-All the other files contain emulator specific code.  This should allow
-others to port the emulator to NetBSD for instance relatively easily.
-
-The floating point operations are based on SoftFloat Release 2, by
-John Hauser.  SoftFloat is a software implementation of floating-point
-that conforms to the IEC/IEEE Standard for Binary Floating-point
-Arithmetic.  As many as four formats are supported: single precision,
-double precision, extended double precision, and quadruple precision.
-All operations required by the standard are implemented, except for
-conversions to and from decimal.  We use only the single precision,
-double precision and extended double precision formats.  The port of
-SoftFloat to the ARM was done by Phil Blundell, based on an earlier
-port of SoftFloat version 1 by Neil Carson for NetBSD/arm32.
-
-The file README.FPE contains a description of what has been implemented
-so far in the emulator.  The file TODO contains a information on what 
-remains to be done, and other ideas for the emulator.
-
-Bug reports, comments, suggestions should be directed to me at
-<scottb@netwinder.org>.  General reports of "this program doesn't
-work correctly when your emulator is installed" are useful for
-determining that bugs still exist; but are virtually useless when
-attempting to isolate the problem.  Please report them, but don't
-expect quick action.  Bugs still exist.  The problem remains in isolating
-which instruction contains the bug.  Small programs illustrating a specific
-problem are a godsend.
-
-Legal Notices
--------------
-
-The NetWinder Floating Point Emulator is free software.  Everything Rebel.com
-has written is provided under the GNU GPL.  See the file COPYING for copying
-conditions.  Excluded from the above is the SoftFloat code.  John Hauser's 
-legal notice for SoftFloat is included below.
-
--------------------------------------------------------------------------------
-SoftFloat Legal Notice
-
-SoftFloat was written by John R. Hauser.  This work was made possible in
-part by the International Computer Science Institute, located at Suite 600,
-1947 Center Street, Berkeley, California 94704.  Funding was partially
-provided by the National Science Foundation under grant MIP-9311980.  The
-original version of this code was written as part of a project to build
-a fixed-point vector processor in collaboration with the University of
-California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
-
-THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
-has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
-TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
-PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
-AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
--------------------------------------------------------------------------------
diff --git a/Documentation/arm/nwfpe/README.FPE b/Documentation/arm/nwfpe/README.FPE
deleted file mode 100644
index 26f5d7bb9a41..000000000000
--- a/Documentation/arm/nwfpe/README.FPE
+++ /dev/null
@@ -1,156 +0,0 @@
-The following describes the current state of the NetWinder's floating point
-emulator.
-
-In the following nomenclature is used to describe the floating point
-instructions.  It follows the conventions in the ARM manual.
-
-<S|D|E> = <single|double|extended>, no default
-{P|M|Z} = {round to +infinity,round to -infinity,round to zero},
-          default = round to nearest
-
-Note: items enclosed in {} are optional.
-
-Floating Point Coprocessor Data Transfer Instructions (CPDT)
-------------------------------------------------------------
-
-LDF/STF - load and store floating
-
-<LDF|STF>{cond}<S|D|E> Fd, Rn
-<LDF|STF>{cond}<S|D|E> Fd, [Rn, #<expression>]{!}
-<LDF|STF>{cond}<S|D|E> Fd, [Rn], #<expression>
-
-These instructions are fully implemented.
-
-LFM/SFM - load and store multiple floating
-
-Form 1 syntax:
-<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn]
-<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn, #<expression>]{!}
-<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn], #<expression>
-
-Form 2 syntax:
-<LFM|SFM>{cond}<FD,EA> Fd, <count>, [Rn]{!}
-
-These instructions are fully implemented.  They store/load three words
-for each floating point register into the memory location given in the 
-instruction.  The format in memory is unlikely to be compatible with
-other implementations, in particular the actual hardware.  Specific
-mention of this is made in the ARM manuals.  
-
-Floating Point Coprocessor Register Transfer Instructions (CPRT)
-----------------------------------------------------------------
-
-Conversions, read/write status/control register instructions
-
-FLT{cond}<S,D,E>{P,M,Z} Fn, Rd          Convert integer to floating point
-FIX{cond}{P,M,Z} Rd, Fn                 Convert floating point to integer
-WFS{cond} Rd                            Write floating point status register
-RFS{cond} Rd                            Read floating point status register
-WFC{cond} Rd                            Write floating point control register
-RFC{cond} Rd                            Read floating point control register
-
-FLT/FIX are fully implemented.
-
-RFS/WFS are fully implemented.
-
-RFC/WFC are fully implemented.  RFC/WFC are supervisor only instructions, and
-presently check the CPU mode, and do an invalid instruction trap if not called
-from supervisor mode.
-
-Compare instructions
-
-CMF{cond} Fn, Fm        Compare floating
-CMFE{cond} Fn, Fm       Compare floating with exception
-CNF{cond} Fn, Fm        Compare negated floating
-CNFE{cond} Fn, Fm       Compare negated floating with exception
-
-These are fully implemented.
-
-Floating Point Coprocessor Data Instructions (CPDT)
----------------------------------------------------
-
-Dyadic operations:
-
-ADF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - add
-SUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - subtract
-RSF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse subtract
-MUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - multiply
-DVF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - divide
-RDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse divide
-
-These are fully implemented.
-
-FML{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast multiply
-FDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast divide
-FRD{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast reverse divide
-
-These are fully implemented as well.  They use the same algorithm as the
-non-fast versions.  Hence, in this implementation their performance is
-equivalent to the MUF/DVF/RDV instructions.  This is acceptable according
-to the ARM manual.  The manual notes these are defined only for single
-operands, on the actual FPA11 hardware they do not work for double or
-extended precision operands.  The emulator currently does not check
-the requested permissions conditions, and performs the requested operation.
-
-RMF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - IEEE remainder
-
-This is fully implemented.
-
-Monadic operations:
-
-MVF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move
-MNF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move negated
-
-These are fully implemented.
-
-ABS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - absolute value
-SQT{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - square root
-RND{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - round
-
-These are fully implemented.
-
-URD{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - unnormalized round
-NRM{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - normalize
-
-These are implemented.  URD is implemented using the same code as the RND
-instruction.  Since URD cannot return a unnormalized number, NRM becomes
-a NOP.
-
-Library calls:
-
-POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power
-RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power
-POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2)
-
-LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10
-LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e 
-EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent
-SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine
-COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine
-TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent
-ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine
-ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine
-ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent
-
-These are not implemented.  They are not currently issued by the compiler,
-and are handled by routines in libc.  These are not implemented by the FPA11
-hardware, but are handled by the floating point support code.  They should 
-be implemented in future versions.
-
-Signalling:
-
-Signals are implemented.  However current ELF kernels produced by Rebel.com
-have a bug in them that prevents the module from generating a SIGFPE.  This
-is caused by a failure to alias fp_current to the kernel variable
-current_set[0] correctly.
-
-The kernel provided with this distribution (vmlinux-nwfpe-0.93) contains
-a fix for this problem and also incorporates the current version of the
-emulator directly.  It is possible to run with no floating point module
-loaded with this kernel.  It is provided as a demonstration of the 
-technology and for those who want to do floating point work that depends
-on signals.  It is not strictly necessary to use the module.
-
-A module (either the one provided by Russell King, or the one in this 
-distribution) can be loaded to replace the functionality of the emulator
-built into the kernel.
diff --git a/Documentation/arm/nwfpe/TODO b/Documentation/arm/nwfpe/TODO
deleted file mode 100644
index 8027061b60eb..000000000000
--- a/Documentation/arm/nwfpe/TODO
+++ /dev/null
@@ -1,67 +0,0 @@
-TODO LIST
----------
-
-POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power
-RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power
-POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2)
-
-LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10
-LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e 
-EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent
-SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine
-COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine
-TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent
-ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine
-ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine
-ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent
-
-These are not implemented.  They are not currently issued by the compiler,
-and are handled by routines in libc.  These are not implemented by the FPA11
-hardware, but are handled by the floating point support code.  They should 
-be implemented in future versions.
-
-There are a couple of ways to approach the implementation of these.  One
-method would be to use accurate table methods for these routines.  I have 
-a couple of papers by S. Gal from IBM's research labs in Haifa, Israel that
-seem to promise extreme accuracy (in the order of 99.8%) and reasonable speed.
-These methods are used in GLIBC for some of the transcendental functions.
-
-Another approach, which I know little about is CORDIC.  This stands for
-Coordinate Rotation Digital Computer, and is a method of computing 
-transcendental functions using mostly shifts and adds and a few
-multiplications and divisions.  The ARM excels at shifts and adds,
-so such a method could be promising, but requires more research to 
-determine if it is feasible.
-
-Rounding Methods
-
-The IEEE standard defines 4 rounding modes.  Round to nearest is the
-default, but rounding to + or - infinity or round to zero are also allowed.
-Many architectures allow the rounding mode to be specified by modifying bits
-in a control register.  Not so with the ARM FPA11 architecture.  To change
-the rounding mode one must specify it with each instruction.
-
-This has made porting some benchmarks difficult.  It is possible to
-introduce such a capability into the emulator.  The FPCR contains 
-bits describing the rounding mode.  The emulator could be altered to 
-examine a flag, which if set forced it to ignore the rounding mode in
-the instruction, and use the mode specified in the bits in the FPCR.
-
-This would require a method of getting/setting the flag, and the bits
-in the FPCR.  This requires a kernel call in ArmLinux, as WFC/RFC are
-supervisor only instructions.  If anyone has any ideas or comments I
-would like to hear them.
-
-[NOTE: pulled out from some docs on ARM floating point, specifically
- for the Acorn FPE, but not limited to it:
-
- The floating point control register (FPCR) may only be present in some
- implementations: it is there to control the hardware in an implementation-
- specific manner, for example to disable the floating point system.  The user
- mode of the ARM is not permitted to use this register (since the right is
- reserved to alter it between implementations) and the WFC and RFC
- instructions will trap if tried in user mode.
-
- Hence, the answer is yes, you could do this, but then you will run a high
- risk of becoming isolated if and when hardware FP emulation comes out
-		-- Russell].
diff --git a/Documentation/arm/nwfpe/index.rst b/Documentation/arm/nwfpe/index.rst
new file mode 100644
index 000000000000..21fa8ce192ae
--- /dev/null
+++ b/Documentation/arm/nwfpe/index.rst
@@ -0,0 +1,11 @@
+===================================
+NetWinder's floating point emulator
+===================================
+
+.. toctree::
+   :maxdepth: 1
+
+   nwfpe
+   netwinder-fpe
+   notes
+   todo
diff --git a/Documentation/arm/nwfpe/netwinder-fpe.rst b/Documentation/arm/nwfpe/netwinder-fpe.rst
new file mode 100644
index 000000000000..cbb320960fc4
--- /dev/null
+++ b/Documentation/arm/nwfpe/netwinder-fpe.rst
@@ -0,0 +1,162 @@
+=============
+Current State
+=============
+
+The following describes the current state of the NetWinder's floating point
+emulator.
+
+In the following nomenclature is used to describe the floating point
+instructions.  It follows the conventions in the ARM manual.
+
+::
+
+  <S|D|E> = <single|double|extended>, no default
+  {P|M|Z} = {round to +infinity,round to -infinity,round to zero},
+            default = round to nearest
+
+Note: items enclosed in {} are optional.
+
+Floating Point Coprocessor Data Transfer Instructions (CPDT)
+------------------------------------------------------------
+
+LDF/STF - load and store floating
+
+<LDF|STF>{cond}<S|D|E> Fd, Rn
+<LDF|STF>{cond}<S|D|E> Fd, [Rn, #<expression>]{!}
+<LDF|STF>{cond}<S|D|E> Fd, [Rn], #<expression>
+
+These instructions are fully implemented.
+
+LFM/SFM - load and store multiple floating
+
+Form 1 syntax:
+<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn]
+<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn, #<expression>]{!}
+<LFM|SFM>{cond}<S|D|E> Fd, <count>, [Rn], #<expression>
+
+Form 2 syntax:
+<LFM|SFM>{cond}<FD,EA> Fd, <count>, [Rn]{!}
+
+These instructions are fully implemented.  They store/load three words
+for each floating point register into the memory location given in the
+instruction.  The format in memory is unlikely to be compatible with
+other implementations, in particular the actual hardware.  Specific
+mention of this is made in the ARM manuals.
+
+Floating Point Coprocessor Register Transfer Instructions (CPRT)
+----------------------------------------------------------------
+
+Conversions, read/write status/control register instructions
+
+FLT{cond}<S,D,E>{P,M,Z} Fn, Rd          Convert integer to floating point
+FIX{cond}{P,M,Z} Rd, Fn                 Convert floating point to integer
+WFS{cond} Rd                            Write floating point status register
+RFS{cond} Rd                            Read floating point status register
+WFC{cond} Rd                            Write floating point control register
+RFC{cond} Rd                            Read floating point control register
+
+FLT/FIX are fully implemented.
+
+RFS/WFS are fully implemented.
+
+RFC/WFC are fully implemented.  RFC/WFC are supervisor only instructions, and
+presently check the CPU mode, and do an invalid instruction trap if not called
+from supervisor mode.
+
+Compare instructions
+
+CMF{cond} Fn, Fm        Compare floating
+CMFE{cond} Fn, Fm       Compare floating with exception
+CNF{cond} Fn, Fm        Compare negated floating
+CNFE{cond} Fn, Fm       Compare negated floating with exception
+
+These are fully implemented.
+
+Floating Point Coprocessor Data Instructions (CPDT)
+---------------------------------------------------
+
+Dyadic operations:
+
+ADF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - add
+SUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - subtract
+RSF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse subtract
+MUF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - multiply
+DVF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - divide
+RDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse divide
+
+These are fully implemented.
+
+FML{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast multiply
+FDV{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast divide
+FRD{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - fast reverse divide
+
+These are fully implemented as well.  They use the same algorithm as the
+non-fast versions.  Hence, in this implementation their performance is
+equivalent to the MUF/DVF/RDV instructions.  This is acceptable according
+to the ARM manual.  The manual notes these are defined only for single
+operands, on the actual FPA11 hardware they do not work for double or
+extended precision operands.  The emulator currently does not check
+the requested permissions conditions, and performs the requested operation.
+
+RMF{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - IEEE remainder
+
+This is fully implemented.
+
+Monadic operations:
+
+MVF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move
+MNF{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - move negated
+
+These are fully implemented.
+
+ABS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - absolute value
+SQT{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - square root
+RND{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - round
+
+These are fully implemented.
+
+URD{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - unnormalized round
+NRM{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - normalize
+
+These are implemented.  URD is implemented using the same code as the RND
+instruction.  Since URD cannot return a unnormalized number, NRM becomes
+a NOP.
+
+Library calls:
+
+POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power
+RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power
+POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2)
+
+LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10
+LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e
+EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent
+SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine
+COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine
+TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent
+ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine
+ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine
+ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent
+
+These are not implemented.  They are not currently issued by the compiler,
+and are handled by routines in libc.  These are not implemented by the FPA11
+hardware, but are handled by the floating point support code.  They should
+be implemented in future versions.
+
+Signalling:
+
+Signals are implemented.  However current ELF kernels produced by Rebel.com
+have a bug in them that prevents the module from generating a SIGFPE.  This
+is caused by a failure to alias fp_current to the kernel variable
+current_set[0] correctly.
+
+The kernel provided with this distribution (vmlinux-nwfpe-0.93) contains
+a fix for this problem and also incorporates the current version of the
+emulator directly.  It is possible to run with no floating point module
+loaded with this kernel.  It is provided as a demonstration of the
+technology and for those who want to do floating point work that depends
+on signals.  It is not strictly necessary to use the module.
+
+A module (either the one provided by Russell King, or the one in this
+distribution) can be loaded to replace the functionality of the emulator
+built into the kernel.
diff --git a/Documentation/arm/nwfpe/notes.rst b/Documentation/arm/nwfpe/notes.rst
new file mode 100644
index 000000000000..102e55af8439
--- /dev/null
+++ b/Documentation/arm/nwfpe/notes.rst
@@ -0,0 +1,32 @@
+Notes
+=====
+
+There seems to be a problem with exp(double) and our emulator.  I haven't
+been able to track it down yet.  This does not occur with the emulator
+supplied by Russell King.
+
+I also found one oddity in the emulator.  I don't think it is serious but
+will point it out.  The ARM calling conventions require floating point
+registers f4-f7 to be preserved over a function call.  The compiler quite
+often uses an stfe instruction to save f4 on the stack upon entry to a
+function, and an ldfe instruction to restore it before returning.
+
+I was looking at some code, that calculated a double result, stored it in f4
+then made a function call. Upon return from the function call the number in
+f4 had been converted to an extended value in the emulator.
+
+This is a side effect of the stfe instruction.  The double in f4 had to be
+converted to extended, then stored.  If an lfm/sfm combination had been used,
+then no conversion would occur.  This has performance considerations.  The
+result from the function call and f4 were used in a multiplication.  If the
+emulator sees a multiply of a double and extended, it promotes the double to
+extended, then does the multiply in extended precision.
+
+This code will cause this problem:
+
+double x, y, z;
+z = log(x)/log(y);
+
+The result of log(x) (a double) will be calculated, returned in f0, then
+moved to f4 to preserve it over the log(y) call.  The division will be done
+in extended precision, due to the stfe instruction used to save f4 in log(y).
diff --git a/Documentation/arm/nwfpe/nwfpe.rst b/Documentation/arm/nwfpe/nwfpe.rst
new file mode 100644
index 000000000000..35cd90dacbff
--- /dev/null
+++ b/Documentation/arm/nwfpe/nwfpe.rst
@@ -0,0 +1,74 @@
+Introduction
+============
+
+This directory contains the version 0.92 test release of the NetWinder
+Floating Point Emulator.
+
+The majority of the code was written by me, Scott Bambrough It is
+written in C, with a small number of routines in inline assembler
+where required.  It was written quickly, with a goal of implementing a
+working version of all the floating point instructions the compiler
+emits as the first target.  I have attempted to be as optimal as
+possible, but there remains much room for improvement.
+
+I have attempted to make the emulator as portable as possible.  One of
+the problems is with leading underscores on kernel symbols.  Elf
+kernels have no leading underscores, a.out compiled kernels do.  I
+have attempted to use the C_SYMBOL_NAME macro wherever this may be
+important.
+
+Another choice I made was in the file structure.  I have attempted to
+contain all operating system specific code in one module (fpmodule.*).
+All the other files contain emulator specific code.  This should allow
+others to port the emulator to NetBSD for instance relatively easily.
+
+The floating point operations are based on SoftFloat Release 2, by
+John Hauser.  SoftFloat is a software implementation of floating-point
+that conforms to the IEC/IEEE Standard for Binary Floating-point
+Arithmetic.  As many as four formats are supported: single precision,
+double precision, extended double precision, and quadruple precision.
+All operations required by the standard are implemented, except for
+conversions to and from decimal.  We use only the single precision,
+double precision and extended double precision formats.  The port of
+SoftFloat to the ARM was done by Phil Blundell, based on an earlier
+port of SoftFloat version 1 by Neil Carson for NetBSD/arm32.
+
+The file README.FPE contains a description of what has been implemented
+so far in the emulator.  The file TODO contains a information on what
+remains to be done, and other ideas for the emulator.
+
+Bug reports, comments, suggestions should be directed to me at
+<scottb@netwinder.org>.  General reports of "this program doesn't
+work correctly when your emulator is installed" are useful for
+determining that bugs still exist; but are virtually useless when
+attempting to isolate the problem.  Please report them, but don't
+expect quick action.  Bugs still exist.  The problem remains in isolating
+which instruction contains the bug.  Small programs illustrating a specific
+problem are a godsend.
+
+Legal Notices
+-------------
+
+The NetWinder Floating Point Emulator is free software.  Everything Rebel.com
+has written is provided under the GNU GPL.  See the file COPYING for copying
+conditions.  Excluded from the above is the SoftFloat code.  John Hauser's
+legal notice for SoftFloat is included below.
+
+-------------------------------------------------------------------------------
+
+SoftFloat Legal Notice
+
+SoftFloat was written by John R. Hauser.  This work was made possible in
+part by the International Computer Science Institute, located at Suite 600,
+1947 Center Street, Berkeley, California 94704.  Funding was partially
+provided by the National Science Foundation under grant MIP-9311980.  The
+original version of this code was written as part of a project to build
+a fixed-point vector processor in collaboration with the University of
+California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
+
+THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
+has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
+TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
+PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
+AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
+-------------------------------------------------------------------------------
diff --git a/Documentation/arm/nwfpe/todo.rst b/Documentation/arm/nwfpe/todo.rst
new file mode 100644
index 000000000000..393f11b14540
--- /dev/null
+++ b/Documentation/arm/nwfpe/todo.rst
@@ -0,0 +1,72 @@
+TODO LIST
+=========
+
+::
+
+  POW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - power
+  RPW{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - reverse power
+  POL{cond}<S|D|E>{P,M,Z} Fd, Fn, <Fm,#value> - polar angle (arctan2)
+
+  LOG{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base 10
+  LGN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - logarithm to base e
+  EXP{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - exponent
+  SIN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - sine
+  COS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - cosine
+  TAN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - tangent
+  ASN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arcsine
+  ACS{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arccosine
+  ATN{cond}<S|D|E>{P,M,Z} Fd, <Fm,#value> - arctangent
+
+These are not implemented.  They are not currently issued by the compiler,
+and are handled by routines in libc.  These are not implemented by the FPA11
+hardware, but are handled by the floating point support code.  They should
+be implemented in future versions.
+
+There are a couple of ways to approach the implementation of these.  One
+method would be to use accurate table methods for these routines.  I have
+a couple of papers by S. Gal from IBM's research labs in Haifa, Israel that
+seem to promise extreme accuracy (in the order of 99.8%) and reasonable speed.
+These methods are used in GLIBC for some of the transcendental functions.
+
+Another approach, which I know little about is CORDIC.  This stands for
+Coordinate Rotation Digital Computer, and is a method of computing
+transcendental functions using mostly shifts and adds and a few
+multiplications and divisions.  The ARM excels at shifts and adds,
+so such a method could be promising, but requires more research to
+determine if it is feasible.
+
+Rounding Methods
+----------------
+
+The IEEE standard defines 4 rounding modes.  Round to nearest is the
+default, but rounding to + or - infinity or round to zero are also allowed.
+Many architectures allow the rounding mode to be specified by modifying bits
+in a control register.  Not so with the ARM FPA11 architecture.  To change
+the rounding mode one must specify it with each instruction.
+
+This has made porting some benchmarks difficult.  It is possible to
+introduce such a capability into the emulator.  The FPCR contains
+bits describing the rounding mode.  The emulator could be altered to
+examine a flag, which if set forced it to ignore the rounding mode in
+the instruction, and use the mode specified in the bits in the FPCR.
+
+This would require a method of getting/setting the flag, and the bits
+in the FPCR.  This requires a kernel call in ArmLinux, as WFC/RFC are
+supervisor only instructions.  If anyone has any ideas or comments I
+would like to hear them.
+
+NOTE:
+ pulled out from some docs on ARM floating point, specifically
+ for the Acorn FPE, but not limited to it:
+
+ The floating point control register (FPCR) may only be present in some
+ implementations: it is there to control the hardware in an implementation-
+ specific manner, for example to disable the floating point system.  The user
+ mode of the ARM is not permitted to use this register (since the right is
+ reserved to alter it between implementations) and the WFC and RFC
+ instructions will trap if tried in user mode.
+
+ Hence, the answer is yes, you could do this, but then you will run a high
+ risk of becoming isolated if and when hardware FP emulation comes out
+
+		-- Russell.
diff --git a/Documentation/arm/omap/dss.rst b/Documentation/arm/omap/dss.rst
new file mode 100644
index 000000000000..a40c4d9c717a
--- /dev/null
+++ b/Documentation/arm/omap/dss.rst
@@ -0,0 +1,372 @@
+=========================
+OMAP2/3 Display Subsystem
+=========================
+
+This is an almost total rewrite of the OMAP FB driver in drivers/video/omap
+(let's call it DSS1). The main differences between DSS1 and DSS2 are DSI,
+TV-out and multiple display support, but there are lots of small improvements
+also.
+
+The DSS2 driver (omapdss module) is in arch/arm/plat-omap/dss/, and the FB,
+panel and controller drivers are in drivers/video/omap2/. DSS1 and DSS2 live
+currently side by side, you can choose which one to use.
+
+Features
+--------
+
+Working and tested features include:
+
+- MIPI DPI (parallel) output
+- MIPI DSI output in command mode
+- MIPI DBI (RFBI) output
+- SDI output
+- TV output
+- All pieces can be compiled as a module or inside kernel
+- Use DISPC to update any of the outputs
+- Use CPU to update RFBI or DSI output
+- OMAP DISPC planes
+- RGB16, RGB24 packed, RGB24 unpacked
+- YUV2, UYVY
+- Scaling
+- Adjusting DSS FCK to find a good pixel clock
+- Use DSI DPLL to create DSS FCK
+
+Tested boards include:
+- OMAP3 SDP board
+- Beagle board
+- N810
+
+omapdss driver
+--------------
+
+The DSS driver does not itself have any support for Linux framebuffer, V4L or
+such like the current ones, but it has an internal kernel API that upper level
+drivers can use.
+
+The DSS driver models OMAP's overlays, overlay managers and displays in a
+flexible way to enable non-common multi-display configuration. In addition to
+modelling the hardware overlays, omapdss supports virtual overlays and overlay
+managers. These can be used when updating a display with CPU or system DMA.
+
+omapdss driver support for audio
+--------------------------------
+There exist several display technologies and standards that support audio as
+well. Hence, it is relevant to update the DSS device driver to provide an audio
+interface that may be used by an audio driver or any other driver interested in
+the functionality.
+
+The audio_enable function is intended to prepare the relevant
+IP for playback (e.g., enabling an audio FIFO, taking in/out of reset
+some IP, enabling companion chips, etc). It is intended to be called before
+audio_start. The audio_disable function performs the reverse operation and is
+intended to be called after audio_stop.
+
+While a given DSS device driver may support audio, it is possible that for
+certain configurations audio is not supported (e.g., an HDMI display using a
+VESA video timing). The audio_supported function is intended to query whether
+the current configuration of the display supports audio.
+
+The audio_config function is intended to configure all the relevant audio
+parameters of the display. In order to make the function independent of any
+specific DSS device driver, a struct omap_dss_audio is defined. Its purpose
+is to contain all the required parameters for audio configuration. At the
+moment, such structure contains pointers to IEC-60958 channel status word
+and CEA-861 audio infoframe structures. This should be enough to support
+HDMI and DisplayPort, as both are based on CEA-861 and IEC-60958.
+
+The audio_enable/disable, audio_config and audio_supported functions could be
+implemented as functions that may sleep. Hence, they should not be called
+while holding a spinlock or a readlock.
+
+The audio_start/audio_stop function is intended to effectively start/stop audio
+playback after the configuration has taken place. These functions are designed
+to be used in an atomic context. Hence, audio_start should return quickly and be
+called only after all the needed resources for audio playback (audio FIFOs,
+DMA channels, companion chips, etc) have been enabled to begin data transfers.
+audio_stop is designed to only stop the audio transfers. The resources used
+for playback are released using audio_disable.
+
+The enum omap_dss_audio_state may be used to help the implementations of
+the interface to keep track of the audio state. The initial state is _DISABLED;
+then, the state transitions to _CONFIGURED, and then, when it is ready to
+play audio, to _ENABLED. The state _PLAYING is used when the audio is being
+rendered.
+
+
+Panel and controller drivers
+----------------------------
+
+The drivers implement panel or controller specific functionality and are not
+usually visible to users except through omapfb driver.  They register
+themselves to the DSS driver.
+
+omapfb driver
+-------------
+
+The omapfb driver implements arbitrary number of standard linux framebuffers.
+These framebuffers can be routed flexibly to any overlays, thus allowing very
+dynamic display architecture.
+
+The driver exports some omapfb specific ioctls, which are compatible with the
+ioctls in the old driver.
+
+The rest of the non standard features are exported via sysfs. Whether the final
+implementation will use sysfs, or ioctls, is still open.
+
+V4L2 drivers
+------------
+
+V4L2 is being implemented in TI.
+
+From omapdss point of view the V4L2 drivers should be similar to framebuffer
+driver.
+
+Architecture
+--------------------
+
+Some clarification what the different components do:
+
+    - Framebuffer is a memory area inside OMAP's SRAM/SDRAM that contains the
+      pixel data for the image. Framebuffer has width and height and color
+      depth.
+    - Overlay defines where the pixels are read from and where they go on the
+      screen. The overlay may be smaller than framebuffer, thus displaying only
+      part of the framebuffer. The position of the overlay may be changed if
+      the overlay is smaller than the display.
+    - Overlay manager combines the overlays in to one image and feeds them to
+      display.
+    - Display is the actual physical display device.
+
+A framebuffer can be connected to multiple overlays to show the same pixel data
+on all of the overlays. Note that in this case the overlay input sizes must be
+the same, but, in case of video overlays, the output size can be different. Any
+framebuffer can be connected to any overlay.
+
+An overlay can be connected to one overlay manager. Also DISPC overlays can be
+connected only to DISPC overlay managers, and virtual overlays can be only
+connected to virtual overlays.
+
+An overlay manager can be connected to one display. There are certain
+restrictions which kinds of displays an overlay manager can be connected:
+
+    - DISPC TV overlay manager can be only connected to TV display.
+    - Virtual overlay managers can only be connected to DBI or DSI displays.
+    - DISPC LCD overlay manager can be connected to all displays, except TV
+      display.
+
+Sysfs
+-----
+The sysfs interface is mainly used for testing. I don't think sysfs
+interface is the best for this in the final version, but I don't quite know
+what would be the best interfaces for these things.
+
+The sysfs interface is divided to two parts: DSS and FB.
+
+/sys/class/graphics/fb? directory:
+mirror		0=off, 1=on
+rotate		Rotation 0-3 for 0, 90, 180, 270 degrees
+rotate_type	0 = DMA rotation, 1 = VRFB rotation
+overlays	List of overlay numbers to which framebuffer pixels go
+phys_addr	Physical address of the framebuffer
+virt_addr	Virtual address of the framebuffer
+size		Size of the framebuffer
+
+/sys/devices/platform/omapdss/overlay? directory:
+enabled		0=off, 1=on
+input_size	width,height (ie. the framebuffer size)
+manager		Destination overlay manager name
+name
+output_size	width,height
+position	x,y
+screen_width	width
+global_alpha   	global alpha 0-255 0=transparent 255=opaque
+
+/sys/devices/platform/omapdss/manager? directory:
+display				Destination display
+name
+alpha_blending_enabled		0=off, 1=on
+trans_key_enabled		0=off, 1=on
+trans_key_type			gfx-destination, video-source
+trans_key_value			transparency color key (RGB24)
+default_color			default background color (RGB24)
+
+/sys/devices/platform/omapdss/display? directory:
+
+=============== =============================================================
+ctrl_name	Controller name
+mirror		0=off, 1=on
+update_mode	0=off, 1=auto, 2=manual
+enabled		0=off, 1=on
+name
+rotate		Rotation 0-3 for 0, 90, 180, 270 degrees
+timings		Display timings (pixclock,xres/hfp/hbp/hsw,yres/vfp/vbp/vsw)
+		When writing, two special timings are accepted for tv-out:
+		"pal" and "ntsc"
+panel_name
+tear_elim	Tearing elimination 0=off, 1=on
+output_type	Output type (video encoder only): "composite" or "svideo"
+=============== =============================================================
+
+There are also some debugfs files at <debugfs>/omapdss/ which show information
+about clocks and registers.
+
+Examples
+--------
+
+The following definitions have been made for the examples below::
+
+	ovl0=/sys/devices/platform/omapdss/overlay0
+	ovl1=/sys/devices/platform/omapdss/overlay1
+	ovl2=/sys/devices/platform/omapdss/overlay2
+
+	mgr0=/sys/devices/platform/omapdss/manager0
+	mgr1=/sys/devices/platform/omapdss/manager1
+
+	lcd=/sys/devices/platform/omapdss/display0
+	dvi=/sys/devices/platform/omapdss/display1
+	tv=/sys/devices/platform/omapdss/display2
+
+	fb0=/sys/class/graphics/fb0
+	fb1=/sys/class/graphics/fb1
+	fb2=/sys/class/graphics/fb2
+
+Default setup on OMAP3 SDP
+--------------------------
+
+Here's the default setup on OMAP3 SDP board. All planes go to LCD. DVI
+and TV-out are not in use. The columns from left to right are:
+framebuffers, overlays, overlay managers, displays. Framebuffers are
+handled by omapfb, and the rest by the DSS::
+
+	FB0 --- GFX  -\            DVI
+	FB1 --- VID1 --+- LCD ---- LCD
+	FB2 --- VID2 -/   TV ----- TV
+
+Example: Switch from LCD to DVI
+-------------------------------
+
+::
+
+	w=`cat $dvi/timings | cut -d "," -f 2 | cut -d "/" -f 1`
+	h=`cat $dvi/timings | cut -d "," -f 3 | cut -d "/" -f 1`
+
+	echo "0" > $lcd/enabled
+	echo "" > $mgr0/display
+	fbset -fb /dev/fb0 -xres $w -yres $h -vxres $w -vyres $h
+	# at this point you have to switch the dvi/lcd dip-switch from the omap board
+	echo "dvi" > $mgr0/display
+	echo "1" > $dvi/enabled
+
+After this the configuration looks like:::
+
+	FB0 --- GFX  -\         -- DVI
+	FB1 --- VID1 --+- LCD -/   LCD
+	FB2 --- VID2 -/   TV ----- TV
+
+Example: Clone GFX overlay to LCD and TV
+----------------------------------------
+
+::
+
+	w=`cat $tv/timings | cut -d "," -f 2 | cut -d "/" -f 1`
+	h=`cat $tv/timings | cut -d "," -f 3 | cut -d "/" -f 1`
+
+	echo "0" > $ovl0/enabled
+	echo "0" > $ovl1/enabled
+
+	echo "" > $fb1/overlays
+	echo "0,1" > $fb0/overlays
+
+	echo "$w,$h" > $ovl1/output_size
+	echo "tv" > $ovl1/manager
+
+	echo "1" > $ovl0/enabled
+	echo "1" > $ovl1/enabled
+
+	echo "1" > $tv/enabled
+
+After this the configuration looks like (only relevant parts shown)::
+
+	FB0 +-- GFX  ---- LCD ---- LCD
+	\- VID1 ---- TV  ---- TV
+
+Misc notes
+----------
+
+OMAP FB allocates the framebuffer memory using the standard dma allocator. You
+can enable Contiguous Memory Allocator (CONFIG_CMA) to improve the dma
+allocator, and if CMA is enabled, you use "cma=" kernel parameter to increase
+the global memory area for CMA.
+
+Using DSI DPLL to generate pixel clock it is possible produce the pixel clock
+of 86.5MHz (max possible), and with that you get 1280x1024@57 output from DVI.
+
+Rotation and mirroring currently only supports RGB565 and RGB8888 modes. VRFB
+does not support mirroring.
+
+VRFB rotation requires much more memory than non-rotated framebuffer, so you
+probably need to increase your vram setting before using VRFB rotation. Also,
+many applications may not work with VRFB if they do not pay attention to all
+framebuffer parameters.
+
+Kernel boot arguments
+---------------------
+
+omapfb.mode=<display>:<mode>[,...]
+	- Default video mode for specified displays. For example,
+	  "dvi:800x400MR-24@60".  See drivers/video/modedb.c.
+	  There are also two special modes: "pal" and "ntsc" that
+	  can be used to tv out.
+
+omapfb.vram=<fbnum>:<size>[@<physaddr>][,...]
+	- VRAM allocated for a framebuffer. Normally omapfb allocates vram
+	  depending on the display size. With this you can manually allocate
+	  more or define the physical address of each framebuffer. For example,
+	  "1:4M" to allocate 4M for fb1.
+
+omapfb.debug=<y|n>
+	- Enable debug printing. You have to have OMAPFB debug support enabled
+	  in kernel config.
+
+omapfb.test=<y|n>
+	- Draw test pattern to framebuffer whenever framebuffer settings change.
+	  You need to have OMAPFB debug support enabled in kernel config.
+
+omapfb.vrfb=<y|n>
+	- Use VRFB rotation for all framebuffers.
+
+omapfb.rotate=<angle>
+	- Default rotation applied to all framebuffers.
+	  0 - 0 degree rotation
+	  1 - 90 degree rotation
+	  2 - 180 degree rotation
+	  3 - 270 degree rotation
+
+omapfb.mirror=<y|n>
+	- Default mirror for all framebuffers. Only works with DMA rotation.
+
+omapdss.def_disp=<display>
+	- Name of default display, to which all overlays will be connected.
+	  Common examples are "lcd" or "tv".
+
+omapdss.debug=<y|n>
+	- Enable debug printing. You have to have DSS debug support enabled in
+	  kernel config.
+
+TODO
+----
+
+DSS locking
+
+Error checking
+
+- Lots of checks are missing or implemented just as BUG()
+
+System DMA update for DSI
+
+- Can be used for RGB16 and RGB24P modes. Probably not for RGB24U (how
+  to skip the empty byte?)
+
+OMAP1 support
+
+- Not sure if needed
diff --git a/Documentation/arm/omap/index.rst b/Documentation/arm/omap/index.rst
new file mode 100644
index 000000000000..f1e9c11d9f9b
--- /dev/null
+++ b/Documentation/arm/omap/index.rst
@@ -0,0 +1,10 @@
+=======
+TI OMAP
+=======
+
+.. toctree::
+   :maxdepth: 1
+
+   omap
+   omap_pm
+   dss
diff --git a/Documentation/arm/omap/omap.rst b/Documentation/arm/omap/omap.rst
new file mode 100644
index 000000000000..f440c0f4613f
--- /dev/null
+++ b/Documentation/arm/omap/omap.rst
@@ -0,0 +1,18 @@
+============
+OMAP history
+============
+
+This file contains documentation for running mainline
+kernel on omaps.
+
+======		======================================================
+KERNEL		NEW DEPENDENCIES
+======		======================================================
+v4.3+		Update is needed for custom .config files to make sure
+		CONFIG_REGULATOR_PBIAS is enabled for MMC1 to work
+		properly.
+
+v4.18+		Update is needed for custom .config files to make sure
+		CONFIG_MMC_SDHCI_OMAP is enabled for all MMC instances
+		to work in DRA7 and K2G based boards.
+======		======================================================
diff --git a/Documentation/arm/omap/omap_pm.rst b/Documentation/arm/omap/omap_pm.rst
new file mode 100644
index 000000000000..a335e4c8ce2c
--- /dev/null
+++ b/Documentation/arm/omap/omap_pm.rst
@@ -0,0 +1,165 @@
+=====================
+The OMAP PM interface
+=====================
+
+This document describes the temporary OMAP PM interface.  Driver
+authors use these functions to communicate minimum latency or
+throughput constraints to the kernel power management code.
+Over time, the intention is to merge features from the OMAP PM
+interface into the Linux PM QoS code.
+
+Drivers need to express PM parameters which:
+
+- support the range of power management parameters present in the TI SRF;
+
+- separate the drivers from the underlying PM parameter
+  implementation, whether it is the TI SRF or Linux PM QoS or Linux
+  latency framework or something else;
+
+- specify PM parameters in terms of fundamental units, such as
+  latency and throughput, rather than units which are specific to OMAP
+  or to particular OMAP variants;
+
+- allow drivers which are shared with other architectures (e.g.,
+  DaVinci) to add these constraints in a way which won't affect non-OMAP
+  systems,
+
+- can be implemented immediately with minimal disruption of other
+  architectures.
+
+
+This document proposes the OMAP PM interface, including the following
+five power management functions for driver code:
+
+1. Set the maximum MPU wakeup latency::
+
+   (*pdata->set_max_mpu_wakeup_lat)(struct device *dev, unsigned long t)
+
+2. Set the maximum device wakeup latency::
+
+   (*pdata->set_max_dev_wakeup_lat)(struct device *dev, unsigned long t)
+
+3. Set the maximum system DMA transfer start latency (CORE pwrdm)::
+
+   (*pdata->set_max_sdma_lat)(struct device *dev, long t)
+
+4. Set the minimum bus throughput needed by a device::
+
+   (*pdata->set_min_bus_tput)(struct device *dev, u8 agent_id, unsigned long r)
+
+5. Return the number of times the device has lost context::
+
+   (*pdata->get_dev_context_loss_count)(struct device *dev)
+
+
+Further documentation for all OMAP PM interface functions can be
+found in arch/arm/plat-omap/include/mach/omap-pm.h.
+
+
+The OMAP PM layer is intended to be temporary
+---------------------------------------------
+
+The intention is that eventually the Linux PM QoS layer should support
+the range of power management features present in OMAP3.  As this
+happens, existing drivers using the OMAP PM interface can be modified
+to use the Linux PM QoS code; and the OMAP PM interface can disappear.
+
+
+Driver usage of the OMAP PM functions
+-------------------------------------
+
+As the 'pdata' in the above examples indicates, these functions are
+exposed to drivers through function pointers in driver .platform_data
+structures.  The function pointers are initialized by the `board-*.c`
+files to point to the corresponding OMAP PM functions:
+
+- set_max_dev_wakeup_lat will point to
+  omap_pm_set_max_dev_wakeup_lat(), etc.  Other architectures which do
+  not support these functions should leave these function pointers set
+  to NULL.  Drivers should use the following idiom::
+
+        if (pdata->set_max_dev_wakeup_lat)
+            (*pdata->set_max_dev_wakeup_lat)(dev, t);
+
+The most common usage of these functions will probably be to specify
+the maximum time from when an interrupt occurs, to when the device
+becomes accessible.  To accomplish this, driver writers should use the
+set_max_mpu_wakeup_lat() function to constrain the MPU wakeup
+latency, and the set_max_dev_wakeup_lat() function to constrain the
+device wakeup latency (from clk_enable() to accessibility).  For
+example::
+
+        /* Limit MPU wakeup latency */
+        if (pdata->set_max_mpu_wakeup_lat)
+            (*pdata->set_max_mpu_wakeup_lat)(dev, tc);
+
+        /* Limit device powerdomain wakeup latency */
+        if (pdata->set_max_dev_wakeup_lat)
+            (*pdata->set_max_dev_wakeup_lat)(dev, td);
+
+        /* total wakeup latency in this example: (tc + td) */
+
+The PM parameters can be overwritten by calling the function again
+with the new value.  The settings can be removed by calling the
+function with a t argument of -1 (except in the case of
+set_max_bus_tput(), which should be called with an r argument of 0).
+
+The fifth function above, omap_pm_get_dev_context_loss_count(),
+is intended as an optimization to allow drivers to determine whether the
+device has lost its internal context.  If context has been lost, the
+driver must restore its internal context before proceeding.
+
+
+Other specialized interface functions
+-------------------------------------
+
+The five functions listed above are intended to be usable by any
+device driver.  DSPBridge and CPUFreq have a few special requirements.
+DSPBridge expresses target DSP performance levels in terms of OPP IDs.
+CPUFreq expresses target MPU performance levels in terms of MPU
+frequency.  The OMAP PM interface contains functions for these
+specialized cases to convert that input information (OPPs/MPU
+frequency) into the form that the underlying power management
+implementation needs:
+
+6. `(*pdata->dsp_get_opp_table)(void)`
+
+7. `(*pdata->dsp_set_min_opp)(u8 opp_id)`
+
+8. `(*pdata->dsp_get_opp)(void)`
+
+9. `(*pdata->cpu_get_freq_table)(void)`
+
+10. `(*pdata->cpu_set_freq)(unsigned long f)`
+
+11. `(*pdata->cpu_get_freq)(void)`
+
+Customizing OPP for platform
+============================
+Defining CONFIG_PM should enable OPP layer for the silicon
+and the registration of OPP table should take place automatically.
+However, in special cases, the default OPP table may need to be
+tweaked, for e.g.:
+
+ * enable default OPPs which are disabled by default, but which
+   could be enabled on a platform
+ * Disable an unsupported OPP on the platform
+ * Define and add a custom opp table entry
+   in these cases, the board file needs to do additional steps as follows:
+
+arch/arm/mach-omapx/board-xyz.c::
+
+	#include "pm.h"
+	....
+	static void __init omap_xyz_init_irq(void)
+	{
+		....
+		/* Initialize the default table */
+		omapx_opp_init();
+		/* Do customization to the defaults */
+		....
+	}
+
+NOTE:
+  omapx_opp_init will be omap3_opp_init or as required
+  based on the omap family.
diff --git a/Documentation/arm/porting.rst b/Documentation/arm/porting.rst
new file mode 100644
index 000000000000..bd21958bdb2d
--- /dev/null
+++ b/Documentation/arm/porting.rst
@@ -0,0 +1,137 @@
+=======
+Porting
+=======
+
+Taken from list archive at http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2001-July/004064.html
+
+Initial definitions
+-------------------
+
+The following symbol definitions rely on you knowing the translation that
+__virt_to_phys() does for your machine.  This macro converts the passed
+virtual address to a physical address.  Normally, it is simply:
+
+		phys = virt - PAGE_OFFSET + PHYS_OFFSET
+
+
+Decompressor Symbols
+--------------------
+
+ZTEXTADDR
+	Start address of decompressor.  There's no point in talking about
+	virtual or physical addresses here, since the MMU will be off at
+	the time when you call the decompressor code.  You normally call
+	the kernel at this address to start it booting.  This doesn't have
+	to be located in RAM, it can be in flash or other read-only or
+	read-write addressable medium.
+
+ZBSSADDR
+	Start address of zero-initialised work area for the decompressor.
+	This must be pointing at RAM.  The decompressor will zero initialise
+	this for you.  Again, the MMU will be off.
+
+ZRELADDR
+	This is the address where the decompressed kernel will be written,
+	and eventually executed.  The following constraint must be valid:
+
+		__virt_to_phys(TEXTADDR) == ZRELADDR
+
+	The initial part of the kernel is carefully coded to be position
+	independent.
+
+INITRD_PHYS
+	Physical address to place the initial RAM disk.  Only relevant if
+	you are using the bootpImage stuff (which only works on the old
+	struct param_struct).
+
+INITRD_VIRT
+	Virtual address of the initial RAM disk.  The following  constraint
+	must be valid:
+
+		__virt_to_phys(INITRD_VIRT) == INITRD_PHYS
+
+PARAMS_PHYS
+	Physical address of the struct param_struct or tag list, giving the
+	kernel various parameters about its execution environment.
+
+
+Kernel Symbols
+--------------
+
+PHYS_OFFSET
+	Physical start address of the first bank of RAM.
+
+PAGE_OFFSET
+	Virtual start address of the first bank of RAM.  During the kernel
+	boot phase, virtual address PAGE_OFFSET will be mapped to physical
+	address PHYS_OFFSET, along with any other mappings you supply.
+	This should be the same value as TASK_SIZE.
+
+TASK_SIZE
+	The maximum size of a user process in bytes.  Since user space
+	always starts at zero, this is the maximum address that a user
+	process can access+1.  The user space stack grows down from this
+	address.
+
+	Any virtual address below TASK_SIZE is deemed to be user process
+	area, and therefore managed dynamically on a process by process
+	basis by the kernel.  I'll call this the user segment.
+
+	Anything above TASK_SIZE is common to all processes.  I'll call
+	this the kernel segment.
+
+	(In other words, you can't put IO mappings below TASK_SIZE, and
+	hence PAGE_OFFSET).
+
+TEXTADDR
+	Virtual start address of kernel, normally PAGE_OFFSET + 0x8000.
+	This is where the kernel image ends up.  With the latest kernels,
+	it must be located at 32768 bytes into a 128MB region.  Previous
+	kernels placed a restriction of 256MB here.
+
+DATAADDR
+	Virtual address for the kernel data segment.  Must not be defined
+	when using the decompressor.
+
+VMALLOC_START / VMALLOC_END
+	Virtual addresses bounding the vmalloc() area.  There must not be
+	any static mappings in this area; vmalloc will overwrite them.
+	The addresses must also be in the kernel segment (see above).
+	Normally, the vmalloc() area starts VMALLOC_OFFSET bytes above the
+	last virtual RAM address (found using variable high_memory).
+
+VMALLOC_OFFSET
+	Offset normally incorporated into VMALLOC_START to provide a hole
+	between virtual RAM and the vmalloc area.  We do this to allow
+	out of bounds memory accesses (eg, something writing off the end
+	of the mapped memory map) to be caught.  Normally set to 8MB.
+
+Architecture Specific Macros
+----------------------------
+
+BOOT_MEM(pram,pio,vio)
+	`pram` specifies the physical start address of RAM.  Must always
+	be present, and should be the same as PHYS_OFFSET.
+
+	`pio` is the physical address of an 8MB region containing IO for
+	use with the debugging macros in arch/arm/kernel/debug-armv.S.
+
+	`vio` is the virtual address of the 8MB debugging region.
+
+	It is expected that the debugging region will be re-initialised
+	by the architecture specific code later in the code (via the
+	MAPIO function).
+
+BOOT_PARAMS
+	Same as, and see PARAMS_PHYS.
+
+FIXUP(func)
+	Machine specific fixups, run before memory subsystems have been
+	initialised.
+
+MAPIO(func)
+	Machine specific function to map IO areas (including the debug
+	region above).
+
+INITIRQ(func)
+	Machine specific function to initialise interrupts.
diff --git a/Documentation/arm/pxa/mfp.rst b/Documentation/arm/pxa/mfp.rst
new file mode 100644
index 000000000000..ac34e5d7ee44
--- /dev/null
+++ b/Documentation/arm/pxa/mfp.rst
@@ -0,0 +1,288 @@
+==============================================
+MFP Configuration for PXA2xx/PXA3xx Processors
+==============================================
+
+			Eric Miao <eric.miao@marvell.com>
+
+MFP stands for Multi-Function Pin, which is the pin-mux logic on PXA3xx and
+later PXA series processors.  This document describes the existing MFP API,
+and how board/platform driver authors could make use of it.
+
+Basic Concept
+=============
+
+Unlike the GPIO alternate function settings on PXA25x and PXA27x, a new MFP
+mechanism is introduced from PXA3xx to completely move the pin-mux functions
+out of the GPIO controller. In addition to pin-mux configurations, the MFP
+also controls the low power state, driving strength, pull-up/down and event
+detection of each pin.  Below is a diagram of internal connections between
+the MFP logic and the remaining SoC peripherals::
+
+ +--------+
+ |        |--(GPIO19)--+
+ |  GPIO  |            |
+ |        |--(GPIO...) |
+ +--------+            |
+                       |       +---------+
+ +--------+            +------>|         |
+ |  PWM2  |--(PWM_OUT)-------->|   MFP   |
+ +--------+            +------>|         |-------> to external PAD
+                       | +---->|         |
+ +--------+            | | +-->|         |
+ |  SSP2  |---(TXD)----+ | |   +---------+
+ +--------+              | |
+                         | |
+ +--------+              | |
+ | Keypad |--(MKOUT4)----+ |
+ +--------+                |
+                           |
+ +--------+                |
+ |  UART2 |---(TXD)--------+
+ +--------+
+
+NOTE: the external pad is named as MFP_PIN_GPIO19, it doesn't necessarily
+mean it's dedicated for GPIO19, only as a hint that internally this pin
+can be routed from GPIO19 of the GPIO controller.
+
+To better understand the change from PXA25x/PXA27x GPIO alternate function
+to this new MFP mechanism, here are several key points:
+
+  1. GPIO controller on PXA3xx is now a dedicated controller, same as other
+     internal controllers like PWM, SSP and UART, with 128 internal signals
+     which can be routed to external through one or more MFPs (e.g. GPIO<0>
+     can be routed through either MFP_PIN_GPIO0 as well as MFP_PIN_GPIO0_2,
+     see arch/arm/mach-pxa/mfp-pxa300.h)
+
+  2. Alternate function configuration is removed from this GPIO controller,
+     the remaining functions are pure GPIO-specific, i.e.
+
+       - GPIO signal level control
+       - GPIO direction control
+       - GPIO level change detection
+
+  3. Low power state for each pin is now controlled by MFP, this means the
+     PGSRx registers on PXA2xx are now useless on PXA3xx
+
+  4. Wakeup detection is now controlled by MFP, PWER does not control the
+     wakeup from GPIO(s) any more, depending on the sleeping state, ADxER
+     (as defined in pxa3xx-regs.h) controls the wakeup from MFP
+
+NOTE: with such a clear separation of MFP and GPIO, by GPIO<xx> we normally
+mean it is a GPIO signal, and by MFP<xxx> or pin xxx, we mean a physical
+pad (or ball).
+
+MFP API Usage
+=============
+
+For board code writers, here are some guidelines:
+
+1. include ONE of the following header files in your <board>.c:
+
+   - #include "mfp-pxa25x.h"
+   - #include "mfp-pxa27x.h"
+   - #include "mfp-pxa300.h"
+   - #include "mfp-pxa320.h"
+   - #include "mfp-pxa930.h"
+
+   NOTE: only one file in your <board>.c, depending on the processors used,
+   because pin configuration definitions may conflict in these file (i.e.
+   same name, different meaning and settings on different processors). E.g.
+   for zylonite platform, which support both PXA300/PXA310 and PXA320, two
+   separate files are introduced: zylonite_pxa300.c and zylonite_pxa320.c
+   (in addition to handle MFP configuration differences, they also handle
+   the other differences between the two combinations).
+
+   NOTE: PXA300 and PXA310 are almost identical in pin configurations (with
+   PXA310 supporting some additional ones), thus the difference is actually
+   covered in a single mfp-pxa300.h.
+
+2. prepare an array for the initial pin configurations, e.g.::
+
+     static unsigned long mainstone_pin_config[] __initdata = {
+	/* Chip Select */
+	GPIO15_nCS_1,
+
+	/* LCD - 16bpp Active TFT */
+	GPIOxx_TFT_LCD_16BPP,
+	GPIO16_PWM0_OUT,	/* Backlight */
+
+	/* MMC */
+	GPIO32_MMC_CLK,
+	GPIO112_MMC_CMD,
+	GPIO92_MMC_DAT_0,
+	GPIO109_MMC_DAT_1,
+	GPIO110_MMC_DAT_2,
+	GPIO111_MMC_DAT_3,
+
+	...
+
+	/* GPIO */
+	GPIO1_GPIO | WAKEUP_ON_EDGE_BOTH,
+     };
+
+   a) once the pin configurations are passed to pxa{2xx,3xx}_mfp_config(),
+   and written to the actual registers, they are useless and may discard,
+   adding '__initdata' will help save some additional bytes here.
+
+   b) when there is only one possible pin configurations for a component,
+   some simplified definitions can be used, e.g. GPIOxx_TFT_LCD_16BPP on
+   PXA25x and PXA27x processors
+
+   c) if by board design, a pin can be configured to wake up the system
+   from low power state, it can be 'OR'ed with any of:
+
+      WAKEUP_ON_EDGE_BOTH
+      WAKEUP_ON_EDGE_RISE
+      WAKEUP_ON_EDGE_FALL
+      WAKEUP_ON_LEVEL_HIGH - specifically for enabling of keypad GPIOs,
+
+   to indicate that this pin has the capability of wake-up the system,
+   and on which edge(s). This, however, doesn't necessarily mean the
+   pin _will_ wakeup the system, it will only when set_irq_wake() is
+   invoked with the corresponding GPIO IRQ (GPIO_IRQ(xx) or gpio_to_irq())
+   and eventually calls gpio_set_wake() for the actual register setting.
+
+   d) although PXA3xx MFP supports edge detection on each pin, the
+   internal logic will only wakeup the system when those specific bits
+   in ADxER registers are set, which can be well mapped to the
+   corresponding peripheral, thus set_irq_wake() can be called with
+   the peripheral IRQ to enable the wakeup.
+
+
+MFP on PXA3xx
+=============
+
+Every external I/O pad on PXA3xx (excluding those for special purpose) has
+one MFP logic associated, and is controlled by one MFP register (MFPR).
+
+The MFPR has the following bit definitions (for PXA300/PXA310/PXA320)::
+
+ 31                        16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
+  +-------------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+  |         RESERVED        |PS|PU|PD|  DRIVE |SS|SD|SO|EC|EF|ER|--| AF_SEL |
+  +-------------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+
+  Bit 3:   RESERVED
+  Bit 4:   EDGE_RISE_EN - enable detection of rising edge on this pin
+  Bit 5:   EDGE_FALL_EN - enable detection of falling edge on this pin
+  Bit 6:   EDGE_CLEAR   - disable edge detection on this pin
+  Bit 7:   SLEEP_OE_N   - enable outputs during low power modes
+  Bit 8:   SLEEP_DATA   - output data on the pin during low power modes
+  Bit 9:   SLEEP_SEL    - selection control for low power modes signals
+  Bit 13:  PULLDOWN_EN  - enable the internal pull-down resistor on this pin
+  Bit 14:  PULLUP_EN    - enable the internal pull-up resistor on this pin
+  Bit 15:  PULL_SEL     - pull state controlled by selected alternate function
+                          (0) or by PULL{UP,DOWN}_EN bits (1)
+
+  Bit 0 - 2: AF_SEL - alternate function selection, 8 possibilities, from 0-7
+  Bit 10-12: DRIVE  - drive strength and slew rate
+			0b000 - fast 1mA
+			0b001 - fast 2mA
+			0b002 - fast 3mA
+			0b003 - fast 4mA
+			0b004 - slow 6mA
+			0b005 - fast 6mA
+			0b006 - slow 10mA
+			0b007 - fast 10mA
+
+MFP Design for PXA2xx/PXA3xx
+============================
+
+Due to the difference of pin-mux handling between PXA2xx and PXA3xx, a unified
+MFP API is introduced to cover both series of processors.
+
+The basic idea of this design is to introduce definitions for all possible pin
+configurations, these definitions are processor and platform independent, and
+the actual API invoked to convert these definitions into register settings and
+make them effective there-after.
+
+Files Involved
+--------------
+
+  - arch/arm/mach-pxa/include/mach/mfp.h
+
+  for
+    1. Unified pin definitions - enum constants for all configurable pins
+    2. processor-neutral bit definitions for a possible MFP configuration
+
+  - arch/arm/mach-pxa/mfp-pxa3xx.h
+
+  for PXA3xx specific MFPR register bit definitions and PXA3xx common pin
+  configurations
+
+  - arch/arm/mach-pxa/mfp-pxa2xx.h
+
+  for PXA2xx specific definitions and PXA25x/PXA27x common pin configurations
+
+  - arch/arm/mach-pxa/mfp-pxa25x.h
+    arch/arm/mach-pxa/mfp-pxa27x.h
+    arch/arm/mach-pxa/mfp-pxa300.h
+    arch/arm/mach-pxa/mfp-pxa320.h
+    arch/arm/mach-pxa/mfp-pxa930.h
+
+  for processor specific definitions
+
+  - arch/arm/mach-pxa/mfp-pxa3xx.c
+  - arch/arm/mach-pxa/mfp-pxa2xx.c
+
+  for implementation of the pin configuration to take effect for the actual
+  processor.
+
+Pin Configuration
+-----------------
+
+  The following comments are copied from mfp.h (see the actual source code
+  for most updated info)::
+
+    /*
+     * a possible MFP configuration is represented by a 32-bit integer
+     *
+     * bit  0.. 9 - MFP Pin Number (1024 Pins Maximum)
+     * bit 10..12 - Alternate Function Selection
+     * bit 13..15 - Drive Strength
+     * bit 16..18 - Low Power Mode State
+     * bit 19..20 - Low Power Mode Edge Detection
+     * bit 21..22 - Run Mode Pull State
+     *
+     * to facilitate the definition, the following macros are provided
+     *
+     * MFP_CFG_DEFAULT - default MFP configuration value, with
+     * 		  alternate function = 0,
+     * 		  drive strength = fast 3mA (MFP_DS03X)
+     * 		  low power mode = default
+     * 		  edge detection = none
+     *
+     * MFP_CFG	- default MFPR value with alternate function
+     * MFP_CFG_DRV	- default MFPR value with alternate function and
+     * 		  pin drive strength
+     * MFP_CFG_LPM	- default MFPR value with alternate function and
+     * 		  low power mode
+     * MFP_CFG_X	- default MFPR value with alternate function,
+     * 		  pin drive strength and low power mode
+     */
+
+   Examples of pin configurations are::
+
+     #define GPIO94_SSP3_RXD		MFP_CFG_X(GPIO94, AF1, DS08X, FLOAT)
+
+   which reads GPIO94 can be configured as SSP3_RXD, with alternate function
+   selection of 1, driving strength of 0b101, and a float state in low power
+   modes.
+
+   NOTE: this is the default setting of this pin being configured as SSP3_RXD
+   which can be modified a bit in board code, though it is not recommended to
+   do so, simply because this default setting is usually carefully encoded,
+   and is supposed to work in most cases.
+
+Register Settings
+-----------------
+
+   Register settings on PXA3xx for a pin configuration is actually very
+   straight-forward, most bits can be converted directly into MFPR value
+   in a easier way. Two sets of MFPR values are calculated: the run-time
+   ones and the low power mode ones, to allow different settings.
+
+   The conversion from a generic pin configuration to the actual register
+   settings on PXA2xx is a bit complicated: many registers are involved,
+   including GAFRx, GPDRx, PGSRx, PWER, PKWR, PFER and PRER. Please see
+   mfp-pxa2xx.c for how the conversion is made.
diff --git a/Documentation/arm/pxa/mfp.txt b/Documentation/arm/pxa/mfp.txt
deleted file mode 100644
index 0b7cab978c02..000000000000
--- a/Documentation/arm/pxa/mfp.txt
+++ /dev/null
@@ -1,286 +0,0 @@
-                 MFP Configuration for PXA2xx/PXA3xx Processors
-
-			Eric Miao <eric.miao@marvell.com>
-
-MFP stands for Multi-Function Pin, which is the pin-mux logic on PXA3xx and
-later PXA series processors.  This document describes the existing MFP API,
-and how board/platform driver authors could make use of it.
-
- Basic Concept
-===============
-
-Unlike the GPIO alternate function settings on PXA25x and PXA27x, a new MFP
-mechanism is introduced from PXA3xx to completely move the pin-mux functions
-out of the GPIO controller. In addition to pin-mux configurations, the MFP
-also controls the low power state, driving strength, pull-up/down and event
-detection of each pin.  Below is a diagram of internal connections between
-the MFP logic and the remaining SoC peripherals:
-
- +--------+
- |        |--(GPIO19)--+
- |  GPIO  |            |
- |        |--(GPIO...) |
- +--------+            |
-                       |       +---------+
- +--------+            +------>|         |
- |  PWM2  |--(PWM_OUT)-------->|   MFP   |
- +--------+            +------>|         |-------> to external PAD
-                       | +---->|         |
- +--------+            | | +-->|         |
- |  SSP2  |---(TXD)----+ | |   +---------+
- +--------+              | |
-                         | |
- +--------+              | |
- | Keypad |--(MKOUT4)----+ |
- +--------+                |
-                           |
- +--------+                |
- |  UART2 |---(TXD)--------+
- +--------+
-
-NOTE: the external pad is named as MFP_PIN_GPIO19, it doesn't necessarily
-mean it's dedicated for GPIO19, only as a hint that internally this pin
-can be routed from GPIO19 of the GPIO controller.
-
-To better understand the change from PXA25x/PXA27x GPIO alternate function
-to this new MFP mechanism, here are several key points:
-
-  1. GPIO controller on PXA3xx is now a dedicated controller, same as other
-     internal controllers like PWM, SSP and UART, with 128 internal signals
-     which can be routed to external through one or more MFPs (e.g. GPIO<0>
-     can be routed through either MFP_PIN_GPIO0 as well as MFP_PIN_GPIO0_2,
-     see arch/arm/mach-pxa/mfp-pxa300.h)
-
-  2. Alternate function configuration is removed from this GPIO controller,
-     the remaining functions are pure GPIO-specific, i.e.
-
-       - GPIO signal level control
-       - GPIO direction control
-       - GPIO level change detection
-
-  3. Low power state for each pin is now controlled by MFP, this means the
-     PGSRx registers on PXA2xx are now useless on PXA3xx
-
-  4. Wakeup detection is now controlled by MFP, PWER does not control the
-     wakeup from GPIO(s) any more, depending on the sleeping state, ADxER
-     (as defined in pxa3xx-regs.h) controls the wakeup from MFP
-
-NOTE: with such a clear separation of MFP and GPIO, by GPIO<xx> we normally
-mean it is a GPIO signal, and by MFP<xxx> or pin xxx, we mean a physical
-pad (or ball).
-
- MFP API Usage
-===============
-
-For board code writers, here are some guidelines:
-
-1. include ONE of the following header files in your <board>.c:
-
-   - #include "mfp-pxa25x.h"
-   - #include "mfp-pxa27x.h"
-   - #include "mfp-pxa300.h"
-   - #include "mfp-pxa320.h"
-   - #include "mfp-pxa930.h"
-
-   NOTE: only one file in your <board>.c, depending on the processors used,
-   because pin configuration definitions may conflict in these file (i.e.
-   same name, different meaning and settings on different processors). E.g.
-   for zylonite platform, which support both PXA300/PXA310 and PXA320, two
-   separate files are introduced: zylonite_pxa300.c and zylonite_pxa320.c
-   (in addition to handle MFP configuration differences, they also handle
-   the other differences between the two combinations).
-
-   NOTE: PXA300 and PXA310 are almost identical in pin configurations (with
-   PXA310 supporting some additional ones), thus the difference is actually
-   covered in a single mfp-pxa300.h.
-
-2. prepare an array for the initial pin configurations, e.g.:
-
-   static unsigned long mainstone_pin_config[] __initdata = {
-	/* Chip Select */
-	GPIO15_nCS_1,
-
-	/* LCD - 16bpp Active TFT */
-	GPIOxx_TFT_LCD_16BPP,
-	GPIO16_PWM0_OUT,	/* Backlight */
-
-	/* MMC */
-	GPIO32_MMC_CLK,
-	GPIO112_MMC_CMD,
-	GPIO92_MMC_DAT_0,
-	GPIO109_MMC_DAT_1,
-	GPIO110_MMC_DAT_2,
-	GPIO111_MMC_DAT_3,
-
-	...
-
-	/* GPIO */
-	GPIO1_GPIO | WAKEUP_ON_EDGE_BOTH,
-   };
-
-   a) once the pin configurations are passed to pxa{2xx,3xx}_mfp_config(),
-   and written to the actual registers, they are useless and may discard,
-   adding '__initdata' will help save some additional bytes here.
-
-   b) when there is only one possible pin configurations for a component,
-   some simplified definitions can be used, e.g. GPIOxx_TFT_LCD_16BPP on
-   PXA25x and PXA27x processors
-
-   c) if by board design, a pin can be configured to wake up the system
-   from low power state, it can be 'OR'ed with any of:
-
-      WAKEUP_ON_EDGE_BOTH
-      WAKEUP_ON_EDGE_RISE
-      WAKEUP_ON_EDGE_FALL
-      WAKEUP_ON_LEVEL_HIGH - specifically for enabling of keypad GPIOs,
-
-   to indicate that this pin has the capability of wake-up the system,
-   and on which edge(s). This, however, doesn't necessarily mean the
-   pin _will_ wakeup the system, it will only when set_irq_wake() is
-   invoked with the corresponding GPIO IRQ (GPIO_IRQ(xx) or gpio_to_irq())
-   and eventually calls gpio_set_wake() for the actual register setting.
-
-   d) although PXA3xx MFP supports edge detection on each pin, the
-   internal logic will only wakeup the system when those specific bits
-   in ADxER registers are set, which can be well mapped to the
-   corresponding peripheral, thus set_irq_wake() can be called with 
-   the peripheral IRQ to enable the wakeup.
-
-
- MFP on PXA3xx
-===============
-
-Every external I/O pad on PXA3xx (excluding those for special purpose) has
-one MFP logic associated, and is controlled by one MFP register (MFPR).
-
-The MFPR has the following bit definitions (for PXA300/PXA310/PXA320):
-
- 31                        16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
-  +-------------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
-  |         RESERVED        |PS|PU|PD|  DRIVE |SS|SD|SO|EC|EF|ER|--| AF_SEL |
-  +-------------------------+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
-
-  Bit 3:   RESERVED
-  Bit 4:   EDGE_RISE_EN - enable detection of rising edge on this pin
-  Bit 5:   EDGE_FALL_EN - enable detection of falling edge on this pin
-  Bit 6:   EDGE_CLEAR   - disable edge detection on this pin
-  Bit 7:   SLEEP_OE_N   - enable outputs during low power modes
-  Bit 8:   SLEEP_DATA   - output data on the pin during low power modes
-  Bit 9:   SLEEP_SEL    - selection control for low power modes signals
-  Bit 13:  PULLDOWN_EN  - enable the internal pull-down resistor on this pin
-  Bit 14:  PULLUP_EN    - enable the internal pull-up resistor on this pin
-  Bit 15:  PULL_SEL     - pull state controlled by selected alternate function
-                          (0) or by PULL{UP,DOWN}_EN bits (1)
-
-  Bit 0 - 2: AF_SEL - alternate function selection, 8 possibilities, from 0-7
-  Bit 10-12: DRIVE  - drive strength and slew rate
-			0b000 - fast 1mA
-			0b001 - fast 2mA
-			0b002 - fast 3mA
-			0b003 - fast 4mA
-			0b004 - slow 6mA
-			0b005 - fast 6mA
-			0b006 - slow 10mA
-			0b007 - fast 10mA
-
- MFP Design for PXA2xx/PXA3xx
-==============================
-
-Due to the difference of pin-mux handling between PXA2xx and PXA3xx, a unified
-MFP API is introduced to cover both series of processors.
-
-The basic idea of this design is to introduce definitions for all possible pin
-configurations, these definitions are processor and platform independent, and
-the actual API invoked to convert these definitions into register settings and
-make them effective there-after.
-
-  Files Involved
-  --------------
-
-  - arch/arm/mach-pxa/include/mach/mfp.h
-  
-  for
-    1. Unified pin definitions - enum constants for all configurable pins
-    2. processor-neutral bit definitions for a possible MFP configuration
-
-  - arch/arm/mach-pxa/mfp-pxa3xx.h
-
-  for PXA3xx specific MFPR register bit definitions and PXA3xx common pin
-  configurations
-
-  - arch/arm/mach-pxa/mfp-pxa2xx.h
-
-  for PXA2xx specific definitions and PXA25x/PXA27x common pin configurations
-
-  - arch/arm/mach-pxa/mfp-pxa25x.h
-    arch/arm/mach-pxa/mfp-pxa27x.h
-    arch/arm/mach-pxa/mfp-pxa300.h
-    arch/arm/mach-pxa/mfp-pxa320.h
-    arch/arm/mach-pxa/mfp-pxa930.h
-
-  for processor specific definitions
-
-  - arch/arm/mach-pxa/mfp-pxa3xx.c
-  - arch/arm/mach-pxa/mfp-pxa2xx.c
-
-  for implementation of the pin configuration to take effect for the actual
-  processor.
-
-  Pin Configuration
-  -----------------
-
-  The following comments are copied from mfp.h (see the actual source code
-  for most updated info)
-  
-  /*
-   * a possible MFP configuration is represented by a 32-bit integer
-   *
-   * bit  0.. 9 - MFP Pin Number (1024 Pins Maximum)
-   * bit 10..12 - Alternate Function Selection
-   * bit 13..15 - Drive Strength
-   * bit 16..18 - Low Power Mode State
-   * bit 19..20 - Low Power Mode Edge Detection
-   * bit 21..22 - Run Mode Pull State
-   *
-   * to facilitate the definition, the following macros are provided
-   *
-   * MFP_CFG_DEFAULT - default MFP configuration value, with
-   * 		  alternate function = 0,
-   * 		  drive strength = fast 3mA (MFP_DS03X)
-   * 		  low power mode = default
-   * 		  edge detection = none
-   *
-   * MFP_CFG	- default MFPR value with alternate function
-   * MFP_CFG_DRV	- default MFPR value with alternate function and
-   * 		  pin drive strength
-   * MFP_CFG_LPM	- default MFPR value with alternate function and
-   * 		  low power mode
-   * MFP_CFG_X	- default MFPR value with alternate function,
-   * 		  pin drive strength and low power mode
-   */
-
-   Examples of pin configurations are:
-
-   #define GPIO94_SSP3_RXD		MFP_CFG_X(GPIO94, AF1, DS08X, FLOAT)
-
-   which reads GPIO94 can be configured as SSP3_RXD, with alternate function
-   selection of 1, driving strength of 0b101, and a float state in low power
-   modes.
-
-   NOTE: this is the default setting of this pin being configured as SSP3_RXD
-   which can be modified a bit in board code, though it is not recommended to
-   do so, simply because this default setting is usually carefully encoded,
-   and is supposed to work in most cases.
-
-  Register Settings
-  -----------------
-
-   Register settings on PXA3xx for a pin configuration is actually very
-   straight-forward, most bits can be converted directly into MFPR value
-   in a easier way. Two sets of MFPR values are calculated: the run-time
-   ones and the low power mode ones, to allow different settings.
-
-   The conversion from a generic pin configuration to the actual register
-   settings on PXA2xx is a bit complicated: many registers are involved,
-   including GAFRx, GPDRx, PGSRx, PWER, PKWR, PFER and PRER. Please see
-   mfp-pxa2xx.c for how the conversion is made.
diff --git a/Documentation/arm/sa1100/adsbitsy.rst b/Documentation/arm/sa1100/adsbitsy.rst
new file mode 100644
index 000000000000..c179cb26b682
--- /dev/null
+++ b/Documentation/arm/sa1100/adsbitsy.rst
@@ -0,0 +1,51 @@
+===============================
+ADS Bitsy Single Board Computer
+===============================
+
+(It is different from Bitsy(iPAQ) of Compaq)
+
+For more details, contact Applied Data Systems or see
+http://www.applieddata.net/products.html
+
+The Linux support for this product has been provided by
+Woojung Huh <whuh@applieddata.net>
+
+Use 'make adsbitsy_config' before any 'make config'.
+This will set up defaults for ADS Bitsy support.
+
+The kernel zImage is linked to be loaded and executed at 0xc0400000.
+
+Linux can  be used with the ADS BootLoader that ships with the
+newer rev boards. See their documentation on how to load Linux.
+
+Supported peripherals
+=====================
+
+- SA1100 LCD frame buffer (8/16bpp...sort of)
+- SA1111 USB Master
+- SA1100 serial port
+- pcmcia, compact flash
+- touchscreen(ucb1200)
+- console on LCD screen
+- serial ports (ttyS[0-2])
+  - ttyS0 is default for serial console
+
+To do
+=====
+
+- everything else!  :-)
+
+Notes
+=====
+
+- The flash on board is divided into 3 partitions.
+  You should be careful to use flash on board.
+  Its partition is different from GraphicsClient Plus and GraphicsMaster
+
+- 16bpp mode requires a different cable than what ships with the board.
+  Contact ADS or look through the manual to wire your own. Currently,
+  if you compile with 16bit mode support and switch into a lower bpp
+  mode, the timing is off so the image is corrupted.  This will be
+  fixed soon.
+
+Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/sa1100/assabet.rst b/Documentation/arm/sa1100/assabet.rst
new file mode 100644
index 000000000000..3e704831c311
--- /dev/null
+++ b/Documentation/arm/sa1100/assabet.rst
@@ -0,0 +1,301 @@
+============================================
+The Intel Assabet (SA-1110 evaluation) board
+============================================
+
+Please see:
+http://developer.intel.com
+
+Also some notes from John G Dorsey <jd5q@andrew.cmu.edu>:
+http://www.cs.cmu.edu/~wearable/software/assabet.html
+
+
+Building the kernel
+-------------------
+
+To build the kernel with current defaults::
+
+	make assabet_config
+	make oldconfig
+	make zImage
+
+The resulting kernel image should be available in linux/arch/arm/boot/zImage.
+
+
+Installing a bootloader
+-----------------------
+
+A couple of bootloaders able to boot Linux on Assabet are available:
+
+BLOB (http://www.lartmaker.nl/lartware/blob/)
+
+   BLOB is a bootloader used within the LART project.  Some contributed
+   patches were merged into BLOB to add support for Assabet.
+
+Compaq's Bootldr + John Dorsey's patch for Assabet support
+(http://www.handhelds.org/Compaq/bootldr.html)
+(http://www.wearablegroup.org/software/bootldr/)
+
+   Bootldr is the bootloader developed by Compaq for the iPAQ Pocket PC.
+   John Dorsey has produced add-on patches to add support for Assabet and
+   the JFFS filesystem.
+
+RedBoot (http://sources.redhat.com/redboot/)
+
+   RedBoot is a bootloader developed by Red Hat based on the eCos RTOS
+   hardware abstraction layer.  It supports Assabet amongst many other
+   hardware platforms.
+
+RedBoot is currently the recommended choice since it's the only one to have
+networking support, and is the most actively maintained.
+
+Brief examples on how to boot Linux with RedBoot are shown below.  But first
+you need to have RedBoot installed in your flash memory.  A known to work
+precompiled RedBoot binary is available from the following location:
+
+- ftp://ftp.netwinder.org/users/n/nico/
+- ftp://ftp.arm.linux.org.uk/pub/linux/arm/people/nico/
+- ftp://ftp.handhelds.org/pub/linux/arm/sa-1100-patches/
+
+Look for redboot-assabet*.tgz.  Some installation infos are provided in
+redboot-assabet*.txt.
+
+
+Initial RedBoot configuration
+-----------------------------
+
+The commands used here are explained in The RedBoot User's Guide available
+on-line at http://sources.redhat.com/ecos/docs.html.
+Please refer to it for explanations.
+
+If you have a CF network card (my Assabet kit contained a CF+ LP-E from
+Socket Communications Inc.), you should strongly consider using it for TFTP
+file transfers.  You must insert it before RedBoot runs since it can't detect
+it dynamically.
+
+To initialize the flash directory::
+
+	fis init -f
+
+To initialize the non-volatile settings, like whether you want to use BOOTP or
+a static IP address, etc, use this command::
+
+	fconfig -i
+
+
+Writing a kernel image into flash
+---------------------------------
+
+First, the kernel image must be loaded into RAM.  If you have the zImage file
+available on a TFTP server::
+
+	load zImage -r -b 0x100000
+
+If you rather want to use Y-Modem upload over the serial port::
+
+	load -m ymodem -r -b 0x100000
+
+To write it to flash::
+
+	fis create "Linux kernel" -b 0x100000 -l 0xc0000
+
+
+Booting the kernel
+------------------
+
+The kernel still requires a filesystem to boot.  A ramdisk image can be loaded
+as follows::
+
+	load ramdisk_image.gz -r -b 0x800000
+
+Again, Y-Modem upload can be used instead of TFTP by replacing the file name
+by '-y ymodem'.
+
+Now the kernel can be retrieved from flash like this::
+
+	fis load "Linux kernel"
+
+or loaded as described previously.  To boot the kernel::
+
+	exec -b 0x100000 -l 0xc0000
+
+The ramdisk image could be stored into flash as well, but there are better
+solutions for on-flash filesystems as mentioned below.
+
+
+Using JFFS2
+-----------
+
+Using JFFS2 (the Second Journalling Flash File System) is probably the most
+convenient way to store a writable filesystem into flash.  JFFS2 is used in
+conjunction with the MTD layer which is responsible for low-level flash
+management.  More information on the Linux MTD can be found on-line at:
+http://www.linux-mtd.infradead.org/.  A JFFS howto with some infos about
+creating JFFS/JFFS2 images is available from the same site.
+
+For instance, a sample JFFS2 image can be retrieved from the same FTP sites
+mentioned below for the precompiled RedBoot image.
+
+To load this file::
+
+	load sample_img.jffs2 -r -b 0x100000
+
+The result should look like::
+
+	RedBoot> load sample_img.jffs2 -r -b 0x100000
+	Raw file loaded 0x00100000-0x00377424
+
+Now we must know the size of the unallocated flash::
+
+	fis free
+
+Result::
+
+	RedBoot> fis free
+	  0x500E0000 .. 0x503C0000
+
+The values above may be different depending on the size of the filesystem and
+the type of flash.  See their usage below as an example and take care of
+substituting yours appropriately.
+
+We must determine some values::
+
+	size of unallocated flash:	0x503c0000 - 0x500e0000 = 0x2e0000
+	size of the filesystem image:	0x00377424 - 0x00100000 = 0x277424
+
+We want to fit the filesystem image of course, but we also want to give it all
+the remaining flash space as well.  To write it::
+
+	fis unlock -f 0x500E0000 -l 0x2e0000
+	fis erase -f 0x500E0000 -l 0x2e0000
+	fis write -b 0x100000 -l 0x277424 -f 0x500E0000
+	fis create "JFFS2" -n -f 0x500E0000 -l 0x2e0000
+
+Now the filesystem is associated to a MTD "partition" once Linux has discovered
+what they are in the boot process.  From Redboot, the 'fis list' command
+displays them::
+
+	RedBoot> fis list
+	Name              FLASH addr  Mem addr    Length      Entry point
+	RedBoot           0x50000000  0x50000000  0x00020000  0x00000000
+	RedBoot config    0x503C0000  0x503C0000  0x00020000  0x00000000
+	FIS directory     0x503E0000  0x503E0000  0x00020000  0x00000000
+	Linux kernel      0x50020000  0x00100000  0x000C0000  0x00000000
+	JFFS2             0x500E0000  0x500E0000  0x002E0000  0x00000000
+
+However Linux should display something like::
+
+	SA1100 flash: probing 32-bit flash bus
+	SA1100 flash: Found 2 x16 devices at 0x0 in 32-bit mode
+	Using RedBoot partition definition
+	Creating 5 MTD partitions on "SA1100 flash":
+	0x00000000-0x00020000 : "RedBoot"
+	0x00020000-0x000e0000 : "Linux kernel"
+	0x000e0000-0x003c0000 : "JFFS2"
+	0x003c0000-0x003e0000 : "RedBoot config"
+	0x003e0000-0x00400000 : "FIS directory"
+
+What's important here is the position of the partition we are interested in,
+which is the third one.  Within Linux, this correspond to /dev/mtdblock2.
+Therefore to boot Linux with the kernel and its root filesystem in flash, we
+need this RedBoot command::
+
+	fis load "Linux kernel"
+	exec -b 0x100000 -l 0xc0000 -c "root=/dev/mtdblock2"
+
+Of course other filesystems than JFFS might be used, like cramfs for example.
+You might want to boot with a root filesystem over NFS, etc.  It is also
+possible, and sometimes more convenient, to flash a filesystem directly from
+within Linux while booted from a ramdisk or NFS.  The Linux MTD repository has
+many tools to deal with flash memory as well, to erase it for example.  JFFS2
+can then be mounted directly on a freshly erased partition and files can be
+copied over directly.  Etc...
+
+
+RedBoot scripting
+-----------------
+
+All the commands above aren't so useful if they have to be typed in every
+time the Assabet is rebooted.  Therefore it's possible to automate the boot
+process using RedBoot's scripting capability.
+
+For example, I use this to boot Linux with both the kernel and the ramdisk
+images retrieved from a TFTP server on the network::
+
+	RedBoot> fconfig
+	Run script at boot: false true
+	Boot script:
+	Enter script, terminate with empty line
+	>> load zImage -r -b 0x100000
+	>> load ramdisk_ks.gz -r -b 0x800000
+	>> exec -b 0x100000 -l 0xc0000
+	>>
+	Boot script timeout (1000ms resolution): 3
+	Use BOOTP for network configuration: true
+	GDB connection port: 9000
+	Network debug at boot time: false
+	Update RedBoot non-volatile configuration - are you sure (y/n)? y
+
+Then, rebooting the Assabet is just a matter of waiting for the login prompt.
+
+
+
+Nicolas Pitre
+nico@fluxnic.net
+
+June 12, 2001
+
+
+Status of peripherals in -rmk tree (updated 14/10/2001)
+-------------------------------------------------------
+
+Assabet:
+ Serial ports:
+  Radio:		TX, RX, CTS, DSR, DCD, RI
+   - PM:		Not tested.
+   - COM:		TX, RX, CTS, DSR, DCD, RTS, DTR, PM
+   - PM:		Not tested.
+   - I2C:		Implemented, not fully tested.
+   - L3:		Fully tested, pass.
+   - PM:		Not tested.
+
+ Video:
+  - LCD:		Fully tested.  PM
+
+   (LCD doesn't like being blanked with neponset connected)
+
+  - Video out:		Not fully
+
+ Audio:
+  UDA1341:
+  -  Playback:		Fully tested, pass.
+  -  Record:		Implemented, not tested.
+  -  PM:			Not tested.
+
+  UCB1200:
+  -  Audio play:	Implemented, not heavily tested.
+  -  Audio rec:		Implemented, not heavily tested.
+  -  Telco audio play:	Implemented, not heavily tested.
+  -  Telco audio rec:	Implemented, not heavily tested.
+  -  POTS control:	No
+  -  Touchscreen:	Yes
+  -  PM:		Not tested.
+
+ Other:
+  - PCMCIA:
+  - LPE:		Fully tested, pass.
+  - USB:		No
+  - IRDA:
+  - SIR:		Fully tested, pass.
+  - FIR:		Fully tested, pass.
+  - PM:			Not tested.
+
+Neponset:
+ Serial ports:
+  - COM1,2:		TX, RX, CTS, DSR, DCD, RTS, DTR
+  - PM:			Not tested.
+  - USB:		Implemented, not heavily tested.
+  - PCMCIA:		Implemented, not heavily tested.
+  - CF:			Implemented, not heavily tested.
+  - PM:			Not tested.
+
+More stuff can be found in the -np (Nicolas Pitre's) tree.
diff --git a/Documentation/arm/sa1100/brutus.rst b/Documentation/arm/sa1100/brutus.rst
new file mode 100644
index 000000000000..e1a23bee6d44
--- /dev/null
+++ b/Documentation/arm/sa1100/brutus.rst
@@ -0,0 +1,69 @@
+======
+Brutus
+======
+
+Brutus is an evaluation platform for the SA1100 manufactured by Intel.
+For more details, see:
+
+http://developer.intel.com
+
+To compile for Brutus, you must issue the following commands::
+
+	make brutus_config
+	make config
+	[accept all the defaults]
+	make zImage
+
+The resulting kernel will end up in linux/arch/arm/boot/zImage.  This file
+must be loaded at 0xc0008000 in Brutus's memory and execution started at
+0xc0008000 as well with the value of registers r0 = 0 and r1 = 16 upon
+entry.
+
+But prior to execute the kernel, a ramdisk image must also be loaded in
+memory.  Use memory address 0xd8000000 for this.  Note that the file
+containing the (compressed) ramdisk image must not exceed 4 MB.
+
+Typically, you'll need angelboot to load the kernel.
+The following angelboot.opt file should be used::
+
+	base 0xc0008000
+	entry 0xc0008000
+	r0 0x00000000
+	r1 0x00000010
+	device /dev/ttyS0
+	options "9600 8N1"
+	baud 115200
+	otherfile ramdisk_img.gz
+	otherbase 0xd8000000
+
+Then load the kernel and ramdisk with::
+
+	angelboot -f angelboot.opt zImage
+
+The first Brutus serial port (assumed to be linked to /dev/ttyS0 on your
+host PC) is used by angel to load the kernel and ramdisk image. The serial
+console is provided through the second Brutus serial port. To access it,
+you may use minicom configured with /dev/ttyS1, 9600 baud, 8N1, no flow
+control.
+
+Currently supported
+===================
+
+	- RS232 serial ports
+	- audio output
+	- LCD screen
+	- keyboard
+
+The actual Brutus support may not be complete without extra patches.
+If such patches exist, they should be found from
+ftp.netwinder.org/users/n/nico.
+
+A full PCMCIA support is still missing, although it's possible to hack
+some drivers in order to drive already inserted cards at boot time with
+little modifications.
+
+Any contribution is welcome.
+
+Please send patches to nico@fluxnic.net
+
+Have Fun !
diff --git a/Documentation/arm/sa1100/cerf.rst b/Documentation/arm/sa1100/cerf.rst
new file mode 100644
index 000000000000..7fa71b609bf9
--- /dev/null
+++ b/Documentation/arm/sa1100/cerf.rst
@@ -0,0 +1,35 @@
+==============
+CerfBoard/Cube
+==============
+
+*** The StrongARM version of the CerfBoard/Cube has been discontinued ***
+
+The Intrinsyc CerfBoard is a StrongARM 1110-based computer on a board
+that measures approximately 2" square. It includes an Ethernet
+controller, an RS232-compatible serial port, a USB function port, and
+one CompactFlash+ slot on the back. Pictures can be found at the
+Intrinsyc website, http://www.intrinsyc.com.
+
+This document describes the support in the Linux kernel for the
+Intrinsyc CerfBoard.
+
+Supported in this version
+=========================
+
+   - CompactFlash+ slot (select PCMCIA in General Setup and any options
+     that may be required)
+   - Onboard Crystal CS8900 Ethernet controller (Cerf CS8900A support in
+     Network Devices)
+   - Serial ports with a serial console (hardcoded to 38400 8N1)
+
+In order to get this kernel onto your Cerf, you need a server that runs
+both BOOTP and TFTP. Detailed instructions should have come with your
+evaluation kit on how to use the bootloader. This series of commands
+will suffice::
+
+   make ARCH=arm CROSS_COMPILE=arm-linux- cerfcube_defconfig
+   make ARCH=arm CROSS_COMPILE=arm-linux- zImage
+   make ARCH=arm CROSS_COMPILE=arm-linux- modules
+   cp arch/arm/boot/zImage <TFTP directory>
+
+support@intrinsyc.com
diff --git a/Documentation/arm/sa1100/freebird.rst b/Documentation/arm/sa1100/freebird.rst
new file mode 100644
index 000000000000..81043d0c6d64
--- /dev/null
+++ b/Documentation/arm/sa1100/freebird.rst
@@ -0,0 +1,25 @@
+========
+Freebird
+========
+
+Freebird-1.1 is produced by Legend(C), Inc.
+`http://web.archive.org/web/*/http://www.legend.com.cn`
+and software/linux maintained by Coventive(C), Inc.
+(http://www.coventive.com)
+
+Based on the Nicolas's strongarm kernel tree.
+
+Maintainer:
+
+Chester Kuo
+	- <chester@coventive.com>
+	- <chester@linux.org.tw>
+
+Author:
+
+- Tim wu <timwu@coventive.com>
+- CIH <cih@coventive.com>
+- Eric Peng <ericpeng@coventive.com>
+- Jeff Lee <jeff_lee@coventive.com>
+- Allen Cheng
+- Tony Liu <tonyliu@coventive.com>
diff --git a/Documentation/arm/sa1100/graphicsclient.rst b/Documentation/arm/sa1100/graphicsclient.rst
new file mode 100644
index 000000000000..a73d61c3ce91
--- /dev/null
+++ b/Documentation/arm/sa1100/graphicsclient.rst
@@ -0,0 +1,102 @@
+=============================================
+ADS GraphicsClient Plus Single Board Computer
+=============================================
+
+For more details, contact Applied Data Systems or see
+http://www.applieddata.net/products.html
+
+The original Linux support for this product has been provided by
+Nicolas Pitre <nico@fluxnic.net>. Continued development work by
+Woojung Huh <whuh@applieddata.net>
+
+It's currently possible to mount a root filesystem via NFS providing a
+complete Linux environment.  Otherwise a ramdisk image may be used.  The
+board supports MTD/JFFS, so you could also mount something on there.
+
+Use 'make graphicsclient_config' before any 'make config'.  This will set up
+defaults for GraphicsClient Plus support.
+
+The kernel zImage is linked to be loaded and executed at 0xc0200000.
+Also the following registers should have the specified values upon entry::
+
+	r0 = 0
+	r1 = 29	(this is the GraphicsClient architecture number)
+
+Linux can  be used with the ADS BootLoader that ships with the
+newer rev boards. See their documentation on how to load Linux.
+Angel is not available for the GraphicsClient Plus AFAIK.
+
+There is a  board known as just the GraphicsClient that ADS used to
+produce but has end of lifed. This code will not work on the older
+board with the ADS bootloader, but should still work with Angel,
+as outlined below.  In any case, if you're planning on deploying
+something en masse, you should probably get the newer board.
+
+If using Angel on the older boards, here is a typical angel.opt option file
+if the kernel is loaded through the Angel Debug Monitor::
+
+	base 0xc0200000
+	entry 0xc0200000
+	r0 0x00000000
+	r1 0x0000001d
+	device /dev/ttyS1
+	options "38400 8N1"
+	baud 115200
+	#otherfile ramdisk.gz
+	#otherbase 0xc0800000
+	exec minicom
+
+Then the kernel (and ramdisk if otherfile/otherbase lines above are
+uncommented) would be loaded with::
+
+	angelboot -f angelboot.opt zImage
+
+Here it is assumed that the board is connected to ttyS1 on your PC
+and that minicom is preconfigured with /dev/ttyS1, 38400 baud, 8N1, no flow
+control by default.
+
+If any other bootloader is used, ensure it accomplish the same, especially
+for r0/r1 register values before jumping into the kernel.
+
+
+Supported peripherals
+=====================
+
+- SA1100 LCD frame buffer (8/16bpp...sort of)
+- on-board SMC 92C96 ethernet NIC
+- SA1100 serial port
+- flash memory access (MTD/JFFS)
+- pcmcia
+- touchscreen(ucb1200)
+- ps/2 keyboard
+- console on LCD screen
+- serial ports (ttyS[0-2])
+  - ttyS0 is default for serial console
+- Smart I/O (ADC, keypad, digital inputs, etc)
+  See http://www.eurotech-inc.com/linux-sbc.asp for IOCTL documentation
+  and example user space code. ps/2 keybd is multiplexed through this driver
+
+To do
+=====
+
+- UCB1200 audio with new ucb_generic layer
+- everything else!  :-)
+
+Notes
+=====
+
+- The flash on board is divided into 3 partitions.  mtd0 is where
+  the ADS boot ROM and zImage is stored.  It's been marked as
+  read-only to keep you from blasting over the bootloader. :)  mtd1 is
+  for the ramdisk.gz image.  mtd2 is user flash space and can be
+  utilized for either JFFS or if you're feeling crazy, running ext2
+  on top of it. If you're not using the ADS bootloader, you're
+  welcome to blast over the mtd1 partition also.
+
+- 16bpp mode requires a different cable than what ships with the board.
+  Contact ADS or look through the manual to wire your own. Currently,
+  if you compile with 16bit mode support and switch into a lower bpp
+  mode, the timing is off so the image is corrupted.  This will be
+  fixed soon.
+
+Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/sa1100/graphicsmaster.rst b/Documentation/arm/sa1100/graphicsmaster.rst
new file mode 100644
index 000000000000..e39892514f0c
--- /dev/null
+++ b/Documentation/arm/sa1100/graphicsmaster.rst
@@ -0,0 +1,60 @@
+========================================
+ADS GraphicsMaster Single Board Computer
+========================================
+
+For more details, contact Applied Data Systems or see
+http://www.applieddata.net/products.html
+
+The original Linux support for this product has been provided by
+Nicolas Pitre <nico@fluxnic.net>. Continued development work by
+Woojung Huh <whuh@applieddata.net>
+
+Use 'make graphicsmaster_config' before any 'make config'.
+This will set up defaults for GraphicsMaster support.
+
+The kernel zImage is linked to be loaded and executed at 0xc0400000.
+
+Linux can  be used with the ADS BootLoader that ships with the
+newer rev boards. See their documentation on how to load Linux.
+
+Supported peripherals
+=====================
+
+- SA1100 LCD frame buffer (8/16bpp...sort of)
+- SA1111 USB Master
+- on-board SMC 92C96 ethernet NIC
+- SA1100 serial port
+- flash memory access (MTD/JFFS)
+- pcmcia, compact flash
+- touchscreen(ucb1200)
+- ps/2 keyboard
+- console on LCD screen
+- serial ports (ttyS[0-2])
+  - ttyS0 is default for serial console
+- Smart I/O (ADC, keypad, digital inputs, etc)
+  See http://www.eurotech-inc.com/linux-sbc.asp for IOCTL documentation
+  and example user space code. ps/2 keybd is multiplexed through this driver
+
+To do
+=====
+
+- everything else!  :-)
+
+Notes
+=====
+
+- The flash on board is divided into 3 partitions.  mtd0 is where
+  the zImage is stored.  It's been marked as read-only to keep you
+  from blasting over the bootloader. :)  mtd1 is
+  for the ramdisk.gz image.  mtd2 is user flash space and can be
+  utilized for either JFFS or if you're feeling crazy, running ext2
+  on top of it. If you're not using the ADS bootloader, you're
+  welcome to blast over the mtd1 partition also.
+
+- 16bpp mode requires a different cable than what ships with the board.
+  Contact ADS or look through the manual to wire your own. Currently,
+  if you compile with 16bit mode support and switch into a lower bpp
+  mode, the timing is off so the image is corrupted.  This will be
+  fixed soon.
+
+Any contribution can be sent to nico@fluxnic.net and will be greatly welcome!
diff --git a/Documentation/arm/sa1100/huw_webpanel.rst b/Documentation/arm/sa1100/huw_webpanel.rst
new file mode 100644
index 000000000000..1dc7ccb165f0
--- /dev/null
+++ b/Documentation/arm/sa1100/huw_webpanel.rst
@@ -0,0 +1,21 @@
+=======================
+Hoeft & Wessel Webpanel
+=======================
+
+The HUW_WEBPANEL is a product of the german company Hoeft & Wessel AG
+
+If you want more information, please visit
+http://www.hoeft-wessel.de
+
+To build the kernel::
+
+	make huw_webpanel_config
+	make oldconfig
+	[accept all defaults]
+	make zImage
+
+Mostly of the work is done by:
+Roman Jordan         jor@hoeft-wessel.de
+Christoph Schulz    schu@hoeft-wessel.de
+
+2000/12/18/
diff --git a/Documentation/arm/sa1100/index.rst b/Documentation/arm/sa1100/index.rst
new file mode 100644
index 000000000000..fb2385b3accf
--- /dev/null
+++ b/Documentation/arm/sa1100/index.rst
@@ -0,0 +1,23 @@
+====================
+Intel StrongARM 1100
+====================
+
+.. toctree::
+    :maxdepth: 1
+
+    adsbitsy
+    assabet
+    brutus
+    cerf
+    freebird
+    graphicsclient
+    graphicsmaster
+    huw_webpanel
+    itsy
+    lart
+    nanoengine
+    pangolin
+    pleb
+    serial_uart
+    tifon
+    yopy
diff --git a/Documentation/arm/sa1100/itsy.rst b/Documentation/arm/sa1100/itsy.rst
new file mode 100644
index 000000000000..f49896ba3ef1
--- /dev/null
+++ b/Documentation/arm/sa1100/itsy.rst
@@ -0,0 +1,47 @@
+====
+Itsy
+====
+
+Itsy is a research project done by the Western Research Lab, and Systems
+Research Center in Palo Alto, CA. The Itsy project is one of several
+research projects at Compaq that are related to pocket computing.
+
+For more information, see:
+
+	http://www.hpl.hp.com/downloads/crl/itsy/
+
+Notes on initial 2.4 Itsy support (8/27/2000) :
+
+The port was done on an Itsy version 1.5 machine with a daughtercard with
+64 Meg of DRAM and 32 Meg of Flash. The initial work includes support for
+serial console (to see what you're doing).  No other devices have been
+enabled.
+
+To build, do a "make menuconfig" (or xmenuconfig) and select Itsy support.
+Disable Flash and LCD support. and then do a make zImage.
+Finally, you will need to cd to arch/arm/boot/tools and execute a make there
+to build the params-itsy program used to boot the kernel.
+
+In order to install the port of 2.4 to the itsy, You will need to set the
+configuration parameters in the monitor as follows::
+
+	Arg 1:0x08340000, Arg2: 0xC0000000, Arg3:18 (0x12), Arg4:0
+
+Make sure the start-routine address is set to 0x00060000.
+
+Next, flash the params-itsy program to 0x00060000 ("p 1 0x00060000" in the
+flash menu)  Flash the kernel in arch/arm/boot/zImage into 0x08340000
+("p 1 0x00340000").  Finally flash an initial ramdisk into 0xC8000000
+("p 2 0x0")  We used ramdisk-2-30.gz from the 0.11 version directory on
+handhelds.org.
+
+The serial connection we established was at:
+
+8-bit data, no parity, 1 stop bit(s), 115200.00 b/s. in the monitor, in the
+params-itsy program, and in the kernel itself.  This can be changed, but
+not easily. The monitor parameters are easily changed, the params program
+setup is assembly outl's, and the kernel is a configuration item specific to
+the itsy. (i.e. grep for CONFIG_SA1100_ITSY and you'll find where it is.)
+
+
+This should get you a properly booting 2.4 kernel on the itsy.
diff --git a/Documentation/arm/sa1100/lart.rst b/Documentation/arm/sa1100/lart.rst
new file mode 100644
index 000000000000..94c0568d1095
--- /dev/null
+++ b/Documentation/arm/sa1100/lart.rst
@@ -0,0 +1,15 @@
+====================================
+Linux Advanced Radio Terminal (LART)
+====================================
+
+The LART is a small (7.5 x 10cm) SA-1100 board, designed for embedded
+applications. It has 32 MB DRAM, 4MB Flash ROM, double RS232 and all
+other StrongARM-gadgets. Almost all SA signals are directly accessible
+through a number of connectors. The powersupply accepts voltages
+between 3.5V and 16V and is overdimensioned to support a range of
+daughterboards. A quad Ethernet / IDE / PS2 / sound daughterboard
+is under development, with plenty of others in different stages of
+planning.
+
+The hardware designs for this board have been released under an open license;
+see the LART page at http://www.lartmaker.nl/ for more information.
diff --git a/Documentation/arm/sa1100/nanoengine.rst b/Documentation/arm/sa1100/nanoengine.rst
new file mode 100644
index 000000000000..47f1a14cf98a
--- /dev/null
+++ b/Documentation/arm/sa1100/nanoengine.rst
@@ -0,0 +1,11 @@
+==========
+nanoEngine
+==========
+
+"nanoEngine" is a SA1110 based single board computer from
+Bright Star Engineering Inc.  See www.brightstareng.com/arm
+for more info.
+(Ref: Stuart Adams <sja@brightstareng.com>)
+
+Also visit Larry Doolittle's "Linux for the nanoEngine" site:
+http://www.brightstareng.com/arm/nanoeng.htm
diff --git a/Documentation/arm/sa1100/pangolin.rst b/Documentation/arm/sa1100/pangolin.rst
new file mode 100644
index 000000000000..f0c5c1618553
--- /dev/null
+++ b/Documentation/arm/sa1100/pangolin.rst
@@ -0,0 +1,29 @@
+========
+Pangolin
+========
+
+Pangolin is a StrongARM 1110-based evaluation platform produced
+by Dialogue Technology (http://www.dialogue.com.tw/).
+It has EISA slots for ease of configuration with SDRAM/Flash
+memory card, USB/Serial/Audio card, Compact Flash card,
+PCMCIA/IDE card and TFT-LCD card.
+
+To compile for Pangolin, you must issue the following commands::
+
+	make pangolin_config
+	make oldconfig
+	make zImage
+
+Supported peripherals
+=====================
+
+- SA1110 serial port (UART1/UART2/UART3)
+- flash memory access
+- compact flash driver
+- UDA1341 sound driver
+- SA1100 LCD controller for 800x600 16bpp TFT-LCD
+- MQ-200 driver for 800x600 16bpp TFT-LCD
+- Penmount(touch panel) driver
+- PCMCIA driver
+- SMC91C94 LAN driver
+- IDE driver (experimental)
diff --git a/Documentation/arm/sa1100/pleb.rst b/Documentation/arm/sa1100/pleb.rst
new file mode 100644
index 000000000000..d5b732967aa3
--- /dev/null
+++ b/Documentation/arm/sa1100/pleb.rst
@@ -0,0 +1,13 @@
+====
+PLEB
+====
+
+The PLEB project was started as a student initiative at the School of
+Computer Science and Engineering, University of New South Wales to make a
+pocket computer capable of running the Linux Kernel.
+
+PLEB support has yet to be fully integrated.
+
+For more information, see:
+
+	http://www.cse.unsw.edu.au
diff --git a/Documentation/arm/sa1100/serial_uart.rst b/Documentation/arm/sa1100/serial_uart.rst
new file mode 100644
index 000000000000..ea983642b9be
--- /dev/null
+++ b/Documentation/arm/sa1100/serial_uart.rst
@@ -0,0 +1,51 @@
+==================
+SA1100 serial port
+==================
+
+The SA1100 serial port had its major/minor numbers officially assigned::
+
+  > Date: Sun, 24 Sep 2000 21:40:27 -0700
+  > From: H. Peter Anvin <hpa@transmeta.com>
+  > To: Nicolas Pitre <nico@CAM.ORG>
+  > Cc: Device List Maintainer <device@lanana.org>
+  > Subject: Re: device
+  >
+  > Okay.  Note that device numbers 204 and 205 are used for "low density
+  > serial devices", so you will have a range of minors on those majors (the
+  > tty device layer handles this just fine, so you don't have to worry about
+  > doing anything special.)
+  >
+  > So your assignments are:
+  >
+  > 204 char        Low-density serial ports
+  >                   5 = /dev/ttySA0               SA1100 builtin serial port 0
+  >                   6 = /dev/ttySA1               SA1100 builtin serial port 1
+  >                   7 = /dev/ttySA2               SA1100 builtin serial port 2
+  >
+  > 205 char        Low-density serial ports (alternate device)
+  >                   5 = /dev/cusa0                Callout device for ttySA0
+  >                   6 = /dev/cusa1                Callout device for ttySA1
+  >                   7 = /dev/cusa2                Callout device for ttySA2
+  >
+
+You must create those inodes in /dev on the root filesystem used
+by your SA1100-based device::
+
+	mknod ttySA0 c 204 5
+	mknod ttySA1 c 204 6
+	mknod ttySA2 c 204 7
+	mknod cusa0 c 205 5
+	mknod cusa1 c 205 6
+	mknod cusa2 c 205 7
+
+In addition to the creation of the appropriate device nodes above, you
+must ensure your user space applications make use of the correct device
+name. The classic example is the content of the /etc/inittab file where
+you might have a getty process started on ttyS0.
+
+In this case:
+
+- replace occurrences of ttyS0 with ttySA0, ttyS1 with ttySA1, etc.
+
+- don't forget to add 'ttySA0', 'console', or the appropriate tty name
+  in /etc/securetty for root to be allowed to login as well.
diff --git a/Documentation/arm/sa1100/tifon.rst b/Documentation/arm/sa1100/tifon.rst
new file mode 100644
index 000000000000..c26e910b9ea7
--- /dev/null
+++ b/Documentation/arm/sa1100/tifon.rst
@@ -0,0 +1,7 @@
+=====
+Tifon
+=====
+
+More info has to come...
+
+Contact: Peter Danielsson <peter.danielsson@era-t.ericsson.se>
diff --git a/Documentation/arm/sa1100/yopy.rst b/Documentation/arm/sa1100/yopy.rst
new file mode 100644
index 000000000000..5b35a5f61a44
--- /dev/null
+++ b/Documentation/arm/sa1100/yopy.rst
@@ -0,0 +1,5 @@
+====
+Yopy
+====
+
+See http://www.yopydeveloper.org for more.
diff --git a/Documentation/arm/samsung-s3c24xx/cpufreq.rst b/Documentation/arm/samsung-s3c24xx/cpufreq.rst
new file mode 100644
index 000000000000..2ddc26c03b1f
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/cpufreq.rst
@@ -0,0 +1,76 @@
+=======================
+S3C24XX CPUfreq support
+=======================
+
+Introduction
+------------
+
+ The S3C24XX series support a number of power saving systems, such as
+ the ability to change the core, memory and peripheral operating
+ frequencies. The core control is exported via the CPUFreq driver
+ which has a number of different manual or automatic controls over the
+ rate the core is running at.
+
+ There are two forms of the driver depending on the specific CPU and
+ how the clocks are arranged. The first implementation used as single
+ PLL to feed the ARM, memory and peripherals via a series of dividers
+ and muxes and this is the implementation that is documented here. A
+ newer version where there is a separate PLL and clock divider for the
+ ARM core is available as a separate driver.
+
+
+Layout
+------
+
+ The code core manages the CPU specific drivers, any data that they
+ need to register and the interface to the generic drivers/cpufreq
+ system. Each CPU registers a driver to control the PLL, clock dividers
+ and anything else associated with it. Any board that wants to use this
+ framework needs to supply at least basic details of what is required.
+
+ The core registers with drivers/cpufreq at init time if all the data
+ necessary has been supplied.
+
+
+CPU support
+-----------
+
+ The support for each CPU depends on the facilities provided by the
+ SoC and the driver as each device has different PLL and clock chains
+ associated with it.
+
+
+Slow Mode
+---------
+
+ The SLOW mode where the PLL is turned off altogether and the
+ system is fed by the external crystal input is currently not
+ supported.
+
+
+sysfs
+-----
+
+ The core code exports extra information via sysfs in the directory
+ devices/system/cpu/cpu0/arch-freq.
+
+
+Board Support
+-------------
+
+ Each board that wants to use the cpufreq code must register some basic
+ information with the core driver to provide information about what the
+ board requires and any restrictions being placed on it.
+
+ The board needs to supply information about whether it needs the IO bank
+ timings changing, any maximum frequency limits and information about the
+ SDRAM refresh rate.
+
+
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2009 Simtec Electronics
+Licensed under GPLv2
diff --git a/Documentation/arm/samsung-s3c24xx/eb2410itx.rst b/Documentation/arm/samsung-s3c24xx/eb2410itx.rst
new file mode 100644
index 000000000000..7863c93652f8
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/eb2410itx.rst
@@ -0,0 +1,59 @@
+===================================
+Simtec Electronics EB2410ITX (BAST)
+===================================
+
+	http://www.simtec.co.uk/products/EB2410ITX/
+
+Introduction
+------------
+
+  The EB2410ITX is a S3C2410 based development board with a variety of
+  peripherals and expansion connectors. This board is also known by
+  the shortened name of Bast.
+
+
+Configuration
+-------------
+
+  To set the default configuration, use `make bast_defconfig` which
+  supports the commonly used features of this board.
+
+
+Support
+-------
+
+  Official support information can be found on the Simtec Electronics
+  website, at the product page http://www.simtec.co.uk/products/EB2410ITX/
+
+  Useful links:
+
+    - Resources Page http://www.simtec.co.uk/products/EB2410ITX/resources.html
+
+    - Board FAQ at http://www.simtec.co.uk/products/EB2410ITX/faq.html
+
+    - Bootloader info http://www.simtec.co.uk/products/SWABLE/resources.html
+      and FAQ http://www.simtec.co.uk/products/SWABLE/faq.html
+
+
+MTD
+---
+
+  The NAND and NOR support has been merged from the linux-mtd project.
+  Any problems, see http://www.linux-mtd.infradead.org/ for more
+  information or up-to-date versions of linux-mtd.
+
+
+IDE
+---
+
+  Both onboard IDE ports are supported, however there is no support for
+  changing speed of devices, PIO Mode 4 capable drives should be used.
+
+
+Maintainers
+-----------
+
+  This board is maintained by Simtec Electronics.
+
+
+Copyright 2004 Ben Dooks, Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/gpio.rst b/Documentation/arm/samsung-s3c24xx/gpio.rst
new file mode 100644
index 000000000000..f7c3d7d011a2
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/gpio.rst
@@ -0,0 +1,172 @@
+====================
+S3C24XX GPIO Control
+====================
+
+Introduction
+------------
+
+  The s3c2410 kernel provides an interface to configure and
+  manipulate the state of the GPIO pins, and find out other
+  information about them.
+
+  There are a number of conditions attached to the configuration
+  of the s3c2410 GPIO system, please read the Samsung provided
+  data-sheet/users manual to find out the complete list.
+
+  See Documentation/arm/samsung/gpio.rst for the core implementation.
+
+
+GPIOLIB
+-------
+
+  With the event of the GPIOLIB in drivers/gpio, support for some
+  of the GPIO functions such as reading and writing a pin will
+  be removed in favour of this common access method.
+
+  Once all the extant drivers have been converted, the functions
+  listed below will be removed (they may be marked as __deprecated
+  in the near future).
+
+  The following functions now either have a `s3c_` specific variant
+  or are merged into gpiolib. See the definitions in
+  arch/arm/plat-samsung/include/plat/gpio-cfg.h:
+
+  - s3c2410_gpio_setpin()	gpio_set_value() or gpio_direction_output()
+  - s3c2410_gpio_getpin()	gpio_get_value() or gpio_direction_input()
+  - s3c2410_gpio_getirq()	gpio_to_irq()
+  - s3c2410_gpio_cfgpin()	s3c_gpio_cfgpin()
+  - s3c2410_gpio_getcfg()	s3c_gpio_getcfg()
+  - s3c2410_gpio_pullup()	s3c_gpio_setpull()
+
+
+GPIOLIB conversion
+------------------
+
+If you need to convert your board or driver to use gpiolib from the phased
+out s3c2410 API, then here are some notes on the process.
+
+1) If your board is exclusively using an GPIO, say to control peripheral
+   power, then it will require to claim the gpio with gpio_request() before
+   it can use it.
+
+   It is recommended to check the return value, with at least WARN_ON()
+   during initialisation.
+
+2) The s3c2410_gpio_cfgpin() can be directly replaced with s3c_gpio_cfgpin()
+   as they have the same arguments, and can either take the pin specific
+   values, or the more generic special-function-number arguments.
+
+3) s3c2410_gpio_pullup() changes have the problem that while the
+   s3c2410_gpio_pullup(x, 1) can be easily translated to the
+   s3c_gpio_setpull(x, S3C_GPIO_PULL_NONE), the s3c2410_gpio_pullup(x, 0)
+   are not so easy.
+
+   The s3c2410_gpio_pullup(x, 0) case enables the pull-up (or in the case
+   of some of the devices, a pull-down) and as such the new API distinguishes
+   between the UP and DOWN case. There is currently no 'just turn on' setting
+   which may be required if this becomes a problem.
+
+4) s3c2410_gpio_setpin() can be replaced by gpio_set_value(), the old call
+   does not implicitly configure the relevant gpio to output. The gpio
+   direction should be changed before using gpio_set_value().
+
+5) s3c2410_gpio_getpin() is replaceable by gpio_get_value() if the pin
+   has been set to input. It is currently unknown what the behaviour is
+   when using gpio_get_value() on an output pin (s3c2410_gpio_getpin
+   would return the value the pin is supposed to be outputting).
+
+6) s3c2410_gpio_getirq() should be directly replaceable with the
+   gpio_to_irq() call.
+
+The s3c2410_gpio and `gpio_` calls have always operated on the same gpio
+numberspace, so there is no problem with converting the gpio numbering
+between the calls.
+
+
+Headers
+-------
+
+  See arch/arm/mach-s3c24xx/include/mach/regs-gpio.h for the list
+  of GPIO pins, and the configuration values for them. This
+  is included by using #include <mach/regs-gpio.h>
+
+
+PIN Numbers
+-----------
+
+  Each pin has an unique number associated with it in regs-gpio.h,
+  e.g. S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
+  the GPIO functions which pin is to be used.
+
+  With the conversion to gpiolib, there is no longer a direct conversion
+  from gpio pin number to register base address as in earlier kernels. This
+  is due to the number space required for newer SoCs where the later
+  GPIOs are not contiguous.
+
+
+Configuring a pin
+-----------------
+
+  The following function allows the configuration of a given pin to
+  be changed.
+
+    void s3c_gpio_cfgpin(unsigned int pin, unsigned int function);
+
+  e.g.:
+
+     s3c_gpio_cfgpin(S3C2410_GPA(0), S3C_GPIO_SFN(1));
+     s3c_gpio_cfgpin(S3C2410_GPE(8), S3C_GPIO_SFN(2));
+
+   which would turn GPA(0) into the lowest Address line A0, and set
+   GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
+
+
+Reading the current configuration
+---------------------------------
+
+  The current configuration of a pin can be read by using standard
+  gpiolib function:
+
+  s3c_gpio_getcfg(unsigned int pin);
+
+  The return value will be from the same set of values which can be
+  passed to s3c_gpio_cfgpin().
+
+
+Configuring a pull-up resistor
+------------------------------
+
+  A large proportion of the GPIO pins on the S3C2410 can have weak
+  pull-up resistors enabled. This can be configured by the following
+  function:
+
+    void s3c_gpio_setpull(unsigned int pin, unsigned int to);
+
+  Where the to value is S3C_GPIO_PULL_NONE to set the pull-up off,
+  and S3C_GPIO_PULL_UP to enable the specified pull-up. Any other
+  values are currently undefined.
+
+
+Getting and setting the state of a PIN
+--------------------------------------
+
+  These calls are now implemented by the relevant gpiolib calls, convert
+  your board or driver to use gpiolib.
+
+
+Getting the IRQ number associated with a PIN
+--------------------------------------------
+
+  A standard gpiolib function can map the given pin number to an IRQ
+  number to pass to the IRQ system.
+
+   int gpio_to_irq(unsigned int pin);
+
+  Note, not all pins have an IRQ.
+
+
+Author
+-------
+
+Ben Dooks, 03 October 2004
+Copyright 2004 Ben Dooks, Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/h1940.rst b/Documentation/arm/samsung-s3c24xx/h1940.rst
new file mode 100644
index 000000000000..62a562c178e3
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/h1940.rst
@@ -0,0 +1,41 @@
+=============
+HP IPAQ H1940
+=============
+
+http://www.handhelds.org/projects/h1940.html
+
+Introduction
+------------
+
+  The HP H1940 is a S3C2410 based handheld device, with
+  bluetooth connectivity.
+
+
+Support
+-------
+
+  A variety of information is available
+
+  handhelds.org project page:
+
+    http://www.handhelds.org/projects/h1940.html
+
+  handhelds.org wiki page:
+
+    http://handhelds.org/moin/moin.cgi/HpIpaqH1940
+
+  Herbert Pötzl pages:
+
+    http://vserver.13thfloor.at/H1940/
+
+
+Maintainers
+-----------
+
+  This project is being maintained and developed by a variety
+  of people, including Ben Dooks, Arnaud Patard, and Herbert Pötzl.
+
+  Thanks to the many others who have also provided support.
+
+
+(c) 2005 Ben Dooks
diff --git a/Documentation/arm/samsung-s3c24xx/index.rst b/Documentation/arm/samsung-s3c24xx/index.rst
new file mode 100644
index 000000000000..6c7b241cbf37
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/index.rst
@@ -0,0 +1,18 @@
+﻿==========================
+Samsung S3C24XX SoC Family
+==========================
+
+.. toctree::
+   :maxdepth: 1
+
+   h1940
+   gpio
+   cpufreq
+   suspend
+   usb-host
+   s3c2412
+   eb2410itx
+   nand
+   smdk2440
+   s3c2413
+   overview
diff --git a/Documentation/arm/samsung-s3c24xx/nand.rst b/Documentation/arm/samsung-s3c24xx/nand.rst
new file mode 100644
index 000000000000..938995694ee7
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/nand.rst
@@ -0,0 +1,30 @@
+====================
+S3C24XX NAND Support
+====================
+
+Introduction
+------------
+
+Small Page NAND
+---------------
+
+The driver uses a 512 byte (1 page) ECC code for this setup. The
+ECC code is not directly compatible with the default kernel ECC
+code, so the driver enforces its own OOB layout and ECC parameters
+
+Large Page NAND
+---------------
+
+The driver is capable of handling NAND flash with a 2KiB page
+size, with support for hardware ECC generation and correction.
+
+Unlike the 512byte page mode, the driver generates ECC data for
+each 256 byte block in an 2KiB page. This means that more than
+one error in a page can be rectified. It also means that the
+OOB layout remains the default kernel layout for these flashes.
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2007 Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/overview.rst b/Documentation/arm/samsung-s3c24xx/overview.rst
new file mode 100644
index 000000000000..e9a1dc7276b5
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/overview.rst
@@ -0,0 +1,319 @@
+==========================
+S3C24XX ARM Linux Overview
+==========================
+
+
+
+Introduction
+------------
+
+  The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
+  by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
+  S3C2412, S3C2413, S3C2416, S3C2440, S3C2442, S3C2443 and S3C2450 devices
+  are supported.
+
+  Support for the S3C2400 and S3C24A0 series was never completed and the
+  corresponding code has been removed after a while.  If someone wishes to
+  revive this effort, partial support can be retrieved from earlier Linux
+  versions.
+
+  The S3C2416 and S3C2450 devices are very similar and S3C2450 support is
+  included under the arch/arm/mach-s3c2416 directory. Note, while core
+  support for these SoCs is in, work on some of the extra peripherals
+  and extra interrupts is still ongoing.
+
+
+Configuration
+-------------
+
+  A generic S3C2410 configuration is provided, and can be used as the
+  default by `make s3c2410_defconfig`. This configuration has support
+  for all the machines, and the commonly used features on them.
+
+  Certain machines may have their own default configurations as well,
+  please check the machine specific documentation.
+
+
+Layout
+------
+
+  The core support files are located in the platform code contained in
+  arch/arm/plat-s3c24xx with headers in include/asm-arm/plat-s3c24xx.
+  This directory should be kept to items shared between the platform
+  code (arch/arm/plat-s3c24xx) and the arch/arm/mach-s3c24* code.
+
+  Each cpu has a directory with the support files for it, and the
+  machines that carry the device. For example S3C2410 is contained
+  in arch/arm/mach-s3c2410 and S3C2440 in arch/arm/mach-s3c2440
+
+  Register, kernel and platform data definitions are held in the
+  arch/arm/mach-s3c2410 directory./include/mach
+
+arch/arm/plat-s3c24xx:
+
+  Files in here are either common to all the s3c24xx family,
+  or are common to only some of them with names to indicate this
+  status. The files that are not common to all are generally named
+  with the initial cpu they support in the series to ensure a short
+  name without any possibility of confusion with newer devices.
+
+  As an example, initially s3c244x would cover s3c2440 and s3c2442, but
+  with the s3c2443 which does not share many of the same drivers in
+  this directory, the name becomes invalid. We stick to s3c2440-<x>
+  to indicate a driver that is s3c2440 and s3c2442 compatible.
+
+  This does mean that to find the status of any given SoC, a number
+  of directories may need to be searched.
+
+
+Machines
+--------
+
+  The currently supported machines are as follows:
+
+  Simtec Electronics EB2410ITX (BAST)
+
+    A general purpose development board, see EB2410ITX.txt for further
+    details
+
+  Simtec Electronics IM2440D20 (Osiris)
+
+    CPU Module from Simtec Electronics, with a S3C2440A CPU, nand flash
+    and a PCMCIA controller.
+
+  Samsung SMDK2410
+
+    Samsung's own development board, geared for PDA work.
+
+  Samsung/Aiji SMDK2412
+
+    The S3C2412 version of the SMDK2440.
+
+  Samsung/Aiji SMDK2413
+
+    The S3C2412 version of the SMDK2440.
+
+  Samsung/Meritech SMDK2440
+
+    The S3C2440 compatible version of the SMDK2440, which has the
+    option of an S3C2440 or S3C2442 CPU module.
+
+  Thorcom VR1000
+
+    Custom embedded board
+
+  HP IPAQ 1940
+
+    Handheld (IPAQ), available in several varieties
+
+  HP iPAQ rx3715
+
+    S3C2440 based IPAQ, with a number of variations depending on
+    features shipped.
+
+  Acer N30
+
+    A S3C2410 based PDA from Acer.  There is a Wiki page at
+    http://handhelds.org/moin/moin.cgi/AcerN30Documentation .
+
+  AML M5900
+
+    American Microsystems' M5900
+
+  Nex Vision Nexcoder
+  Nex Vision Otom
+
+    Two machines by Nex Vision
+
+
+Adding New Machines
+-------------------
+
+  The architecture has been designed to support as many machines as can
+  be configured for it in one kernel build, and any future additions
+  should keep this in mind before altering items outside of their own
+  machine files.
+
+  Machine definitions should be kept in linux/arch/arm/mach-s3c2410,
+  and there are a number of examples that can be looked at.
+
+  Read the kernel patch submission policies as well as the
+  Documentation/arm directory before submitting patches. The
+  ARM kernel series is managed by Russell King, and has a patch system
+  located at http://www.arm.linux.org.uk/developer/patches/
+  as well as mailing lists that can be found from the same site.
+
+  As a courtesy, please notify <ben-linux@fluff.org> of any new
+  machines or other modifications.
+
+  Any large scale modifications, or new drivers should be discussed
+  on the ARM kernel mailing list (linux-arm-kernel) before being
+  attempted. See http://www.arm.linux.org.uk/mailinglists/ for the
+  mailing list information.
+
+
+I2C
+---
+
+  The hardware I2C core in the CPU is supported in single master
+  mode, and can be configured via platform data.
+
+
+RTC
+---
+
+  Support for the onboard RTC unit, including alarm function.
+
+  This has recently been upgraded to use the new RTC core,
+  and the module has been renamed to rtc-s3c to fit in with
+  the new rtc naming scheme.
+
+
+Watchdog
+--------
+
+  The onchip watchdog is available via the standard watchdog
+  interface.
+
+
+NAND
+----
+
+  The current kernels now have support for the s3c2410 NAND
+  controller. If there are any problems the latest linux-mtd
+  code can be found from http://www.linux-mtd.infradead.org/
+
+  For more information see Documentation/arm/samsung-s3c24xx/nand.rst
+
+
+SD/MMC
+------
+
+  The SD/MMC hardware pre S3C2443 is supported in the current
+  kernel, the driver is drivers/mmc/host/s3cmci.c and supports
+  1 and 4 bit SD or MMC cards.
+
+  The SDIO behaviour of this driver has not been fully tested. There is no
+  current support for hardware SDIO interrupts.
+
+
+Serial
+------
+
+  The s3c2410 serial driver provides support for the internal
+  serial ports. These devices appear as /dev/ttySAC0 through 3.
+
+  To create device nodes for these, use the following commands
+
+    mknod ttySAC0 c 204 64
+    mknod ttySAC1 c 204 65
+    mknod ttySAC2 c 204 66
+
+
+GPIO
+----
+
+  The core contains support for manipulating the GPIO, see the
+  documentation in GPIO.txt in the same directory as this file.
+
+  Newer kernels carry GPIOLIB, and support is being moved towards
+  this with some of the older support in line to be removed.
+
+  As of v2.6.34, the move towards using gpiolib support is almost
+  complete, and very little of the old calls are left.
+
+  See Documentation/arm/samsung-s3c24xx/gpio.rst for the S3C24XX specific
+  support and Documentation/arm/samsung/gpio.rst for the core Samsung
+  implementation.
+
+
+Clock Management
+----------------
+
+  The core provides the interface defined in the header file
+  include/asm-arm/hardware/clock.h, to allow control over the
+  various clock units
+
+
+Suspend to RAM
+--------------
+
+  For boards that provide support for suspend to RAM, the
+  system can be placed into low power suspend.
+
+  See Suspend.txt for more information.
+
+
+SPI
+---
+
+  SPI drivers are available for both the in-built hardware
+  (although there is no DMA support yet) and a generic
+  GPIO based solution.
+
+
+LEDs
+----
+
+  There is support for GPIO based LEDs via a platform driver
+  in the LED subsystem.
+
+
+Platform Data
+-------------
+
+  Whenever a device has platform specific data that is specified
+  on a per-machine basis, care should be taken to ensure the
+  following:
+
+    1) that default data is not left in the device to confuse the
+       driver if a machine does not set it at startup
+
+    2) the data should (if possible) be marked as __initdata,
+       to ensure that the data is thrown away if the machine is
+       not the one currently in use.
+
+       The best way of doing this is to make a function that
+       kmalloc()s an area of memory, and copies the __initdata
+       and then sets the relevant device's platform data. Making
+       the function `__init` takes care of ensuring it is discarded
+       with the rest of the initialisation code::
+
+         static __init void s3c24xx_xxx_set_platdata(struct xxx_data *pd)
+         {
+             struct s3c2410_xxx_mach_info *npd;
+
+	   npd = kmalloc(sizeof(struct s3c2410_xxx_mach_info), GFP_KERNEL);
+	   if (npd) {
+	      memcpy(npd, pd, sizeof(struct s3c2410_xxx_mach_info));
+	      s3c_device_xxx.dev.platform_data = npd;
+	   } else {
+                printk(KERN_ERR "no memory for xxx platform data\n");
+	   }
+	}
+
+	Note, since the code is marked as __init, it should not be
+	exported outside arch/arm/mach-s3c2410/, or exported to
+	modules via EXPORT_SYMBOL() and related functions.
+
+
+Port Contributors
+-----------------
+
+  Ben Dooks (BJD)
+  Vincent Sanders
+  Herbert Potzl
+  Arnaud Patard (RTP)
+  Roc Wu
+  Klaus Fetscher
+  Dimitry Andric
+  Shannon Holland
+  Guillaume Gourat (NexVision)
+  Christer Weinigel (wingel) (Acer N30)
+  Lucas Correia Villa Real (S3C2400 port)
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2004-2006 Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/s3c2412.rst b/Documentation/arm/samsung-s3c24xx/s3c2412.rst
new file mode 100644
index 000000000000..68b985fc6bf4
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/s3c2412.rst
@@ -0,0 +1,121 @@
+==========================
+S3C2412 ARM Linux Overview
+==========================
+
+Introduction
+------------
+
+  The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs
+  from Samsung. This part has an ARM926-EJS core, capable of running up
+  to 266MHz (see data-sheet for more information)
+
+
+Clock
+-----
+
+  The core clock code provides a set of clocks to the drivers, and allows
+  for source selection and a number of other features.
+
+
+Power
+-----
+
+  No support for suspend/resume to RAM in the current system.
+
+
+DMA
+---
+
+  No current support for DMA.
+
+
+GPIO
+----
+
+  There is support for setting the GPIO to input/output/special function
+  and reading or writing to them.
+
+
+UART
+----
+
+  The UART hardware is similar to the S3C2440, and is supported by the
+  s3c2410 driver in the drivers/serial directory.
+
+
+NAND
+----
+
+  The NAND hardware is similar to the S3C2440, and is supported by the
+  s3c2410 driver in the drivers/mtd/nand/raw directory.
+
+
+USB Host
+--------
+
+  The USB hardware is similar to the S3C2410, with extended clock source
+  control. The OHCI portion is supported by the ohci-s3c2410 driver, and
+  the clock control selection is supported by the core clock code.
+
+
+USB Device
+----------
+
+  No current support in the kernel
+
+
+IRQs
+----
+
+  All the standard, and external interrupt sources are supported. The
+  extra sub-sources are not yet supported.
+
+
+RTC
+---
+
+  The RTC hardware is similar to the S3C2410, and is supported by the
+  s3c2410-rtc driver.
+
+
+Watchdog
+--------
+
+  The watchdog hardware is the same as the S3C2410, and is supported by
+  the s3c2410_wdt driver.
+
+
+MMC/SD/SDIO
+-----------
+
+  No current support for the MMC/SD/SDIO block.
+
+IIC
+---
+
+  The IIC hardware is the same as the S3C2410, and is supported by the
+  i2c-s3c24xx driver.
+
+
+IIS
+---
+
+  No current support for the IIS interface.
+
+
+SPI
+---
+
+  No current support for the SPI interfaces.
+
+
+ATA
+---
+
+  No current support for the on-board ATA block.
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2006 Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/s3c2413.rst b/Documentation/arm/samsung-s3c24xx/s3c2413.rst
new file mode 100644
index 000000000000..1f51e207fc46
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/s3c2413.rst
@@ -0,0 +1,22 @@
+==========================
+S3C2413 ARM Linux Overview
+==========================
+
+Introduction
+------------
+
+  The S3C2413 is an extended version of the S3C2412, with an camera
+  interface and mobile DDR memory support. See the S3C2412 support
+  documentation for more information.
+
+
+Camera Interface
+----------------
+
+  This block is currently not supported.
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2006 Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/smdk2440.rst b/Documentation/arm/samsung-s3c24xx/smdk2440.rst
new file mode 100644
index 000000000000..524fd0b4afaf
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/smdk2440.rst
@@ -0,0 +1,57 @@
+=========================
+Samsung/Meritech SMDK2440
+=========================
+
+Introduction
+------------
+
+  The SMDK2440 is a two part evaluation board for the Samsung S3C2440
+  processor. It includes support for LCD, SmartMedia, Audio, SD and
+  10MBit Ethernet, and expansion headers for various signals, including
+  the camera and unused GPIO.
+
+
+Configuration
+-------------
+
+  To set the default configuration, use `make smdk2440_defconfig` which
+  will configure the common features of this board, or use
+  `make s3c2410_config` to include support for all s3c2410/s3c2440 machines
+
+
+Support
+-------
+
+  Ben Dooks' SMDK2440 site at http://www.fluff.org/ben/smdk2440/ which
+  includes linux based USB download tools.
+
+  Some of the h1940 patches that can be found from the H1940 project
+  site at http://www.handhelds.org/projects/h1940.html can also be
+  applied to this board.
+
+
+Peripherals
+-----------
+
+  There is no current support for any of the extra peripherals on the
+  base-board itself.
+
+
+MTD
+---
+
+  The NAND flash should be supported by the in kernel MTD NAND support,
+  NOR flash will be added later.
+
+
+Maintainers
+-----------
+
+  This board is being maintained by Ben Dooks, for more info, see
+  http://www.fluff.org/ben/smdk2440/
+
+  Many thanks to Dimitry Andric of TomTom for the loan of the SMDK2440,
+  and to Simtec Electronics for allowing me time to work on this.
+
+
+(c) 2004 Ben Dooks
diff --git a/Documentation/arm/samsung-s3c24xx/suspend.rst b/Documentation/arm/samsung-s3c24xx/suspend.rst
new file mode 100644
index 000000000000..b4f3ae9fe76e
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/suspend.rst
@@ -0,0 +1,137 @@
+=======================
+S3C24XX Suspend Support
+=======================
+
+
+Introduction
+------------
+
+  The S3C24XX supports a low-power suspend mode, where the SDRAM is kept
+  in Self-Refresh mode, and all but the essential peripheral blocks are
+  powered down. For more information on how this works, please look
+  at the relevant CPU datasheet from Samsung.
+
+
+Requirements
+------------
+
+  1) A bootloader that can support the necessary resume operation
+
+  2) Support for at least 1 source for resume
+
+  3) CONFIG_PM enabled in the kernel
+
+  4) Any peripherals that are going to be powered down at the same
+     time require suspend/resume support.
+
+
+Resuming
+--------
+
+  The S3C2410 user manual defines the process of sending the CPU to
+  sleep and how it resumes. The default behaviour of the Linux code
+  is to set the GSTATUS3 register to the physical address of the
+  code to resume Linux operation.
+
+  GSTATUS4 is currently left alone by the sleep code, and is free to
+  use for any other purposes (for example, the EB2410ITX uses this to
+  save memory configuration in).
+
+
+Machine Support
+---------------
+
+  The machine specific functions must call the s3c_pm_init() function
+  to say that its bootloader is capable of resuming. This can be as
+  simple as adding the following to the machine's definition:
+
+  INITMACHINE(s3c_pm_init)
+
+  A board can do its own setup before calling s3c_pm_init, if it
+  needs to setup anything else for power management support.
+
+  There is currently no support for over-riding the default method of
+  saving the resume address, if your board requires it, then contact
+  the maintainer and discuss what is required.
+
+  Note, the original method of adding an late_initcall() is wrong,
+  and will end up initialising all compiled machines' pm init!
+
+  The following is an example of code used for testing wakeup from
+  an falling edge on IRQ_EINT0::
+
+
+    static irqreturn_t button_irq(int irq, void *pw)
+    {
+	return IRQ_HANDLED;
+    }
+
+    statuc void __init machine_init(void)
+    {
+	...
+
+	request_irq(IRQ_EINT0, button_irq, IRQF_TRIGGER_FALLING,
+		   "button-irq-eint0", NULL);
+
+	enable_irq_wake(IRQ_EINT0);
+
+	s3c_pm_init();
+    }
+
+
+Debugging
+---------
+
+  There are several important things to remember when using PM suspend:
+
+  1) The uart drivers will disable the clocks to the UART blocks when
+     suspending, which means that use of printascii() or similar direct
+     access to the UARTs will cause the debug to stop.
+
+  2) While the pm code itself will attempt to re-enable the UART clocks,
+     care should be taken that any external clock sources that the UARTs
+     rely on are still enabled at that point.
+
+  3) If any debugging is placed in the resume path, then it must have the
+     relevant clocks and peripherals setup before use (ie, bootloader).
+
+     For example, if you transmit a character from the UART, the baud
+     rate and uart controls must be setup beforehand.
+
+
+Configuration
+-------------
+
+  The S3C2410 specific configuration in `System Type` defines various
+  aspects of how the S3C2410 suspend and resume support is configured
+
+  `S3C2410 PM Suspend debug`
+
+    This option prints messages to the serial console before and after
+    the actual suspend, giving detailed information on what is
+    happening
+
+
+  `S3C2410 PM Suspend Memory CRC`
+
+    Allows the entire memory to be checksummed before and after the
+    suspend to see if there has been any corruption of the contents.
+
+    Note, the time to calculate the CRC is dependent on the CPU speed
+    and the size of memory. For an 64Mbyte RAM area on an 200MHz
+    S3C2410, this can take approximately 4 seconds to complete.
+
+    This support requires the CRC32 function to be enabled.
+
+
+  `S3C2410 PM Suspend CRC Chunksize (KiB)`
+
+    Defines the size of memory each CRC chunk covers. A smaller value
+    will mean that the CRC data block will take more memory, but will
+    identify any faults with better precision
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2004 Simtec Electronics
diff --git a/Documentation/arm/samsung-s3c24xx/usb-host.rst b/Documentation/arm/samsung-s3c24xx/usb-host.rst
new file mode 100644
index 000000000000..c84268bd1884
--- /dev/null
+++ b/Documentation/arm/samsung-s3c24xx/usb-host.rst
@@ -0,0 +1,91 @@
+========================
+S3C24XX USB Host support
+========================
+
+
+
+Introduction
+------------
+
+  This document details the S3C2410/S3C2440 in-built OHCI USB host support.
+
+Configuration
+-------------
+
+  Enable at least the following kernel options:
+
+  menuconfig::
+
+   Device Drivers  --->
+     USB support  --->
+       <*> Support for Host-side USB
+       <*>   OHCI HCD support
+
+
+  .config:
+
+    - CONFIG_USB
+    - CONFIG_USB_OHCI_HCD
+
+
+  Once these options are configured, the standard set of USB device
+  drivers can be configured and used.
+
+
+Board Support
+-------------
+
+  The driver attaches to a platform device, which will need to be
+  added by the board specific support file in linux/arch/arm/mach-s3c2410,
+  such as mach-bast.c or mach-smdk2410.c
+
+  The platform device's platform_data field is only needed if the
+  board implements extra power control or over-current monitoring.
+
+  The OHCI driver does not ensure the state of the S3C2410's MISCCTRL
+  register, so if both ports are to be used for the host, then it is
+  the board support file's responsibility to ensure that the second
+  port is configured to be connected to the OHCI core.
+
+
+Platform Data
+-------------
+
+  See arch/arm/mach-s3c2410/include/mach/usb-control.h for the
+  descriptions of the platform device data. An implementation
+  can be found in linux/arch/arm/mach-s3c2410/usb-simtec.c .
+
+  The `struct s3c2410_hcd_info` contains a pair of functions
+  that get called to enable over-current detection, and to
+  control the port power status.
+
+  The ports are numbered 0 and 1.
+
+  power_control:
+    Called to enable or disable the power on the port.
+
+  enable_oc:
+    Called to enable or disable the over-current monitoring.
+    This should claim or release the resources being used to
+    check the power condition on the port, such as an IRQ.
+
+  report_oc:
+    The OHCI driver fills this field in for the over-current code
+    to call when there is a change to the over-current state on
+    an port. The ports argument is a bitmask of 1 bit per port,
+    with bit X being 1 for an over-current on port X.
+
+    The function s3c2410_usb_report_oc() has been provided to
+    ensure this is called correctly.
+
+  port[x]:
+    This is struct describes each port, 0 or 1. The platform driver
+    should set the flags field of each port to S3C_HCDFLG_USED if
+    the port is enabled.
+
+
+
+Document Author
+---------------
+
+Ben Dooks, Copyright 2005 Simtec Electronics
diff --git a/Documentation/arm/samsung/bootloader-interface.rst b/Documentation/arm/samsung/bootloader-interface.rst
new file mode 100644
index 000000000000..a56f325dae78
--- /dev/null
+++ b/Documentation/arm/samsung/bootloader-interface.rst
@@ -0,0 +1,81 @@
+==========================================================
+Interface between kernel and boot loaders on Exynos boards
+==========================================================
+
+Author: Krzysztof Kozlowski
+
+Date  : 6 June 2015
+
+The document tries to describe currently used interface between Linux kernel
+and boot loaders on Samsung Exynos based boards. This is not a definition
+of interface but rather a description of existing state, a reference
+for information purpose only.
+
+In the document "boot loader" means any of following: U-boot, proprietary
+SBOOT or any other firmware for ARMv7 and ARMv8 initializing the board before
+executing kernel.
+
+
+1. Non-Secure mode
+
+Address:      sysram_ns_base_addr
+
+============= ============================================ ==================
+Offset        Value                                        Purpose
+============= ============================================ ==================
+0x08          exynos_cpu_resume_ns, mcpm_entry_point       System suspend
+0x0c          0x00000bad (Magic cookie)                    System suspend
+0x1c          exynos4_secondary_startup                    Secondary CPU boot
+0x1c + 4*cpu  exynos4_secondary_startup (Exynos4412)       Secondary CPU boot
+0x20          0xfcba0d10 (Magic cookie)                    AFTR
+0x24          exynos_cpu_resume_ns                         AFTR
+0x28 + 4*cpu  0x8 (Magic cookie, Exynos3250)               AFTR
+0x28          0x0 or last value during resume (Exynos542x) System suspend
+============= ============================================ ==================
+
+
+2. Secure mode
+
+Address:      sysram_base_addr
+
+============= ============================================ ==================
+Offset        Value                                        Purpose
+============= ============================================ ==================
+0x00          exynos4_secondary_startup                    Secondary CPU boot
+0x04          exynos4_secondary_startup (Exynos542x)       Secondary CPU boot
+4*cpu         exynos4_secondary_startup (Exynos4412)       Secondary CPU boot
+0x20          exynos_cpu_resume (Exynos4210 r1.0)          AFTR
+0x24          0xfcba0d10 (Magic cookie, Exynos4210 r1.0)   AFTR
+============= ============================================ ==================
+
+Address:      pmu_base_addr
+
+============= ============================================ ==================
+Offset        Value                                        Purpose
+============= ============================================ ==================
+0x0800        exynos_cpu_resume                            AFTR, suspend
+0x0800        mcpm_entry_point (Exynos542x with MCPM)      AFTR, suspend
+0x0804        0xfcba0d10 (Magic cookie)                    AFTR
+0x0804        0x00000bad (Magic cookie)                    System suspend
+0x0814        exynos4_secondary_startup (Exynos4210 r1.1)  Secondary CPU boot
+0x0818        0xfcba0d10 (Magic cookie, Exynos4210 r1.1)   AFTR
+0x081C        exynos_cpu_resume (Exynos4210 r1.1)          AFTR
+============= ============================================ ==================
+
+3. Other (regardless of secure/non-secure mode)
+
+Address:      pmu_base_addr
+
+============= =============================== ===============================
+Offset        Value                           Purpose
+============= =============================== ===============================
+0x0908        Non-zero                        Secondary CPU boot up indicator
+                                              on Exynos3250 and Exynos542x
+============= =============================== ===============================
+
+
+4. Glossary
+
+AFTR - ARM Off Top Running, a low power mode, Cortex cores and many other
+modules are power gated, except the TOP modules
+MCPM - Multi-Cluster Power Management
diff --git a/Documentation/arm/samsung/clksrc-change-registers.awk b/Documentation/arm/samsung/clksrc-change-registers.awk
new file mode 100755
index 000000000000..7be1b8aa7cd9
--- /dev/null
+++ b/Documentation/arm/samsung/clksrc-change-registers.awk
@@ -0,0 +1,166 @@
+#!/usr/bin/awk -f
+#
+# Copyright 2010 Ben Dooks <ben-linux@fluff.org>
+#
+# Released under GPLv2
+
+# example usage
+# ./clksrc-change-registers.awk arch/arm/plat-s5pc1xx/include/plat/regs-clock.h < src > dst
+
+function extract_value(s)
+{
+    eqat = index(s, "=")
+    comat = index(s, ",")
+    return substr(s, eqat+2, (comat-eqat)-2)
+}
+
+function remove_brackets(b)
+{
+    return substr(b, 2, length(b)-2)
+}
+
+function splitdefine(l, p)
+{
+    r = split(l, tp)
+
+    p[0] = tp[2]
+    p[1] = remove_brackets(tp[3])
+}
+
+function find_length(f)
+{
+    if (0)
+	printf "find_length " f "\n" > "/dev/stderr"
+
+    if (f ~ /0x1/)
+	return 1
+    else if (f ~ /0x3/)
+	return 2
+    else if (f ~ /0x7/)
+	return 3
+    else if (f ~ /0xf/)
+	return 4
+
+    printf "unknown length " f "\n" > "/dev/stderr"
+    exit
+}
+
+function find_shift(s)
+{
+    id = index(s, "<")
+    if (id <= 0) {
+	printf "cannot find shift " s "\n" > "/dev/stderr"
+	exit
+    }
+
+    return substr(s, id+2)
+}
+
+
+BEGIN {
+    if (ARGC < 2) {
+	print "too few arguments" > "/dev/stderr"
+	exit
+    }
+
+# read the header file and find the mask values that we will need
+# to replace and create an associative array of values
+
+    while (getline line < ARGV[1] > 0) {
+	if (line ~ /\#define.*_MASK/ &&
+	    !(line ~ /USB_SIG_MASK/)) {
+	    splitdefine(line, fields)
+	    name = fields[0]
+	    if (0)
+		printf "MASK " line "\n" > "/dev/stderr"
+	    dmask[name,0] = find_length(fields[1])
+	    dmask[name,1] = find_shift(fields[1])
+	    if (0)
+		printf "=> '" name "' LENGTH=" dmask[name,0] " SHIFT=" dmask[name,1] "\n" > "/dev/stderr"
+	} else {
+	}
+    }
+
+    delete ARGV[1]
+}
+
+/clksrc_clk.*=.*{/ {
+    shift=""
+    mask=""
+    divshift=""
+    reg_div=""
+    reg_src=""
+    indent=1
+
+    print $0
+
+    for(; indent >= 1;) {
+	if ((getline line) <= 0) {
+	    printf "unexpected end of file" > "/dev/stderr"
+	    exit 1;
+	}
+
+	if (line ~ /\.shift/) {
+	    shift = extract_value(line)
+	} else if (line ~ /\.mask/) {
+	    mask = extract_value(line)
+	} else if (line ~ /\.reg_divider/) {
+	    reg_div = extract_value(line)
+	} else if (line ~ /\.reg_source/) {
+	    reg_src = extract_value(line)
+	} else if (line ~ /\.divider_shift/) {
+	    divshift = extract_value(line)
+	} else if (line ~ /{/) {
+		indent++
+		print line
+	    } else if (line ~ /}/) {
+	    indent--
+
+	    if (indent == 0) {
+		if (0) {
+		    printf "shift '" shift   "' ='" dmask[shift,0] "'\n" > "/dev/stderr"
+		    printf "mask  '" mask    "'\n" > "/dev/stderr"
+		    printf "dshft '" divshift "'\n" > "/dev/stderr"
+		    printf "rdiv  '" reg_div "'\n" > "/dev/stderr"
+		    printf "rsrc  '" reg_src "'\n" > "/dev/stderr"
+		}
+
+		generated = mask
+		sub(reg_src, reg_div, generated)
+
+		if (0) {
+		    printf "/* rsrc " reg_src " */\n"
+		    printf "/* rdiv " reg_div " */\n"
+		    printf "/* shift " shift " */\n"
+		    printf "/* mask " mask " */\n"
+		    printf "/* generated " generated " */\n"
+		}
+
+		if (reg_div != "") {
+		    printf "\t.reg_div = { "
+		    printf ".reg = " reg_div ", "
+		    printf ".shift = " dmask[generated,1] ", "
+		    printf ".size = " dmask[generated,0] ", "
+		    printf "},\n"
+		}
+
+		printf "\t.reg_src = { "
+		printf ".reg = " reg_src ", "
+		printf ".shift = " dmask[mask,1] ", "
+		printf ".size = " dmask[mask,0] ", "
+
+		printf "},\n"
+
+	    }
+
+	    print line
+	} else {
+	    print line
+	}
+
+	if (0)
+	    printf indent ":" line "\n" > "/dev/stderr"
+    }
+}
+
+// && ! /clksrc_clk.*=.*{/ { print $0 }
diff --git a/Documentation/arm/samsung/gpio.rst b/Documentation/arm/samsung/gpio.rst
new file mode 100644
index 000000000000..5f7cadd7159e
--- /dev/null
+++ b/Documentation/arm/samsung/gpio.rst
@@ -0,0 +1,41 @@
+===========================
+Samsung GPIO implementation
+===========================
+
+Introduction
+------------
+
+This outlines the Samsung GPIO implementation and the architecture
+specific calls provided alongside the drivers/gpio core.
+
+
+S3C24XX (Legacy)
+----------------
+
+See Documentation/arm/samsung-s3c24xx/gpio.rst for more information
+about these devices. Their implementation has been brought into line
+with the core samsung implementation described in this document.
+
+
+GPIOLIB integration
+-------------------
+
+The gpio implementation uses gpiolib as much as possible, only providing
+specific calls for the items that require Samsung specific handling, such
+as pin special-function or pull resistor control.
+
+GPIO numbering is synchronised between the Samsung and gpiolib system.
+
+
+PIN configuration
+-----------------
+
+Pin configuration is specific to the Samsung architecture, with each SoC
+registering the necessary information for the core gpio configuration
+implementation to configure pins as necessary.
+
+The s3c_gpio_cfgpin() and s3c_gpio_setpull() provide the means for a
+driver or machine to change gpio configuration.
+
+See arch/arm/plat-samsung/include/plat/gpio-cfg.h for more information
+on these functions.
diff --git a/Documentation/arm/samsung/index.rst b/Documentation/arm/samsung/index.rst
new file mode 100644
index 000000000000..f54d95734362
--- /dev/null
+++ b/Documentation/arm/samsung/index.rst
@@ -0,0 +1,10 @@
+===========
+Samsung SoC
+===========
+
+.. toctree::
+   :maxdepth: 1
+
+   gpio
+   bootloader-interface
+   overview
diff --git a/Documentation/arm/samsung/overview.rst b/Documentation/arm/samsung/overview.rst
new file mode 100644
index 000000000000..e74307897416
--- /dev/null
+++ b/Documentation/arm/samsung/overview.rst
@@ -0,0 +1,89 @@
+==========================
+Samsung ARM Linux Overview
+==========================
+
+Introduction
+------------
+
+  The Samsung range of ARM SoCs spans many similar devices, from the initial
+  ARM9 through to the newest ARM cores. This document shows an overview of
+  the current kernel support, how to use it and where to find the code
+  that supports this.
+
+  The currently supported SoCs are:
+
+  - S3C24XX: See Documentation/arm/samsung-s3c24xx/overview.rst for full list
+  - S3C64XX: S3C6400 and S3C6410
+  - S5PC110 / S5PV210
+
+
+S3C24XX Systems
+---------------
+
+  There is still documentation in Documnetation/arm/Samsung-S3C24XX/ which
+  deals with the architecture and drivers specific to these devices.
+
+  See Documentation/arm/samsung-s3c24xx/overview.rst for more information
+  on the implementation details and specific support.
+
+
+Configuration
+-------------
+
+  A number of configurations are supplied, as there is no current way of
+  unifying all the SoCs into one kernel.
+
+  s5pc110_defconfig
+	- S5PC110 specific default configuration
+  s5pv210_defconfig
+	- S5PV210 specific default configuration
+
+
+Layout
+------
+
+  The directory layout is currently being restructured, and consists of
+  several platform directories and then the machine specific directories
+  of the CPUs being built for.
+
+  plat-samsung provides the base for all the implementations, and is the
+  last in the line of include directories that are processed for the build
+  specific information. It contains the base clock, GPIO and device definitions
+  to get the system running.
+
+  plat-s3c24xx is for s3c24xx specific builds, see the S3C24XX docs.
+
+  plat-s5p is for s5p specific builds, and contains common support for the
+  S5P specific systems. Not all S5Ps use all the features in this directory
+  due to differences in the hardware.
+
+
+Layout changes
+--------------
+
+  The old plat-s3c and plat-s5pc1xx directories have been removed, with
+  support moved to either plat-samsung or plat-s5p as necessary. These moves
+  where to simplify the include and dependency issues involved with having
+  so many different platform directories.
+
+
+Port Contributors
+-----------------
+
+  Ben Dooks (BJD)
+  Vincent Sanders
+  Herbert Potzl
+  Arnaud Patard (RTP)
+  Roc Wu
+  Klaus Fetscher
+  Dimitry Andric
+  Shannon Holland
+  Guillaume Gourat (NexVision)
+  Christer Weinigel (wingel) (Acer N30)
+  Lucas Correia Villa Real (S3C2400 port)
+
+
+Document Author
+---------------
+
+Copyright 2009-2010 Ben Dooks <ben-linux@fluff.org>
diff --git a/Documentation/arm/setup.rst b/Documentation/arm/setup.rst
new file mode 100644
index 000000000000..8e12ef3fb9a7
--- /dev/null
+++ b/Documentation/arm/setup.rst
@@ -0,0 +1,108 @@
+=============================================
+Kernel initialisation parameters on ARM Linux
+=============================================
+
+The following document describes the kernel initialisation parameter
+structure, otherwise known as 'struct param_struct' which is used
+for most ARM Linux architectures.
+
+This structure is used to pass initialisation parameters from the
+kernel loader to the Linux kernel proper, and may be short lived
+through the kernel initialisation process.  As a general rule, it
+should not be referenced outside of arch/arm/kernel/setup.c:setup_arch().
+
+There are a lot of parameters listed in there, and they are described
+below:
+
+ page_size
+   This parameter must be set to the page size of the machine, and
+   will be checked by the kernel.
+
+ nr_pages
+   This is the total number of pages of memory in the system.  If
+   the memory is banked, then this should contain the total number
+   of pages in the system.
+
+   If the system contains separate VRAM, this value should not
+   include this information.
+
+ ramdisk_size
+   This is now obsolete, and should not be used.
+
+ flags
+   Various kernel flags, including:
+
+    =====   ========================
+    bit 0   1 = mount root read only
+    bit 1   unused
+    bit 2   0 = load ramdisk
+    bit 3   0 = prompt for ramdisk
+    =====   ========================
+
+ rootdev
+   major/minor number pair of device to mount as the root filesystem.
+
+ video_num_cols / video_num_rows
+   These two together describe the character size of the dummy console,
+   or VGA console character size.  They should not be used for any other
+   purpose.
+
+   It's generally a good idea to set these to be either standard VGA, or
+   the equivalent character size of your fbcon display.  This then allows
+   all the bootup messages to be displayed correctly.
+
+ video_x / video_y
+   This describes the character position of cursor on VGA console, and
+   is otherwise unused. (should not be used for other console types, and
+   should not be used for other purposes).
+
+ memc_control_reg
+   MEMC chip control register for Acorn Archimedes and Acorn A5000
+   based machines.  May be used differently by different architectures.
+
+ sounddefault
+   Default sound setting on Acorn machines.  May be used differently by
+   different architectures.
+
+ adfsdrives
+   Number of ADFS/MFM disks.  May be used differently by different
+   architectures.
+
+ bytes_per_char_h / bytes_per_char_v
+   These are now obsolete, and should not be used.
+
+ pages_in_bank[4]
+   Number of pages in each bank of the systems memory (used for RiscPC).
+   This is intended to be used on systems where the physical memory
+   is non-contiguous from the processors point of view.
+
+ pages_in_vram
+   Number of pages in VRAM (used on Acorn RiscPC).  This value may also
+   be used by loaders if the size of the video RAM can't be obtained
+   from the hardware.
+
+ initrd_start / initrd_size
+   This describes the kernel virtual start address and size of the
+   initial ramdisk.
+
+ rd_start
+   Start address in sectors of the ramdisk image on a floppy disk.
+
+ system_rev
+   system revision number.
+
+ system_serial_low / system_serial_high
+   system 64-bit serial number
+
+ mem_fclk_21285
+   The speed of the external oscillator to the 21285 (footbridge),
+   which control's the speed of the memory bus, timer & serial port.
+   Depending upon the speed of the cpu its value can be between
+   0-66 MHz. If no params are passed or a value of zero is passed,
+   then a value of 50 Mhz is the default on 21285 architectures.
+
+ paths[8][128]
+   These are now obsolete, and should not be used.
+
+ commandline
+   Kernel command line parameters.  Details can be found elsewhere.
diff --git a/Documentation/arm/sh-mobile/.gitignore b/Documentation/arm/sh-mobile/.gitignore
new file mode 100644
index 000000000000..c928dbf3cc88
--- /dev/null
+++ b/Documentation/arm/sh-mobile/.gitignore
@@ -0,0 +1 @@
+vrl4
diff --git a/Documentation/arm/spear/overview.rst b/Documentation/arm/spear/overview.rst
new file mode 100644
index 000000000000..8a1a87aca427
--- /dev/null
+++ b/Documentation/arm/spear/overview.rst
@@ -0,0 +1,65 @@
+========================
+SPEAr ARM Linux Overview
+========================
+
+Introduction
+------------
+
+  SPEAr (Structured Processor Enhanced Architecture).
+  weblink : http://www.st.com/spear
+
+  The ST Microelectronics SPEAr range of ARM9/CortexA9 System-on-Chip CPUs are
+  supported by the 'spear' platform of ARM Linux. Currently SPEAr1310,
+  SPEAr1340, SPEAr300, SPEAr310, SPEAr320 and SPEAr600 SOCs are supported.
+
+  Hierarchy in SPEAr is as follows:
+
+  SPEAr (Platform)
+	- SPEAr3XX (3XX SOC series, based on ARM9)
+		- SPEAr300 (SOC)
+			- SPEAr300 Evaluation Board
+		- SPEAr310 (SOC)
+			- SPEAr310 Evaluation Board
+		- SPEAr320 (SOC)
+			- SPEAr320 Evaluation Board
+	- SPEAr6XX (6XX SOC series, based on ARM9)
+		- SPEAr600 (SOC)
+			- SPEAr600 Evaluation Board
+	- SPEAr13XX (13XX SOC series, based on ARM CORTEXA9)
+		- SPEAr1310 (SOC)
+			- SPEAr1310 Evaluation Board
+		- SPEAr1340 (SOC)
+			- SPEAr1340 Evaluation Board
+
+Configuration
+-------------
+
+  A generic configuration is provided for each machine, and can be used as the
+  default by::
+
+	make spear13xx_defconfig
+	make spear3xx_defconfig
+	make spear6xx_defconfig
+
+Layout
+------
+
+  The common files for multiple machine families (SPEAr3xx, SPEAr6xx and
+  SPEAr13xx) are located in the platform code contained in arch/arm/plat-spear
+  with headers in plat/.
+
+  Each machine series have a directory with name arch/arm/mach-spear followed by
+  series name. Like mach-spear3xx, mach-spear6xx and mach-spear13xx.
+
+  Common file for machines of spear3xx family is mach-spear3xx/spear3xx.c, for
+  spear6xx is mach-spear6xx/spear6xx.c and for spear13xx family is
+  mach-spear13xx/spear13xx.c. mach-spear* also contain soc/machine specific
+  files, like spear1310.c, spear1340.c spear300.c, spear310.c, spear320.c and
+  spear600.c.  mach-spear* doesn't contains board specific files as they fully
+  support Flattened Device Tree.
+
+
+Document Author
+---------------
+
+  Viresh Kumar <vireshk@kernel.org>, (c) 2010-2012 ST Microelectronics
diff --git a/Documentation/arm/sti/overview.rst b/Documentation/arm/sti/overview.rst
new file mode 100644
index 000000000000..70743617a74f
--- /dev/null
+++ b/Documentation/arm/sti/overview.rst
@@ -0,0 +1,36 @@
+======================
+STi ARM Linux Overview
+======================
+
+Introduction
+------------
+
+  The ST Microelectronics Multimedia and Application Processors range of
+  CortexA9 System-on-Chip are supported by the 'STi' platform of
+  ARM Linux. Currently STiH415, STiH416 SOCs are supported with both
+  B2000 and B2020 Reference boards.
+
+
+configuration
+-------------
+
+  A generic configuration is provided for both STiH415/416, and can be used as the
+  default by::
+
+	make stih41x_defconfig
+
+Layout
+------
+
+  All the files for multiple machine families (STiH415, STiH416, and STiG125)
+  are located in the platform code contained in arch/arm/mach-sti
+
+  There is a generic board board-dt.c in the mach folder which support
+  Flattened Device Tree, which means, It works with any compatible board with
+  Device Trees.
+
+
+Document Author
+---------------
+
+  Srinivas Kandagatla <srinivas.kandagatla@st.com>, (c) 2013 ST Microelectronics
diff --git a/Documentation/arm/sti/overview.txt b/Documentation/arm/sti/overview.txt
deleted file mode 100644
index 1a4e93d6027f..000000000000
--- a/Documentation/arm/sti/overview.txt
+++ /dev/null
@@ -1,33 +0,0 @@
-			STi ARM Linux Overview
-			==========================
-
-Introduction
-------------
-
-  The ST Microelectronics Multimedia and Application Processors range of
-  CortexA9 System-on-Chip are supported by the 'STi' platform of
-  ARM Linux. Currently STiH415, STiH416 SOCs are supported with both
-  B2000 and B2020 Reference boards.
-
-
-  configuration
-  -------------
-
-  A generic configuration is provided for both STiH415/416, and can be used as the
-  default by
-	make stih41x_defconfig
-
-  Layout
-  ------
-  All the files for multiple machine families (STiH415, STiH416, and STiG125)
-  are located in the platform code contained in arch/arm/mach-sti
-
-  There is a generic board board-dt.c in the mach folder which support
-  Flattened Device Tree, which means, It works with any compatible board with
-  Device Trees.
-
-
-  Document Author
-  ---------------
-
-  Srinivas Kandagatla <srinivas.kandagatla@st.com>, (c) 2013 ST Microelectronics
diff --git a/Documentation/arm/sti/stih407-overview.rst b/Documentation/arm/sti/stih407-overview.rst
new file mode 100644
index 000000000000..027e75bc7b7c
--- /dev/null
+++ b/Documentation/arm/sti/stih407-overview.rst
@@ -0,0 +1,19 @@
+================
+STiH407 Overview
+================
+
+Introduction
+------------
+
+    The STiH407 is the new generation of SoC for Multi-HD, AVC set-top boxes
+    and server/connected client application for satellite, cable, terrestrial
+    and IP-STB markets.
+
+    Features
+    - ARM Cortex-A9 1.5 GHz dual core CPU (28nm)
+    - SATA2, USB 3.0, PCIe, Gbit Ethernet
+
+Document Author
+---------------
+
+  Maxime Coquelin <maxime.coquelin@st.com>, (c) 2014 ST Microelectronics
diff --git a/Documentation/arm/sti/stih407-overview.txt b/Documentation/arm/sti/stih407-overview.txt
deleted file mode 100644
index 3343f32f58bc..000000000000
--- a/Documentation/arm/sti/stih407-overview.txt
+++ /dev/null
@@ -1,18 +0,0 @@
-			STiH407 Overview
-			================
-
-Introduction
-------------
-
-    The STiH407 is the new generation of SoC for Multi-HD, AVC set-top boxes
-    and server/connected client application for satellite, cable, terrestrial
-    and IP-STB markets.
-
-    Features
-    - ARM Cortex-A9 1.5 GHz dual core CPU (28nm)
-    - SATA2, USB 3.0, PCIe, Gbit Ethernet
-
-  Document Author
-  ---------------
-
-  Maxime Coquelin <maxime.coquelin@st.com>, (c) 2014 ST Microelectronics
diff --git a/Documentation/arm/sti/stih415-overview.rst b/Documentation/arm/sti/stih415-overview.rst
new file mode 100644
index 000000000000..b67452d610c4
--- /dev/null
+++ b/Documentation/arm/sti/stih415-overview.rst
@@ -0,0 +1,14 @@
+================
+STiH415 Overview
+================
+
+Introduction
+------------
+
+    The STiH415 is the next generation of HD, AVC set-top box processors
+    for satellite, cable, terrestrial and IP-STB markets.
+
+    Features:
+
+    - ARM Cortex-A9 1.0 GHz, dual-core CPU
+    - SATA2x2,USB 2.0x3, PCIe, Gbit Ethernet MACx2
diff --git a/Documentation/arm/sti/stih415-overview.txt b/Documentation/arm/sti/stih415-overview.txt
deleted file mode 100644
index 1383e33f265d..000000000000
--- a/Documentation/arm/sti/stih415-overview.txt
+++ /dev/null
@@ -1,12 +0,0 @@
-			STiH415 Overview
-			================
-
-Introduction
-------------
-
-    The STiH415 is the next generation of HD, AVC set-top box processors
-    for satellite, cable, terrestrial and IP-STB markets.
-
-    Features
-    - ARM Cortex-A9 1.0 GHz, dual-core CPU
-    - SATA2x2,USB 2.0x3, PCIe, Gbit Ethernet MACx2
diff --git a/Documentation/arm/sti/stih416-overview.rst b/Documentation/arm/sti/stih416-overview.rst
new file mode 100644
index 000000000000..93f17d74d8db
--- /dev/null
+++ b/Documentation/arm/sti/stih416-overview.rst
@@ -0,0 +1,13 @@
+================
+STiH416 Overview
+================
+
+Introduction
+------------
+
+    The STiH416 is the next generation of HD, AVC set-top box processors
+    for satellite, cable, terrestrial and IP-STB markets.
+
+    Features
+    - ARM Cortex-A9 1.2 GHz dual core CPU
+    - SATA2x2,USB 2.0x3, PCIe, Gbit Ethernet MACx2
diff --git a/Documentation/arm/sti/stih416-overview.txt b/Documentation/arm/sti/stih416-overview.txt
deleted file mode 100644
index 558444c201c6..000000000000
--- a/Documentation/arm/sti/stih416-overview.txt
+++ /dev/null
@@ -1,12 +0,0 @@
-			STiH416 Overview
-			================
-
-Introduction
-------------
-
-    The STiH416 is the next generation of HD, AVC set-top box processors
-    for satellite, cable, terrestrial and IP-STB markets.
-
-    Features
-    - ARM Cortex-A9 1.2 GHz dual core CPU
-    - SATA2x2,USB 2.0x3, PCIe, Gbit Ethernet MACx2
diff --git a/Documentation/arm/sti/stih418-overview.rst b/Documentation/arm/sti/stih418-overview.rst
new file mode 100644
index 000000000000..b563c1f4fe5a
--- /dev/null
+++ b/Documentation/arm/sti/stih418-overview.rst
@@ -0,0 +1,21 @@
+================
+STiH418 Overview
+================
+
+Introduction
+------------
+
+    The STiH418 is the new generation of SoC for UHDp60 set-top boxes
+    and server/connected client application for satellite, cable, terrestrial
+    and IP-STB markets.
+
+    Features
+    - ARM Cortex-A9 1.5 GHz quad core CPU (28nm)
+    - SATA2, USB 3.0, PCIe, Gbit Ethernet
+    - HEVC L5.1 Main 10
+    - VP9
+
+Document Author
+---------------
+
+  Maxime Coquelin <maxime.coquelin@st.com>, (c) 2015 ST Microelectronics
diff --git a/Documentation/arm/sti/stih418-overview.txt b/Documentation/arm/sti/stih418-overview.txt
deleted file mode 100644
index 1cd8fc80646d..000000000000
--- a/Documentation/arm/sti/stih418-overview.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-			STiH418 Overview
-			================
-
-Introduction
-------------
-
-    The STiH418 is the new generation of SoC for UHDp60 set-top boxes
-    and server/connected client application for satellite, cable, terrestrial
-    and IP-STB markets.
-
-    Features
-    - ARM Cortex-A9 1.5 GHz quad core CPU (28nm)
-    - SATA2, USB 3.0, PCIe, Gbit Ethernet
-    - HEVC L5.1 Main 10
-    - VP9
-
-  Document Author
-  ---------------
-
-  Maxime Coquelin <maxime.coquelin@st.com>, (c) 2015 ST Microelectronics
diff --git a/Documentation/arm/stm32/overview.rst b/Documentation/arm/stm32/overview.rst
index f7e734153860..85cfc8410798 100644
--- a/Documentation/arm/stm32/overview.rst
+++ b/Documentation/arm/stm32/overview.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ========================
 STM32 ARM Linux Overview
 ========================
diff --git a/Documentation/arm/stm32/stm32f429-overview.rst b/Documentation/arm/stm32/stm32f429-overview.rst
index 65bbb1c3b423..a7ebe8ea6697 100644
--- a/Documentation/arm/stm32/stm32f429-overview.rst
+++ b/Documentation/arm/stm32/stm32f429-overview.rst
@@ -1,5 +1,4 @@
-:orphan:
-
+==================
 STM32F429 Overview
 ==================
 
@@ -23,6 +22,4 @@ Datasheet and reference manual are publicly available on ST website (STM32F429_)
 
 .. _STM32F429: http://www.st.com/web/en/catalog/mmc/FM141/SC1169/SS1577/LN1806?ecmp=stm32f429-439_pron_pr-ces2014_nov2013
 
-:Authors:
-
-Maxime Coquelin <mcoquelin.stm32@gmail.com>
+:Authors: Maxime Coquelin <mcoquelin.stm32@gmail.com>
diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst
index 42d593085015..78befddc7740 100644
--- a/Documentation/arm/stm32/stm32f746-overview.rst
+++ b/Documentation/arm/stm32/stm32f746-overview.rst
@@ -1,5 +1,4 @@
-:orphan:
-
+==================
 STM32F746 Overview
 ==================
 
@@ -30,6 +29,4 @@ Datasheet and reference manual are publicly available on ST website (STM32F746_)
 
 .. _STM32F746: http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32f7-series/stm32f7x6/stm32f746ng.html
 
-:Authors:
-
-Alexandre Torgue <alexandre.torgue@st.com>
+:Authors: Alexandre Torgue <alexandre.torgue@st.com>
diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst
index f6adac862b17..e482980ddf21 100644
--- a/Documentation/arm/stm32/stm32f769-overview.rst
+++ b/Documentation/arm/stm32/stm32f769-overview.rst
@@ -1,5 +1,4 @@
-:orphan:
-
+==================
 STM32F769 Overview
 ==================
 
@@ -32,6 +31,4 @@ Datasheet and reference manual are publicly available on ST website (STM32F769_)
 
 .. _STM32F769: http://www.st.com/content/st_com/en/products/microcontrollers/stm32-32-bit-arm-cortex-mcus/stm32-high-performance-mcus/stm32f7-series/stm32f7x9/stm32f769ni.html
 
-:Authors:
-
-Alexandre Torgue <alexandre.torgue@st.com>
+:Authors: Alexandre Torgue <alexandre.torgue@st.com>
diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst
index c525835e7473..4e15f1a42730 100644
--- a/Documentation/arm/stm32/stm32h743-overview.rst
+++ b/Documentation/arm/stm32/stm32h743-overview.rst
@@ -1,5 +1,4 @@
-:orphan:
-
+==================
 STM32H743 Overview
 ==================
 
@@ -31,6 +30,4 @@ Datasheet and reference manual are publicly available on ST website (STM32H743_)
 
 .. _STM32H743: http://www.st.com/en/microcontrollers/stm32h7x3.html?querycriteria=productId=LN2033
 
-:Authors:
-
-Alexandre Torgue <alexandre.torgue@st.com>
+:Authors: Alexandre Torgue <alexandre.torgue@st.com>
diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst
index 2c52cd020601..f62fdc8e7d8d 100644
--- a/Documentation/arm/stm32/stm32mp157-overview.rst
+++ b/Documentation/arm/stm32/stm32mp157-overview.rst
@@ -1,5 +1,4 @@
-:orphan:
-
+===================
 STM32MP157 Overview
 ===================
 
diff --git a/Documentation/arm/sunxi.rst b/Documentation/arm/sunxi.rst
new file mode 100644
index 000000000000..b037428aee98
--- /dev/null
+++ b/Documentation/arm/sunxi.rst
@@ -0,0 +1,150 @@
+==================
+ARM Allwinner SoCs
+==================
+
+This document lists all the ARM Allwinner SoCs that are currently
+supported in mainline by the Linux kernel. This document will also
+provide links to documentation and/or datasheet for these SoCs.
+
+SunXi family
+------------
+  Linux kernel mach directory: arch/arm/mach-sunxi
+
+  Flavors:
+
+    * ARM926 based SoCs
+      - Allwinner F20 (sun3i)
+
+        * Not Supported
+
+    * ARM Cortex-A8 based SoCs
+      - Allwinner A10 (sun4i)
+
+        * Datasheet
+
+	  http://dl.linux-sunxi.org/A10/A10%20Datasheet%20-%20v1.21%20%282012-04-06%29.pdf
+	* User Manual
+
+	  http://dl.linux-sunxi.org/A10/A10%20User%20Manual%20-%20v1.20%20%282012-04-09%2c%20DECRYPTED%29.pdf
+
+      - Allwinner A10s (sun5i)
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/A10s/A10s%20Datasheet%20-%20v1.20%20%282012-03-27%29.pdf
+
+      - Allwinner A13 / R8 (sun5i)
+
+        * Datasheet
+
+	  http://dl.linux-sunxi.org/A13/A13%20Datasheet%20-%20v1.12%20%282012-03-29%29.pdf
+        * User Manual
+
+          http://dl.linux-sunxi.org/A13/A13%20User%20Manual%20-%20v1.2%20%282013-01-08%29.pdf
+
+      - Next Thing Co GR8 (sun5i)
+
+    * Single ARM Cortex-A7 based SoCs
+      - Allwinner V3s (sun8i)
+
+        * Datasheet
+
+          http://linux-sunxi.org/File:Allwinner_V3s_Datasheet_V1.0.pdf
+
+    * Dual ARM Cortex-A7 based SoCs
+      - Allwinner A20 (sun7i)
+
+        * User Manual
+
+          http://dl.linux-sunxi.org/A20/A20%20User%20Manual%202013-03-22.pdf
+
+      - Allwinner A23 (sun8i)
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/A23/A23%20Datasheet%20V1.0%2020130830.pdf
+
+        * User Manual
+
+          http://dl.linux-sunxi.org/A23/A23%20User%20Manual%20V1.0%2020130830.pdf
+
+    * Quad ARM Cortex-A7 based SoCs
+      - Allwinner A31 (sun6i)
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/A31/A3x_release_document/A31/IC/A31%20datasheet%20V1.3%2020131106.pdf
+
+        * User Manual
+
+          http://dl.linux-sunxi.org/A31/A3x_release_document/A31/IC/A31%20user%20manual%20V1.1%2020130630.pdf
+
+      - Allwinner A31s (sun6i)
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/A31/A3x_release_document/A31s/IC/A31s%20datasheet%20V1.3%2020131106.pdf
+
+        * User Manual
+
+          http://dl.linux-sunxi.org/A31/A3x_release_document/A31s/IC/A31s%20User%20Manual%20%20V1.0%2020130322.pdf
+
+      - Allwinner A33 (sun8i)
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/A33/A33%20Datasheet%20release%201.1.pdf
+
+        * User Manual
+
+          http://dl.linux-sunxi.org/A33/A33%20user%20manual%20release%201.1.pdf
+
+      - Allwinner H2+ (sun8i)
+
+        * No document available now, but is known to be working properly with
+          H3 drivers and memory map.
+
+      - Allwinner H3 (sun8i)
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/H3/Allwinner_H3_Datasheet_V1.0.pdf
+
+      - Allwinner R40 (sun8i)
+
+        * Datasheet
+
+          https://github.com/tinalinux/docs/raw/r40-v1.y/R40_Datasheet_V1.0.pdf
+
+        * User Manual
+
+          https://github.com/tinalinux/docs/raw/r40-v1.y/Allwinner_R40_User_Manual_V1.0.pdf
+
+    * Quad ARM Cortex-A15, Quad ARM Cortex-A7 based SoCs
+      - Allwinner A80
+
+        * Datasheet
+
+	  http://dl.linux-sunxi.org/A80/A80_Datasheet_Revision_1.0_0404.pdf
+
+    * Octa ARM Cortex-A7 based SoCs
+      - Allwinner A83T
+
+        * Datasheet
+
+          https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_Datasheet_v1.3_20150510.pdf
+
+        * User Manual
+
+          https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_User_Manual_v1.5.1_20150513.pdf
+
+    * Quad ARM Cortex-A53 based SoCs
+      - Allwinner A64
+
+        * Datasheet
+
+          http://dl.linux-sunxi.org/A64/A64_Datasheet_V1.1.pdf
+
+        * User Manual
+
+          http://dl.linux-sunxi.org/A64/Allwinner%20A64%20User%20Manual%20v1.0.pdf
diff --git a/Documentation/arm/sunxi/README b/Documentation/arm/sunxi/README
deleted file mode 100644
index f8efc21998bf..000000000000
--- a/Documentation/arm/sunxi/README
+++ /dev/null
@@ -1,102 +0,0 @@
-ARM Allwinner SoCs
-==================
-
-This document lists all the ARM Allwinner SoCs that are currently
-supported in mainline by the Linux kernel. This document will also
-provide links to documentation and/or datasheet for these SoCs.
-
-SunXi family
-------------
-  Linux kernel mach directory: arch/arm/mach-sunxi
-
-  Flavors:
-    * ARM926 based SoCs
-      - Allwinner F20 (sun3i)
-        + Not Supported
-
-    * ARM Cortex-A8 based SoCs
-      - Allwinner A10 (sun4i)
-        + Datasheet
-	  http://dl.linux-sunxi.org/A10/A10%20Datasheet%20-%20v1.21%20%282012-04-06%29.pdf
-	+ User Manual
-	  http://dl.linux-sunxi.org/A10/A10%20User%20Manual%20-%20v1.20%20%282012-04-09%2c%20DECRYPTED%29.pdf
-
-      - Allwinner A10s (sun5i)
-        + Datasheet
-          http://dl.linux-sunxi.org/A10s/A10s%20Datasheet%20-%20v1.20%20%282012-03-27%29.pdf
-
-      - Allwinner A13 / R8 (sun5i)
-        + Datasheet
-	  http://dl.linux-sunxi.org/A13/A13%20Datasheet%20-%20v1.12%20%282012-03-29%29.pdf
-        + User Manual
-          http://dl.linux-sunxi.org/A13/A13%20User%20Manual%20-%20v1.2%20%282013-01-08%29.pdf
-
-      - Next Thing Co GR8 (sun5i)
-
-    * Single ARM Cortex-A7 based SoCs
-      - Allwinner V3s (sun8i)
-        + Datasheet
-          http://linux-sunxi.org/File:Allwinner_V3s_Datasheet_V1.0.pdf
-
-    * Dual ARM Cortex-A7 based SoCs
-      - Allwinner A20 (sun7i)
-        + User Manual
-          http://dl.linux-sunxi.org/A20/A20%20User%20Manual%202013-03-22.pdf
-
-      - Allwinner A23 (sun8i)
-        + Datasheet
-          http://dl.linux-sunxi.org/A23/A23%20Datasheet%20V1.0%2020130830.pdf
-        + User Manual
-          http://dl.linux-sunxi.org/A23/A23%20User%20Manual%20V1.0%2020130830.pdf
-
-    * Quad ARM Cortex-A7 based SoCs
-      - Allwinner A31 (sun6i)
-        + Datasheet
-          http://dl.linux-sunxi.org/A31/A3x_release_document/A31/IC/A31%20datasheet%20V1.3%2020131106.pdf
-        + User Manual
-          http://dl.linux-sunxi.org/A31/A3x_release_document/A31/IC/A31%20user%20manual%20V1.1%2020130630.pdf
-
-      - Allwinner A31s (sun6i)
-        + Datasheet
-          http://dl.linux-sunxi.org/A31/A3x_release_document/A31s/IC/A31s%20datasheet%20V1.3%2020131106.pdf
-        + User Manual
-          http://dl.linux-sunxi.org/A31/A3x_release_document/A31s/IC/A31s%20User%20Manual%20%20V1.0%2020130322.pdf
-
-      - Allwinner A33 (sun8i)
-        + Datasheet
-          http://dl.linux-sunxi.org/A33/A33%20Datasheet%20release%201.1.pdf
-        + User Manual
-          http://dl.linux-sunxi.org/A33/A33%20user%20manual%20release%201.1.pdf
-
-      - Allwinner H2+ (sun8i)
-        + No document available now, but is known to be working properly with
-          H3 drivers and memory map.
-
-      - Allwinner H3 (sun8i)
-        + Datasheet
-          http://dl.linux-sunxi.org/H3/Allwinner_H3_Datasheet_V1.0.pdf
-
-      - Allwinner R40 (sun8i)
-        + Datasheet
-          https://github.com/tinalinux/docs/raw/r40-v1.y/R40_Datasheet_V1.0.pdf
-        + User Manual
-          https://github.com/tinalinux/docs/raw/r40-v1.y/Allwinner_R40_User_Manual_V1.0.pdf
-
-    * Quad ARM Cortex-A15, Quad ARM Cortex-A7 based SoCs
-      - Allwinner A80
-        + Datasheet
-	  http://dl.linux-sunxi.org/A80/A80_Datasheet_Revision_1.0_0404.pdf
-
-    * Octa ARM Cortex-A7 based SoCs
-      - Allwinner A83T
-        + Datasheet
-          https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_Datasheet_v1.3_20150510.pdf
-        + User Manual
-          https://github.com/allwinner-zh/documents/raw/master/A83T/A83T_User_Manual_v1.5.1_20150513.pdf
-
-    * Quad ARM Cortex-A53 based SoCs
-      - Allwinner A64
-        + Datasheet
-          http://dl.linux-sunxi.org/A64/A64_Datasheet_V1.1.pdf
-        + User Manual
-          http://dl.linux-sunxi.org/A64/Allwinner%20A64%20User%20Manual%20v1.0.pdf
diff --git a/Documentation/arm/sunxi/clocks.rst b/Documentation/arm/sunxi/clocks.rst
new file mode 100644
index 000000000000..23bd03f3e21f
--- /dev/null
+++ b/Documentation/arm/sunxi/clocks.rst
@@ -0,0 +1,57 @@
+=======================================================
+Frequently asked questions about the sunxi clock system
+=======================================================
+
+This document contains useful bits of information that people tend to ask
+about the sunxi clock system, as well as accompanying ASCII art when adequate.
+
+Q: Why is the main 24MHz oscillator gatable? Wouldn't that break the
+   system?
+
+A: The 24MHz oscillator allows gating to save power. Indeed, if gated
+   carelessly the system would stop functioning, but with the right
+   steps, one can gate it and keep the system running. Consider this
+   simplified suspend example:
+
+   While the system is operational, you would see something like::
+
+      24MHz         32kHz
+       |
+      PLL1
+       \
+        \_ CPU Mux
+             |
+           [CPU]
+
+   When you are about to suspend, you switch the CPU Mux to the 32kHz
+   oscillator::
+
+      24Mhz         32kHz
+       |              |
+      PLL1            |
+                     /
+           CPU Mux _/
+             |
+           [CPU]
+
+    Finally you can gate the main oscillator::
+
+                    32kHz
+                      |
+                      |
+                     /
+           CPU Mux _/
+             |
+           [CPU]
+
+Q: Were can I learn more about the sunxi clocks?
+
+A: The linux-sunxi wiki contains a page documenting the clock registers,
+   you can find it at
+
+        http://linux-sunxi.org/A10/CCM
+
+   The authoritative source for information at this time is the ccmu driver
+   released by Allwinner, you can find it at
+
+        https://github.com/linux-sunxi/linux-sunxi/tree/sunxi-3.0/arch/arm/mach-sun4i/clock/ccmu
diff --git a/Documentation/arm/sunxi/clocks.txt b/Documentation/arm/sunxi/clocks.txt
deleted file mode 100644
index e09a88aa3136..000000000000
--- a/Documentation/arm/sunxi/clocks.txt
+++ /dev/null
@@ -1,56 +0,0 @@
-Frequently asked questions about the sunxi clock system
-=======================================================
-
-This document contains useful bits of information that people tend to ask
-about the sunxi clock system, as well as accompanying ASCII art when adequate.
-
-Q: Why is the main 24MHz oscillator gatable? Wouldn't that break the
-   system?
-
-A: The 24MHz oscillator allows gating to save power. Indeed, if gated
-   carelessly the system would stop functioning, but with the right
-   steps, one can gate it and keep the system running. Consider this
-   simplified suspend example:
-
-   While the system is operational, you would see something like
-
-      24MHz         32kHz
-       |
-      PLL1
-       \
-        \_ CPU Mux
-             |
-           [CPU]
-
-   When you are about to suspend, you switch the CPU Mux to the 32kHz
-   oscillator:
-
-      24Mhz         32kHz
-       |              |
-      PLL1            |
-                     /
-           CPU Mux _/
-             |
-           [CPU]
-
-    Finally you can gate the main oscillator
-
-                    32kHz
-                      |
-                      |
-                     /
-           CPU Mux _/
-             |
-           [CPU]
-
-Q: Were can I learn more about the sunxi clocks?
-
-A: The linux-sunxi wiki contains a page documenting the clock registers,
-   you can find it at
-
-        http://linux-sunxi.org/A10/CCM
-
-   The authoritative source for information at this time is the ccmu driver
-   released by Allwinner, you can find it at
-
-        https://github.com/linux-sunxi/linux-sunxi/tree/sunxi-3.0/arch/arm/mach-sun4i/clock/ccmu
diff --git a/Documentation/arm/swp_emulation b/Documentation/arm/swp_emulation
deleted file mode 100644
index af903d22fd93..000000000000
--- a/Documentation/arm/swp_emulation
+++ /dev/null
@@ -1,27 +0,0 @@
-Software emulation of deprecated SWP instruction (CONFIG_SWP_EMULATE)
----------------------------------------------------------------------
-
-ARMv6 architecture deprecates use of the SWP/SWPB instructions, and recommeds
-moving to the load-locked/store-conditional instructions LDREX and STREX.
-
-ARMv7 multiprocessing extensions introduce the ability to disable these
-instructions, triggering an undefined instruction exception when executed.
-Trapped instructions are emulated using an LDREX/STREX or LDREXB/STREXB
-sequence. If a memory access fault (an abort) occurs, a segmentation fault is
-signalled to the triggering process.
-
-/proc/cpu/swp_emulation holds some statistics/information, including the PID of
-the last process to trigger the emulation to be invocated. For example:
----
-Emulated SWP:		12
-Emulated SWPB:		0
-Aborted SWP{B}:		1
-Last process:		314
----
-
-NOTE: when accessing uncached shared regions, LDREX/STREX rely on an external
-transaction monitoring block called a global monitor to maintain update
-atomicity. If your system does not implement a global monitor, this option can
-cause programs that perform SWP operations to uncached memory to deadlock, as
-the STREX operation will always fail.
-
diff --git a/Documentation/arm/swp_emulation.rst b/Documentation/arm/swp_emulation.rst
new file mode 100644
index 000000000000..6a608a9c3715
--- /dev/null
+++ b/Documentation/arm/swp_emulation.rst
@@ -0,0 +1,27 @@
+Software emulation of deprecated SWP instruction (CONFIG_SWP_EMULATE)
+---------------------------------------------------------------------
+
+ARMv6 architecture deprecates use of the SWP/SWPB instructions, and recommeds
+moving to the load-locked/store-conditional instructions LDREX and STREX.
+
+ARMv7 multiprocessing extensions introduce the ability to disable these
+instructions, triggering an undefined instruction exception when executed.
+Trapped instructions are emulated using an LDREX/STREX or LDREXB/STREXB
+sequence. If a memory access fault (an abort) occurs, a segmentation fault is
+signalled to the triggering process.
+
+/proc/cpu/swp_emulation holds some statistics/information, including the PID of
+the last process to trigger the emulation to be invocated. For example::
+
+  Emulated SWP:		12
+  Emulated SWPB:		0
+  Aborted SWP{B}:		1
+  Last process:		314
+
+
+NOTE:
+  when accessing uncached shared regions, LDREX/STREX rely on an external
+  transaction monitoring block called a global monitor to maintain update
+  atomicity. If your system does not implement a global monitor, this option can
+  cause programs that perform SWP operations to uncached memory to deadlock, as
+  the STREX operation will always fail.
diff --git a/Documentation/arm/tcm.rst b/Documentation/arm/tcm.rst
new file mode 100644
index 000000000000..effd9c7bc968
--- /dev/null
+++ b/Documentation/arm/tcm.rst
@@ -0,0 +1,161 @@
+==================================================
+ARM TCM (Tightly-Coupled Memory) handling in Linux
+==================================================
+
+Written by Linus Walleij <linus.walleij@stericsson.com>
+
+Some ARM SoC:s have a so-called TCM (Tightly-Coupled Memory).
+This is usually just a few (4-64) KiB of RAM inside the ARM
+processor.
+
+Due to being embedded inside the CPU The TCM has a
+Harvard-architecture, so there is an ITCM (instruction TCM)
+and a DTCM (data TCM). The DTCM can not contain any
+instructions, but the ITCM can actually contain data.
+The size of DTCM or ITCM is minimum 4KiB so the typical
+minimum configuration is 4KiB ITCM and 4KiB DTCM.
+
+ARM CPU:s have special registers to read out status, physical
+location and size of TCM memories. arch/arm/include/asm/cputype.h
+defines a CPUID_TCM register that you can read out from the
+system control coprocessor. Documentation from ARM can be found
+at http://infocenter.arm.com, search for "TCM Status Register"
+to see documents for all CPUs. Reading this register you can
+determine if ITCM (bits 1-0) and/or DTCM (bit 17-16) is present
+in the machine.
+
+There is further a TCM region register (search for "TCM Region
+Registers" at the ARM site) that can report and modify the location
+size of TCM memories at runtime. This is used to read out and modify
+TCM location and size. Notice that this is not a MMU table: you
+actually move the physical location of the TCM around. At the
+place you put it, it will mask any underlying RAM from the
+CPU so it is usually wise not to overlap any physical RAM with
+the TCM.
+
+The TCM memory can then be remapped to another address again using
+the MMU, but notice that the TCM if often used in situations where
+the MMU is turned off. To avoid confusion the current Linux
+implementation will map the TCM 1 to 1 from physical to virtual
+memory in the location specified by the kernel. Currently Linux
+will map ITCM to 0xfffe0000 and on, and DTCM to 0xfffe8000 and
+on, supporting a maximum of 32KiB of ITCM and 32KiB of DTCM.
+
+Newer versions of the region registers also support dividing these
+TCMs in two separate banks, so for example an 8KiB ITCM is divided
+into two 4KiB banks with its own control registers. The idea is to
+be able to lock and hide one of the banks for use by the secure
+world (TrustZone).
+
+TCM is used for a few things:
+
+- FIQ and other interrupt handlers that need deterministic
+  timing and cannot wait for cache misses.
+
+- Idle loops where all external RAM is set to self-refresh
+  retention mode, so only on-chip RAM is accessible by
+  the CPU and then we hang inside ITCM waiting for an
+  interrupt.
+
+- Other operations which implies shutting off or reconfiguring
+  the external RAM controller.
+
+There is an interface for using TCM on the ARM architecture
+in <asm/tcm.h>. Using this interface it is possible to:
+
+- Define the physical address and size of ITCM and DTCM.
+
+- Tag functions to be compiled into ITCM.
+
+- Tag data and constants to be allocated to DTCM and ITCM.
+
+- Have the remaining TCM RAM added to a special
+  allocation pool with gen_pool_create() and gen_pool_add()
+  and provice tcm_alloc() and tcm_free() for this
+  memory. Such a heap is great for things like saving
+  device state when shutting off device power domains.
+
+A machine that has TCM memory shall select HAVE_TCM from
+arch/arm/Kconfig for itself. Code that needs to use TCM shall
+#include <asm/tcm.h>
+
+Functions to go into itcm can be tagged like this:
+int __tcmfunc foo(int bar);
+
+Since these are marked to become long_calls and you may want
+to have functions called locally inside the TCM without
+wasting space, there is also the __tcmlocalfunc prefix that
+will make the call relative.
+
+Variables to go into dtcm can be tagged like this::
+
+  int __tcmdata foo;
+
+Constants can be tagged like this::
+
+  int __tcmconst foo;
+
+To put assembler into TCM just use::
+
+  .section ".tcm.text" or .section ".tcm.data"
+
+respectively.
+
+Example code::
+
+  #include <asm/tcm.h>
+
+  /* Uninitialized data */
+  static u32 __tcmdata tcmvar;
+  /* Initialized data */
+  static u32 __tcmdata tcmassigned = 0x2BADBABEU;
+  /* Constant */
+  static const u32 __tcmconst tcmconst = 0xCAFEBABEU;
+
+  static void __tcmlocalfunc tcm_to_tcm(void)
+  {
+	int i;
+	for (i = 0; i < 100; i++)
+		tcmvar ++;
+  }
+
+  static void __tcmfunc hello_tcm(void)
+  {
+	/* Some abstract code that runs in ITCM */
+	int i;
+	for (i = 0; i < 100; i++) {
+		tcmvar ++;
+	}
+	tcm_to_tcm();
+  }
+
+  static void __init test_tcm(void)
+  {
+	u32 *tcmem;
+	int i;
+
+	hello_tcm();
+	printk("Hello TCM executed from ITCM RAM\n");
+
+	printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar);
+	tcmvar = 0xDEADBEEFU;
+	printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar);
+
+	printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned);
+
+	printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst);
+
+	/* Allocate some TCM memory from the pool */
+	tcmem = tcm_alloc(20);
+	if (tcmem) {
+		printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem);
+		tcmem[0] = 0xDEADBEEFU;
+		tcmem[1] = 0x2BADBABEU;
+		tcmem[2] = 0xCAFEBABEU;
+		tcmem[3] = 0xDEADBEEFU;
+		tcmem[4] = 0x2BADBABEU;
+		for (i = 0; i < 5; i++)
+			printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]);
+		tcm_free(tcmem, 20);
+	}
+  }
diff --git a/Documentation/arm/tcm.txt b/Documentation/arm/tcm.txt
deleted file mode 100644
index 7c15871c1885..000000000000
--- a/Documentation/arm/tcm.txt
+++ /dev/null
@@ -1,155 +0,0 @@
-ARM TCM (Tightly-Coupled Memory) handling in Linux
-----
-Written by Linus Walleij <linus.walleij@stericsson.com>
-
-Some ARM SoC:s have a so-called TCM (Tightly-Coupled Memory).
-This is usually just a few (4-64) KiB of RAM inside the ARM
-processor.
-
-Due to being embedded inside the CPU The TCM has a
-Harvard-architecture, so there is an ITCM (instruction TCM)
-and a DTCM (data TCM). The DTCM can not contain any
-instructions, but the ITCM can actually contain data.
-The size of DTCM or ITCM is minimum 4KiB so the typical
-minimum configuration is 4KiB ITCM and 4KiB DTCM.
-
-ARM CPU:s have special registers to read out status, physical
-location and size of TCM memories. arch/arm/include/asm/cputype.h
-defines a CPUID_TCM register that you can read out from the
-system control coprocessor. Documentation from ARM can be found
-at http://infocenter.arm.com, search for "TCM Status Register"
-to see documents for all CPUs. Reading this register you can
-determine if ITCM (bits 1-0) and/or DTCM (bit 17-16) is present
-in the machine.
-
-There is further a TCM region register (search for "TCM Region
-Registers" at the ARM site) that can report and modify the location
-size of TCM memories at runtime. This is used to read out and modify
-TCM location and size. Notice that this is not a MMU table: you
-actually move the physical location of the TCM around. At the
-place you put it, it will mask any underlying RAM from the
-CPU so it is usually wise not to overlap any physical RAM with
-the TCM.
-
-The TCM memory can then be remapped to another address again using
-the MMU, but notice that the TCM if often used in situations where
-the MMU is turned off. To avoid confusion the current Linux
-implementation will map the TCM 1 to 1 from physical to virtual
-memory in the location specified by the kernel. Currently Linux
-will map ITCM to 0xfffe0000 and on, and DTCM to 0xfffe8000 and
-on, supporting a maximum of 32KiB of ITCM and 32KiB of DTCM.
-
-Newer versions of the region registers also support dividing these
-TCMs in two separate banks, so for example an 8KiB ITCM is divided
-into two 4KiB banks with its own control registers. The idea is to
-be able to lock and hide one of the banks for use by the secure
-world (TrustZone).
-
-TCM is used for a few things:
-
-- FIQ and other interrupt handlers that need deterministic
-  timing and cannot wait for cache misses.
-
-- Idle loops where all external RAM is set to self-refresh
-  retention mode, so only on-chip RAM is accessible by
-  the CPU and then we hang inside ITCM waiting for an
-  interrupt.
-
-- Other operations which implies shutting off or reconfiguring
-  the external RAM controller.
-
-There is an interface for using TCM on the ARM architecture
-in <asm/tcm.h>. Using this interface it is possible to:
-
-- Define the physical address and size of ITCM and DTCM.
-
-- Tag functions to be compiled into ITCM.
-
-- Tag data and constants to be allocated to DTCM and ITCM.
-
-- Have the remaining TCM RAM added to a special
-  allocation pool with gen_pool_create() and gen_pool_add()
-  and provice tcm_alloc() and tcm_free() for this
-  memory. Such a heap is great for things like saving
-  device state when shutting off device power domains.
-
-A machine that has TCM memory shall select HAVE_TCM from
-arch/arm/Kconfig for itself. Code that needs to use TCM shall
-#include <asm/tcm.h>
-
-Functions to go into itcm can be tagged like this:
-int __tcmfunc foo(int bar);
-
-Since these are marked to become long_calls and you may want
-to have functions called locally inside the TCM without
-wasting space, there is also the __tcmlocalfunc prefix that
-will make the call relative.
-
-Variables to go into dtcm can be tagged like this:
-int __tcmdata foo;
-
-Constants can be tagged like this:
-int __tcmconst foo;
-
-To put assembler into TCM just use
-.section ".tcm.text" or .section ".tcm.data"
-respectively.
-
-Example code:
-
-#include <asm/tcm.h>
-
-/* Uninitialized data */
-static u32 __tcmdata tcmvar;
-/* Initialized data */
-static u32 __tcmdata tcmassigned = 0x2BADBABEU;
-/* Constant */
-static const u32 __tcmconst tcmconst = 0xCAFEBABEU;
-
-static void __tcmlocalfunc tcm_to_tcm(void)
-{
-	int i;
-	for (i = 0; i < 100; i++)
-		tcmvar ++;
-}
-
-static void __tcmfunc hello_tcm(void)
-{
-	/* Some abstract code that runs in ITCM */
-	int i;
-	for (i = 0; i < 100; i++) {
-		tcmvar ++;
-	}
-	tcm_to_tcm();
-}
-
-static void __init test_tcm(void)
-{
-	u32 *tcmem;
-	int i;
-
-	hello_tcm();
-	printk("Hello TCM executed from ITCM RAM\n");
-
-	printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar);
-	tcmvar = 0xDEADBEEFU;
-	printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar);
-
-	printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned);
-
-	printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst);
-
-	/* Allocate some TCM memory from the pool */
-	tcmem = tcm_alloc(20);
-	if (tcmem) {
-		printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem);
-		tcmem[0] = 0xDEADBEEFU;
-		tcmem[1] = 0x2BADBABEU;
-		tcmem[2] = 0xCAFEBABEU;
-		tcmem[3] = 0xDEADBEEFU;
-		tcmem[4] = 0x2BADBABEU;
-		for (i = 0; i < 5; i++)
-			printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]);
-		tcm_free(tcmem, 20);
-	}
-}
diff --git a/Documentation/arm/uefi.rst b/Documentation/arm/uefi.rst
new file mode 100644
index 000000000000..f868330df6be
--- /dev/null
+++ b/Documentation/arm/uefi.rst
@@ -0,0 +1,67 @@
+================================================
+The Unified Extensible Firmware Interface (UEFI)
+================================================
+
+UEFI, the Unified Extensible Firmware Interface, is a specification
+governing the behaviours of compatible firmware interfaces. It is
+maintained by the UEFI Forum - http://www.uefi.org/.
+
+UEFI is an evolution of its predecessor 'EFI', so the terms EFI and
+UEFI are used somewhat interchangeably in this document and associated
+source code. As a rule, anything new uses 'UEFI', whereas 'EFI' refers
+to legacy code or specifications.
+
+UEFI support in Linux
+=====================
+Booting on a platform with firmware compliant with the UEFI specification
+makes it possible for the kernel to support additional features:
+
+- UEFI Runtime Services
+- Retrieving various configuration information through the standardised
+  interface of UEFI configuration tables. (ACPI, SMBIOS, ...)
+
+For actually enabling [U]EFI support, enable:
+
+- CONFIG_EFI=y
+- CONFIG_EFI_VARS=y or m
+
+The implementation depends on receiving information about the UEFI environment
+in a Flattened Device Tree (FDT) - so is only available with CONFIG_OF.
+
+UEFI stub
+=========
+The "stub" is a feature that extends the Image/zImage into a valid UEFI
+PE/COFF executable, including a loader application that makes it possible to
+load the kernel directly from the UEFI shell, boot menu, or one of the
+lightweight bootloaders like Gummiboot or rEFInd.
+
+The kernel image built with stub support remains a valid kernel image for
+booting in non-UEFI environments.
+
+UEFI kernel support on ARM
+==========================
+UEFI kernel support on the ARM architectures (arm and arm64) is only available
+when boot is performed through the stub.
+
+When booting in UEFI mode, the stub deletes any memory nodes from a provided DT.
+Instead, the kernel reads the UEFI memory map.
+
+The stub populates the FDT /chosen node with (and the kernel scans for) the
+following parameters:
+
+==========================  ======   ===========================================
+Name                        Size     Description
+==========================  ======   ===========================================
+linux,uefi-system-table     64-bit   Physical address of the UEFI System Table.
+
+linux,uefi-mmap-start       64-bit   Physical address of the UEFI memory map,
+                                     populated by the UEFI GetMemoryMap() call.
+
+linux,uefi-mmap-size        32-bit   Size in bytes of the UEFI memory map
+                                     pointed to in previous entry.
+
+linux,uefi-mmap-desc-size   32-bit   Size in bytes of each entry in the UEFI
+                                     memory map.
+
+linux,uefi-mmap-desc-ver    32-bit   Version of the mmap descriptor format.
+==========================  ======   ===========================================
diff --git a/Documentation/arm/uefi.txt b/Documentation/arm/uefi.txt
deleted file mode 100644
index 6543a0adea8a..000000000000
--- a/Documentation/arm/uefi.txt
+++ /dev/null
@@ -1,60 +0,0 @@
-UEFI, the Unified Extensible Firmware Interface, is a specification
-governing the behaviours of compatible firmware interfaces. It is
-maintained by the UEFI Forum - http://www.uefi.org/.
-
-UEFI is an evolution of its predecessor 'EFI', so the terms EFI and
-UEFI are used somewhat interchangeably in this document and associated
-source code. As a rule, anything new uses 'UEFI', whereas 'EFI' refers
-to legacy code or specifications.
-
-UEFI support in Linux
-=====================
-Booting on a platform with firmware compliant with the UEFI specification
-makes it possible for the kernel to support additional features:
-- UEFI Runtime Services
-- Retrieving various configuration information through the standardised
-  interface of UEFI configuration tables. (ACPI, SMBIOS, ...)
-
-For actually enabling [U]EFI support, enable:
-- CONFIG_EFI=y
-- CONFIG_EFI_VARS=y or m
-
-The implementation depends on receiving information about the UEFI environment
-in a Flattened Device Tree (FDT) - so is only available with CONFIG_OF.
-
-UEFI stub
-=========
-The "stub" is a feature that extends the Image/zImage into a valid UEFI
-PE/COFF executable, including a loader application that makes it possible to
-load the kernel directly from the UEFI shell, boot menu, or one of the
-lightweight bootloaders like Gummiboot or rEFInd.
-
-The kernel image built with stub support remains a valid kernel image for
-booting in non-UEFI environments.
-
-UEFI kernel support on ARM
-==========================
-UEFI kernel support on the ARM architectures (arm and arm64) is only available
-when boot is performed through the stub.
-
-When booting in UEFI mode, the stub deletes any memory nodes from a provided DT.
-Instead, the kernel reads the UEFI memory map.
-
-The stub populates the FDT /chosen node with (and the kernel scans for) the
-following parameters:
-________________________________________________________________________________
-Name                      | Size   | Description
-================================================================================
-linux,uefi-system-table   | 64-bit | Physical address of the UEFI System Table.
---------------------------------------------------------------------------------
-linux,uefi-mmap-start     | 64-bit | Physical address of the UEFI memory map,
-                          |        | populated by the UEFI GetMemoryMap() call.
---------------------------------------------------------------------------------
-linux,uefi-mmap-size      | 32-bit | Size in bytes of the UEFI memory map
-                          |        | pointed to in previous entry.
---------------------------------------------------------------------------------
-linux,uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI
-                          |        | memory map.
---------------------------------------------------------------------------------
-linux,uefi-mmap-desc-ver  | 32-bit | Version of the mmap descriptor format.
---------------------------------------------------------------------------------
diff --git a/Documentation/arm/vfp/release-notes.rst b/Documentation/arm/vfp/release-notes.rst
new file mode 100644
index 000000000000..c6b04937cee3
--- /dev/null
+++ b/Documentation/arm/vfp/release-notes.rst
@@ -0,0 +1,57 @@
+===============================================
+Release notes for Linux Kernel VFP support code
+===============================================
+
+Date: 	20 May 2004
+
+Author:	Russell King
+
+This is the first release of the Linux Kernel VFP support code.  It
+provides support for the exceptions bounced from VFP hardware found
+on ARM926EJ-S.
+
+This release has been validated against the SoftFloat-2b library by
+John R. Hauser using the TestFloat-2a test suite.  Details of this
+library and test suite can be found at:
+
+   http://www.jhauser.us/arithmetic/SoftFloat.html
+
+The operations which have been tested with this package are:
+
+ - fdiv
+ - fsub
+ - fadd
+ - fmul
+ - fcmp
+ - fcmpe
+ - fcvtd
+ - fcvts
+ - fsito
+ - ftosi
+ - fsqrt
+
+All the above pass softfloat tests with the following exceptions:
+
+- fadd/fsub shows some differences in the handling of +0 / -0 results
+  when input operands differ in signs.
+- the handling of underflow exceptions is slightly different.  If a
+  result underflows before rounding, but becomes a normalised number
+  after rounding, we do not signal an underflow exception.
+
+Other operations which have been tested by basic assembly-only tests
+are:
+
+ - fcpy
+ - fabs
+ - fneg
+ - ftoui
+ - ftosiz
+ - ftouiz
+
+The combination operations have not been tested:
+
+ - fmac
+ - fnmac
+ - fmsc
+ - fnmsc
+ - fnmul
diff --git a/Documentation/arm/vlocks.rst b/Documentation/arm/vlocks.rst
new file mode 100644
index 000000000000..a40a1742110b
--- /dev/null
+++ b/Documentation/arm/vlocks.rst
@@ -0,0 +1,212 @@
+======================================
+vlocks for Bare-Metal Mutual Exclusion
+======================================
+
+Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
+mechanism, with reasonable but minimal requirements on the memory
+system.
+
+These are intended to be used to coordinate critical activity among CPUs
+which are otherwise non-coherent, in situations where the hardware
+provides no other mechanism to support this and ordinary spinlocks
+cannot be used.
+
+
+vlocks make use of the atomicity provided by the memory system for
+writes to a single memory location.  To arbitrate, every CPU "votes for
+itself", by storing a unique number to a common memory location.  The
+final value seen in that memory location when all the votes have been
+cast identifies the winner.
+
+In order to make sure that the election produces an unambiguous result
+in finite time, a CPU will only enter the election in the first place if
+no winner has been chosen and the election does not appear to have
+started yet.
+
+
+Algorithm
+---------
+
+The easiest way to explain the vlocks algorithm is with some pseudo-code::
+
+
+	int currently_voting[NR_CPUS] = { 0, };
+	int last_vote = -1; /* no votes yet */
+
+	bool vlock_trylock(int this_cpu)
+	{
+		/* signal our desire to vote */
+		currently_voting[this_cpu] = 1;
+		if (last_vote != -1) {
+			/* someone already volunteered himself */
+			currently_voting[this_cpu] = 0;
+			return false; /* not ourself */
+		}
+
+		/* let's suggest ourself */
+		last_vote = this_cpu;
+		currently_voting[this_cpu] = 0;
+
+		/* then wait until everyone else is done voting */
+		for_each_cpu(i) {
+			while (currently_voting[i] != 0)
+				/* wait */;
+		}
+
+		/* result */
+		if (last_vote == this_cpu)
+			return true; /* we won */
+		return false;
+	}
+
+	bool vlock_unlock(void)
+	{
+		last_vote = -1;
+	}
+
+
+The currently_voting[] array provides a way for the CPUs to determine
+whether an election is in progress, and plays a role analogous to the
+"entering" array in Lamport's bakery algorithm [1].
+
+However, once the election has started, the underlying memory system
+atomicity is used to pick the winner.  This avoids the need for a static
+priority rule to act as a tie-breaker, or any counters which could
+overflow.
+
+As long as the last_vote variable is globally visible to all CPUs, it
+will contain only one value that won't change once every CPU has cleared
+its currently_voting flag.
+
+
+Features and limitations
+------------------------
+
+ * vlocks are not intended to be fair.  In the contended case, it is the
+   _last_ CPU which attempts to get the lock which will be most likely
+   to win.
+
+   vlocks are therefore best suited to situations where it is necessary
+   to pick a unique winner, but it does not matter which CPU actually
+   wins.
+
+ * Like other similar mechanisms, vlocks will not scale well to a large
+   number of CPUs.
+
+   vlocks can be cascaded in a voting hierarchy to permit better scaling
+   if necessary, as in the following hypothetical example for 4096 CPUs::
+
+	/* first level: local election */
+	my_town = towns[(this_cpu >> 4) & 0xf];
+	I_won = vlock_trylock(my_town, this_cpu & 0xf);
+	if (I_won) {
+		/* we won the town election, let's go for the state */
+		my_state = states[(this_cpu >> 8) & 0xf];
+		I_won = vlock_lock(my_state, this_cpu & 0xf));
+		if (I_won) {
+			/* and so on */
+			I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
+			if (I_won) {
+				/* ... */
+			}
+			vlock_unlock(the_whole_country);
+		}
+		vlock_unlock(my_state);
+	}
+	vlock_unlock(my_town);
+
+
+ARM implementation
+------------------
+
+The current ARM implementation [2] contains some optimisations beyond
+the basic algorithm:
+
+ * By packing the members of the currently_voting array close together,
+   we can read the whole array in one transaction (providing the number
+   of CPUs potentially contending the lock is small enough).  This
+   reduces the number of round-trips required to external memory.
+
+   In the ARM implementation, this means that we can use a single load
+   and comparison::
+
+	LDR	Rt, [Rn]
+	CMP	Rt, #0
+
+   ...in place of code equivalent to::
+
+	LDRB	Rt, [Rn]
+	CMP	Rt, #0
+	LDRBEQ	Rt, [Rn, #1]
+	CMPEQ	Rt, #0
+	LDRBEQ	Rt, [Rn, #2]
+	CMPEQ	Rt, #0
+	LDRBEQ	Rt, [Rn, #3]
+	CMPEQ	Rt, #0
+
+   This cuts down on the fast-path latency, as well as potentially
+   reducing bus contention in contended cases.
+
+   The optimisation relies on the fact that the ARM memory system
+   guarantees coherency between overlapping memory accesses of
+   different sizes, similarly to many other architectures.  Note that
+   we do not care which element of currently_voting appears in which
+   bits of Rt, so there is no need to worry about endianness in this
+   optimisation.
+
+   If there are too many CPUs to read the currently_voting array in
+   one transaction then multiple transations are still required.  The
+   implementation uses a simple loop of word-sized loads for this
+   case.  The number of transactions is still fewer than would be
+   required if bytes were loaded individually.
+
+
+   In principle, we could aggregate further by using LDRD or LDM, but
+   to keep the code simple this was not attempted in the initial
+   implementation.
+
+
+ * vlocks are currently only used to coordinate between CPUs which are
+   unable to enable their caches yet.  This means that the
+   implementation removes many of the barriers which would be required
+   when executing the algorithm in cached memory.
+
+   packing of the currently_voting array does not work with cached
+   memory unless all CPUs contending the lock are cache-coherent, due
+   to cache writebacks from one CPU clobbering values written by other
+   CPUs.  (Though if all the CPUs are cache-coherent, you should be
+   probably be using proper spinlocks instead anyway).
+
+
+ * The "no votes yet" value used for the last_vote variable is 0 (not
+   -1 as in the pseudocode).  This allows statically-allocated vlocks
+   to be implicitly initialised to an unlocked state simply by putting
+   them in .bss.
+
+   An offset is added to each CPU's ID for the purpose of setting this
+   variable, so that no CPU uses the value 0 for its ID.
+
+
+Colophon
+--------
+
+Originally created and documented by Dave Martin for Linaro Limited, for
+use in ARM-based big.LITTLE platforms, with review and input gratefully
+received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
+grabbing most of this text out of the relevant mail thread and writing
+up the pseudocode.
+
+Copyright (C) 2012-2013  Linaro Limited
+Distributed under the terms of Version 2 of the GNU General Public
+License, as defined in linux/COPYING.
+
+
+References
+----------
+
+[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
+    Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
+
+    https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
+
+[2] linux/arch/arm/common/vlock.S, www.kernel.org.
diff --git a/Documentation/arm/vlocks.txt b/Documentation/arm/vlocks.txt
deleted file mode 100644
index 45731672c564..000000000000
--- a/Documentation/arm/vlocks.txt
+++ /dev/null
@@ -1,211 +0,0 @@
-vlocks for Bare-Metal Mutual Exclusion
-======================================
-
-Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
-mechanism, with reasonable but minimal requirements on the memory
-system.
-
-These are intended to be used to coordinate critical activity among CPUs
-which are otherwise non-coherent, in situations where the hardware
-provides no other mechanism to support this and ordinary spinlocks
-cannot be used.
-
-
-vlocks make use of the atomicity provided by the memory system for
-writes to a single memory location.  To arbitrate, every CPU "votes for
-itself", by storing a unique number to a common memory location.  The
-final value seen in that memory location when all the votes have been
-cast identifies the winner.
-
-In order to make sure that the election produces an unambiguous result
-in finite time, a CPU will only enter the election in the first place if
-no winner has been chosen and the election does not appear to have
-started yet.
-
-
-Algorithm
----------
-
-The easiest way to explain the vlocks algorithm is with some pseudo-code:
-
-
-	int currently_voting[NR_CPUS] = { 0, };
-	int last_vote = -1; /* no votes yet */
-
-	bool vlock_trylock(int this_cpu)
-	{
-		/* signal our desire to vote */
-		currently_voting[this_cpu] = 1;
-		if (last_vote != -1) {
-			/* someone already volunteered himself */
-			currently_voting[this_cpu] = 0;
-			return false; /* not ourself */
-		}
-
-		/* let's suggest ourself */
-		last_vote = this_cpu;
-		currently_voting[this_cpu] = 0;
-
-		/* then wait until everyone else is done voting */
-		for_each_cpu(i) {
-			while (currently_voting[i] != 0)
-				/* wait */;
-		}
-
-		/* result */
-		if (last_vote == this_cpu)
-			return true; /* we won */
-		return false;
-	}
-
-	bool vlock_unlock(void)
-	{
-		last_vote = -1;
-	}
-
-
-The currently_voting[] array provides a way for the CPUs to determine
-whether an election is in progress, and plays a role analogous to the
-"entering" array in Lamport's bakery algorithm [1].
-
-However, once the election has started, the underlying memory system
-atomicity is used to pick the winner.  This avoids the need for a static
-priority rule to act as a tie-breaker, or any counters which could
-overflow.
-
-As long as the last_vote variable is globally visible to all CPUs, it
-will contain only one value that won't change once every CPU has cleared
-its currently_voting flag.
-
-
-Features and limitations
-------------------------
-
- * vlocks are not intended to be fair.  In the contended case, it is the
-   _last_ CPU which attempts to get the lock which will be most likely
-   to win.
-
-   vlocks are therefore best suited to situations where it is necessary
-   to pick a unique winner, but it does not matter which CPU actually
-   wins.
-
- * Like other similar mechanisms, vlocks will not scale well to a large
-   number of CPUs.
-
-   vlocks can be cascaded in a voting hierarchy to permit better scaling
-   if necessary, as in the following hypothetical example for 4096 CPUs:
-
-	/* first level: local election */
-	my_town = towns[(this_cpu >> 4) & 0xf];
-	I_won = vlock_trylock(my_town, this_cpu & 0xf);
-	if (I_won) {
-		/* we won the town election, let's go for the state */
-		my_state = states[(this_cpu >> 8) & 0xf];
-		I_won = vlock_lock(my_state, this_cpu & 0xf));
-		if (I_won) {
-			/* and so on */
-			I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
-			if (I_won) {
-				/* ... */
-			}
-			vlock_unlock(the_whole_country);
-		}
-		vlock_unlock(my_state);
-	}
-	vlock_unlock(my_town);
-
-
-ARM implementation
-------------------
-
-The current ARM implementation [2] contains some optimisations beyond
-the basic algorithm:
-
- * By packing the members of the currently_voting array close together,
-   we can read the whole array in one transaction (providing the number
-   of CPUs potentially contending the lock is small enough).  This
-   reduces the number of round-trips required to external memory.
-
-   In the ARM implementation, this means that we can use a single load
-   and comparison:
-
-	LDR	Rt, [Rn]
-	CMP	Rt, #0
-
-   ...in place of code equivalent to:
-
-	LDRB	Rt, [Rn]
-	CMP	Rt, #0
-	LDRBEQ	Rt, [Rn, #1]
-	CMPEQ	Rt, #0
-	LDRBEQ	Rt, [Rn, #2]
-	CMPEQ	Rt, #0
-	LDRBEQ	Rt, [Rn, #3]
-	CMPEQ	Rt, #0
-
-   This cuts down on the fast-path latency, as well as potentially
-   reducing bus contention in contended cases.
-
-   The optimisation relies on the fact that the ARM memory system
-   guarantees coherency between overlapping memory accesses of
-   different sizes, similarly to many other architectures.  Note that
-   we do not care which element of currently_voting appears in which
-   bits of Rt, so there is no need to worry about endianness in this
-   optimisation.
-
-   If there are too many CPUs to read the currently_voting array in
-   one transaction then multiple transations are still required.  The
-   implementation uses a simple loop of word-sized loads for this
-   case.  The number of transactions is still fewer than would be
-   required if bytes were loaded individually.
-
-
-   In principle, we could aggregate further by using LDRD or LDM, but
-   to keep the code simple this was not attempted in the initial
-   implementation.
-
-
- * vlocks are currently only used to coordinate between CPUs which are
-   unable to enable their caches yet.  This means that the
-   implementation removes many of the barriers which would be required
-   when executing the algorithm in cached memory.
-
-   packing of the currently_voting array does not work with cached
-   memory unless all CPUs contending the lock are cache-coherent, due
-   to cache writebacks from one CPU clobbering values written by other
-   CPUs.  (Though if all the CPUs are cache-coherent, you should be
-   probably be using proper spinlocks instead anyway).
-
-
- * The "no votes yet" value used for the last_vote variable is 0 (not
-   -1 as in the pseudocode).  This allows statically-allocated vlocks
-   to be implicitly initialised to an unlocked state simply by putting
-   them in .bss.
-
-   An offset is added to each CPU's ID for the purpose of setting this
-   variable, so that no CPU uses the value 0 for its ID.
-
-
-Colophon
---------
-
-Originally created and documented by Dave Martin for Linaro Limited, for
-use in ARM-based big.LITTLE platforms, with review and input gratefully
-received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
-grabbing most of this text out of the relevant mail thread and writing
-up the pseudocode.
-
-Copyright (C) 2012-2013  Linaro Limited
-Distributed under the terms of Version 2 of the GNU General Public
-License, as defined in linux/COPYING.
-
-
-References
-----------
-
-[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
-    Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
-
-    https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
-
-[2] linux/arch/arm/common/vlock.S, www.kernel.org.
diff --git a/Documentation/devicetree/bindings/arm/xen.txt b/Documentation/devicetree/bindings/arm/xen.txt
index c9b9321434ea..db5c56db30ec 100644
--- a/Documentation/devicetree/bindings/arm/xen.txt
+++ b/Documentation/devicetree/bindings/arm/xen.txt
@@ -54,7 +54,7 @@ hypervisor {
 };
 
 The format and meaning of the "xen,uefi-*" parameters are similar to those in
-Documentation/arm/uefi.txt, which are provided by the regular UEFI stub. However
+Documentation/arm/uefi.rst, which are provided by the regular UEFI stub. However
 they differ because they are provided by the Xen hypervisor, together with a set
 of UEFI runtime services implemented via hypercalls, see
 http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,platform.h.html.
diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt
index 60f8640f2b2f..4660ccee35a3 100644
--- a/Documentation/devicetree/booting-without-of.txt
+++ b/Documentation/devicetree/booting-without-of.txt
@@ -160,7 +160,7 @@ it with special cases.
    of the kernel image. That entry point supports two calling
    conventions.  A summary of the interface is described here.  A full
    description of the boot requirements is documented in
-   Documentation/arm/Booting
+   Documentation/arm/booting.rst
 
         a) ATAGS interface.  Minimal information is passed from firmware
         to the kernel with a tagged list of predefined parameters.
@@ -174,7 +174,7 @@ it with special cases.
         b) Entry with a flattened device-tree block.  Firmware loads the
         physical address of the flattened device tree block (dtb) into r2,
         r1 is not used, but it is considered good practice to use a valid
-        machine number as described in Documentation/arm/Booting.
+        machine number as described in Documentation/arm/booting.rst.
 
                 r0 : 0
 
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 216dc0e1e6f2..c6934d90363c 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -1,3 +1,4 @@
+
 .. The Linux Kernel documentation master file, created by
    sphinx-quickstart on Fri Feb 12 13:51:46 2016.
    You can adapt this file completely to your liking, but it should at least
diff --git a/Documentation/translations/zh_CN/arm/Booting b/Documentation/translations/zh_CN/arm/Booting
index 1fe866f8218f..562e9a2957e6 100644
--- a/Documentation/translations/zh_CN/arm/Booting
+++ b/Documentation/translations/zh_CN/arm/Booting
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm/Booting
+Chinese translated version of Documentation/arm/booting.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -9,7 +9,7 @@ or if there is a problem with the translation.
 Maintainer: Russell King <linux@arm.linux.org.uk>
 Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
 ---------------------------------------------------------------------
-Documentation/arm/Booting 的中文翻译
+Documentation/arm/booting.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/Documentation/translations/zh_CN/arm/kernel_user_helpers.txt b/Documentation/translations/zh_CN/arm/kernel_user_helpers.txt
index cd7fc8f34cf9..99af4363984d 100644
--- a/Documentation/translations/zh_CN/arm/kernel_user_helpers.txt
+++ b/Documentation/translations/zh_CN/arm/kernel_user_helpers.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/arm/kernel_user_helpers.txt
+Chinese translated version of Documentation/arm/kernel_user_helpers.rst
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -10,7 +10,7 @@ Maintainer: Nicolas Pitre <nicolas.pitre@linaro.org>
 		Dave Martin <dave.martin@linaro.org>
 Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
 ---------------------------------------------------------------------
-Documentation/arm/kernel_user_helpers.txt 的中文翻译
+Documentation/arm/kernel_user_helpers.rst 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/MAINTAINERS b/MAINTAINERS
index 37ba75bae7aa..96c85695b3d4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2218,7 +2218,7 @@ F:	drivers/*/*s3c64xx*
 F:	drivers/*/*s5pv210*
 F:	drivers/memory/samsung/*
 F:	drivers/soc/samsung/*
-F:	Documentation/arm/Samsung/
+F:	Documentation/arm/samsung/
 F:	Documentation/devicetree/bindings/arm/samsung/
 F:	Documentation/devicetree/bindings/sram/samsung-sram.txt
 F:	Documentation/devicetree/bindings/power/pd-samsung.txt
@@ -11571,7 +11571,7 @@ L:	linux-omap@vger.kernel.org
 L:	linux-fbdev@vger.kernel.org
 S:	Orphan
 F:	drivers/video/fbdev/omap2/
-F:	Documentation/arm/OMAP/DSS
+F:	Documentation/arm/omap/dss.rst
 
 OMAP FRAMEBUFFER SUPPORT
 L:	linux-fbdev@vger.kernel.org
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2bf1ce39a96d..6425871e9903 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2142,7 +2142,7 @@ config VFP
 	  Say Y to include VFP support code in the kernel. This is needed
 	  if your hardware includes a VFP unit.
 
-	  Please see <file:Documentation/arm/VFP/release-notes.txt> for
+	  Please see <file:Documentation/arm/vfp/release-notes.rst> for
 	  release notes and additional status information.
 
 	  Say N if your target does not have VFP hardware.
diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
index e24ad60891b2..8a9aeeb504dd 100644
--- a/arch/arm/common/mcpm_entry.c
+++ b/arch/arm/common/mcpm_entry.c
@@ -21,7 +21,7 @@
 /*
  * The public API for this code is documented in arch/arm/include/asm/mcpm.h.
  * For a comprehensive description of the main algorithm used here, please
- * see Documentation/arm/cluster-pm-race-avoidance.txt.
+ * see Documentation/arm/cluster-pm-race-avoidance.rst.
  */
 
 struct sync_struct mcpm_sync;
diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S
index d5bd75dd576d..291d969bc719 100644
--- a/arch/arm/common/mcpm_head.S
+++ b/arch/arm/common/mcpm_head.S
@@ -5,7 +5,7 @@
  * Created by:  Nicolas Pitre, March 2012
  * Copyright:   (C) 2012-2013  Linaro Limited
  *
- * Refer to Documentation/arm/cluster-pm-race-avoidance.txt
+ * Refer to Documentation/arm/cluster-pm-race-avoidance.rst
  * for details of the synchronisation algorithms used here.
  */
 
diff --git a/arch/arm/common/vlock.S b/arch/arm/common/vlock.S
index 9675cc15d0c4..f1c7fd44f1b1 100644
--- a/arch/arm/common/vlock.S
+++ b/arch/arm/common/vlock.S
@@ -6,7 +6,7 @@
  * Copyright:	(C) 2012-2013  Linaro Limited
  *
  * This algorithm is described in more detail in
- * Documentation/arm/vlocks.txt.
+ * Documentation/arm/vlocks.rst.
  */
 
 #include <linux/linkage.h>
diff --git a/arch/arm/include/asm/setup.h b/arch/arm/include/asm/setup.h
index 77e5582c2259..67d20712cb48 100644
--- a/arch/arm/include/asm/setup.h
+++ b/arch/arm/include/asm/setup.h
@@ -5,7 +5,7 @@
  *  Copyright (C) 1997-1999 Russell King
  *
  *  Structure passed to kernel to tell it about the
- *  hardware it's running on.  See Documentation/arm/Setup
+ *  hardware it's running on.  See Documentation/arm/setup.rst
  *  for more info.
  */
 #ifndef __ASMARM_SETUP_H
diff --git a/arch/arm/include/uapi/asm/setup.h b/arch/arm/include/uapi/asm/setup.h
index 6b335a9ff8c8..25ceda63b284 100644
--- a/arch/arm/include/uapi/asm/setup.h
+++ b/arch/arm/include/uapi/asm/setup.h
@@ -9,7 +9,7 @@
  * published by the Free Software Foundation.
  *
  *  Structure passed to kernel to tell it about the
- *  hardware it's running on.  See Documentation/arm/Setup
+ *  hardware it's running on.  See Documentation/arm/setup.rst
  *  for more info.
  */
 #ifndef _UAPI__ASMARM_SETUP_H
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 0b8cfdd60b90..858d4e541532 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -826,7 +826,7 @@ ENDPROC(__switch_to)
  * existing ones.  This mechanism should be used only for things that are
  * really small and justified, and not be abused freely.
  *
- * See Documentation/arm/kernel_user_helpers.txt for formal definitions.
+ * See Documentation/arm/kernel_user_helpers.rst for formal definitions.
  */
  THUMB(	.arm	)
 
diff --git a/arch/arm/mach-exynos/common.h b/arch/arm/mach-exynos/common.h
index c93356a8d662..56411bb63d45 100644
--- a/arch/arm/mach-exynos/common.h
+++ b/arch/arm/mach-exynos/common.h
@@ -106,7 +106,7 @@ void exynos_firmware_init(void);
 #define C2_STATE	(1 << 3)
 /*
  * Magic values for bootloader indicating chosen low power mode.
- * See also Documentation/arm/Samsung/Bootloader-interface.txt
+ * See also Documentation/arm/samsung/bootloader-interface.rst
  */
 #define EXYNOS_SLEEP_MAGIC	0x00000bad
 #define EXYNOS_AFTR_MAGIC	0xfcba0d10
diff --git a/arch/arm/mach-ixp4xx/Kconfig b/arch/arm/mach-ixp4xx/Kconfig
index fc5378b00f3d..f7211b57b1e7 100644
--- a/arch/arm/mach-ixp4xx/Kconfig
+++ b/arch/arm/mach-ixp4xx/Kconfig
@@ -33,7 +33,7 @@ config MACH_AVILA
 	help
 	  Say 'Y' here if you want your kernel to support the Gateworks
 	  Avila Network Platform. For more information on this platform,
-	  see <file:Documentation/arm/IXP4xx>.
+	  see <file:Documentation/arm/ixp4xx.rst>.
 
 config MACH_LOFT
     bool "Loft"
@@ -49,7 +49,7 @@ config ARCH_ADI_COYOTE
 	help
 	  Say 'Y' here if you want your kernel to support the ADI 
 	  Engineering Coyote Gateway Reference Platform. For more
-	  information on this platform, see <file:Documentation/arm/IXP4xx>.
+	  information on this platform, see <file:Documentation/arm/ixp4xx.rst>.
 
 config MACH_GATEWAY7001
 	bool "Gateway 7001"
@@ -72,21 +72,21 @@ config ARCH_IXDP425
 	help
 	  Say 'Y' here if you want your kernel to support Intel's 
 	  IXDP425 Development Platform (Also known as Richfield).  
-	  For more information on this platform, see <file:Documentation/arm/IXP4xx>.
+	  For more information on this platform, see <file:Documentation/arm/ixp4xx.rst>.
 
 config MACH_IXDPG425
 	bool "IXDPG425"
 	help
 	  Say 'Y' here if you want your kernel to support Intel's
 	  IXDPG425 Development Platform (Also known as Montajade).
-	  For more information on this platform, see <file:Documentation/arm/IXP4xx>.
+	  For more information on this platform, see <file:Documentation/arm/ixp4xx.rst>.
 
 config MACH_IXDP465
 	bool "IXDP465"
 	help
 	  Say 'Y' here if you want your kernel to support Intel's
 	  IXDP465 Development Platform (Also known as BMP).
-	  For more information on this platform, see <file:Documentation/arm/IXP4xx>.
+	  For more information on this platform, see <file:Documentation/arm/ixp4xx.rst>.
 
 config MACH_GORAMO_MLR
 	bool "GORAMO Multi Link Router"
@@ -99,7 +99,7 @@ config MACH_KIXRP435
 	help
 	  Say 'Y' here if you want your kernel to support Intel's
 	  KIXRP435 Reference Platform.
-	  For more information on this platform, see <file:Documentation/arm/IXP4xx>.
+	  For more information on this platform, see <file:Documentation/arm/ixp4xx.rst>.
 
 #
 # IXCDP1100 is the exact same HW as IXDP425, but with a different machine 
@@ -116,7 +116,7 @@ config ARCH_PRPMC1100
 	help
 	  Say 'Y' here if you want your kernel to support the Motorola
 	  PrPCM1100 Processor Mezanine Module. For more information on
-	  this platform, see <file:Documentation/arm/IXP4xx>.
+	  this platform, see <file:Documentation/arm/ixp4xx.rst>.
 
 config MACH_NAS100D
 	bool
diff --git a/arch/arm/mach-s3c24xx/pm.c b/arch/arm/mach-s3c24xx/pm.c
index adcb90645460..c64988c609ad 100644
--- a/arch/arm/mach-s3c24xx/pm.c
+++ b/arch/arm/mach-s3c24xx/pm.c
@@ -5,7 +5,7 @@
 //
 // S3C24XX Power Manager (Suspend-To-RAM) support
 //
-// See Documentation/arm/Samsung-S3C24XX/Suspend.txt for more information
+// See Documentation/arm/samsung-s3c24xx/suspend.rst for more information
 //
 // Parts based on arch/arm/mach-pxa/pm.c
 //
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index cc798115aa9b..820b60a50125 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -709,7 +709,7 @@ config ARM_VIRT_EXT
 	  assistance.
 
 	  A compliant bootloader is required in order to make maximum
-	  use of this feature.  Refer to Documentation/arm/Booting for
+	  use of this feature.  Refer to Documentation/arm/booting.rst for
 	  details.
 
 config SWP_EMULATE
@@ -875,7 +875,7 @@ config KUSER_HELPERS
 	  the CPU type fitted to the system.  This permits binaries to be
 	  run on ARMv4 through to ARMv7 without modification.
 
-	  See Documentation/arm/kernel_user_helpers.txt for details.
+	  See Documentation/arm/kernel_user_helpers.rst for details.
 
 	  However, the fixed address nature of these helpers can be used
 	  by ROP (return orientated programming) authors when creating
diff --git a/arch/arm/plat-samsung/Kconfig b/arch/arm/plat-samsung/Kconfig
index 53da57fba39c..301e572651c0 100644
--- a/arch/arm/plat-samsung/Kconfig
+++ b/arch/arm/plat-samsung/Kconfig
@@ -243,7 +243,7 @@ config SAMSUNG_PM_DEBUG
 	depends on DEBUG_EXYNOS_UART || DEBUG_S3C24XX_UART || DEBUG_S3C2410_UART
 	help
 	  Say Y here if you want verbose debugging from the PM Suspend and
-	  Resume code. See <file:Documentation/arm/Samsung-S3C24XX/Suspend.txt>
+	  Resume code. See <file:Documentation/arm/samsung-s3c24xx/suspend.rst>
 	  for more information.
 
 config S3C_PM_DEBUG_LED_SMDK
@@ -268,7 +268,7 @@ config SAMSUNG_PM_CHECK
 	  Note, this can take several seconds depending on memory size
 	  and CPU speed.
 
-	  See <file:Documentation/arm/Samsung-S3C24XX/Suspend.txt>
+	  See <file:Documentation/arm/samsung-s3c24xx/suspend.rst>
 
 config SAMSUNG_PM_CHECK_CHUNKSIZE
 	int "S3C2410 PM Suspend CRC Chunksize (KiB)"
@@ -280,7 +280,7 @@ config SAMSUNG_PM_CHECK_CHUNKSIZE
 	  the CRC data block will take more memory, but will identify any
 	  faults with better precision.
 
-	  See <file:Documentation/arm/Samsung-S3C24XX/Suspend.txt>
+	  See <file:Documentation/arm/samsung-s3c24xx/suspend.rst>
 
 config SAMSUNG_WAKEMASK
 	bool
diff --git a/arch/arm/tools/mach-types b/arch/arm/tools/mach-types
index 4eac94c1eb6f..9e74c7ff6b04 100644
--- a/arch/arm/tools/mach-types
+++ b/arch/arm/tools/mach-types
@@ -7,7 +7,7 @@
 #   http://www.arm.linux.org.uk/developer/machines/download.php
 #
 # Please do not send patches to this file; it is automatically generated!
-# To add an entry into this database, please see Documentation/arm/README,
+# To add an entry into this database, please see Documentation/arm/arm.rst,
 # or visit:
 #
 #   http://www.arm.linux.org.uk/developer/machines/?action=new
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a36ff61321ce..a4b22bbf0590 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1142,7 +1142,7 @@ config KUSER_HELPERS
 	  the system. This permits binaries to be run on ARMv4 through
 	  to ARMv8 without modification.
 
-	  See Documentation/arm/kernel_user_helpers.txt for details.
+	  See Documentation/arm/kernel_user_helpers.rst for details.
 
 	  However, the fixed address nature of these helpers can be used
 	  by ROP (return orientated programming) authors when creating
diff --git a/arch/arm64/kernel/kuser32.S b/arch/arm64/kernel/kuser32.S
index 49825e9e421e..42bd8c0c60e0 100644
--- a/arch/arm64/kernel/kuser32.S
+++ b/arch/arm64/kernel/kuser32.S
@@ -10,7 +10,7 @@
  * aarch32_setup_additional_pages() and are provided for compatibility
  * reasons with 32 bit (aarch32) applications that need them.
  *
- * See Documentation/arm/kernel_user_helpers.txt for formal definitions.
+ * See Documentation/arm/kernel_user_helpers.rst for formal definitions.
  */
 
 #include <asm/unistd.h>
diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
index 1738a06396f9..2f81a94c71a6 100644
--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -162,7 +162,7 @@ void __init plat_mem_setup(void)
 	ioport_resource.start = 0;
 	ioport_resource.end = ~0;
 
-	/* intended to somewhat resemble ARM; see Documentation/arm/Booting */
+	/* intended to somewhat resemble ARM; see Documentation/arm/booting.rst */
 	if (fw_arg0 == 0 && fw_arg1 == 0xffffffff)
 		dtb = phys_to_virt(fw_arg2);
 	else if (fw_passed_dtb) /* UHI interface or appended dtb */
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c b/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
index 4ab14d58e85b..6f7cbf6c2b55 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-cipher.c
@@ -8,7 +8,7 @@
  * keysize in CBC and ECB mode.
  * Add support also for DES and 3DES in CBC and ECB mode.
  *
- * You could find the datasheet in Documentation/arm/sunxi/README
+ * You could find the datasheet in Documentation/arm/sunxi.rst
  */
 #include "sun4i-ss.h"
 
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-core.c b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
index cdcda7f059c8..2e8704271f45 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-core.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-core.c
@@ -6,7 +6,7 @@
  *
  * Core file which registers crypto algorithms supported by the SS.
  *
- * You could find a link for the datasheet in Documentation/arm/sunxi/README
+ * You could find a link for the datasheet in Documentation/arm/sunxi.rst
  */
 #include <linux/clk.h>
 #include <linux/crypto.h>
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss-hash.c b/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
index d2b6d89aad28..fcffba5ef927 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
+++ b/drivers/crypto/sunxi-ss/sun4i-ss-hash.c
@@ -6,7 +6,7 @@
  *
  * This file add support for MD5 and SHA1.
  *
- * You could find the datasheet in Documentation/arm/sunxi/README
+ * You could find the datasheet in Documentation/arm/sunxi.rst
  */
 #include "sun4i-ss.h"
 #include <linux/scatterlist.h>
diff --git a/drivers/crypto/sunxi-ss/sun4i-ss.h b/drivers/crypto/sunxi-ss/sun4i-ss.h
index 68b82d1a6303..8654d48aedc0 100644
--- a/drivers/crypto/sunxi-ss/sun4i-ss.h
+++ b/drivers/crypto/sunxi-ss/sun4i-ss.h
@@ -8,7 +8,7 @@
  * Support MD5 and SHA1 hash algorithms.
  * Support DES and 3DES
  *
- * You could find the datasheet in Documentation/arm/sunxi/README
+ * You could find the datasheet in Documentation/arm/sunxi.rst
  */
 
 #include <linux/clk.h>
diff --git a/drivers/input/touchscreen/sun4i-ts.c b/drivers/input/touchscreen/sun4i-ts.c
index 92f6e1ae23a2..f11ba7f2dca7 100644
--- a/drivers/input/touchscreen/sun4i-ts.c
+++ b/drivers/input/touchscreen/sun4i-ts.c
@@ -22,7 +22,7 @@
  * in the kernel). So this driver offers straight forward, reliable single
  * touch functionality only.
  *
- * s.a. A20 User Manual "1.15 TP" (Documentation/arm/sunxi/README)
+ * s.a. A20 User Manual "1.15 TP" (Documentation/arm/sunxi.rst)
  * (looks like the description in the A20 User Manual v1.3 is better
  * than the one in the A10 User Manual v.1.5)
  */
diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index b416c7b33f49..04c23951b831 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -500,7 +500,7 @@ config SERIAL_SA1100
 	help
 	  If you have a machine based on a SA1100/SA1110 StrongARM(R) CPU you
 	  can enable its onboard serial port by enabling this option.
-	  Please read <file:Documentation/arm/SA1100/serial_UART> for further
+	  Please read <file:Documentation/arm/sa1100/serial_uart.rst> for further
 	  info.
 
 config SERIAL_SA1100_CONSOLE
-- 
cgit v1.2.3-55-g7522


From 2bbbf827d339032dbeda62f0a5f20d2fde07b0f5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 15 Apr 2019 18:39:27 -0300
Subject: docs: memory-devices: convert ti-emif.txt to ReST

Prepare this file to be moved to a kernel book by converting
it to ReST format and renaming it to ti-emif.rst.

While this is not part of any book, mark it as :orphan:, in order
to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/memory-devices/ti-emif.rst | 64 ++++++++++++++++++++++++++++++++
 Documentation/memory-devices/ti-emif.txt | 57 ----------------------------
 2 files changed, 64 insertions(+), 57 deletions(-)
 create mode 100644 Documentation/memory-devices/ti-emif.rst
 delete mode 100644 Documentation/memory-devices/ti-emif.txt

diff --git a/Documentation/memory-devices/ti-emif.rst b/Documentation/memory-devices/ti-emif.rst
new file mode 100644
index 000000000000..c9242294e63c
--- /dev/null
+++ b/Documentation/memory-devices/ti-emif.rst
@@ -0,0 +1,64 @@
+:orphan:
+
+===============================
+TI EMIF SDRAM Controller Driver
+===============================
+
+Author
+======
+Aneesh V <aneesh@ti.com>
+
+Location
+========
+driver/memory/emif.c
+
+Supported SoCs:
+===============
+TI OMAP44xx
+TI OMAP54xx
+
+Menuconfig option:
+==================
+Device Drivers
+	Memory devices
+		Texas Instruments EMIF driver
+
+Description
+===========
+This driver is for the EMIF module available in Texas Instruments
+SoCs. EMIF is an SDRAM controller that, based on its revision,
+supports one or more of DDR2, DDR3, and LPDDR2 SDRAM protocols.
+This driver takes care of only LPDDR2 memories presently. The
+functions of the driver includes re-configuring AC timing
+parameters and other settings during frequency, voltage and
+temperature changes
+
+Platform Data (see include/linux/platform_data/emif_plat.h)
+===========================================================
+DDR device details and other board dependent and SoC dependent
+information can be passed through platform data (struct emif_platform_data)
+
+- DDR device details: 'struct ddr_device_info'
+- Device AC timings: 'struct lpddr2_timings' and 'struct lpddr2_min_tck'
+- Custom configurations: customizable policy options through
+  'struct emif_custom_configs'
+- IP revision
+- PHY type
+
+Interface to the external world
+===============================
+EMIF driver registers notifiers for voltage and frequency changes
+affecting EMIF and takes appropriate actions when these are invoked.
+
+- freq_pre_notify_handling()
+- freq_post_notify_handling()
+- volt_notify_handling()
+
+Debugfs
+=======
+The driver creates two debugfs entries per device.
+
+- regcache_dump : dump of register values calculated and saved for all
+  frequencies used so far.
+- mr4 : last polled value of MR4 register in the LPDDR2 device. MR4
+  indicates the current temperature level of the device.
diff --git a/Documentation/memory-devices/ti-emif.txt b/Documentation/memory-devices/ti-emif.txt
deleted file mode 100644
index f4ad9a7d0f4b..000000000000
--- a/Documentation/memory-devices/ti-emif.txt
+++ /dev/null
@@ -1,57 +0,0 @@
-TI EMIF SDRAM Controller Driver:
-
-Author
-========
-Aneesh V <aneesh@ti.com>
-
-Location
-============
-driver/memory/emif.c
-
-Supported SoCs:
-===================
-TI OMAP44xx
-TI OMAP54xx
-
-Menuconfig option:
-==========================
-Device Drivers
-	Memory devices
-		Texas Instruments EMIF driver
-
-Description
-===========
-This driver is for the EMIF module available in Texas Instruments
-SoCs. EMIF is an SDRAM controller that, based on its revision,
-supports one or more of DDR2, DDR3, and LPDDR2 SDRAM protocols.
-This driver takes care of only LPDDR2 memories presently. The
-functions of the driver includes re-configuring AC timing
-parameters and other settings during frequency, voltage and
-temperature changes
-
-Platform Data (see include/linux/platform_data/emif_plat.h):
-=====================================================================
-DDR device details and other board dependent and SoC dependent
-information can be passed through platform data (struct emif_platform_data)
-- DDR device details: 'struct ddr_device_info'
-- Device AC timings: 'struct lpddr2_timings' and 'struct lpddr2_min_tck'
-- Custom configurations: customizable policy options through
-  'struct emif_custom_configs'
-- IP revision
-- PHY type
-
-Interface to the external world:
-================================
-EMIF driver registers notifiers for voltage and frequency changes
-affecting EMIF and takes appropriate actions when these are invoked.
-- freq_pre_notify_handling()
-- freq_post_notify_handling()
-- volt_notify_handling()
-
-Debugfs
-========
-The driver creates two debugfs entries per device.
-- regcache_dump : dump of register values calculated and saved for all
-  frequencies used so far.
-- mr4 : last polled value of MR4 register in the LPDDR2 device. MR4
-  indicates the current temperature level of the device.
-- 
cgit v1.2.3-55-g7522


From 675aaf05d8982d3d304d4652d1555714be8b4af2 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 15 Apr 2019 18:46:48 -0300
Subject: docs: xen-tpmfront.txt: convert it to .rst

In order to be able to add this file to the security book,
we need first to convert it to reST.

While this is not part of any book, mark it as :orphan:, in order
to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/security/tpm/xen-tpmfront.rst | 126 ++++++++++++++++++++++++++++
 Documentation/security/tpm/xen-tpmfront.txt | 113 -------------------------
 2 files changed, 126 insertions(+), 113 deletions(-)
 create mode 100644 Documentation/security/tpm/xen-tpmfront.rst
 delete mode 100644 Documentation/security/tpm/xen-tpmfront.txt

diff --git a/Documentation/security/tpm/xen-tpmfront.rst b/Documentation/security/tpm/xen-tpmfront.rst
new file mode 100644
index 000000000000..98a16ab87360
--- /dev/null
+++ b/Documentation/security/tpm/xen-tpmfront.rst
@@ -0,0 +1,126 @@
+:orphan:
+
+﻿=============================
+Virtual TPM interface for Xen
+=============================
+
+Authors: Matthew Fioravante (JHUAPL), Daniel De Graaf (NSA)
+
+This document describes the virtual Trusted Platform Module (vTPM) subsystem for
+Xen. The reader is assumed to have familiarity with building and installing Xen,
+Linux, and a basic understanding of the TPM and vTPM concepts.
+
+Introduction
+------------
+
+The goal of this work is to provide a TPM functionality to a virtual guest
+operating system (in Xen terms, a DomU).  This allows programs to interact with
+a TPM in a virtual system the same way they interact with a TPM on the physical
+system.  Each guest gets its own unique, emulated, software TPM.  However, each
+of the vTPM's secrets (Keys, NVRAM, etc) are managed by a vTPM Manager domain,
+which seals the secrets to the Physical TPM.  If the process of creating each of
+these domains (manager, vTPM, and guest) is trusted, the vTPM subsystem extends
+the chain of trust rooted in the hardware TPM to virtual machines in Xen. Each
+major component of vTPM is implemented as a separate domain, providing secure
+separation guaranteed by the hypervisor. The vTPM domains are implemented in
+mini-os to reduce memory and processor overhead.
+
+This mini-os vTPM subsystem was built on top of the previous vTPM work done by
+IBM and Intel corporation.
+
+
+Design Overview
+---------------
+
+The architecture of vTPM is described below::
+
+  +------------------+
+  |    Linux DomU    | ...
+  |       |  ^       |
+  |       v  |       |
+  |   xen-tpmfront   |
+  +------------------+
+          |  ^
+          v  |
+  +------------------+
+  | mini-os/tpmback  |
+  |       |  ^       |
+  |       v  |       |
+  |  vtpm-stubdom    | ...
+  |       |  ^       |
+  |       v  |       |
+  | mini-os/tpmfront |
+  +------------------+
+          |  ^
+          v  |
+  +------------------+
+  | mini-os/tpmback  |
+  |       |  ^       |
+  |       v  |       |
+  | vtpmmgr-stubdom  |
+  |       |  ^       |
+  |       v  |       |
+  | mini-os/tpm_tis  |
+  +------------------+
+          |  ^
+          v  |
+  +------------------+
+  |   Hardware TPM   |
+  +------------------+
+
+* Linux DomU:
+	       The Linux based guest that wants to use a vTPM. There may be
+	       more than one of these.
+
+* xen-tpmfront.ko:
+		    Linux kernel virtual TPM frontend driver. This driver
+                    provides vTPM access to a Linux-based DomU.
+
+* mini-os/tpmback:
+		    Mini-os TPM backend driver. The Linux frontend driver
+		    connects to this backend driver to facilitate communications
+		    between the Linux DomU and its vTPM. This driver is also
+		    used by vtpmmgr-stubdom to communicate with vtpm-stubdom.
+
+* vtpm-stubdom:
+		 A mini-os stub domain that implements a vTPM. There is a
+		 one to one mapping between running vtpm-stubdom instances and
+                 logical vtpms on the system. The vTPM Platform Configuration
+                 Registers (PCRs) are normally all initialized to zero.
+
+* mini-os/tpmfront:
+		     Mini-os TPM frontend driver. The vTPM mini-os domain
+		     vtpm-stubdom uses this driver to communicate with
+		     vtpmmgr-stubdom. This driver is also used in mini-os
+		     domains such as pv-grub that talk to the vTPM domain.
+
+* vtpmmgr-stubdom:
+		    A mini-os domain that implements the vTPM manager. There is
+		    only one vTPM manager and it should be running during the
+		    entire lifetime of the machine.  This domain regulates
+		    access to the physical TPM on the system and secures the
+		    persistent state of each vTPM.
+
+* mini-os/tpm_tis:
+		    Mini-os TPM version 1.2 TPM Interface Specification (TIS)
+                    driver. This driver used by vtpmmgr-stubdom to talk directly to
+                    the hardware TPM. Communication is facilitated by mapping
+                    hardware memory pages into vtpmmgr-stubdom.
+
+* Hardware TPM:
+		The physical TPM that is soldered onto the motherboard.
+
+
+Integration With Xen
+--------------------
+
+Support for the vTPM driver was added in Xen using the libxl toolstack in Xen
+4.3.  See the Xen documentation (docs/misc/vtpm.txt) for details on setting up
+the vTPM and vTPM Manager stub domains.  Once the stub domains are running, a
+vTPM device is set up in the same manner as a disk or network device in the
+domain's configuration file.
+
+In order to use features such as IMA that require a TPM to be loaded prior to
+the initrd, the xen-tpmfront driver must be compiled in to the kernel.  If not
+using such features, the driver can be compiled as a module and will be loaded
+as usual.
diff --git a/Documentation/security/tpm/xen-tpmfront.txt b/Documentation/security/tpm/xen-tpmfront.txt
deleted file mode 100644
index 69346de87ff3..000000000000
--- a/Documentation/security/tpm/xen-tpmfront.txt
+++ /dev/null
@@ -1,113 +0,0 @@
-Virtual TPM interface for Xen
-
-Authors: Matthew Fioravante (JHUAPL), Daniel De Graaf (NSA)
-
-This document describes the virtual Trusted Platform Module (vTPM) subsystem for
-Xen. The reader is assumed to have familiarity with building and installing Xen,
-Linux, and a basic understanding of the TPM and vTPM concepts.
-
-INTRODUCTION
-
-The goal of this work is to provide a TPM functionality to a virtual guest
-operating system (in Xen terms, a DomU).  This allows programs to interact with
-a TPM in a virtual system the same way they interact with a TPM on the physical
-system.  Each guest gets its own unique, emulated, software TPM.  However, each
-of the vTPM's secrets (Keys, NVRAM, etc) are managed by a vTPM Manager domain,
-which seals the secrets to the Physical TPM.  If the process of creating each of
-these domains (manager, vTPM, and guest) is trusted, the vTPM subsystem extends
-the chain of trust rooted in the hardware TPM to virtual machines in Xen. Each
-major component of vTPM is implemented as a separate domain, providing secure
-separation guaranteed by the hypervisor. The vTPM domains are implemented in
-mini-os to reduce memory and processor overhead.
-
-This mini-os vTPM subsystem was built on top of the previous vTPM work done by
-IBM and Intel corporation.
-
-
-DESIGN OVERVIEW
----------------
-
-The architecture of vTPM is described below:
-
-+------------------+
-|    Linux DomU    | ...
-|       |  ^       |
-|       v  |       |
-|   xen-tpmfront   |
-+------------------+
-        |  ^
-        v  |
-+------------------+
-| mini-os/tpmback  |
-|       |  ^       |
-|       v  |       |
-|  vtpm-stubdom    | ...
-|       |  ^       |
-|       v  |       |
-| mini-os/tpmfront |
-+------------------+
-        |  ^
-        v  |
-+------------------+
-| mini-os/tpmback  |
-|       |  ^       |
-|       v  |       |
-| vtpmmgr-stubdom  |
-|       |  ^       |
-|       v  |       |
-| mini-os/tpm_tis  |
-+------------------+
-        |  ^
-        v  |
-+------------------+
-|   Hardware TPM   |
-+------------------+
-
- * Linux DomU: The Linux based guest that wants to use a vTPM. There may be
-	       more than one of these.
-
- * xen-tpmfront.ko: Linux kernel virtual TPM frontend driver. This driver
-                    provides vTPM access to a Linux-based DomU.
-
- * mini-os/tpmback: Mini-os TPM backend driver. The Linux frontend driver
-		    connects to this backend driver to facilitate communications
-		    between the Linux DomU and its vTPM. This driver is also
-		    used by vtpmmgr-stubdom to communicate with vtpm-stubdom.
-
- * vtpm-stubdom: A mini-os stub domain that implements a vTPM. There is a
-		 one to one mapping between running vtpm-stubdom instances and
-                 logical vtpms on the system. The vTPM Platform Configuration
-                 Registers (PCRs) are normally all initialized to zero.
-
- * mini-os/tpmfront: Mini-os TPM frontend driver. The vTPM mini-os domain
-		     vtpm-stubdom uses this driver to communicate with
-		     vtpmmgr-stubdom. This driver is also used in mini-os
-		     domains such as pv-grub that talk to the vTPM domain.
-
- * vtpmmgr-stubdom: A mini-os domain that implements the vTPM manager. There is
-		    only one vTPM manager and it should be running during the
-		    entire lifetime of the machine.  This domain regulates
-		    access to the physical TPM on the system and secures the
-		    persistent state of each vTPM.
-
- * mini-os/tpm_tis: Mini-os TPM version 1.2 TPM Interface Specification (TIS)
-                    driver. This driver used by vtpmmgr-stubdom to talk directly to
-                    the hardware TPM. Communication is facilitated by mapping
-                    hardware memory pages into vtpmmgr-stubdom.
-
- * Hardware TPM: The physical TPM that is soldered onto the motherboard.
-
-
-INTEGRATION WITH XEN
---------------------
-
-Support for the vTPM driver was added in Xen using the libxl toolstack in Xen
-4.3.  See the Xen documentation (docs/misc/vtpm.txt) for details on setting up
-the vTPM and vTPM Manager stub domains.  Once the stub domains are running, a
-vTPM device is set up in the same manner as a disk or network device in the
-domain's configuration file.
-
-In order to use features such as IMA that require a TPM to be loaded prior to
-the initrd, the xen-tpmfront driver must be compiled in to the kernel.  If not
-using such features, the driver can be compiled as a module and will be loaded
-as usual.
-- 
cgit v1.2.3-55-g7522


From 619ba4516771bdfb96658e7a5f57e6551232549a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 19 Apr 2019 18:49:49 -0300
Subject: docs: bus-devices: ti-gpmc.rst: convert it to ReST

In order to be able to add this file to a book, it needs
first to be converted to ReST and renamed.

While this is not part of any book, mark it as :orphan:, in order
to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/bus-devices/ti-gpmc.rst | 179 ++++++++++++++++++++++++++++++++++
 Documentation/bus-devices/ti-gpmc.txt | 122 -----------------------
 2 files changed, 179 insertions(+), 122 deletions(-)
 create mode 100644 Documentation/bus-devices/ti-gpmc.rst
 delete mode 100644 Documentation/bus-devices/ti-gpmc.txt

diff --git a/Documentation/bus-devices/ti-gpmc.rst b/Documentation/bus-devices/ti-gpmc.rst
new file mode 100644
index 000000000000..87c366e418be
--- /dev/null
+++ b/Documentation/bus-devices/ti-gpmc.rst
@@ -0,0 +1,179 @@
+:orphan:
+
+========================================
+GPMC (General Purpose Memory Controller)
+========================================
+
+GPMC is an unified memory controller dedicated to interfacing external
+memory devices like
+
+ * Asynchronous SRAM like memories and application specific integrated
+   circuit devices.
+ * Asynchronous, synchronous, and page mode burst NOR flash devices
+   NAND flash
+ * Pseudo-SRAM devices
+
+GPMC is found on Texas Instruments SoC's (OMAP based)
+IP details: http://www.ti.com/lit/pdf/spruh73 section 7.1
+
+
+GPMC generic timing calculation:
+================================
+
+GPMC has certain timings that has to be programmed for proper
+functioning of the peripheral, while peripheral has another set of
+timings. To have peripheral work with gpmc, peripheral timings has to
+be translated to the form gpmc can understand. The way it has to be
+translated depends on the connected peripheral. Also there is a
+dependency for certain gpmc timings on gpmc clock frequency. Hence a
+generic timing routine was developed to achieve above requirements.
+
+Generic routine provides a generic method to calculate gpmc timings
+from gpmc peripheral timings. struct gpmc_device_timings fields has to
+be updated with timings from the datasheet of the peripheral that is
+connected to gpmc. A few of the peripheral timings can be fed either
+in time or in cycles, provision to handle this scenario has been
+provided (refer struct gpmc_device_timings definition). It may so
+happen that timing as specified by peripheral datasheet is not present
+in timing structure, in this scenario, try to correlate peripheral
+timing to the one available. If that doesn't work, try to add a new
+field as required by peripheral, educate generic timing routine to
+handle it, make sure that it does not break any of the existing.
+Then there may be cases where peripheral datasheet doesn't mention
+certain fields of struct gpmc_device_timings, zero those entries.
+
+Generic timing routine has been verified to work properly on
+multiple onenand's and tusb6010 peripherals.
+
+A word of caution: generic timing routine has been developed based
+on understanding of gpmc timings, peripheral timings, available
+custom timing routines, a kind of reverse engineering without
+most of the datasheets & hardware (to be exact none of those supported
+in mainline having custom timing routine) and by simulation.
+
+gpmc timing dependency on peripheral timings:
+
+[<gpmc_timing>: <peripheral timing1>, <peripheral timing2> ...]
+
+1. common
+
+cs_on:
+	t_ceasu
+adv_on:
+	t_avdasu, t_ceavd
+
+2. sync common
+
+sync_clk:
+	clk
+page_burst_access:
+	t_bacc
+clk_activation:
+	t_ces, t_avds
+
+3. read async muxed
+
+adv_rd_off:
+	t_avdp_r
+oe_on:
+	t_oeasu, t_aavdh
+access:
+	t_iaa, t_oe, t_ce, t_aa
+rd_cycle:
+	t_rd_cycle, t_cez_r, t_oez
+
+4. read async non-muxed
+
+adv_rd_off:
+	t_avdp_r
+oe_on:
+	t_oeasu
+access:
+	t_iaa, t_oe, t_ce, t_aa
+rd_cycle:
+	t_rd_cycle, t_cez_r, t_oez
+
+5. read sync muxed
+
+adv_rd_off:
+	t_avdp_r, t_avdh
+oe_on:
+	t_oeasu, t_ach, cyc_aavdh_oe
+access:
+	t_iaa, cyc_iaa, cyc_oe
+rd_cycle:
+	t_cez_r, t_oez, t_ce_rdyz
+
+6. read sync non-muxed
+
+adv_rd_off:
+	t_avdp_r
+oe_on:
+	t_oeasu
+access:
+	t_iaa, cyc_iaa, cyc_oe
+rd_cycle:
+	t_cez_r, t_oez, t_ce_rdyz
+
+7. write async muxed
+
+adv_wr_off:
+	t_avdp_w
+we_on, wr_data_mux_bus:
+	t_weasu, t_aavdh, cyc_aavhd_we
+we_off:
+	t_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_wr_cycle
+
+8. write async non-muxed
+
+adv_wr_off:
+	t_avdp_w
+we_on, wr_data_mux_bus:
+	t_weasu
+we_off:
+	t_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_wr_cycle
+
+9. write sync muxed
+
+adv_wr_off:
+	t_avdp_w, t_avdh
+we_on, wr_data_mux_bus:
+	t_weasu, t_rdyo, t_aavdh, cyc_aavhd_we
+we_off:
+	t_wpl, cyc_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_ce_rdyz
+
+10. write sync non-muxed
+
+adv_wr_off:
+	t_avdp_w
+we_on, wr_data_mux_bus:
+	t_weasu, t_rdyo
+we_off:
+	t_wpl, cyc_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_ce_rdyz
+
+
+Note:
+  Many of gpmc timings are dependent on other gpmc timings (a few
+  gpmc timings purely dependent on other gpmc timings, a reason that
+  some of the gpmc timings are missing above), and it will result in
+  indirect dependency of peripheral timings to gpmc timings other than
+  mentioned above, refer timing routine for more details. To know what
+  these peripheral timings correspond to, please see explanations in
+  struct gpmc_device_timings definition. And for gpmc timings refer
+  IP details (link above).
diff --git a/Documentation/bus-devices/ti-gpmc.txt b/Documentation/bus-devices/ti-gpmc.txt
deleted file mode 100644
index cc9ce57e0a26..000000000000
--- a/Documentation/bus-devices/ti-gpmc.txt
+++ /dev/null
@@ -1,122 +0,0 @@
-GPMC (General Purpose Memory Controller):
-=========================================
-
-GPMC is an unified memory controller dedicated to interfacing external
-memory devices like
- * Asynchronous SRAM like memories and application specific integrated
-   circuit devices.
- * Asynchronous, synchronous, and page mode burst NOR flash devices
-   NAND flash
- * Pseudo-SRAM devices
-
-GPMC is found on Texas Instruments SoC's (OMAP based)
-IP details: http://www.ti.com/lit/pdf/spruh73 section 7.1
-
-
-GPMC generic timing calculation:
-================================
-
-GPMC has certain timings that has to be programmed for proper
-functioning of the peripheral, while peripheral has another set of
-timings. To have peripheral work with gpmc, peripheral timings has to
-be translated to the form gpmc can understand. The way it has to be
-translated depends on the connected peripheral. Also there is a
-dependency for certain gpmc timings on gpmc clock frequency. Hence a
-generic timing routine was developed to achieve above requirements.
-
-Generic routine provides a generic method to calculate gpmc timings
-from gpmc peripheral timings. struct gpmc_device_timings fields has to
-be updated with timings from the datasheet of the peripheral that is
-connected to gpmc. A few of the peripheral timings can be fed either
-in time or in cycles, provision to handle this scenario has been
-provided (refer struct gpmc_device_timings definition). It may so
-happen that timing as specified by peripheral datasheet is not present
-in timing structure, in this scenario, try to correlate peripheral
-timing to the one available. If that doesn't work, try to add a new
-field as required by peripheral, educate generic timing routine to
-handle it, make sure that it does not break any of the existing.
-Then there may be cases where peripheral datasheet doesn't mention
-certain fields of struct gpmc_device_timings, zero those entries.
-
-Generic timing routine has been verified to work properly on
-multiple onenand's and tusb6010 peripherals.
-
-A word of caution: generic timing routine has been developed based
-on understanding of gpmc timings, peripheral timings, available
-custom timing routines, a kind of reverse engineering without
-most of the datasheets & hardware (to be exact none of those supported
-in mainline having custom timing routine) and by simulation.
-
-gpmc timing dependency on peripheral timings:
-[<gpmc_timing>: <peripheral timing1>, <peripheral timing2> ...]
-
-1. common
-cs_on: t_ceasu
-adv_on: t_avdasu, t_ceavd
-
-2. sync common
-sync_clk: clk
-page_burst_access: t_bacc
-clk_activation: t_ces, t_avds
-
-3. read async muxed
-adv_rd_off: t_avdp_r
-oe_on: t_oeasu, t_aavdh
-access: t_iaa, t_oe, t_ce, t_aa
-rd_cycle: t_rd_cycle, t_cez_r, t_oez
-
-4. read async non-muxed
-adv_rd_off: t_avdp_r
-oe_on: t_oeasu
-access: t_iaa, t_oe, t_ce, t_aa
-rd_cycle: t_rd_cycle, t_cez_r, t_oez
-
-5. read sync muxed
-adv_rd_off: t_avdp_r, t_avdh
-oe_on: t_oeasu, t_ach, cyc_aavdh_oe
-access: t_iaa, cyc_iaa, cyc_oe
-rd_cycle: t_cez_r, t_oez, t_ce_rdyz
-
-6. read sync non-muxed
-adv_rd_off: t_avdp_r
-oe_on: t_oeasu
-access: t_iaa, cyc_iaa, cyc_oe
-rd_cycle: t_cez_r, t_oez, t_ce_rdyz
-
-7. write async muxed
-adv_wr_off: t_avdp_w
-we_on, wr_data_mux_bus: t_weasu, t_aavdh, cyc_aavhd_we
-we_off: t_wpl
-cs_wr_off: t_wph
-wr_cycle: t_cez_w, t_wr_cycle
-
-8. write async non-muxed
-adv_wr_off: t_avdp_w
-we_on, wr_data_mux_bus: t_weasu
-we_off: t_wpl
-cs_wr_off: t_wph
-wr_cycle: t_cez_w, t_wr_cycle
-
-9. write sync muxed
-adv_wr_off: t_avdp_w, t_avdh
-we_on, wr_data_mux_bus: t_weasu, t_rdyo, t_aavdh, cyc_aavhd_we
-we_off: t_wpl, cyc_wpl
-cs_wr_off: t_wph
-wr_cycle: t_cez_w, t_ce_rdyz
-
-10. write sync non-muxed
-adv_wr_off: t_avdp_w
-we_on, wr_data_mux_bus: t_weasu, t_rdyo
-we_off: t_wpl, cyc_wpl
-cs_wr_off: t_wph
-wr_cycle: t_cez_w, t_ce_rdyz
-
-
-Note: Many of gpmc timings are dependent on other gpmc timings (a few
-gpmc timings purely dependent on other gpmc timings, a reason that
-some of the gpmc timings are missing above), and it will result in
-indirect dependency of peripheral timings to gpmc timings other than
-mentioned above, refer timing routine for more details. To know what
-these peripheral timings correspond to, please see explanations in
-struct gpmc_device_timings definition. And for gpmc timings refer
-IP details (link above).
-- 
cgit v1.2.3-55-g7522


From a278295ccc2ddd1dc0ac8423a12ff6dd74f0d502 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 15 Apr 2019 19:25:27 -0300
Subject: docs: nvmem: convert docs to ReST and rename to *.rst

In order to be able to add it into a doc book, we need first
convert it to ReST.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - mark literal blocks;
  - adjust title markups.

While this is not part of any book, mark it as :orphan:, in order
to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/nvmem/nvmem.rst | 189 ++++++++++++++++++++++++++++++++++++++++++
 Documentation/nvmem/nvmem.txt | 183 ----------------------------------------
 2 files changed, 189 insertions(+), 183 deletions(-)
 create mode 100644 Documentation/nvmem/nvmem.rst
 delete mode 100644 Documentation/nvmem/nvmem.txt

diff --git a/Documentation/nvmem/nvmem.rst b/Documentation/nvmem/nvmem.rst
new file mode 100644
index 000000000000..3866b6e066d5
--- /dev/null
+++ b/Documentation/nvmem/nvmem.rst
@@ -0,0 +1,189 @@
+:orphan:
+
+===============
+NVMEM Subsystem
+===============
+
+ Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
+
+This document explains the NVMEM Framework along with the APIs provided,
+and how to use it.
+
+1. Introduction
+===============
+*NVMEM* is the abbreviation for Non Volatile Memory layer. It is used to
+retrieve configuration of SOC or Device specific data from non volatile
+memories like eeprom, efuses and so on.
+
+Before this framework existed, NVMEM drivers like eeprom were stored in
+drivers/misc, where they all had to duplicate pretty much the same code to
+register a sysfs file, allow in-kernel users to access the content of the
+devices they were driving, etc.
+
+This was also a problem as far as other in-kernel users were involved, since
+the solutions used were pretty much different from one driver to another, there
+was a rather big abstraction leak.
+
+This framework aims at solve these problems. It also introduces DT
+representation for consumer devices to go get the data they require (MAC
+Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
+framework is based on regmap, so that most of the abstraction available in
+regmap can be reused, across multiple types of buses.
+
+NVMEM Providers
++++++++++++++++
+
+NVMEM provider refers to an entity that implements methods to initialize, read
+and write the non-volatile memory.
+
+2. Registering/Unregistering the NVMEM provider
+===============================================
+
+A NVMEM provider can register with NVMEM core by supplying relevant
+nvmem configuration to nvmem_register(), on success core would return a valid
+nvmem_device pointer.
+
+nvmem_unregister(nvmem) is used to unregister a previously registered provider.
+
+For example, a simple qfprom case::
+
+  static struct nvmem_config econfig = {
+	.name = "qfprom",
+	.owner = THIS_MODULE,
+  };
+
+  static int qfprom_probe(struct platform_device *pdev)
+  {
+	...
+	econfig.dev = &pdev->dev;
+	nvmem = nvmem_register(&econfig);
+	...
+  }
+
+It is mandatory that the NVMEM provider has a regmap associated with its
+struct device. Failure to do would return error code from nvmem_register().
+
+Users of board files can define and register nvmem cells using the
+nvmem_cell_table struct::
+
+  static struct nvmem_cell_info foo_nvmem_cells[] = {
+	{
+		.name		= "macaddr",
+		.offset		= 0x7f00,
+		.bytes		= ETH_ALEN,
+	}
+  };
+
+  static struct nvmem_cell_table foo_nvmem_cell_table = {
+	.nvmem_name		= "i2c-eeprom",
+	.cells			= foo_nvmem_cells,
+	.ncells			= ARRAY_SIZE(foo_nvmem_cells),
+  };
+
+  nvmem_add_cell_table(&foo_nvmem_cell_table);
+
+Additionally it is possible to create nvmem cell lookup entries and register
+them with the nvmem framework from machine code as shown in the example below::
+
+  static struct nvmem_cell_lookup foo_nvmem_lookup = {
+	.nvmem_name		= "i2c-eeprom",
+	.cell_name		= "macaddr",
+	.dev_id			= "foo_mac.0",
+	.con_id			= "mac-address",
+  };
+
+  nvmem_add_cell_lookups(&foo_nvmem_lookup, 1);
+
+NVMEM Consumers
++++++++++++++++
+
+NVMEM consumers are the entities which make use of the NVMEM provider to
+read from and to NVMEM.
+
+3. NVMEM cell based consumer APIs
+=================================
+
+NVMEM cells are the data entries/fields in the NVMEM.
+The NVMEM framework provides 3 APIs to read/write NVMEM cells::
+
+  struct nvmem_cell *nvmem_cell_get(struct device *dev, const char *name);
+  struct nvmem_cell *devm_nvmem_cell_get(struct device *dev, const char *name);
+
+  void nvmem_cell_put(struct nvmem_cell *cell);
+  void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
+
+  void *nvmem_cell_read(struct nvmem_cell *cell, ssize_t *len);
+  int nvmem_cell_write(struct nvmem_cell *cell, void *buf, ssize_t len);
+
+`*nvmem_cell_get()` apis will get a reference to nvmem cell for a given id,
+and nvmem_cell_read/write() can then read or write to the cell.
+Once the usage of the cell is finished the consumer should call
+`*nvmem_cell_put()` to free all the allocation memory for the cell.
+
+4. Direct NVMEM device based consumer APIs
+==========================================
+
+In some instances it is necessary to directly read/write the NVMEM.
+To facilitate such consumers NVMEM framework provides below apis::
+
+  struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
+  struct nvmem_device *devm_nvmem_device_get(struct device *dev,
+					   const char *name);
+  void nvmem_device_put(struct nvmem_device *nvmem);
+  int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
+		      size_t bytes, void *buf);
+  int nvmem_device_write(struct nvmem_device *nvmem, unsigned int offset,
+		       size_t bytes, void *buf);
+  int nvmem_device_cell_read(struct nvmem_device *nvmem,
+			   struct nvmem_cell_info *info, void *buf);
+  int nvmem_device_cell_write(struct nvmem_device *nvmem,
+			    struct nvmem_cell_info *info, void *buf);
+
+Before the consumers can read/write NVMEM directly, it should get hold
+of nvmem_controller from one of the `*nvmem_device_get()` api.
+
+The difference between these apis and cell based apis is that these apis always
+take nvmem_device as parameter.
+
+5. Releasing a reference to the NVMEM
+=====================================
+
+When a consumer no longer needs the NVMEM, it has to release the reference
+to the NVMEM it has obtained using the APIs mentioned in the above section.
+The NVMEM framework provides 2 APIs to release a reference to the NVMEM::
+
+  void nvmem_cell_put(struct nvmem_cell *cell);
+  void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
+  void nvmem_device_put(struct nvmem_device *nvmem);
+  void devm_nvmem_device_put(struct device *dev, struct nvmem_device *nvmem);
+
+Both these APIs are used to release a reference to the NVMEM and
+devm_nvmem_cell_put and devm_nvmem_device_put destroys the devres associated
+with this NVMEM.
+
+Userspace
++++++++++
+
+6. Userspace binary interface
+==============================
+
+Userspace can read/write the raw NVMEM file located at::
+
+	/sys/bus/nvmem/devices/*/nvmem
+
+ex::
+
+  hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
+
+  0000000 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  00000a0 db10 2240 0000 e000 0c00 0c00 0000 0c00
+  0000000 0000 0000 0000 0000 0000 0000 0000 0000
+  ...
+  *
+  0001000
+
+7. DeviceTree Binding
+=====================
+
+See Documentation/devicetree/bindings/nvmem/nvmem.txt
diff --git a/Documentation/nvmem/nvmem.txt b/Documentation/nvmem/nvmem.txt
deleted file mode 100644
index fc2fe4b18655..000000000000
--- a/Documentation/nvmem/nvmem.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-			    NVMEM SUBSYSTEM
-	  Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
-
-This document explains the NVMEM Framework along with the APIs provided,
-and how to use it.
-
-1. Introduction
-===============
-*NVMEM* is the abbreviation for Non Volatile Memory layer. It is used to
-retrieve configuration of SOC or Device specific data from non volatile
-memories like eeprom, efuses and so on.
-
-Before this framework existed, NVMEM drivers like eeprom were stored in
-drivers/misc, where they all had to duplicate pretty much the same code to
-register a sysfs file, allow in-kernel users to access the content of the
-devices they were driving, etc.
-
-This was also a problem as far as other in-kernel users were involved, since
-the solutions used were pretty much different from one driver to another, there
-was a rather big abstraction leak.
-
-This framework aims at solve these problems. It also introduces DT
-representation for consumer devices to go get the data they require (MAC
-Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
-framework is based on regmap, so that most of the abstraction available in
-regmap can be reused, across multiple types of buses.
-
-NVMEM Providers
-+++++++++++++++
-
-NVMEM provider refers to an entity that implements methods to initialize, read
-and write the non-volatile memory.
-
-2. Registering/Unregistering the NVMEM provider
-===============================================
-
-A NVMEM provider can register with NVMEM core by supplying relevant
-nvmem configuration to nvmem_register(), on success core would return a valid
-nvmem_device pointer.
-
-nvmem_unregister(nvmem) is used to unregister a previously registered provider.
-
-For example, a simple qfprom case:
-
-static struct nvmem_config econfig = {
-	.name = "qfprom",
-	.owner = THIS_MODULE,
-};
-
-static int qfprom_probe(struct platform_device *pdev)
-{
-	...
-	econfig.dev = &pdev->dev;
-	nvmem = nvmem_register(&econfig);
-	...
-}
-
-It is mandatory that the NVMEM provider has a regmap associated with its
-struct device. Failure to do would return error code from nvmem_register().
-
-Users of board files can define and register nvmem cells using the
-nvmem_cell_table struct:
-
-static struct nvmem_cell_info foo_nvmem_cells[] = {
-	{
-		.name		= "macaddr",
-		.offset		= 0x7f00,
-		.bytes		= ETH_ALEN,
-	}
-};
-
-static struct nvmem_cell_table foo_nvmem_cell_table = {
-	.nvmem_name		= "i2c-eeprom",
-	.cells			= foo_nvmem_cells,
-	.ncells			= ARRAY_SIZE(foo_nvmem_cells),
-};
-
-nvmem_add_cell_table(&foo_nvmem_cell_table);
-
-Additionally it is possible to create nvmem cell lookup entries and register
-them with the nvmem framework from machine code as shown in the example below:
-
-static struct nvmem_cell_lookup foo_nvmem_lookup = {
-	.nvmem_name		= "i2c-eeprom",
-	.cell_name		= "macaddr",
-	.dev_id			= "foo_mac.0",
-	.con_id			= "mac-address",
-};
-
-nvmem_add_cell_lookups(&foo_nvmem_lookup, 1);
-
-NVMEM Consumers
-+++++++++++++++
-
-NVMEM consumers are the entities which make use of the NVMEM provider to
-read from and to NVMEM.
-
-3. NVMEM cell based consumer APIs
-=================================
-
-NVMEM cells are the data entries/fields in the NVMEM.
-The NVMEM framework provides 3 APIs to read/write NVMEM cells.
-
-struct nvmem_cell *nvmem_cell_get(struct device *dev, const char *name);
-struct nvmem_cell *devm_nvmem_cell_get(struct device *dev, const char *name);
-
-void nvmem_cell_put(struct nvmem_cell *cell);
-void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
-
-void *nvmem_cell_read(struct nvmem_cell *cell, ssize_t *len);
-int nvmem_cell_write(struct nvmem_cell *cell, void *buf, ssize_t len);
-
-*nvmem_cell_get() apis will get a reference to nvmem cell for a given id,
-and nvmem_cell_read/write() can then read or write to the cell.
-Once the usage of the cell is finished the consumer should call *nvmem_cell_put()
-to free all the allocation memory for the cell.
-
-4. Direct NVMEM device based consumer APIs
-==========================================
-
-In some instances it is necessary to directly read/write the NVMEM.
-To facilitate such consumers NVMEM framework provides below apis.
-
-struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
-struct nvmem_device *devm_nvmem_device_get(struct device *dev,
-					   const char *name);
-void nvmem_device_put(struct nvmem_device *nvmem);
-int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
-		      size_t bytes, void *buf);
-int nvmem_device_write(struct nvmem_device *nvmem, unsigned int offset,
-		       size_t bytes, void *buf);
-int nvmem_device_cell_read(struct nvmem_device *nvmem,
-			   struct nvmem_cell_info *info, void *buf);
-int nvmem_device_cell_write(struct nvmem_device *nvmem,
-			    struct nvmem_cell_info *info, void *buf);
-
-Before the consumers can read/write NVMEM directly, it should get hold
-of nvmem_controller from one of the *nvmem_device_get() api.
-
-The difference between these apis and cell based apis is that these apis always
-take nvmem_device as parameter.
-
-5. Releasing a reference to the NVMEM
-=====================================
-
-When a consumer no longer needs the NVMEM, it has to release the reference
-to the NVMEM it has obtained using the APIs mentioned in the above section.
-The NVMEM framework provides 2 APIs to release a reference to the NVMEM.
-
-void nvmem_cell_put(struct nvmem_cell *cell);
-void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
-void nvmem_device_put(struct nvmem_device *nvmem);
-void devm_nvmem_device_put(struct device *dev, struct nvmem_device *nvmem);
-
-Both these APIs are used to release a reference to the NVMEM and
-devm_nvmem_cell_put and devm_nvmem_device_put destroys the devres associated
-with this NVMEM.
-
-Userspace
-+++++++++
-
-6. Userspace binary interface
-==============================
-
-Userspace can read/write the raw NVMEM file located at
-/sys/bus/nvmem/devices/*/nvmem
-
-ex:
-
-hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
-
-0000000 0000 0000 0000 0000 0000 0000 0000 0000
-*
-00000a0 db10 2240 0000 e000 0c00 0c00 0000 0c00
-0000000 0000 0000 0000 0000 0000 0000 0000 0000
-...
-*
-0001000
-
-7. DeviceTree Binding
-=====================
-
-See Documentation/devicetree/bindings/nvmem/nvmem.txt
-- 
cgit v1.2.3-55-g7522


From 1945a035540e2cef0362a2e7e828f8cf547e86b8 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 15 Apr 2019 19:27:55 -0300
Subject: docs: phy: convert samsung-usb2.txt to ReST format

In order to merge it into a Sphinx book, we need first to
convert to ReST.

While this is not part of any book, mark it as :orphan:, in order
to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/phy/samsung-usb2.rst | 139 +++++++++++++++++++++++++++++++++++++
 Documentation/phy/samsung-usb2.txt | 135 -----------------------------------
 MAINTAINERS                        |   2 +-
 3 files changed, 140 insertions(+), 136 deletions(-)
 create mode 100644 Documentation/phy/samsung-usb2.rst
 delete mode 100644 Documentation/phy/samsung-usb2.txt

diff --git a/Documentation/phy/samsung-usb2.rst b/Documentation/phy/samsung-usb2.rst
new file mode 100644
index 000000000000..98b5952fcb97
--- /dev/null
+++ b/Documentation/phy/samsung-usb2.rst
@@ -0,0 +1,139 @@
+:orphan:
+
+====================================
+Samsung USB 2.0 PHY adaptation layer
+====================================
+
+1. Description
+--------------
+
+The architecture of the USB 2.0 PHY module in Samsung SoCs is similar
+among many SoCs. In spite of the similarities it proved difficult to
+create a one driver that would fit all these PHY controllers. Often
+the differences were minor and were found in particular bits of the
+registers of the PHY. In some rare cases the order of register writes or
+the PHY powering up process had to be altered. This adaptation layer is
+a compromise between having separate drivers and having a single driver
+with added support for many special cases.
+
+2. Files description
+--------------------
+
+- phy-samsung-usb2.c
+   This is the main file of the adaptation layer. This file contains
+   the probe function and provides two callbacks to the Generic PHY
+   Framework. This two callbacks are used to power on and power off the
+   phy. They carry out the common work that has to be done on all version
+   of the PHY module. Depending on which SoC was chosen they execute SoC
+   specific callbacks. The specific SoC version is selected by choosing
+   the appropriate compatible string. In addition, this file contains
+   struct of_device_id definitions for particular SoCs.
+
+- phy-samsung-usb2.h
+   This is the include file. It declares the structures used by this
+   driver. In addition it should contain extern declarations for
+   structures that describe particular SoCs.
+
+3. Supporting SoCs
+------------------
+
+To support a new SoC a new file should be added to the drivers/phy
+directory. Each SoC's configuration is stored in an instance of the
+struct samsung_usb2_phy_config::
+
+  struct samsung_usb2_phy_config {
+	const struct samsung_usb2_common_phy *phys;
+	int (*rate_to_clk)(unsigned long, u32 *);
+	unsigned int num_phys;
+	bool has_mode_switch;
+  };
+
+The num_phys is the number of phys handled by the driver. `*phys` is an
+array that contains the configuration for each phy. The has_mode_switch
+property is a boolean flag that determines whether the SoC has USB host
+and device on a single pair of pins. If so, a special register has to
+be modified to change the internal routing of these pins between a USB
+device or host module.
+
+For example the configuration for Exynos 4210 is following::
+
+  const struct samsung_usb2_phy_config exynos4210_usb2_phy_config = {
+	.has_mode_switch        = 0,
+	.num_phys		= EXYNOS4210_NUM_PHYS,
+	.phys			= exynos4210_phys,
+	.rate_to_clk		= exynos4210_rate_to_clk,
+  }
+
+- `int (*rate_to_clk)(unsigned long, u32 *)`
+
+	The rate_to_clk callback is to convert the rate of the clock
+	used as the reference clock for the PHY module to the value
+	that should be written in the hardware register.
+
+The exynos4210_phys configuration array is as follows::
+
+  static const struct samsung_usb2_common_phy exynos4210_phys[] = {
+	{
+		.label		= "device",
+		.id		= EXYNOS4210_DEVICE,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{
+		.label		= "host",
+		.id		= EXYNOS4210_HOST,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{
+		.label		= "hsic0",
+		.id		= EXYNOS4210_HSIC0,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{
+		.label		= "hsic1",
+		.id		= EXYNOS4210_HSIC1,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{},
+  };
+
+- `int (*power_on)(struct samsung_usb2_phy_instance *);`
+  `int (*power_off)(struct samsung_usb2_phy_instance *);`
+
+	These two callbacks are used to power on and power off the phy
+	by modifying appropriate registers.
+
+Final change to the driver is adding appropriate compatible value to the
+phy-samsung-usb2.c file. In case of Exynos 4210 the following lines were
+added to the struct of_device_id samsung_usb2_phy_of_match[] array::
+
+  #ifdef CONFIG_PHY_EXYNOS4210_USB2
+	{
+		.compatible = "samsung,exynos4210-usb2-phy",
+		.data = &exynos4210_usb2_phy_config,
+	},
+  #endif
+
+To add further flexibility to the driver the Kconfig file enables to
+include support for selected SoCs in the compiled driver. The Kconfig
+entry for Exynos 4210 is following::
+
+  config PHY_EXYNOS4210_USB2
+	bool "Support for Exynos 4210"
+	depends on PHY_SAMSUNG_USB2
+	depends on CPU_EXYNOS4210
+	help
+	  Enable USB PHY support for Exynos 4210. This option requires that
+	  Samsung USB 2.0 PHY driver is enabled and means that support for this
+	  particular SoC is compiled in the driver. In case of Exynos 4210 four
+	  phys are available - device, host, HSCI0 and HSCI1.
+
+The newly created file that supports the new SoC has to be also added to the
+Makefile. In case of Exynos 4210 the added line is following::
+
+  obj-$(CONFIG_PHY_EXYNOS4210_USB2)       += phy-exynos4210-usb2.o
+
+After completing these steps the support for the new SoC should be ready.
diff --git a/Documentation/phy/samsung-usb2.txt b/Documentation/phy/samsung-usb2.txt
deleted file mode 100644
index ed12d437189d..000000000000
--- a/Documentation/phy/samsung-usb2.txt
+++ /dev/null
@@ -1,135 +0,0 @@
-.------------------------------------------------------------------------------+
-|			Samsung USB 2.0 PHY adaptation layer		       |
-+-----------------------------------------------------------------------------+'
-
-| 1. Description
-+----------------
-
-The architecture of the USB 2.0 PHY module in Samsung SoCs is similar
-among many SoCs. In spite of the similarities it proved difficult to
-create a one driver that would fit all these PHY controllers. Often
-the differences were minor and were found in particular bits of the
-registers of the PHY. In some rare cases the order of register writes or
-the PHY powering up process had to be altered. This adaptation layer is
-a compromise between having separate drivers and having a single driver
-with added support for many special cases.
-
-| 2. Files description
-+----------------------
-
-- phy-samsung-usb2.c
-   This is the main file of the adaptation layer. This file contains
-   the probe function and provides two callbacks to the Generic PHY
-   Framework. This two callbacks are used to power on and power off the
-   phy. They carry out the common work that has to be done on all version
-   of the PHY module. Depending on which SoC was chosen they execute SoC
-   specific callbacks. The specific SoC version is selected by choosing
-   the appropriate compatible string. In addition, this file contains
-   struct of_device_id definitions for particular SoCs.
-
-- phy-samsung-usb2.h
-   This is the include file. It declares the structures used by this
-   driver. In addition it should contain extern declarations for
-   structures that describe particular SoCs.
-
-| 3. Supporting SoCs
-+--------------------
-
-To support a new SoC a new file should be added to the drivers/phy
-directory. Each SoC's configuration is stored in an instance of the
-struct samsung_usb2_phy_config.
-
-struct samsung_usb2_phy_config {
-	const struct samsung_usb2_common_phy *phys;
-	int (*rate_to_clk)(unsigned long, u32 *);
-	unsigned int num_phys;
-	bool has_mode_switch;
-};
-
-The num_phys is the number of phys handled by the driver. *phys is an
-array that contains the configuration for each phy. The has_mode_switch
-property is a boolean flag that determines whether the SoC has USB host
-and device on a single pair of pins. If so, a special register has to
-be modified to change the internal routing of these pins between a USB
-device or host module.
-
-For example the configuration for Exynos 4210 is following:
-
-const struct samsung_usb2_phy_config exynos4210_usb2_phy_config = {
-	.has_mode_switch        = 0,
-	.num_phys		= EXYNOS4210_NUM_PHYS,
-	.phys			= exynos4210_phys,
-	.rate_to_clk		= exynos4210_rate_to_clk,
-}
-
-- int (*rate_to_clk)(unsigned long, u32 *)
-	The rate_to_clk callback is to convert the rate of the clock
-	used as the reference clock for the PHY module to the value
-	that should be written in the hardware register.
-
-The exynos4210_phys configuration array is as follows:
-
-static const struct samsung_usb2_common_phy exynos4210_phys[] = {
-	{
-		.label		= "device",
-		.id		= EXYNOS4210_DEVICE,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{
-		.label		= "host",
-		.id		= EXYNOS4210_HOST,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{
-		.label		= "hsic0",
-		.id		= EXYNOS4210_HSIC0,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{
-		.label		= "hsic1",
-		.id		= EXYNOS4210_HSIC1,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{},
-};
-
-- int (*power_on)(struct samsung_usb2_phy_instance *);
-- int (*power_off)(struct samsung_usb2_phy_instance *);
-	These two callbacks are used to power on and power off the phy
-	by modifying appropriate registers.
-
-Final change to the driver is adding appropriate compatible value to the
-phy-samsung-usb2.c file. In case of Exynos 4210 the following lines were
-added to the struct of_device_id samsung_usb2_phy_of_match[] array:
-
-#ifdef CONFIG_PHY_EXYNOS4210_USB2
-	{
-		.compatible = "samsung,exynos4210-usb2-phy",
-		.data = &exynos4210_usb2_phy_config,
-	},
-#endif
-
-To add further flexibility to the driver the Kconfig file enables to
-include support for selected SoCs in the compiled driver. The Kconfig
-entry for Exynos 4210 is following:
-
-config PHY_EXYNOS4210_USB2
-	bool "Support for Exynos 4210"
-	depends on PHY_SAMSUNG_USB2
-	depends on CPU_EXYNOS4210
-	help
-	  Enable USB PHY support for Exynos 4210. This option requires that
-	  Samsung USB 2.0 PHY driver is enabled and means that support for this
-	  particular SoC is compiled in the driver. In case of Exynos 4210 four
-	  phys are available - device, host, HSCI0 and HSCI1.
-
-The newly created file that supports the new SoC has to be also added to the
-Makefile. In case of Exynos 4210 the added line is following:
-
-obj-$(CONFIG_PHY_EXYNOS4210_USB2)       += phy-exynos4210-usb2.o
-
-After completing these steps the support for the new SoC should be ready.
diff --git a/MAINTAINERS b/MAINTAINERS
index 96c85695b3d4..2a2d74e5d670 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14083,7 +14083,7 @@ M:	Sylwester Nawrocki <s.nawrocki@samsung.com>
 L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	Documentation/devicetree/bindings/phy/samsung-phy.txt
-F:	Documentation/phy/samsung-usb2.txt
+F:	Documentation/phy/samsung-usb2.rst
 F:	drivers/phy/samsung/phy-exynos4210-usb2.c
 F:	drivers/phy/samsung/phy-exynos4x12-usb2.c
 F:	drivers/phy/samsung/phy-exynos5250-usb2.c
-- 
cgit v1.2.3-55-g7522


From eaf5211d8c00060a3b41a031a762c906d3603098 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 15 Apr 2019 23:00:35 -0300
Subject: docs: rbtree.txt: fix Sphinx build warnings

Ths file is already at ReST format. Yet, some recent changes
made it to produce a few warnings when building it with
Sphinx.

Those are trivially fixed by marking some literal blocks.

Fix them before adding it to the docs building system.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/rbtree.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index c42a21b99046..523d54b60087 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -204,21 +204,21 @@ potentially expensive tree iterations. This is done at negligible runtime
 overhead for maintanence; albeit larger memory footprint.
 
 Similar to the rb_root structure, cached rbtrees are initialized to be
-empty via:
+empty via::
 
   struct rb_root_cached mytree = RB_ROOT_CACHED;
 
 Cached rbtree is simply a regular rb_root with an extra pointer to cache the
 leftmost node. This allows rb_root_cached to exist wherever rb_root does,
 which permits augmented trees to be supported as well as only a few extra
-interfaces:
+interfaces::
 
   struct rb_node *rb_first_cached(struct rb_root_cached *tree);
   void rb_insert_color_cached(struct rb_node *, struct rb_root_cached *, bool);
   void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
 
 Both insert and erase calls have their respective counterpart of augmented
-trees:
+trees::
 
   void rb_insert_augmented_cached(struct rb_node *node, struct rb_root_cached *,
 				  bool, struct rb_augment_callbacks *);
-- 
cgit v1.2.3-55-g7522


From a36d053863a1b6cd6e79a632af01be014517f9ac Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 27 May 2019 15:59:13 -0300
Subject: docs: DMA-API-HOWTO.txt: fix an unmarked code block

When building with Sphinx, it would produce this warning:

    docs/Documentation/DMA-API-HOWTO.rst:222: WARNING: Definition list ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/DMA-API-HOWTO.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index cb712a02f59f..358d495456d1 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -212,7 +212,7 @@ The standard 64-bit addressing device would do something like this::
 
 If the device only supports 32-bit addressing for descriptors in the
 coherent allocations, but supports full 64-bits for streaming mappings
-it would look like this:
+it would look like this::
 
 	if (dma_set_mask(dev, DMA_BIT_MASK(64))) {
 		dev_warn(dev, "mydev: No suitable DMA available\n");
-- 
cgit v1.2.3-55-g7522


From c3123552aad3ffd7a35e16d4402231225165e343 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Wed, 17 Apr 2019 05:46:08 -0300
Subject: docs: accounting: convert to ReST

Rename the accounting documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/accounting/cgroupstats.rst      |  31 ++++
 Documentation/accounting/cgroupstats.txt      |  27 ----
 Documentation/accounting/delay-accounting.rst | 126 ++++++++++++++++
 Documentation/accounting/delay-accounting.txt | 117 ---------------
 Documentation/accounting/index.rst            |  14 ++
 Documentation/accounting/psi.rst              | 182 +++++++++++++++++++++++
 Documentation/accounting/psi.txt              | 180 -----------------------
 Documentation/accounting/taskstats-struct.rst | 199 ++++++++++++++++++++++++++
 Documentation/accounting/taskstats-struct.txt | 180 -----------------------
 Documentation/accounting/taskstats.rst        | 180 +++++++++++++++++++++++
 Documentation/accounting/taskstats.txt        | 181 -----------------------
 Documentation/admin-guide/cgroup-v2.rst       |   6 +-
 init/Kconfig                                  |   2 +-
 13 files changed, 736 insertions(+), 689 deletions(-)
 create mode 100644 Documentation/accounting/cgroupstats.rst
 delete mode 100644 Documentation/accounting/cgroupstats.txt
 create mode 100644 Documentation/accounting/delay-accounting.rst
 delete mode 100644 Documentation/accounting/delay-accounting.txt
 create mode 100644 Documentation/accounting/index.rst
 create mode 100644 Documentation/accounting/psi.rst
 delete mode 100644 Documentation/accounting/psi.txt
 create mode 100644 Documentation/accounting/taskstats-struct.rst
 delete mode 100644 Documentation/accounting/taskstats-struct.txt
 create mode 100644 Documentation/accounting/taskstats.rst
 delete mode 100644 Documentation/accounting/taskstats.txt

diff --git a/Documentation/accounting/cgroupstats.rst b/Documentation/accounting/cgroupstats.rst
new file mode 100644
index 000000000000..b9afc48f4ea2
--- /dev/null
+++ b/Documentation/accounting/cgroupstats.rst
@@ -0,0 +1,31 @@
+==================
+Control Groupstats
+==================
+
+Control Groupstats is inspired by the discussion at
+http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics as
+suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.
+
+Per cgroup statistics infrastructure re-uses code from the taskstats
+interface. A new set of cgroup operations are registered with commands
+and attributes specific to cgroups. It should be very easy to
+extend per cgroup statistics, by adding members to the cgroupstats
+structure.
+
+The current model for cgroupstats is a pull, a push model (to post
+statistics on interesting events), should be very easy to add. Currently
+user space requests for statistics by passing the cgroup path.
+Statistics about the state of all the tasks in the cgroup is returned to
+user space.
+
+NOTE: We currently rely on delay accounting for extracting information
+about tasks blocked on I/O. If CONFIG_TASK_DELAY_ACCT is disabled, this
+information will not be available.
+
+To extract cgroup statistics a utility very similar to getdelays.c
+has been developed, the sample output of the utility is shown below::
+
+  ~/balbir/cgroupstats # ./getdelays  -C "/sys/fs/cgroup/a"
+  sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
+  ~/balbir/cgroupstats # ./getdelays  -C "/sys/fs/cgroup"
+  sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2
diff --git a/Documentation/accounting/cgroupstats.txt b/Documentation/accounting/cgroupstats.txt
deleted file mode 100644
index d16a9849e60e..000000000000
--- a/Documentation/accounting/cgroupstats.txt
+++ /dev/null
@@ -1,27 +0,0 @@
-Control Groupstats is inspired by the discussion at
-http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics as
-suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263.
-
-Per cgroup statistics infrastructure re-uses code from the taskstats
-interface. A new set of cgroup operations are registered with commands
-and attributes specific to cgroups. It should be very easy to
-extend per cgroup statistics, by adding members to the cgroupstats
-structure.
-
-The current model for cgroupstats is a pull, a push model (to post
-statistics on interesting events), should be very easy to add. Currently
-user space requests for statistics by passing the cgroup path.
-Statistics about the state of all the tasks in the cgroup is returned to
-user space.
-
-NOTE: We currently rely on delay accounting for extracting information
-about tasks blocked on I/O. If CONFIG_TASK_DELAY_ACCT is disabled, this
-information will not be available.
-
-To extract cgroup statistics a utility very similar to getdelays.c
-has been developed, the sample output of the utility is shown below
-
-~/balbir/cgroupstats # ./getdelays  -C "/sys/fs/cgroup/a"
-sleeping 1, blocked 0, running 1, stopped 0, uninterruptible 0
-~/balbir/cgroupstats # ./getdelays  -C "/sys/fs/cgroup"
-sleeping 155, blocked 0, running 1, stopped 0, uninterruptible 2
diff --git a/Documentation/accounting/delay-accounting.rst b/Documentation/accounting/delay-accounting.rst
new file mode 100644
index 000000000000..7cc7f5852da0
--- /dev/null
+++ b/Documentation/accounting/delay-accounting.rst
@@ -0,0 +1,126 @@
+================
+Delay accounting
+================
+
+Tasks encounter delays in execution when they wait
+for some kernel resource to become available e.g. a
+runnable task may wait for a free CPU to run on.
+
+The per-task delay accounting functionality measures
+the delays experienced by a task while
+
+a) waiting for a CPU (while being runnable)
+b) completion of synchronous block I/O initiated by the task
+c) swapping in pages
+d) memory reclaim
+
+and makes these statistics available to userspace through
+the taskstats interface.
+
+Such delays provide feedback for setting a task's cpu priority,
+io priority and rss limit values appropriately. Long delays for
+important tasks could be a trigger for raising its corresponding priority.
+
+The functionality, through its use of the taskstats interface, also provides
+delay statistics aggregated for all tasks (or threads) belonging to a
+thread group (corresponding to a traditional Unix process). This is a commonly
+needed aggregation that is more efficiently done by the kernel.
+
+Userspace utilities, particularly resource management applications, can also
+aggregate delay statistics into arbitrary groups. To enable this, delay
+statistics of a task are available both during its lifetime as well as on its
+exit, ensuring continuous and complete monitoring can be done.
+
+
+Interface
+---------
+
+Delay accounting uses the taskstats interface which is described
+in detail in a separate document in this directory. Taskstats returns a
+generic data structure to userspace corresponding to per-pid and per-tgid
+statistics. The delay accounting functionality populates specific fields of
+this structure. See
+
+     include/linux/taskstats.h
+
+for a description of the fields pertaining to delay accounting.
+It will generally be in the form of counters returning the cumulative
+delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
+
+Taking the difference of two successive readings of a given
+counter (say cpu_delay_total) for a task will give the delay
+experienced by the task waiting for the corresponding resource
+in that interval.
+
+When a task exits, records containing the per-task statistics
+are sent to userspace without requiring a command. If it is the last exiting
+task of a thread group, the per-tgid statistics are also sent. More details
+are given in the taskstats interface description.
+
+The getdelays.c userspace utility in tools/accounting directory allows simple
+commands to be run and the corresponding delay statistics to be displayed. It
+also serves as an example of using the taskstats interface.
+
+Usage
+-----
+
+Compile the kernel with::
+
+	CONFIG_TASK_DELAY_ACCT=y
+	CONFIG_TASKSTATS=y
+
+Delay accounting is enabled by default at boot up.
+To disable, add::
+
+   nodelayacct
+
+to the kernel boot options. The rest of the instructions
+below assume this has not been done.
+
+After the system has booted up, use a utility
+similar to  getdelays.c to access the delays
+seen by a given task or a task group (tgid).
+The utility also allows a given command to be
+executed and the corresponding delays to be
+seen.
+
+General format of the getdelays command::
+
+	getdelays [-t tgid] [-p pid] [-c cmd...]
+
+
+Get delays, since system boot, for pid 10::
+
+	# ./getdelays -p 10
+	(output similar to next case)
+
+Get sum of delays, since system boot, for all pids with tgid 5::
+
+	# ./getdelays -t 5
+
+
+	CPU	count	real total	virtual total	delay total
+		7876	92005750	100000000	24001500
+	IO	count	delay total
+		0	0
+	SWAP	count	delay total
+		0	0
+	RECLAIM	count	delay total
+		0	0
+
+Get delays seen in executing a given simple command::
+
+  # ./getdelays -c ls /
+
+  bin   data1  data3  data5  dev  home  media  opt   root  srv        sys  usr
+  boot  data2  data4  data6  etc  lib   mnt    proc  sbin  subdomain  tmp  var
+
+
+  CPU	count	real total	virtual total	delay total
+	6	4000250		4000000		0
+  IO	count	delay total
+	0	0
+  SWAP	count	delay total
+	0	0
+  RECLAIM	count	delay total
+	0	0
diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt
deleted file mode 100644
index 042ea59b5853..000000000000
--- a/Documentation/accounting/delay-accounting.txt
+++ /dev/null
@@ -1,117 +0,0 @@
-Delay accounting
-----------------
-
-Tasks encounter delays in execution when they wait
-for some kernel resource to become available e.g. a
-runnable task may wait for a free CPU to run on.
-
-The per-task delay accounting functionality measures
-the delays experienced by a task while
-
-a) waiting for a CPU (while being runnable)
-b) completion of synchronous block I/O initiated by the task
-c) swapping in pages
-d) memory reclaim
-
-and makes these statistics available to userspace through
-the taskstats interface.
-
-Such delays provide feedback for setting a task's cpu priority,
-io priority and rss limit values appropriately. Long delays for
-important tasks could be a trigger for raising its corresponding priority.
-
-The functionality, through its use of the taskstats interface, also provides
-delay statistics aggregated for all tasks (or threads) belonging to a
-thread group (corresponding to a traditional Unix process). This is a commonly
-needed aggregation that is more efficiently done by the kernel.
-
-Userspace utilities, particularly resource management applications, can also
-aggregate delay statistics into arbitrary groups. To enable this, delay
-statistics of a task are available both during its lifetime as well as on its
-exit, ensuring continuous and complete monitoring can be done.
-
-
-Interface
----------
-
-Delay accounting uses the taskstats interface which is described
-in detail in a separate document in this directory. Taskstats returns a
-generic data structure to userspace corresponding to per-pid and per-tgid
-statistics. The delay accounting functionality populates specific fields of
-this structure. See
-     include/linux/taskstats.h
-for a description of the fields pertaining to delay accounting.
-It will generally be in the form of counters returning the cumulative
-delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
-
-Taking the difference of two successive readings of a given
-counter (say cpu_delay_total) for a task will give the delay
-experienced by the task waiting for the corresponding resource
-in that interval.
-
-When a task exits, records containing the per-task statistics
-are sent to userspace without requiring a command. If it is the last exiting
-task of a thread group, the per-tgid statistics are also sent. More details
-are given in the taskstats interface description.
-
-The getdelays.c userspace utility in tools/accounting directory allows simple
-commands to be run and the corresponding delay statistics to be displayed. It
-also serves as an example of using the taskstats interface.
-
-Usage
------
-
-Compile the kernel with
-	CONFIG_TASK_DELAY_ACCT=y
-	CONFIG_TASKSTATS=y
-
-Delay accounting is enabled by default at boot up.
-To disable, add
-   nodelayacct
-to the kernel boot options. The rest of the instructions
-below assume this has not been done.
-
-After the system has booted up, use a utility
-similar to  getdelays.c to access the delays
-seen by a given task or a task group (tgid).
-The utility also allows a given command to be
-executed and the corresponding delays to be
-seen.
-
-General format of the getdelays command
-
-getdelays [-t tgid] [-p pid] [-c cmd...]
-
-
-Get delays, since system boot, for pid 10
-# ./getdelays -p 10
-(output similar to next case)
-
-Get sum of delays, since system boot, for all pids with tgid 5
-# ./getdelays -t 5
-
-
-CPU	count	real total	virtual total	delay total
-	7876	92005750	100000000	24001500
-IO	count	delay total
-	0	0
-SWAP	count	delay total
-	0	0
-RECLAIM	count	delay total
-	0	0
-
-Get delays seen in executing a given simple command
-# ./getdelays -c ls /
-
-bin   data1  data3  data5  dev  home  media  opt   root  srv        sys  usr
-boot  data2  data4  data6  etc  lib   mnt    proc  sbin  subdomain  tmp  var
-
-
-CPU	count	real total	virtual total	delay total
-	6	4000250		4000000		0
-IO	count	delay total
-	0	0
-SWAP	count	delay total
-	0	0
-RECLAIM	count	delay total
-	0	0
diff --git a/Documentation/accounting/index.rst b/Documentation/accounting/index.rst
new file mode 100644
index 000000000000..e1f6284b5ff3
--- /dev/null
+++ b/Documentation/accounting/index.rst
@@ -0,0 +1,14 @@
+:orphan:
+
+==========
+Accounting
+==========
+
+.. toctree::
+   :maxdepth: 1
+
+   cgroupstats
+   delay-accounting
+   psi
+   taskstats
+   taskstats-struct
diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst
new file mode 100644
index 000000000000..621111ce5740
--- /dev/null
+++ b/Documentation/accounting/psi.rst
@@ -0,0 +1,182 @@
+================================
+PSI - Pressure Stall Information
+================================
+
+:Date: April, 2018
+:Author: Johannes Weiner <hannes@cmpxchg.org>
+
+When CPU, memory or IO devices are contended, workloads experience
+latency spikes, throughput losses, and run the risk of OOM kills.
+
+Without an accurate measure of such contention, users are forced to
+either play it safe and under-utilize their hardware resources, or
+roll the dice and frequently suffer the disruptions resulting from
+excessive overcommit.
+
+The psi feature identifies and quantifies the disruptions caused by
+such resource crunches and the time impact it has on complex workloads
+or even entire systems.
+
+Having an accurate measure of productivity losses caused by resource
+scarcity aids users in sizing workloads to hardware--or provisioning
+hardware according to workload demand.
+
+As psi aggregates this information in realtime, systems can be managed
+dynamically using techniques such as load shedding, migrating jobs to
+other systems or data centers, or strategically pausing or killing low
+priority or restartable batch jobs.
+
+This allows maximizing hardware utilization without sacrificing
+workload health or risking major disruptions such as OOM kills.
+
+Pressure interface
+==================
+
+Pressure information for each resource is exported through the
+respective file in /proc/pressure/ -- cpu, memory, and io.
+
+The format for CPU is as such::
+
+	some avg10=0.00 avg60=0.00 avg300=0.00 total=0
+
+and for memory and IO::
+
+	some avg10=0.00 avg60=0.00 avg300=0.00 total=0
+	full avg10=0.00 avg60=0.00 avg300=0.00 total=0
+
+The "some" line indicates the share of time in which at least some
+tasks are stalled on a given resource.
+
+The "full" line indicates the share of time in which all non-idle
+tasks are stalled on a given resource simultaneously. In this state
+actual CPU cycles are going to waste, and a workload that spends
+extended time in this state is considered to be thrashing. This has
+severe impact on performance, and it's useful to distinguish this
+situation from a state where some tasks are stalled but the CPU is
+still doing productive work. As such, time spent in this subset of the
+stall state is tracked separately and exported in the "full" averages.
+
+The ratios (in %) are tracked as recent trends over ten, sixty, and
+three hundred second windows, which gives insight into short term events
+as well as medium and long term trends. The total absolute stall time
+(in us) is tracked and exported as well, to allow detection of latency
+spikes which wouldn't necessarily make a dent in the time averages,
+or to average trends over custom time frames.
+
+Monitoring for pressure thresholds
+==================================
+
+Users can register triggers and use poll() to be woken up when resource
+pressure exceeds certain thresholds.
+
+A trigger describes the maximum cumulative stall time over a specific
+time window, e.g. 100ms of total stall time within any 500ms window to
+generate a wakeup event.
+
+To register a trigger user has to open psi interface file under
+/proc/pressure/ representing the resource to be monitored and write the
+desired threshold and time window. The open file descriptor should be
+used to wait for trigger events using select(), poll() or epoll().
+The following format is used::
+
+	<some|full> <stall amount in us> <time window in us>
+
+For example writing "some 150000 1000000" into /proc/pressure/memory
+would add 150ms threshold for partial memory stall measured within
+1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
+would add 50ms threshold for full io stall measured within 1sec time window.
+
+Triggers can be set on more than one psi metric and more than one trigger
+for the same psi metric can be specified. However for each trigger a separate
+file descriptor is required to be able to poll it separately from others,
+therefore for each trigger a separate open() syscall should be made even
+when opening the same psi interface file.
+
+Monitors activate only when system enters stall state for the monitored
+psi metric and deactivates upon exit from the stall state. While system is
+in the stall state psi signal growth is monitored at a rate of 10 times per
+tracking window.
+
+The kernel accepts window sizes ranging from 500ms to 10s, therefore min
+monitoring update interval is 50ms and max is 1s. Min limit is set to
+prevent overly frequent polling. Max limit is chosen as a high enough number
+after which monitors are most likely not needed and psi averages can be used
+instead.
+
+When activated, psi monitor stays active for at least the duration of one
+tracking window to avoid repeated activations/deactivations when system is
+bouncing in and out of the stall state.
+
+Notifications to the userspace are rate-limited to one per tracking window.
+
+The trigger will de-register when the file descriptor used to define the
+trigger  is closed.
+
+Userspace monitor usage example
+===============================
+
+::
+
+  #include <errno.h>
+  #include <fcntl.h>
+  #include <stdio.h>
+  #include <poll.h>
+  #include <string.h>
+  #include <unistd.h>
+
+  /*
+   * Monitor memory partial stall with 1s tracking window size
+   * and 150ms threshold.
+   */
+  int main() {
+	const char trig[] = "some 150000 1000000";
+	struct pollfd fds;
+	int n;
+
+	fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
+	if (fds.fd < 0) {
+		printf("/proc/pressure/memory open error: %s\n",
+			strerror(errno));
+		return 1;
+	}
+	fds.events = POLLPRI;
+
+	if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
+		printf("/proc/pressure/memory write error: %s\n",
+			strerror(errno));
+		return 1;
+	}
+
+	printf("waiting for events...\n");
+	while (1) {
+		n = poll(&fds, 1, -1);
+		if (n < 0) {
+			printf("poll error: %s\n", strerror(errno));
+			return 1;
+		}
+		if (fds.revents & POLLERR) {
+			printf("got POLLERR, event source is gone\n");
+			return 0;
+		}
+		if (fds.revents & POLLPRI) {
+			printf("event triggered!\n");
+		} else {
+			printf("unknown event received: 0x%x\n", fds.revents);
+			return 1;
+		}
+	}
+
+	return 0;
+  }
+
+Cgroup2 interface
+=================
+
+In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
+mounted, pressure stall information is also tracked for tasks grouped
+into cgroups. Each subdirectory in the cgroupfs mountpoint contains
+cpu.pressure, memory.pressure, and io.pressure files; the format is
+the same as the /proc/pressure/ files.
+
+Per-cgroup psi monitors can be specified and used the same way as
+system-wide ones.
diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt
deleted file mode 100644
index 5cbe5659e3b7..000000000000
--- a/Documentation/accounting/psi.txt
+++ /dev/null
@@ -1,180 +0,0 @@
-================================
-PSI - Pressure Stall Information
-================================
-
-:Date: April, 2018
-:Author: Johannes Weiner <hannes@cmpxchg.org>
-
-When CPU, memory or IO devices are contended, workloads experience
-latency spikes, throughput losses, and run the risk of OOM kills.
-
-Without an accurate measure of such contention, users are forced to
-either play it safe and under-utilize their hardware resources, or
-roll the dice and frequently suffer the disruptions resulting from
-excessive overcommit.
-
-The psi feature identifies and quantifies the disruptions caused by
-such resource crunches and the time impact it has on complex workloads
-or even entire systems.
-
-Having an accurate measure of productivity losses caused by resource
-scarcity aids users in sizing workloads to hardware--or provisioning
-hardware according to workload demand.
-
-As psi aggregates this information in realtime, systems can be managed
-dynamically using techniques such as load shedding, migrating jobs to
-other systems or data centers, or strategically pausing or killing low
-priority or restartable batch jobs.
-
-This allows maximizing hardware utilization without sacrificing
-workload health or risking major disruptions such as OOM kills.
-
-Pressure interface
-==================
-
-Pressure information for each resource is exported through the
-respective file in /proc/pressure/ -- cpu, memory, and io.
-
-The format for CPU is as such:
-
-some avg10=0.00 avg60=0.00 avg300=0.00 total=0
-
-and for memory and IO:
-
-some avg10=0.00 avg60=0.00 avg300=0.00 total=0
-full avg10=0.00 avg60=0.00 avg300=0.00 total=0
-
-The "some" line indicates the share of time in which at least some
-tasks are stalled on a given resource.
-
-The "full" line indicates the share of time in which all non-idle
-tasks are stalled on a given resource simultaneously. In this state
-actual CPU cycles are going to waste, and a workload that spends
-extended time in this state is considered to be thrashing. This has
-severe impact on performance, and it's useful to distinguish this
-situation from a state where some tasks are stalled but the CPU is
-still doing productive work. As such, time spent in this subset of the
-stall state is tracked separately and exported in the "full" averages.
-
-The ratios (in %) are tracked as recent trends over ten, sixty, and
-three hundred second windows, which gives insight into short term events
-as well as medium and long term trends. The total absolute stall time
-(in us) is tracked and exported as well, to allow detection of latency
-spikes which wouldn't necessarily make a dent in the time averages,
-or to average trends over custom time frames.
-
-Monitoring for pressure thresholds
-==================================
-
-Users can register triggers and use poll() to be woken up when resource
-pressure exceeds certain thresholds.
-
-A trigger describes the maximum cumulative stall time over a specific
-time window, e.g. 100ms of total stall time within any 500ms window to
-generate a wakeup event.
-
-To register a trigger user has to open psi interface file under
-/proc/pressure/ representing the resource to be monitored and write the
-desired threshold and time window. The open file descriptor should be
-used to wait for trigger events using select(), poll() or epoll().
-The following format is used:
-
-<some|full> <stall amount in us> <time window in us>
-
-For example writing "some 150000 1000000" into /proc/pressure/memory
-would add 150ms threshold for partial memory stall measured within
-1sec time window. Writing "full 50000 1000000" into /proc/pressure/io
-would add 50ms threshold for full io stall measured within 1sec time window.
-
-Triggers can be set on more than one psi metric and more than one trigger
-for the same psi metric can be specified. However for each trigger a separate
-file descriptor is required to be able to poll it separately from others,
-therefore for each trigger a separate open() syscall should be made even
-when opening the same psi interface file.
-
-Monitors activate only when system enters stall state for the monitored
-psi metric and deactivates upon exit from the stall state. While system is
-in the stall state psi signal growth is monitored at a rate of 10 times per
-tracking window.
-
-The kernel accepts window sizes ranging from 500ms to 10s, therefore min
-monitoring update interval is 50ms and max is 1s. Min limit is set to
-prevent overly frequent polling. Max limit is chosen as a high enough number
-after which monitors are most likely not needed and psi averages can be used
-instead.
-
-When activated, psi monitor stays active for at least the duration of one
-tracking window to avoid repeated activations/deactivations when system is
-bouncing in and out of the stall state.
-
-Notifications to the userspace are rate-limited to one per tracking window.
-
-The trigger will de-register when the file descriptor used to define the
-trigger  is closed.
-
-Userspace monitor usage example
-===============================
-
-#include <errno.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <poll.h>
-#include <string.h>
-#include <unistd.h>
-
-/*
- * Monitor memory partial stall with 1s tracking window size
- * and 150ms threshold.
- */
-int main() {
-	const char trig[] = "some 150000 1000000";
-	struct pollfd fds;
-	int n;
-
-	fds.fd = open("/proc/pressure/memory", O_RDWR | O_NONBLOCK);
-	if (fds.fd < 0) {
-		printf("/proc/pressure/memory open error: %s\n",
-			strerror(errno));
-		return 1;
-	}
-	fds.events = POLLPRI;
-
-	if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
-		printf("/proc/pressure/memory write error: %s\n",
-			strerror(errno));
-		return 1;
-	}
-
-	printf("waiting for events...\n");
-	while (1) {
-		n = poll(&fds, 1, -1);
-		if (n < 0) {
-			printf("poll error: %s\n", strerror(errno));
-			return 1;
-		}
-		if (fds.revents & POLLERR) {
-			printf("got POLLERR, event source is gone\n");
-			return 0;
-		}
-		if (fds.revents & POLLPRI) {
-			printf("event triggered!\n");
-		} else {
-			printf("unknown event received: 0x%x\n", fds.revents);
-			return 1;
-		}
-	}
-
-	return 0;
-}
-
-Cgroup2 interface
-=================
-
-In a system with a CONFIG_CGROUP=y kernel and the cgroup2 filesystem
-mounted, pressure stall information is also tracked for tasks grouped
-into cgroups. Each subdirectory in the cgroupfs mountpoint contains
-cpu.pressure, memory.pressure, and io.pressure files; the format is
-the same as the /proc/pressure/ files.
-
-Per-cgroup psi monitors can be specified and used the same way as
-system-wide ones.
diff --git a/Documentation/accounting/taskstats-struct.rst b/Documentation/accounting/taskstats-struct.rst
new file mode 100644
index 000000000000..ca90fd489c9a
--- /dev/null
+++ b/Documentation/accounting/taskstats-struct.rst
@@ -0,0 +1,199 @@
+====================
+The struct taskstats
+====================
+
+This document contains an explanation of the struct taskstats fields.
+
+There are three different groups of fields in the struct taskstats:
+
+1) Common and basic accounting fields
+    If CONFIG_TASKSTATS is set, the taskstats interface is enabled and
+    the common fields and basic accounting fields are collected for
+    delivery at do_exit() of a task.
+2) Delay accounting fields
+    These fields are placed between::
+
+	/* Delay accounting fields start */
+
+    and::
+
+	/* Delay accounting fields end */
+
+    Their values are collected if CONFIG_TASK_DELAY_ACCT is set.
+3) Extended accounting fields
+    These fields are placed between::
+
+	/* Extended accounting fields start */
+
+    and::
+
+	/* Extended accounting fields end */
+
+    Their values are collected if CONFIG_TASK_XACCT is set.
+
+4) Per-task and per-thread context switch count statistics
+
+5) Time accounting for SMT machines
+
+6) Extended delay accounting fields for memory reclaim
+
+Future extension should add fields to the end of the taskstats struct, and
+should not change the relative position of each field within the struct.
+
+::
+
+  struct taskstats {
+
+1) Common and basic accounting fields::
+
+	/* The version number of this struct. This field is always set to
+	 * TAKSTATS_VERSION, which is defined in <linux/taskstats.h>.
+	 * Each time the struct is changed, the value should be incremented.
+	 */
+	__u16	version;
+
+	/* The exit code of a task. */
+	__u32	ac_exitcode;		/* Exit status */
+
+	/* The accounting flags of a task as defined in <linux/acct.h>
+	 * Defined values are AFORK, ASU, ACOMPAT, ACORE, and AXSIG.
+	 */
+	__u8	ac_flag;		/* Record flags */
+
+	/* The value of task_nice() of a task. */
+	__u8	ac_nice;		/* task_nice */
+
+	/* The name of the command that started this task. */
+	char	ac_comm[TS_COMM_LEN];	/* Command name */
+
+	/* The scheduling discipline as set in task->policy field. */
+	__u8	ac_sched;		/* Scheduling discipline */
+
+	__u8	ac_pad[3];
+	__u32	ac_uid;			/* User ID */
+	__u32	ac_gid;			/* Group ID */
+	__u32	ac_pid;			/* Process ID */
+	__u32	ac_ppid;		/* Parent process ID */
+
+	/* The time when a task begins, in [secs] since 1970. */
+	__u32	ac_btime;		/* Begin time [sec since 1970] */
+
+	/* The elapsed time of a task, in [usec]. */
+	__u64	ac_etime;		/* Elapsed time [usec] */
+
+	/* The user CPU time of a task, in [usec]. */
+	__u64	ac_utime;		/* User CPU time [usec] */
+
+	/* The system CPU time of a task, in [usec]. */
+	__u64	ac_stime;		/* System CPU time [usec] */
+
+	/* The minor page fault count of a task, as set in task->min_flt. */
+	__u64	ac_minflt;		/* Minor Page Fault Count */
+
+	/* The major page fault count of a task, as set in task->maj_flt. */
+	__u64	ac_majflt;		/* Major Page Fault Count */
+
+
+2) Delay accounting fields::
+
+	/* Delay accounting fields start
+	 *
+	 * All values, until the comment "Delay accounting fields end" are
+	 * available only if delay accounting is enabled, even though the last
+	 * few fields are not delays
+	 *
+	 * xxx_count is the number of delay values recorded
+	 * xxx_delay_total is the corresponding cumulative delay in nanoseconds
+	 *
+	 * xxx_delay_total wraps around to zero on overflow
+	 * xxx_count incremented regardless of overflow
+	 */
+
+	/* Delay waiting for cpu, while runnable
+	 * count, delay_total NOT updated atomically
+	 */
+	__u64	cpu_count;
+	__u64	cpu_delay_total;
+
+	/* Following four fields atomically updated using task->delays->lock */
+
+	/* Delay waiting for synchronous block I/O to complete
+	 * does not account for delays in I/O submission
+	 */
+	__u64	blkio_count;
+	__u64	blkio_delay_total;
+
+	/* Delay waiting for page fault I/O (swap in only) */
+	__u64	swapin_count;
+	__u64	swapin_delay_total;
+
+	/* cpu "wall-clock" running time
+	 * On some architectures, value will adjust for cpu time stolen
+	 * from the kernel in involuntary waits due to virtualization.
+	 * Value is cumulative, in nanoseconds, without a corresponding count
+	 * and wraps around to zero silently on overflow
+	 */
+	__u64	cpu_run_real_total;
+
+	/* cpu "virtual" running time
+	 * Uses time intervals seen by the kernel i.e. no adjustment
+	 * for kernel's involuntary waits due to virtualization.
+	 * Value is cumulative, in nanoseconds, without a corresponding count
+	 * and wraps around to zero silently on overflow
+	 */
+	__u64	cpu_run_virtual_total;
+	/* Delay accounting fields end */
+	/* version 1 ends here */
+
+
+3) Extended accounting fields::
+
+	/* Extended accounting fields start */
+
+	/* Accumulated RSS usage in duration of a task, in MBytes-usecs.
+	 * The current rss usage is added to this counter every time
+	 * a tick is charged to a task's system time. So, at the end we
+	 * will have memory usage multiplied by system time. Thus an
+	 * average usage per system time unit can be calculated.
+	 */
+	__u64	coremem;		/* accumulated RSS usage in MB-usec */
+
+	/* Accumulated virtual memory usage in duration of a task.
+	 * Same as acct_rss_mem1 above except that we keep track of VM usage.
+	 */
+	__u64	virtmem;		/* accumulated VM usage in MB-usec */
+
+	/* High watermark of RSS usage in duration of a task, in KBytes. */
+	__u64	hiwater_rss;		/* High-watermark of RSS usage */
+
+	/* High watermark of VM  usage in duration of a task, in KBytes. */
+	__u64	hiwater_vm;		/* High-water virtual memory usage */
+
+	/* The following four fields are I/O statistics of a task. */
+	__u64	read_char;		/* bytes read */
+	__u64	write_char;		/* bytes written */
+	__u64	read_syscalls;		/* read syscalls */
+	__u64	write_syscalls;		/* write syscalls */
+
+	/* Extended accounting fields end */
+
+4) Per-task and per-thread statistics::
+
+	__u64	nvcsw;			/* Context voluntary switch counter */
+	__u64	nivcsw;			/* Context involuntary switch counter */
+
+5) Time accounting for SMT machines::
+
+	__u64	ac_utimescaled;		/* utime scaled on frequency etc */
+	__u64	ac_stimescaled;		/* stime scaled on frequency etc */
+	__u64	cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
+
+6) Extended delay accounting fields for memory reclaim::
+
+	/* Delay waiting for memory reclaim */
+	__u64	freepages_count;
+	__u64	freepages_delay_total;
+
+::
+
+  }
diff --git a/Documentation/accounting/taskstats-struct.txt b/Documentation/accounting/taskstats-struct.txt
deleted file mode 100644
index e7512c061c15..000000000000
--- a/Documentation/accounting/taskstats-struct.txt
+++ /dev/null
@@ -1,180 +0,0 @@
-The struct taskstats
---------------------
-
-This document contains an explanation of the struct taskstats fields.
-
-There are three different groups of fields in the struct taskstats:
-
-1) Common and basic accounting fields
-    If CONFIG_TASKSTATS is set, the taskstats interface is enabled and
-    the common fields and basic accounting fields are collected for
-    delivery at do_exit() of a task.
-2) Delay accounting fields
-    These fields are placed between
-    /* Delay accounting fields start */
-    and
-    /* Delay accounting fields end */
-    Their values are collected if CONFIG_TASK_DELAY_ACCT is set.
-3) Extended accounting fields
-    These fields are placed between
-    /* Extended accounting fields start */
-    and
-    /* Extended accounting fields end */
-    Their values are collected if CONFIG_TASK_XACCT is set.
-
-4) Per-task and per-thread context switch count statistics
-
-5) Time accounting for SMT machines
-
-6) Extended delay accounting fields for memory reclaim
-
-Future extension should add fields to the end of the taskstats struct, and
-should not change the relative position of each field within the struct.
-
-
-struct taskstats {
-
-1) Common and basic accounting fields:
-	/* The version number of this struct. This field is always set to
-	 * TAKSTATS_VERSION, which is defined in <linux/taskstats.h>.
-	 * Each time the struct is changed, the value should be incremented.
-	 */
-	__u16	version;
-
-  	/* The exit code of a task. */
-	__u32	ac_exitcode;		/* Exit status */
-
-  	/* The accounting flags of a task as defined in <linux/acct.h>
-	 * Defined values are AFORK, ASU, ACOMPAT, ACORE, and AXSIG.
-	 */
-	__u8	ac_flag;		/* Record flags */
-
-  	/* The value of task_nice() of a task. */
-	__u8	ac_nice;		/* task_nice */
-
-  	/* The name of the command that started this task. */
-	char	ac_comm[TS_COMM_LEN];	/* Command name */
-
-  	/* The scheduling discipline as set in task->policy field. */
-	__u8	ac_sched;		/* Scheduling discipline */
-
-	__u8	ac_pad[3];
-	__u32	ac_uid;			/* User ID */
-	__u32	ac_gid;			/* Group ID */
-	__u32	ac_pid;			/* Process ID */
-	__u32	ac_ppid;		/* Parent process ID */
-
-  	/* The time when a task begins, in [secs] since 1970. */
-	__u32	ac_btime;		/* Begin time [sec since 1970] */
-
-  	/* The elapsed time of a task, in [usec]. */
-	__u64	ac_etime;		/* Elapsed time [usec] */
-
-  	/* The user CPU time of a task, in [usec]. */
-	__u64	ac_utime;		/* User CPU time [usec] */
-
-  	/* The system CPU time of a task, in [usec]. */
-	__u64	ac_stime;		/* System CPU time [usec] */
-
-  	/* The minor page fault count of a task, as set in task->min_flt. */
-	__u64	ac_minflt;		/* Minor Page Fault Count */
-
-	/* The major page fault count of a task, as set in task->maj_flt. */
-	__u64	ac_majflt;		/* Major Page Fault Count */
-
-
-2) Delay accounting fields:
-	/* Delay accounting fields start
-	 *
-	 * All values, until the comment "Delay accounting fields end" are
-	 * available only if delay accounting is enabled, even though the last
-	 * few fields are not delays
-	 *
-	 * xxx_count is the number of delay values recorded
-	 * xxx_delay_total is the corresponding cumulative delay in nanoseconds
-	 *
-	 * xxx_delay_total wraps around to zero on overflow
-	 * xxx_count incremented regardless of overflow
-	 */
-
-	/* Delay waiting for cpu, while runnable
-	 * count, delay_total NOT updated atomically
-	 */
-	__u64	cpu_count;
-	__u64	cpu_delay_total;
-
-	/* Following four fields atomically updated using task->delays->lock */
-
-	/* Delay waiting for synchronous block I/O to complete
-	 * does not account for delays in I/O submission
-	 */
-	__u64	blkio_count;
-	__u64	blkio_delay_total;
-
-	/* Delay waiting for page fault I/O (swap in only) */
-	__u64	swapin_count;
-	__u64	swapin_delay_total;
-
-	/* cpu "wall-clock" running time
-	 * On some architectures, value will adjust for cpu time stolen
-	 * from the kernel in involuntary waits due to virtualization.
-	 * Value is cumulative, in nanoseconds, without a corresponding count
-	 * and wraps around to zero silently on overflow
-	 */
-	__u64	cpu_run_real_total;
-
-	/* cpu "virtual" running time
-	 * Uses time intervals seen by the kernel i.e. no adjustment
-	 * for kernel's involuntary waits due to virtualization.
-	 * Value is cumulative, in nanoseconds, without a corresponding count
-	 * and wraps around to zero silently on overflow
-	 */
-	__u64	cpu_run_virtual_total;
-	/* Delay accounting fields end */
-	/* version 1 ends here */
-
-
-3) Extended accounting fields
-	/* Extended accounting fields start */
-
-	/* Accumulated RSS usage in duration of a task, in MBytes-usecs.
-	 * The current rss usage is added to this counter every time
-	 * a tick is charged to a task's system time. So, at the end we
-	 * will have memory usage multiplied by system time. Thus an
-	 * average usage per system time unit can be calculated.
-	 */
-	__u64	coremem;		/* accumulated RSS usage in MB-usec */
-
-  	/* Accumulated virtual memory usage in duration of a task.
-	 * Same as acct_rss_mem1 above except that we keep track of VM usage.
-	 */
-	__u64	virtmem;		/* accumulated VM usage in MB-usec */
-
-  	/* High watermark of RSS usage in duration of a task, in KBytes. */
-	__u64	hiwater_rss;		/* High-watermark of RSS usage */
-
-  	/* High watermark of VM  usage in duration of a task, in KBytes. */
-	__u64	hiwater_vm;		/* High-water virtual memory usage */
-
-	/* The following four fields are I/O statistics of a task. */
-	__u64	read_char;		/* bytes read */
-	__u64	write_char;		/* bytes written */
-	__u64	read_syscalls;		/* read syscalls */
-	__u64	write_syscalls;		/* write syscalls */
-
-	/* Extended accounting fields end */
-
-4) Per-task and per-thread statistics
-	__u64	nvcsw;			/* Context voluntary switch counter */
-	__u64	nivcsw;			/* Context involuntary switch counter */
-
-5) Time accounting for SMT machines
-	__u64	ac_utimescaled;		/* utime scaled on frequency etc */
-	__u64	ac_stimescaled;		/* stime scaled on frequency etc */
-	__u64	cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
-
-6) Extended delay accounting fields for memory reclaim
-	/* Delay waiting for memory reclaim */
-	__u64	freepages_count;
-	__u64	freepages_delay_total;
-}
diff --git a/Documentation/accounting/taskstats.rst b/Documentation/accounting/taskstats.rst
new file mode 100644
index 000000000000..2a28b7f55c10
--- /dev/null
+++ b/Documentation/accounting/taskstats.rst
@@ -0,0 +1,180 @@
+=============================
+Per-task statistics interface
+=============================
+
+
+Taskstats is a netlink-based interface for sending per-task and
+per-process statistics from the kernel to userspace.
+
+Taskstats was designed for the following benefits:
+
+- efficiently provide statistics during lifetime of a task and on its exit
+- unified interface for multiple accounting subsystems
+- extensibility for use by future accounting patches
+
+Terminology
+-----------
+
+"pid", "tid" and "task" are used interchangeably and refer to the standard
+Linux task defined by struct task_struct.  per-pid stats are the same as
+per-task stats.
+
+"tgid", "process" and "thread group" are used interchangeably and refer to the
+tasks that share an mm_struct i.e. the traditional Unix process. Despite the
+use of tgid, there is no special treatment for the task that is thread group
+leader - a process is deemed alive as long as it has any task belonging to it.
+
+Usage
+-----
+
+To get statistics during a task's lifetime, userspace opens a unicast netlink
+socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
+The response contains statistics for a task (if pid is specified) or the sum of
+statistics for all tasks of the process (if tgid is specified).
+
+To obtain statistics for tasks which are exiting, the userspace listener
+sends a register command and specifies a cpumask. Whenever a task exits on
+one of the cpus in the cpumask, its per-pid statistics are sent to the
+registered listener. Using cpumasks allows the data received by one listener
+to be limited and assists in flow control over the netlink interface and is
+explained in more detail below.
+
+If the exiting task is the last thread exiting its thread group,
+an additional record containing the per-tgid stats is also sent to userspace.
+The latter contains the sum of per-pid stats for all threads in the thread
+group, both past and present.
+
+getdelays.c is a simple utility demonstrating usage of the taskstats interface
+for reporting delay accounting statistics. Users can register cpumasks,
+send commands and process responses, listen for per-tid/tgid exit data,
+write the data received to a file and do basic flow control by increasing
+receive buffer sizes.
+
+Interface
+---------
+
+The user-kernel interface is encapsulated in include/linux/taskstats.h
+
+To avoid this documentation becoming obsolete as the interface evolves, only
+an outline of the current version is given. taskstats.h always overrides the
+description here.
+
+struct taskstats is the common accounting structure for both per-pid and
+per-tgid data. It is versioned and can be extended by each accounting subsystem
+that is added to the kernel. The fields and their semantics are defined in the
+taskstats.h file.
+
+The data exchanged between user and kernel space is a netlink message belonging
+to the NETLINK_GENERIC family and using the netlink attributes interface.
+The messages are in the format::
+
+    +----------+- - -+-------------+-------------------+
+    | nlmsghdr | Pad |  genlmsghdr | taskstats payload |
+    +----------+- - -+-------------+-------------------+
+
+
+The taskstats payload is one of the following three kinds:
+
+1. Commands: Sent from user to kernel. Commands to get data on
+a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
+containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
+the task/process for which userspace wants statistics.
+
+Commands to register/deregister interest in exit data from a set of cpus
+consist of one attribute, of type
+TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
+attribute payload. The cpumask is specified as an ascii string of
+comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
+the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
+in cpus before closing the listening socket, the kernel cleans up its interest
+set over time. However, for the sake of efficiency, an explicit deregistration
+is advisable.
+
+2. Response for a command: sent from the kernel in response to a userspace
+command. The payload is a series of three attributes of type:
+
+a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
+a pid/tgid will be followed by some stats.
+
+b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
+are being returned.
+
+c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
+same structure is used for both per-pid and per-tgid stats.
+
+3. New message sent by kernel whenever a task exits. The payload consists of a
+   series of attributes of the following type:
+
+a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
+b) TASKSTATS_TYPE_PID: contains exiting task's pid
+c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
+d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
+e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
+f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
+
+
+per-tgid stats
+--------------
+
+Taskstats provides per-process stats, in addition to per-task stats, since
+resource management is often done at a process granularity and aggregating task
+stats in userspace alone is inefficient and potentially inaccurate (due to lack
+of atomicity).
+
+However, maintaining per-process, in addition to per-task stats, within the
+kernel has space and time overheads. To address this, the taskstats code
+accumulates each exiting task's statistics into a process-wide data structure.
+When the last task of a process exits, the process level data accumulated also
+gets sent to userspace (along with the per-task data).
+
+When a user queries to get per-tgid data, the sum of all other live threads in
+the group is added up and added to the accumulated total for previously exited
+threads of the same thread group.
+
+Extending taskstats
+-------------------
+
+There are two ways to extend the taskstats interface to export more
+per-task/process stats as patches to collect them get added to the kernel
+in future:
+
+1. Adding more fields to the end of the existing struct taskstats. Backward
+   compatibility is ensured by the version number within the
+   structure. Userspace will use only the fields of the struct that correspond
+   to the version its using.
+
+2. Defining separate statistic structs and using the netlink attributes
+   interface to return them. Since userspace processes each netlink attribute
+   independently, it can always ignore attributes whose type it does not
+   understand (because it is using an older version of the interface).
+
+
+Choosing between 1. and 2. is a matter of trading off flexibility and
+overhead. If only a few fields need to be added, then 1. is the preferable
+path since the kernel and userspace don't need to incur the overhead of
+processing new netlink attributes. But if the new fields expand the existing
+struct too much, requiring disparate userspace accounting utilities to
+unnecessarily receive large structures whose fields are of no interest, then
+extending the attributes structure would be worthwhile.
+
+Flow control for taskstats
+--------------------------
+
+When the rate of task exits becomes large, a listener may not be able to keep
+up with the kernel's rate of sending per-tid/tgid exit data leading to data
+loss. This possibility gets compounded when the taskstats structure gets
+extended and the number of cpus grows large.
+
+To avoid losing statistics, userspace should do one or more of the following:
+
+- increase the receive buffer sizes for the netlink sockets opened by
+  listeners to receive exit data.
+
+- create more listeners and reduce the number of cpus being listened to by
+  each listener. In the extreme case, there could be one listener for each cpu.
+  Users may also consider setting the cpu affinity of the listener to the subset
+  of cpus to which it listens, especially if they are listening to just one cpu.
+
+Despite these measures, if the userspace receives ENOBUFS error messages
+indicated overflow of receive buffers, it should take measures to handle the
+loss of data.
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt
deleted file mode 100644
index ff06b738bb88..000000000000
--- a/Documentation/accounting/taskstats.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-Per-task statistics interface
------------------------------
-
-
-Taskstats is a netlink-based interface for sending per-task and
-per-process statistics from the kernel to userspace.
-
-Taskstats was designed for the following benefits:
-
-- efficiently provide statistics during lifetime of a task and on its exit
-- unified interface for multiple accounting subsystems
-- extensibility for use by future accounting patches
-
-Terminology
------------
-
-"pid", "tid" and "task" are used interchangeably and refer to the standard
-Linux task defined by struct task_struct.  per-pid stats are the same as
-per-task stats.
-
-"tgid", "process" and "thread group" are used interchangeably and refer to the
-tasks that share an mm_struct i.e. the traditional Unix process. Despite the
-use of tgid, there is no special treatment for the task that is thread group
-leader - a process is deemed alive as long as it has any task belonging to it.
-
-Usage
------
-
-To get statistics during a task's lifetime, userspace opens a unicast netlink
-socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
-The response contains statistics for a task (if pid is specified) or the sum of
-statistics for all tasks of the process (if tgid is specified).
-
-To obtain statistics for tasks which are exiting, the userspace listener
-sends a register command and specifies a cpumask. Whenever a task exits on
-one of the cpus in the cpumask, its per-pid statistics are sent to the
-registered listener. Using cpumasks allows the data received by one listener
-to be limited and assists in flow control over the netlink interface and is
-explained in more detail below.
-
-If the exiting task is the last thread exiting its thread group,
-an additional record containing the per-tgid stats is also sent to userspace.
-The latter contains the sum of per-pid stats for all threads in the thread
-group, both past and present.
-
-getdelays.c is a simple utility demonstrating usage of the taskstats interface
-for reporting delay accounting statistics. Users can register cpumasks,
-send commands and process responses, listen for per-tid/tgid exit data,
-write the data received to a file and do basic flow control by increasing
-receive buffer sizes.
-
-Interface
----------
-
-The user-kernel interface is encapsulated in include/linux/taskstats.h
-
-To avoid this documentation becoming obsolete as the interface evolves, only
-an outline of the current version is given. taskstats.h always overrides the
-description here.
-
-struct taskstats is the common accounting structure for both per-pid and
-per-tgid data. It is versioned and can be extended by each accounting subsystem
-that is added to the kernel. The fields and their semantics are defined in the
-taskstats.h file.
-
-The data exchanged between user and kernel space is a netlink message belonging
-to the NETLINK_GENERIC family and using the netlink attributes interface.
-The messages are in the format
-
-    +----------+- - -+-------------+-------------------+
-    | nlmsghdr | Pad |  genlmsghdr | taskstats payload |
-    +----------+- - -+-------------+-------------------+
-
-
-The taskstats payload is one of the following three kinds:
-
-1. Commands: Sent from user to kernel. Commands to get data on
-a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
-containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
-the task/process for which userspace wants statistics.
-
-Commands to register/deregister interest in exit data from a set of cpus
-consist of one attribute, of type
-TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
-attribute payload. The cpumask is specified as an ascii string of
-comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
-the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
-in cpus before closing the listening socket, the kernel cleans up its interest
-set over time. However, for the sake of efficiency, an explicit deregistration
-is advisable.
-
-2. Response for a command: sent from the kernel in response to a userspace
-command. The payload is a series of three attributes of type:
-
-a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
-a pid/tgid will be followed by some stats.
-
-b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
-are being returned.
-
-c) TASKSTATS_TYPE_STATS: attribute with a struct taskstats as payload. The
-same structure is used for both per-pid and per-tgid stats.
-
-3. New message sent by kernel whenever a task exits. The payload consists of a
-   series of attributes of the following type:
-
-a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
-b) TASKSTATS_TYPE_PID: contains exiting task's pid
-c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
-d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
-e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
-f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
-
-
-per-tgid stats
---------------
-
-Taskstats provides per-process stats, in addition to per-task stats, since
-resource management is often done at a process granularity and aggregating task
-stats in userspace alone is inefficient and potentially inaccurate (due to lack
-of atomicity).
-
-However, maintaining per-process, in addition to per-task stats, within the
-kernel has space and time overheads. To address this, the taskstats code
-accumulates each exiting task's statistics into a process-wide data structure.
-When the last task of a process exits, the process level data accumulated also
-gets sent to userspace (along with the per-task data).
-
-When a user queries to get per-tgid data, the sum of all other live threads in
-the group is added up and added to the accumulated total for previously exited
-threads of the same thread group.
-
-Extending taskstats
--------------------
-
-There are two ways to extend the taskstats interface to export more
-per-task/process stats as patches to collect them get added to the kernel
-in future:
-
-1. Adding more fields to the end of the existing struct taskstats. Backward
-   compatibility is ensured by the version number within the
-   structure. Userspace will use only the fields of the struct that correspond
-   to the version its using.
-
-2. Defining separate statistic structs and using the netlink attributes
-   interface to return them. Since userspace processes each netlink attribute
-   independently, it can always ignore attributes whose type it does not
-   understand (because it is using an older version of the interface).
-
-
-Choosing between 1. and 2. is a matter of trading off flexibility and
-overhead. If only a few fields need to be added, then 1. is the preferable
-path since the kernel and userspace don't need to incur the overhead of
-processing new netlink attributes. But if the new fields expand the existing
-struct too much, requiring disparate userspace accounting utilities to
-unnecessarily receive large structures whose fields are of no interest, then
-extending the attributes structure would be worthwhile.
-
-Flow control for taskstats
---------------------------
-
-When the rate of task exits becomes large, a listener may not be able to keep
-up with the kernel's rate of sending per-tid/tgid exit data leading to data
-loss. This possibility gets compounded when the taskstats structure gets
-extended and the number of cpus grows large.
-
-To avoid losing statistics, userspace should do one or more of the following:
-
-- increase the receive buffer sizes for the netlink sockets opened by
-listeners to receive exit data.
-
-- create more listeners and reduce the number of cpus being listened to by
-each listener. In the extreme case, there could be one listener for each cpu.
-Users may also consider setting the cpu affinity of the listener to the subset
-of cpus to which it listens, especially if they are listening to just one cpu.
-
-Despite these measures, if the userspace receives ENOBUFS error messages
-indicated overflow of receive buffers, it should take measures to handle the
-loss of data.
-
-----
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index a9548de56ac9..080b18ce2a5d 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1014,7 +1014,7 @@ All time durations are in microseconds.
 	A read-only nested-key file which exists on non-root cgroups.
 
 	Shows pressure stall information for CPU. See
-	Documentation/accounting/psi.txt for details.
+	Documentation/accounting/psi.rst for details.
 
 
 Memory
@@ -1355,7 +1355,7 @@ PAGE_SIZE multiple when read back.
 	A read-only nested-key file which exists on non-root cgroups.
 
 	Shows pressure stall information for memory. See
-	Documentation/accounting/psi.txt for details.
+	Documentation/accounting/psi.rst for details.
 
 
 Usage Guidelines
@@ -1498,7 +1498,7 @@ IO Interface Files
 	A read-only nested-key file which exists on non-root cgroups.
 
 	Shows pressure stall information for IO. See
-	Documentation/accounting/psi.txt for details.
+	Documentation/accounting/psi.rst for details.
 
 
 Writeback
diff --git a/init/Kconfig b/init/Kconfig
index 9697c6b5303c..9eb92ee52d40 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -550,7 +550,7 @@ config PSI
 	  have cpu.pressure, memory.pressure, and io.pressure files,
 	  which aggregate pressure stalls for the grouped tasks only.
 
-	  For more details see Documentation/accounting/psi.txt.
+	  For more details see Documentation/accounting/psi.rst.
 
 	  Say N if unsure.
 
-- 
cgit v1.2.3-55-g7522


From db9a0975a20c1f21c108b9d44545792d790593e4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 10:10:33 -0300
Subject: docs: ia64: convert to ReST

Rename the ia64 documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

There are two upper case file names. Rename them to
lower case, as we're working to avoid upper case file
names at Documentation.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/ia64/IRQ-redir.txt  |   69 ---
 Documentation/ia64/README         |   43 --
 Documentation/ia64/aliasing.rst   |  246 +++++++++
 Documentation/ia64/aliasing.txt   |  221 --------
 Documentation/ia64/efirtc.rst     |  144 +++++
 Documentation/ia64/efirtc.txt     |  128 -----
 Documentation/ia64/err_inject.rst | 1067 ++++++++++++++++++++++++++++++++++++
 Documentation/ia64/err_inject.txt | 1068 -------------------------------------
 Documentation/ia64/fsys.rst       |  303 +++++++++++
 Documentation/ia64/fsys.txt       |  286 ----------
 Documentation/ia64/ia64.rst       |   49 ++
 Documentation/ia64/index.rst      |   18 +
 Documentation/ia64/irq-redir.rst  |   80 +++
 Documentation/ia64/mca.rst        |  198 +++++++
 Documentation/ia64/mca.txt        |  194 -------
 Documentation/ia64/serial.rst     |  165 ++++++
 Documentation/ia64/serial.txt     |  151 ------
 Documentation/ia64/xen.rst        |  206 +++++++
 Documentation/ia64/xen.txt        |  183 -------
 MAINTAINERS                       |    2 +-
 arch/ia64/kernel/efi.c            |    2 +-
 arch/ia64/kernel/fsys.S           |    2 +-
 arch/ia64/mm/ioremap.c            |    2 +-
 arch/ia64/pci/pci.c               |    2 +-
 24 files changed, 2481 insertions(+), 2348 deletions(-)
 delete mode 100644 Documentation/ia64/IRQ-redir.txt
 delete mode 100644 Documentation/ia64/README
 create mode 100644 Documentation/ia64/aliasing.rst
 delete mode 100644 Documentation/ia64/aliasing.txt
 create mode 100644 Documentation/ia64/efirtc.rst
 delete mode 100644 Documentation/ia64/efirtc.txt
 create mode 100644 Documentation/ia64/err_inject.rst
 delete mode 100644 Documentation/ia64/err_inject.txt
 create mode 100644 Documentation/ia64/fsys.rst
 delete mode 100644 Documentation/ia64/fsys.txt
 create mode 100644 Documentation/ia64/ia64.rst
 create mode 100644 Documentation/ia64/index.rst
 create mode 100644 Documentation/ia64/irq-redir.rst
 create mode 100644 Documentation/ia64/mca.rst
 delete mode 100644 Documentation/ia64/mca.txt
 create mode 100644 Documentation/ia64/serial.rst
 delete mode 100644 Documentation/ia64/serial.txt
 create mode 100644 Documentation/ia64/xen.rst
 delete mode 100644 Documentation/ia64/xen.txt

diff --git a/Documentation/ia64/IRQ-redir.txt b/Documentation/ia64/IRQ-redir.txt
deleted file mode 100644
index f7bd72261283..000000000000
--- a/Documentation/ia64/IRQ-redir.txt
+++ /dev/null
@@ -1,69 +0,0 @@
-IRQ affinity on IA64 platforms
-------------------------------
-                           07.01.2002, Erich Focht <efocht@ess.nec.de>
-
-
-By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be
-controlled. The behavior on IA64 platforms is slightly different from
-that described in Documentation/IRQ-affinity.txt for i386 systems.
-
-Because of the usage of SAPIC mode and physical destination mode the
-IRQ target is one particular CPU and cannot be a mask of several
-CPUs. Only the first non-zero bit is taken into account.
-
-
-Usage examples:
-
-The target CPU has to be specified as a hexadecimal CPU mask. The
-first non-zero bit is the selected CPU. This format has been kept for
-compatibility reasons with i386.
-
-Set the delivery mode of interrupt 41 to fixed and route the
-interrupts to CPU #3 (logical CPU number) (2^3=0x08):
-     echo "8" >/proc/irq/41/smp_affinity
-
-Set the default route for IRQ number 41 to CPU 6 in lowest priority
-delivery mode (redirectable):
-     echo "r 40" >/proc/irq/41/smp_affinity
-
-The output of the command
-     cat /proc/irq/IRQ#/smp_affinity
-gives the target CPU mask for the specified interrupt vector. If the CPU
-mask is preceded by the character "r", the interrupt is redirectable
-(i.e. lowest priority mode routing is used), otherwise its route is
-fixed.
-
-
-
-Initialization and default behavior:
-
-If the platform features IRQ redirection (info provided by SAL) all
-IO-SAPIC interrupts are initialized with CPU#0 as their default target
-and the routing is the so called "lowest priority mode" (actually
-fixed SAPIC mode with hint). The XTP chipset registers are used as hints
-for the IRQ routing. Currently in Linux XTP registers can have three
-values:
-	- minimal for an idle task,
-	- normal if any other task runs,
-	- maximal if the CPU is going to be switched off.
-The IRQ is routed to the CPU with lowest XTP register value, the
-search begins at the default CPU. Therefore most of the interrupts
-will be handled by CPU #0.
-
-If the platform doesn't feature interrupt redirection IOSAPIC fixed
-routing is used. The target CPUs are distributed in a round robin
-manner. IRQs will be routed only to the selected target CPUs. Check
-with
-        cat /proc/interrupts
-
-
-
-Comments:
-
-On large (multi-node) systems it is recommended to route the IRQs to
-the node to which the corresponding device is connected.
-For systems like the NEC AzusA we get IRQ node-affinity for free. This
-is because usually the chipsets on each node redirect the interrupts
-only to their own CPUs (as they cannot see the XTP registers on the
-other nodes).
-
diff --git a/Documentation/ia64/README b/Documentation/ia64/README
deleted file mode 100644
index aa17f2154cba..000000000000
--- a/Documentation/ia64/README
+++ /dev/null
@@ -1,43 +0,0 @@
-        Linux kernel release 2.4.xx for the IA-64 Platform
-
-   These are the release notes for Linux version 2.4 for IA-64
-   platform.  This document provides information specific to IA-64
-   ONLY, to get additional information about the Linux kernel also
-   read the original Linux README provided with the kernel.
-
-INSTALLING the kernel:
-
- - IA-64 kernel installation is the same as the other platforms, see
-   original README for details.
-
-
-SOFTWARE REQUIREMENTS
-
-   Compiling and running this kernel requires an IA-64 compliant GCC
-   compiler.  And various software packages also compiled with an
-   IA-64 compliant GCC compiler.
-
-
-CONFIGURING the kernel:
-
-   Configuration is the same, see original README for details.
-
-
-COMPILING the kernel:
-
- - Compiling this kernel doesn't differ from other platform so read
-   the original README for details BUT make sure you have an IA-64
-   compliant GCC compiler.
-
-IA-64 SPECIFICS
-
- - General issues:
-
-    o Hardly any performance tuning has been done. Obvious targets
-      include the library routines (IP checksum, etc.). Less
-      obvious targets include making sure we don't flush the TLB
-      needlessly, etc.
-
-    o SMP locks cleanup/optimization
-
-    o IA32 support.  Currently experimental.  It mostly works.
diff --git a/Documentation/ia64/aliasing.rst b/Documentation/ia64/aliasing.rst
new file mode 100644
index 000000000000..a08b36aba015
--- /dev/null
+++ b/Documentation/ia64/aliasing.rst
@@ -0,0 +1,246 @@
+==================================
+Memory Attribute Aliasing on IA-64
+==================================
+
+Bjorn Helgaas <bjorn.helgaas@hp.com>
+
+May 4, 2006
+
+
+Memory Attributes
+=================
+
+    Itanium supports several attributes for virtual memory references.
+    The attribute is part of the virtual translation, i.e., it is
+    contained in the TLB entry.  The ones of most interest to the Linux
+    kernel are:
+
+	==		======================
+        WB		Write-back (cacheable)
+	UC		Uncacheable
+	WC		Write-coalescing
+	==		======================
+
+    System memory typically uses the WB attribute.  The UC attribute is
+    used for memory-mapped I/O devices.  The WC attribute is uncacheable
+    like UC is, but writes may be delayed and combined to increase
+    performance for things like frame buffers.
+
+    The Itanium architecture requires that we avoid accessing the same
+    page with both a cacheable mapping and an uncacheable mapping[1].
+
+    The design of the chipset determines which attributes are supported
+    on which regions of the address space.  For example, some chipsets
+    support either WB or UC access to main memory, while others support
+    only WB access.
+
+Memory Map
+==========
+
+    Platform firmware describes the physical memory map and the
+    supported attributes for each region.  At boot-time, the kernel uses
+    the EFI GetMemoryMap() interface.  ACPI can also describe memory
+    devices and the attributes they support, but Linux/ia64 currently
+    doesn't use this information.
+
+    The kernel uses the efi_memmap table returned from GetMemoryMap() to
+    learn the attributes supported by each region of physical address
+    space.  Unfortunately, this table does not completely describe the
+    address space because some machines omit some or all of the MMIO
+    regions from the map.
+
+    The kernel maintains another table, kern_memmap, which describes the
+    memory Linux is actually using and the attribute for each region.
+    This contains only system memory; it does not contain MMIO space.
+
+    The kern_memmap table typically contains only a subset of the system
+    memory described by the efi_memmap.  Linux/ia64 can't use all memory
+    in the system because of constraints imposed by the identity mapping
+    scheme.
+
+    The efi_memmap table is preserved unmodified because the original
+    boot-time information is required for kexec.
+
+Kernel Identify Mappings
+========================
+
+    Linux/ia64 identity mappings are done with large pages, currently
+    either 16MB or 64MB, referred to as "granules."  Cacheable mappings
+    are speculative[2], so the processor can read any location in the
+    page at any time, independent of the programmer's intentions.  This
+    means that to avoid attribute aliasing, Linux can create a cacheable
+    identity mapping only when the entire granule supports cacheable
+    access.
+
+    Therefore, kern_memmap contains only full granule-sized regions that
+    can referenced safely by an identity mapping.
+
+    Uncacheable mappings are not speculative, so the processor will
+    generate UC accesses only to locations explicitly referenced by
+    software.  This allows UC identity mappings to cover granules that
+    are only partially populated, or populated with a combination of UC
+    and WB regions.
+
+User Mappings
+=============
+
+    User mappings are typically done with 16K or 64K pages.  The smaller
+    page size allows more flexibility because only 16K or 64K has to be
+    homogeneous with respect to memory attributes.
+
+Potential Attribute Aliasing Cases
+==================================
+
+    There are several ways the kernel creates new mappings:
+
+mmap of /dev/mem
+----------------
+
+	This uses remap_pfn_range(), which creates user mappings.  These
+	mappings may be either WB or UC.  If the region being mapped
+	happens to be in kern_memmap, meaning that it may also be mapped
+	by a kernel identity mapping, the user mapping must use the same
+	attribute as the kernel mapping.
+
+	If the region is not in kern_memmap, the user mapping should use
+	an attribute reported as being supported in the EFI memory map.
+
+	Since the EFI memory map does not describe MMIO on some
+	machines, this should use an uncacheable mapping as a fallback.
+
+mmap of /sys/class/pci_bus/.../legacy_mem
+-----------------------------------------
+
+	This is very similar to mmap of /dev/mem, except that legacy_mem
+	only allows mmap of the one megabyte "legacy MMIO" area for a
+	specific PCI bus.  Typically this is the first megabyte of
+	physical address space, but it may be different on machines with
+	several VGA devices.
+
+	"X" uses this to access VGA frame buffers.  Using legacy_mem
+	rather than /dev/mem allows multiple instances of X to talk to
+	different VGA cards.
+
+	The /dev/mem mmap constraints apply.
+
+mmap of /proc/bus/pci/.../??.?
+------------------------------
+
+	This is an MMIO mmap of PCI functions, which additionally may or
+	may not be requested as using the WC attribute.
+
+	If WC is requested, and the region in kern_memmap is either WC
+	or UC, and the EFI memory map designates the region as WC, then
+	the WC mapping is allowed.
+
+	Otherwise, the user mapping must use the same attribute as the
+	kernel mapping.
+
+read/write of /dev/mem
+----------------------
+
+	This uses copy_from_user(), which implicitly uses a kernel
+	identity mapping.  This is obviously safe for things in
+	kern_memmap.
+
+	There may be corner cases of things that are not in kern_memmap,
+	but could be accessed this way.  For example, registers in MMIO
+	space are not in kern_memmap, but could be accessed with a UC
+	mapping.  This would not cause attribute aliasing.  But
+	registers typically can be accessed only with four-byte or
+	eight-byte accesses, and the copy_from_user() path doesn't allow
+	any control over the access size, so this would be dangerous.
+
+ioremap()
+---------
+
+	This returns a mapping for use inside the kernel.
+
+	If the region is in kern_memmap, we should use the attribute
+	specified there.
+
+	If the EFI memory map reports that the entire granule supports
+	WB, we should use that (granules that are partially reserved
+	or occupied by firmware do not appear in kern_memmap).
+
+	If the granule contains non-WB memory, but we can cover the
+	region safely with kernel page table mappings, we can use
+	ioremap_page_range() as most other architectures do.
+
+	Failing all of the above, we have to fall back to a UC mapping.
+
+Past Problem Cases
+==================
+
+mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
+--------------------------------------------------------------------
+
+      The EFI memory map may not report these MMIO regions.
+
+      These must be allowed so that X will work.  This means that
+      when the EFI memory map is incomplete, every /dev/mem mmap must
+      succeed.  It may create either WB or UC user mappings, depending
+      on whether the region is in kern_memmap or the EFI memory map.
+
+mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
+----------------------------------------------------------------------
+
+      The EFI memory map reports the following attributes:
+
+        =============== ======= ==================
+        0x00000-0x9FFFF WB only
+        0xA0000-0xBFFFF UC only (VGA frame buffer)
+        0xC0000-0xFFFFF WB only
+        =============== ======= ==================
+
+      This mmap is done with user pages, not kernel identity mappings,
+      so it is safe to use WB mappings.
+
+      The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
+      which uses a granule-sized UC mapping.  This granule will cover some
+      WB-only memory, but since UC is non-speculative, the processor will
+      never generate an uncacheable reference to the WB-only areas unless
+      the driver explicitly touches them.
+
+mmap of 0x0-0xFFFFF legacy_mem by "X"
+-------------------------------------
+
+      If the EFI memory map reports that the entire range supports the
+      same attributes, we can allow the mmap (and we will prefer WB if
+      supported, as is the case with HP sx[12]000 machines with VGA
+      disabled).
+
+      If EFI reports the range as partly WB and partly UC (as on sx[12]000
+      machines with VGA enabled), we must fail the mmap because there's no
+      safe attribute to use.
+
+      If EFI reports some of the range but not all (as on Intel firmware
+      that doesn't report the VGA frame buffer at all), we should fail the
+      mmap and force the user to map just the specific region of interest.
+
+mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
+------------------------------------------------------------------------
+
+      The EFI memory map reports the following attributes::
+
+        0x00000-0xFFFFF WB only (no VGA MMIO hole)
+
+      This is a special case of the previous case, and the mmap should
+      fail for the same reason as above.
+
+read of /sys/devices/.../rom
+----------------------------
+
+      For VGA devices, this may cause an ioremap() of 0xC0000.  This
+      used to be done with a UC mapping, because the VGA frame buffer
+      at 0xA0000 prevents use of a WB granule.  The UC mapping causes
+      an MCA on HP sx[12]000 chipsets.
+
+      We should use WB page table mappings to avoid covering the VGA
+      frame buffer.
+
+Notes
+=====
+
+    [1] SDM rev 2.2, vol 2, sec 4.4.1.
+    [2] SDM rev 2.2, vol 2, sec 4.4.6.
diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt
deleted file mode 100644
index 5a4dea6abebd..000000000000
--- a/Documentation/ia64/aliasing.txt
+++ /dev/null
@@ -1,221 +0,0 @@
-	         MEMORY ATTRIBUTE ALIASING ON IA-64
-
-			   Bjorn Helgaas
-		       <bjorn.helgaas@hp.com>
-			    May 4, 2006
-
-
-MEMORY ATTRIBUTES
-
-    Itanium supports several attributes for virtual memory references.
-    The attribute is part of the virtual translation, i.e., it is
-    contained in the TLB entry.  The ones of most interest to the Linux
-    kernel are:
-
-	WB		Write-back (cacheable)
-	UC		Uncacheable
-	WC		Write-coalescing
-
-    System memory typically uses the WB attribute.  The UC attribute is
-    used for memory-mapped I/O devices.  The WC attribute is uncacheable
-    like UC is, but writes may be delayed and combined to increase
-    performance for things like frame buffers.
-
-    The Itanium architecture requires that we avoid accessing the same
-    page with both a cacheable mapping and an uncacheable mapping[1].
-
-    The design of the chipset determines which attributes are supported
-    on which regions of the address space.  For example, some chipsets
-    support either WB or UC access to main memory, while others support
-    only WB access.
-
-MEMORY MAP
-
-    Platform firmware describes the physical memory map and the
-    supported attributes for each region.  At boot-time, the kernel uses
-    the EFI GetMemoryMap() interface.  ACPI can also describe memory
-    devices and the attributes they support, but Linux/ia64 currently
-    doesn't use this information.
-
-    The kernel uses the efi_memmap table returned from GetMemoryMap() to
-    learn the attributes supported by each region of physical address
-    space.  Unfortunately, this table does not completely describe the
-    address space because some machines omit some or all of the MMIO
-    regions from the map.
-
-    The kernel maintains another table, kern_memmap, which describes the
-    memory Linux is actually using and the attribute for each region.
-    This contains only system memory; it does not contain MMIO space.
-
-    The kern_memmap table typically contains only a subset of the system
-    memory described by the efi_memmap.  Linux/ia64 can't use all memory
-    in the system because of constraints imposed by the identity mapping
-    scheme.
-
-    The efi_memmap table is preserved unmodified because the original
-    boot-time information is required for kexec.
-
-KERNEL IDENTITY MAPPINGS
-
-    Linux/ia64 identity mappings are done with large pages, currently
-    either 16MB or 64MB, referred to as "granules."  Cacheable mappings
-    are speculative[2], so the processor can read any location in the
-    page at any time, independent of the programmer's intentions.  This
-    means that to avoid attribute aliasing, Linux can create a cacheable
-    identity mapping only when the entire granule supports cacheable
-    access.
-
-    Therefore, kern_memmap contains only full granule-sized regions that
-    can referenced safely by an identity mapping.
-
-    Uncacheable mappings are not speculative, so the processor will
-    generate UC accesses only to locations explicitly referenced by
-    software.  This allows UC identity mappings to cover granules that
-    are only partially populated, or populated with a combination of UC
-    and WB regions.
-
-USER MAPPINGS
-
-    User mappings are typically done with 16K or 64K pages.  The smaller
-    page size allows more flexibility because only 16K or 64K has to be
-    homogeneous with respect to memory attributes.
-
-POTENTIAL ATTRIBUTE ALIASING CASES
-
-    There are several ways the kernel creates new mappings:
-
-    mmap of /dev/mem
-
-	This uses remap_pfn_range(), which creates user mappings.  These
-	mappings may be either WB or UC.  If the region being mapped
-	happens to be in kern_memmap, meaning that it may also be mapped
-	by a kernel identity mapping, the user mapping must use the same
-	attribute as the kernel mapping.
-
-	If the region is not in kern_memmap, the user mapping should use
-	an attribute reported as being supported in the EFI memory map.
-
-	Since the EFI memory map does not describe MMIO on some
-	machines, this should use an uncacheable mapping as a fallback.
-
-    mmap of /sys/class/pci_bus/.../legacy_mem
-
-	This is very similar to mmap of /dev/mem, except that legacy_mem
-	only allows mmap of the one megabyte "legacy MMIO" area for a
-	specific PCI bus.  Typically this is the first megabyte of
-	physical address space, but it may be different on machines with
-	several VGA devices.
-
-	"X" uses this to access VGA frame buffers.  Using legacy_mem
-	rather than /dev/mem allows multiple instances of X to talk to
-	different VGA cards.
-
-	The /dev/mem mmap constraints apply.
-
-    mmap of /proc/bus/pci/.../??.?
-
-    	This is an MMIO mmap of PCI functions, which additionally may or
-	may not be requested as using the WC attribute.
-
-	If WC is requested, and the region in kern_memmap is either WC
-	or UC, and the EFI memory map designates the region as WC, then
-	the WC mapping is allowed.
-
-	Otherwise, the user mapping must use the same attribute as the
-	kernel mapping.
-
-    read/write of /dev/mem
-
-	This uses copy_from_user(), which implicitly uses a kernel
-	identity mapping.  This is obviously safe for things in
-	kern_memmap.
-
-	There may be corner cases of things that are not in kern_memmap,
-	but could be accessed this way.  For example, registers in MMIO
-	space are not in kern_memmap, but could be accessed with a UC
-	mapping.  This would not cause attribute aliasing.  But
-	registers typically can be accessed only with four-byte or
-	eight-byte accesses, and the copy_from_user() path doesn't allow
-	any control over the access size, so this would be dangerous.
-
-    ioremap()
-
-	This returns a mapping for use inside the kernel.
-
-	If the region is in kern_memmap, we should use the attribute
-	specified there.
-
-	If the EFI memory map reports that the entire granule supports
-	WB, we should use that (granules that are partially reserved
-	or occupied by firmware do not appear in kern_memmap).
-
-	If the granule contains non-WB memory, but we can cover the
-	region safely with kernel page table mappings, we can use
-	ioremap_page_range() as most other architectures do.
-
-	Failing all of the above, we have to fall back to a UC mapping.
-
-PAST PROBLEM CASES
-
-    mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
-
-      The EFI memory map may not report these MMIO regions.
-
-      These must be allowed so that X will work.  This means that
-      when the EFI memory map is incomplete, every /dev/mem mmap must
-      succeed.  It may create either WB or UC user mappings, depending
-      on whether the region is in kern_memmap or the EFI memory map.
-
-    mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
-
-      The EFI memory map reports the following attributes:
-        0x00000-0x9FFFF WB only
-        0xA0000-0xBFFFF UC only (VGA frame buffer)
-        0xC0000-0xFFFFF WB only
-
-      This mmap is done with user pages, not kernel identity mappings,
-      so it is safe to use WB mappings.
-
-      The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
-      which uses a granule-sized UC mapping.  This granule will cover some
-      WB-only memory, but since UC is non-speculative, the processor will
-      never generate an uncacheable reference to the WB-only areas unless
-      the driver explicitly touches them.
-
-    mmap of 0x0-0xFFFFF legacy_mem by "X"
-
-      If the EFI memory map reports that the entire range supports the
-      same attributes, we can allow the mmap (and we will prefer WB if
-      supported, as is the case with HP sx[12]000 machines with VGA
-      disabled).
-
-      If EFI reports the range as partly WB and partly UC (as on sx[12]000
-      machines with VGA enabled), we must fail the mmap because there's no
-      safe attribute to use.
-
-      If EFI reports some of the range but not all (as on Intel firmware
-      that doesn't report the VGA frame buffer at all), we should fail the
-      mmap and force the user to map just the specific region of interest.
-
-    mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
-
-      The EFI memory map reports the following attributes:
-        0x00000-0xFFFFF WB only (no VGA MMIO hole)
-
-      This is a special case of the previous case, and the mmap should
-      fail for the same reason as above.
-
-    read of /sys/devices/.../rom
-
-      For VGA devices, this may cause an ioremap() of 0xC0000.  This
-      used to be done with a UC mapping, because the VGA frame buffer
-      at 0xA0000 prevents use of a WB granule.  The UC mapping causes
-      an MCA on HP sx[12]000 chipsets.
-
-      We should use WB page table mappings to avoid covering the VGA
-      frame buffer.
-
-NOTES
-
-    [1] SDM rev 2.2, vol 2, sec 4.4.1.
-    [2] SDM rev 2.2, vol 2, sec 4.4.6.
diff --git a/Documentation/ia64/efirtc.rst b/Documentation/ia64/efirtc.rst
new file mode 100644
index 000000000000..2f7ff5026308
--- /dev/null
+++ b/Documentation/ia64/efirtc.rst
@@ -0,0 +1,144 @@
+==========================
+EFI Real Time Clock driver
+==========================
+
+S. Eranian <eranian@hpl.hp.com>
+
+March 2000
+
+1. Introduction
+===============
+
+This document describes the efirtc.c driver has provided for
+the IA-64 platform.
+
+The purpose of this driver is to supply an API for kernel and user applications
+to get access to the Time Service offered by EFI version 0.92.
+
+EFI provides 4 calls one can make once the OS is booted: GetTime(),
+SetTime(), GetWakeupTime(), SetWakeupTime() which are all supported by this
+driver. We describe those calls as well the design of the driver in the
+following sections.
+
+2. Design Decisions
+===================
+
+The original ideas was to provide a very simple driver to get access to,
+at first, the time of day service. This is required in order to access, in a
+portable way, the CMOS clock. A program like /sbin/hwclock uses such a clock
+to initialize the system view of the time during boot.
+
+Because we wanted to minimize the impact on existing user-level apps using
+the CMOS clock, we decided to expose an API that was very similar to the one
+used today with the legacy RTC driver (driver/char/rtc.c). However, because
+EFI provides a simpler services, not all ioctl() are available. Also
+new ioctl()s have been introduced for things that EFI provides but not the
+legacy.
+
+EFI uses a slightly different way of representing the time, noticeably
+the reference date is different. Year is the using the full 4-digit format.
+The Epoch is January 1st 1998. For backward compatibility reasons we don't
+expose this new way of representing time. Instead we use something very
+similar to the struct tm, i.e. struct rtc_time, as used by hwclock.
+One of the reasons for doing it this way is to allow for EFI to still evolve
+without necessarily impacting any of the user applications. The decoupling
+enables flexibility and permits writing wrapper code is ncase things change.
+
+The driver exposes two interfaces, one via the device file and a set of
+ioctl()s. The other is read-only via the /proc filesystem.
+
+As of today we don't offer a /proc/sys interface.
+
+To allow for a uniform interface between the legacy RTC and EFI time service,
+we have created the include/linux/rtc.h header file to contain only the
+"public" API of the two drivers.  The specifics of the legacy RTC are still
+in include/linux/mc146818rtc.h.
+
+
+3. Time of day service
+======================
+
+The part of the driver gives access to the time of day service of EFI.
+Two ioctl()s, compatible with the legacy RTC calls:
+
+	Read the CMOS clock::
+
+		ioctl(d, RTC_RD_TIME, &rtc);
+
+	Write the CMOS clock::
+
+		ioctl(d, RTC_SET_TIME, &rtc);
+
+The rtc is a pointer to a data structure defined in rtc.h which is close
+to a struct tm::
+
+  struct rtc_time {
+          int tm_sec;
+          int tm_min;
+          int tm_hour;
+          int tm_mday;
+          int tm_mon;
+          int tm_year;
+          int tm_wday;
+          int tm_yday;
+          int tm_isdst;
+  };
+
+The driver takes care of converting back an forth between the EFI time and
+this format.
+
+Those two ioctl()s can be exercised with the hwclock command:
+
+For reading::
+
+	# /sbin/hwclock --show
+	Mon Mar  6 15:32:32 2000  -0.910248 seconds
+
+For setting::
+
+	# /sbin/hwclock --systohc
+
+Root privileges are required to be able to set the time of day.
+
+4. Wakeup Alarm service
+=======================
+
+EFI provides an API by which one can program when a machine should wakeup,
+i.e. reboot. This is very different from the alarm provided by the legacy
+RTC which is some kind of interval timer alarm. For this reason we don't use
+the same ioctl()s to get access to the service. Instead we have
+introduced 2 news ioctl()s to the interface of an RTC.
+
+We have added 2 new ioctl()s that are specific to the EFI driver:
+
+	Read the current state of the alarm::
+
+		ioctl(d, RTC_WKLAM_RD, &wkt)
+
+	Set the alarm or change its status::
+
+		ioctl(d, RTC_WKALM_SET, &wkt)
+
+The wkt structure encapsulates a struct rtc_time + 2 extra fields to get
+status information::
+
+  struct rtc_wkalrm {
+
+          unsigned char enabled; /* =1 if alarm is enabled */
+          unsigned char pending; /* =1 if alarm is pending  */
+
+          struct rtc_time time;
+  }
+
+As of today, none of the existing user-level apps supports this feature.
+However writing such a program should be hard by simply using those two
+ioctl().
+
+Root privileges are required to be able to set the alarm.
+
+5. References
+=============
+
+Checkout the following Web site for more information on EFI:
+
+http://developer.intel.com/technology/efi/
diff --git a/Documentation/ia64/efirtc.txt b/Documentation/ia64/efirtc.txt
deleted file mode 100644
index 057e6bebda8f..000000000000
--- a/Documentation/ia64/efirtc.txt
+++ /dev/null
@@ -1,128 +0,0 @@
-EFI Real Time Clock driver
--------------------------------
-S. Eranian <eranian@hpl.hp.com>
-March 2000
-
-I/ Introduction
-
-This document describes the efirtc.c driver has provided for
-the IA-64 platform. 
-
-The purpose of this driver is to supply an API for kernel and user applications
-to get access to the Time Service offered by EFI version 0.92.
-
-EFI provides 4 calls one can make once the OS is booted: GetTime(),
-SetTime(), GetWakeupTime(), SetWakeupTime() which are all supported by this
-driver. We describe those calls as well the design of the driver in the
-following sections.
-
-II/ Design Decisions
-
-The original ideas was to provide a very simple driver to get access to, 
-at first, the time of day service. This is required in order to access, in a 
-portable way, the CMOS clock. A program like /sbin/hwclock uses such a clock 
-to initialize the system view of the time during boot.
-
-Because we wanted to minimize the impact on existing user-level apps using
-the CMOS clock, we decided to expose an API that was very similar to the one
-used today with the legacy RTC driver (driver/char/rtc.c). However, because 
-EFI provides a simpler services, not all ioctl() are available. Also
-new ioctl()s have been introduced for things that EFI provides but not the 
-legacy.
-
-EFI uses a slightly different way of representing the time, noticeably
-the reference date is different. Year is the using the full 4-digit format.
-The Epoch is January 1st 1998. For backward compatibility reasons we don't
-expose this new way of representing time. Instead we use something very 
-similar to the struct tm, i.e. struct rtc_time, as used by hwclock.
-One of the reasons for doing it this way is to allow for EFI to still evolve
-without necessarily impacting any of the user applications. The decoupling
-enables flexibility and permits writing wrapper code is ncase things change.
-
-The driver exposes two interfaces, one via the device file and a set of
-ioctl()s. The other is read-only via the /proc filesystem. 
-
-As of today we don't offer a /proc/sys interface.
-
-To allow for a uniform interface between the legacy RTC and EFI time service,
-we have created the include/linux/rtc.h header file to contain only the 
-"public" API of the two drivers.  The specifics of the legacy RTC are still 
-in include/linux/mc146818rtc.h.
-
- 
-III/ Time of day service
-
-The part of the driver gives access to the time of day service of EFI.
-Two ioctl()s, compatible with the legacy RTC calls:
-
-	Read the CMOS clock: ioctl(d, RTC_RD_TIME, &rtc);
-
-	Write the CMOS clock: ioctl(d, RTC_SET_TIME, &rtc);
-
-The rtc is a pointer to a data structure defined in rtc.h which is close
-to a struct tm:
-
-struct rtc_time {
-        int tm_sec;
-        int tm_min;
-        int tm_hour;
-        int tm_mday;
-        int tm_mon;
-        int tm_year;
-        int tm_wday;
-        int tm_yday;
-        int tm_isdst;
-};
-
-The driver takes care of converting back an forth between the EFI time and
-this format.
-
-Those two ioctl()s can be exercised with the hwclock command:
-
-For reading:
-# /sbin/hwclock --show
-Mon Mar  6 15:32:32 2000  -0.910248 seconds
-
-For setting:
-# /sbin/hwclock --systohc
-
-Root privileges are required to be able to set the time of day.
-
-IV/ Wakeup Alarm service
-
-EFI provides an API by which one can program when a machine should wakeup,
-i.e. reboot. This is very different from the alarm provided by the legacy
-RTC which is some kind of interval timer alarm. For this reason we don't use
-the same ioctl()s to get access to the service. Instead we have
-introduced 2 news ioctl()s to the interface of an RTC. 
-
-We have added 2 new ioctl()s that are specific to the EFI driver:
-
-	Read the current state of the alarm
-	ioctl(d, RTC_WKLAM_RD, &wkt)
-
-	Set the alarm or change its status
-	ioctl(d, RTC_WKALM_SET, &wkt)
-
-The wkt structure encapsulates a struct rtc_time + 2 extra fields to get 
-status information:
-	
-struct rtc_wkalrm {
-
-        unsigned char enabled; /* =1 if alarm is enabled */
-        unsigned char pending; /* =1 if alarm is pending  */
-
-        struct rtc_time time;
-} 
-
-As of today, none of the existing user-level apps supports this feature.
-However writing such a program should be hard by simply using those two 
-ioctl(). 
-
-Root privileges are required to be able to set the alarm.
-
-V/ References.
-
-Checkout the following Web site for more information on EFI:
-
-http://developer.intel.com/technology/efi/
diff --git a/Documentation/ia64/err_inject.rst b/Documentation/ia64/err_inject.rst
new file mode 100644
index 000000000000..900f71e93a29
--- /dev/null
+++ b/Documentation/ia64/err_inject.rst
@@ -0,0 +1,1067 @@
+========================================
+IPF Machine Check (MC) error inject tool
+========================================
+
+IPF Machine Check (MC) error inject tool is used to inject MC
+errors from Linux. The tool is a test bed for IPF MC work flow including
+hardware correctable error handling, OS recoverable error handling, MC
+event logging, etc.
+
+The tool includes two parts: a kernel driver and a user application
+sample. The driver provides interface to PAL to inject error
+and query error injection capabilities. The driver code is in
+arch/ia64/kernel/err_inject.c. The application sample (shown below)
+provides a combination of various errors and calls the driver's interface
+(sysfs interface) to inject errors or query error injection capabilities.
+
+The tool can be used to test Intel IPF machine MC handling capabilities.
+It's especially useful for people who can not access hardware MC injection
+tool to inject error. It's also very useful to integrate with other
+software test suits to do stressful testing on IPF.
+
+Below is a sample application as part of the whole tool. The sample
+can be used as a working test tool. Or it can be expanded to include
+more features. It also can be a integrated into a library or other user
+application to have more thorough test.
+
+The sample application takes err.conf as error configuration input. GCC
+compiles the code. After you install err_inject driver, you can run
+this sample application to inject errors.
+
+Errata: Itanium 2 Processors Specification Update lists some errata against
+the pal_mc_error_inject PAL procedure. The following err.conf has been tested
+on latest Montecito PAL.
+
+err.conf::
+
+  #This is configuration file for err_inject_tool.
+  #The format of the each line is:
+  #cpu, loop, interval, err_type_info, err_struct_info, err_data_buffer
+  #where
+  #	cpu: logical cpu number the error will be inject in.
+  #	loop: times the error will be injected.
+  #	interval: In second. every so often one error is injected.
+  #	err_type_info, err_struct_info: PAL parameters.
+  #
+  #Note: All values are hex w/o or w/ 0x prefix.
+
+
+  #On cpu2, inject only total 0x10 errors, interval 5 seconds
+  #corrected, data cache, hier-2, physical addr(assigned by tool code).
+  #working on Montecito latest PAL.
+  2, 10, 5, 4101, 95
+
+  #On cpu4, inject and consume total 0x10 errors, interval 5 seconds
+  #corrected, data cache, hier-2, physical addr(assigned by tool code).
+  #working on Montecito latest PAL.
+  4, 10, 5, 4109, 95
+
+  #On cpu15, inject and consume total 0x10 errors, interval 5 seconds
+  #recoverable, DTR0, hier-2.
+  #working on Montecito latest PAL.
+  0xf, 0x10, 5, 4249, 15
+
+The sample application source code:
+
+err_injection_tool.c::
+
+  /*
+   * This program is free software; you can redistribute it and/or modify
+   * it under the terms of the GNU General Public License as published by
+   * the Free Software Foundation; either version 2 of the License, or
+   * (at your option) any later version.
+   *
+   * This program is distributed in the hope that it will be useful, but
+   * WITHOUT ANY WARRANTY; without even the implied warranty of
+   * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+   * NON INFRINGEMENT.  See the GNU General Public License for more
+   * details.
+   *
+   * You should have received a copy of the GNU General Public License
+   * along with this program; if not, write to the Free Software
+   * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+   *
+   * Copyright (C) 2006 Intel Co
+   *	Fenghua Yu <fenghua.yu@intel.com>
+   *
+   */
+  #include <sys/types.h>
+  #include <sys/stat.h>
+  #include <fcntl.h>
+  #include <stdio.h>
+  #include <sched.h>
+  #include <unistd.h>
+  #include <stdlib.h>
+  #include <stdarg.h>
+  #include <string.h>
+  #include <errno.h>
+  #include <time.h>
+  #include <sys/ipc.h>
+  #include <sys/sem.h>
+  #include <sys/wait.h>
+  #include <sys/mman.h>
+  #include <sys/shm.h>
+
+  #define MAX_FN_SIZE 		256
+  #define MAX_BUF_SIZE 		256
+  #define DATA_BUF_SIZE 		256
+  #define NR_CPUS 		512
+  #define MAX_TASK_NUM		2048
+  #define MIN_INTERVAL		5	// seconds
+  #define	ERR_DATA_BUFFER_SIZE 	3	// Three 8-byte.
+  #define PARA_FIELD_NUM		5
+  #define MASK_SIZE		(NR_CPUS/64)
+  #define PATH_FORMAT "/sys/devices/system/cpu/cpu%d/err_inject/"
+
+  int sched_setaffinity(pid_t pid, unsigned int len, unsigned long *mask);
+
+  int verbose;
+  #define vbprintf if (verbose) printf
+
+  int log_info(int cpu, const char *fmt, ...)
+  {
+	FILE *log;
+	char fn[MAX_FN_SIZE];
+	char buf[MAX_BUF_SIZE];
+	va_list args;
+
+	sprintf(fn, "%d.log", cpu);
+	log=fopen(fn, "a+");
+	if (log==NULL) {
+		perror("Error open:");
+		return -1;
+	}
+
+	va_start(args, fmt);
+	vprintf(fmt, args);
+	memset(buf, 0, MAX_BUF_SIZE);
+	vsprintf(buf, fmt, args);
+	va_end(args);
+
+	fwrite(buf, sizeof(buf), 1, log);
+	fclose(log);
+
+	return 0;
+  }
+
+  typedef unsigned long u64;
+  typedef unsigned int  u32;
+
+  typedef union err_type_info_u {
+	struct {
+		u64	mode		: 3,	/* 0-2 */
+			err_inj		: 3,	/* 3-5 */
+			err_sev		: 2,	/* 6-7 */
+			err_struct	: 5,	/* 8-12 */
+			struct_hier	: 3,	/* 13-15 */
+			reserved	: 48;	/* 16-63 */
+	} err_type_info_u;
+	u64	err_type_info;
+  } err_type_info_t;
+
+  typedef union err_struct_info_u {
+	struct {
+		u64	siv		: 1,	/* 0	 */
+			c_t		: 2,	/* 1-2	 */
+			cl_p		: 3,	/* 3-5	 */
+			cl_id		: 3,	/* 6-8	 */
+			cl_dp		: 1,	/* 9	 */
+			reserved1	: 22,	/* 10-31 */
+			tiv		: 1,	/* 32	 */
+			trigger		: 4,	/* 33-36 */
+			trigger_pl 	: 3,	/* 37-39 */
+			reserved2 	: 24;	/* 40-63 */
+	} err_struct_info_cache;
+	struct {
+		u64	siv		: 1,	/* 0	 */
+			tt		: 2,	/* 1-2	 */
+			tc_tr		: 2,	/* 3-4	 */
+			tr_slot		: 8,	/* 5-12	 */
+			reserved1	: 19,	/* 13-31 */
+			tiv		: 1,	/* 32	 */
+			trigger		: 4,	/* 33-36 */
+			trigger_pl 	: 3,	/* 37-39 */
+			reserved2 	: 24;	/* 40-63 */
+	} err_struct_info_tlb;
+	struct {
+		u64	siv		: 1,	/* 0	 */
+			regfile_id	: 4,	/* 1-4	 */
+			reg_num		: 7,	/* 5-11	 */
+			reserved1	: 20,	/* 12-31 */
+			tiv		: 1,	/* 32	 */
+			trigger		: 4,	/* 33-36 */
+			trigger_pl 	: 3,	/* 37-39 */
+			reserved2 	: 24;	/* 40-63 */
+	} err_struct_info_register;
+	struct {
+		u64	reserved;
+	} err_struct_info_bus_processor_interconnect;
+	u64	err_struct_info;
+  } err_struct_info_t;
+
+  typedef union err_data_buffer_u {
+	struct {
+		u64	trigger_addr;		/* 0-63		*/
+		u64	inj_addr;		/* 64-127 	*/
+		u64	way		: 5,	/* 128-132	*/
+			index		: 20,	/* 133-152	*/
+					: 39;	/* 153-191	*/
+	} err_data_buffer_cache;
+	struct {
+		u64	trigger_addr;		/* 0-63		*/
+		u64	inj_addr;		/* 64-127 	*/
+		u64	way		: 5,	/* 128-132	*/
+			index		: 20,	/* 133-152	*/
+			reserved	: 39;	/* 153-191	*/
+	} err_data_buffer_tlb;
+	struct {
+		u64	trigger_addr;		/* 0-63		*/
+	} err_data_buffer_register;
+	struct {
+		u64	reserved;		/* 0-63		*/
+	} err_data_buffer_bus_processor_interconnect;
+	u64 err_data_buffer[ERR_DATA_BUFFER_SIZE];
+  } err_data_buffer_t;
+
+  typedef union capabilities_u {
+	struct {
+		u64	i		: 1,
+			d		: 1,
+			rv		: 1,
+			tag		: 1,
+			data		: 1,
+			mesi		: 1,
+			dp		: 1,
+			reserved1	: 3,
+			pa		: 1,
+			va		: 1,
+			wi		: 1,
+			reserved2	: 20,
+			trigger		: 1,
+			trigger_pl	: 1,
+			reserved3	: 30;
+	} capabilities_cache;
+	struct {
+		u64	d		: 1,
+			i		: 1,
+			rv		: 1,
+			tc		: 1,
+			tr		: 1,
+			reserved1	: 27,
+			trigger		: 1,
+			trigger_pl	: 1,
+			reserved2	: 30;
+	} capabilities_tlb;
+	struct {
+		u64	gr_b0		: 1,
+			gr_b1		: 1,
+			fr		: 1,
+			br		: 1,
+			pr		: 1,
+			ar		: 1,
+			cr		: 1,
+			rr		: 1,
+			pkr		: 1,
+			dbr		: 1,
+			ibr		: 1,
+			pmc		: 1,
+			pmd		: 1,
+			reserved1	: 3,
+			regnum		: 1,
+			reserved2	: 15,
+			trigger		: 1,
+			trigger_pl	: 1,
+			reserved3	: 30;
+	} capabilities_register;
+	struct {
+		u64	reserved;
+	} capabilities_bus_processor_interconnect;
+  } capabilities_t;
+
+  typedef struct resources_s {
+	u64	ibr0		: 1,
+		ibr2		: 1,
+		ibr4		: 1,
+		ibr6		: 1,
+		dbr0		: 1,
+		dbr2		: 1,
+		dbr4		: 1,
+		dbr6		: 1,
+		reserved	: 48;
+  } resources_t;
+
+
+  long get_page_size(void)
+  {
+	long page_size=sysconf(_SC_PAGESIZE);
+	return page_size;
+  }
+
+  #define PAGE_SIZE (get_page_size()==-1?0x4000:get_page_size())
+  #define SHM_SIZE (2*PAGE_SIZE*NR_CPUS)
+  #define SHM_VA 0x2000000100000000
+
+  int shmid;
+  void *shmaddr;
+
+  int create_shm(void)
+  {
+	key_t key;
+	char fn[MAX_FN_SIZE];
+
+	/* cpu0 is always existing */
+	sprintf(fn, PATH_FORMAT, 0);
+	if ((key = ftok(fn, 's')) == -1) {
+		perror("ftok");
+		return -1;
+	}
+
+	shmid = shmget(key, SHM_SIZE, 0644 | IPC_CREAT);
+	if (shmid == -1) {
+		if (errno==EEXIST) {
+			shmid = shmget(key, SHM_SIZE, 0);
+			if (shmid == -1) {
+				perror("shmget");
+				return -1;
+			}
+		}
+		else {
+			perror("shmget");
+			return -1;
+		}
+	}
+	vbprintf("shmid=%d", shmid);
+
+	/* connect to the segment: */
+	shmaddr = shmat(shmid, (void *)SHM_VA, 0);
+	if (shmaddr == (void*)-1) {
+		perror("shmat");
+		return -1;
+	}
+
+	memset(shmaddr, 0, SHM_SIZE);
+	mlock(shmaddr, SHM_SIZE);
+
+	return 0;
+  }
+
+  int free_shm()
+  {
+	munlock(shmaddr, SHM_SIZE);
+          shmdt(shmaddr);
+	semctl(shmid, 0, IPC_RMID);
+
+	return 0;
+  }
+
+  #ifdef _SEM_SEMUN_UNDEFINED
+  union semun
+  {
+	int val;
+	struct semid_ds *buf;
+	unsigned short int *array;
+	struct seminfo *__buf;
+  };
+  #endif
+
+  u32 mode=1; /* 1: physical mode; 2: virtual mode. */
+  int one_lock=1;
+  key_t key[NR_CPUS];
+  int semid[NR_CPUS];
+
+  int create_sem(int cpu)
+  {
+	union semun arg;
+	char fn[MAX_FN_SIZE];
+	int sid;
+
+	sprintf(fn, PATH_FORMAT, cpu);
+	sprintf(fn, "%s/%s", fn, "err_type_info");
+	if ((key[cpu] = ftok(fn, 'e')) == -1) {
+		perror("ftok");
+		return -1;
+	}
+
+	if (semid[cpu]!=0)
+		return 0;
+
+	/* clear old semaphore */
+	if ((sid = semget(key[cpu], 1, 0)) != -1)
+		semctl(sid, 0, IPC_RMID);
+
+	/* get one semaphore */
+	if ((semid[cpu] = semget(key[cpu], 1, IPC_CREAT | IPC_EXCL)) == -1) {
+		perror("semget");
+		printf("Please remove semaphore with key=0x%lx, then run the tool.\n",
+			(u64)key[cpu]);
+		return -1;
+	}
+
+	vbprintf("semid[%d]=0x%lx, key[%d]=%lx\n",cpu,(u64)semid[cpu],cpu,
+		(u64)key[cpu]);
+	/* initialize the semaphore to 1: */
+	arg.val = 1;
+	if (semctl(semid[cpu], 0, SETVAL, arg) == -1) {
+		perror("semctl");
+		return -1;
+	}
+
+	return 0;
+  }
+
+  static int lock(int cpu)
+  {
+	struct sembuf lock;
+
+	lock.sem_num = cpu;
+	lock.sem_op = 1;
+	semop(semid[cpu], &lock, 1);
+
+          return 0;
+  }
+
+  static int unlock(int cpu)
+  {
+	struct sembuf unlock;
+
+	unlock.sem_num = cpu;
+	unlock.sem_op = -1;
+	semop(semid[cpu], &unlock, 1);
+
+          return 0;
+  }
+
+  void free_sem(int cpu)
+  {
+	semctl(semid[cpu], 0, IPC_RMID);
+  }
+
+  int wr_multi(char *fn, unsigned long *data, int size)
+  {
+	int fd;
+	char buf[MAX_BUF_SIZE];
+	int ret;
+
+	if (size==1)
+		sprintf(buf, "%lx", *data);
+	else if (size==3)
+		sprintf(buf, "%lx,%lx,%lx", data[0], data[1], data[2]);
+	else {
+		fprintf(stderr,"write to file with wrong size!\n");
+		return -1;
+	}
+
+	fd=open(fn, O_RDWR);
+	if (!fd) {
+		perror("Error:");
+		return -1;
+	}
+	ret=write(fd, buf, sizeof(buf));
+	close(fd);
+	return ret;
+  }
+
+  int wr(char *fn, unsigned long data)
+  {
+	return wr_multi(fn, &data, 1);
+  }
+
+  int rd(char *fn, unsigned long *data)
+  {
+	int fd;
+	char buf[MAX_BUF_SIZE];
+
+	fd=open(fn, O_RDONLY);
+	if (fd<0) {
+		perror("Error:");
+		return -1;
+	}
+	read(fd, buf, MAX_BUF_SIZE);
+	*data=strtoul(buf, NULL, 16);
+	close(fd);
+	return 0;
+  }
+
+  int rd_status(char *path, int *status)
+  {
+	char fn[MAX_FN_SIZE];
+	sprintf(fn, "%s/status", path);
+	if (rd(fn, (u64*)status)<0) {
+		perror("status reading error.\n");
+		return -1;
+	}
+
+	return 0;
+  }
+
+  int rd_capabilities(char *path, u64 *capabilities)
+  {
+	char fn[MAX_FN_SIZE];
+	sprintf(fn, "%s/capabilities", path);
+	if (rd(fn, capabilities)<0) {
+		perror("capabilities reading error.\n");
+		return -1;
+	}
+
+	return 0;
+  }
+
+  int rd_all(char *path)
+  {
+	unsigned long err_type_info, err_struct_info, err_data_buffer;
+	int status;
+	unsigned long capabilities, resources;
+	char fn[MAX_FN_SIZE];
+
+	sprintf(fn, "%s/err_type_info", path);
+	if (rd(fn, &err_type_info)<0) {
+		perror("err_type_info reading error.\n");
+		return -1;
+	}
+	printf("err_type_info=%lx\n", err_type_info);
+
+	sprintf(fn, "%s/err_struct_info", path);
+	if (rd(fn, &err_struct_info)<0) {
+		perror("err_struct_info reading error.\n");
+		return -1;
+	}
+	printf("err_struct_info=%lx\n", err_struct_info);
+
+	sprintf(fn, "%s/err_data_buffer", path);
+	if (rd(fn, &err_data_buffer)<0) {
+		perror("err_data_buffer reading error.\n");
+		return -1;
+	}
+	printf("err_data_buffer=%lx\n", err_data_buffer);
+
+	sprintf(fn, "%s/status", path);
+	if (rd("status", (u64*)&status)<0) {
+		perror("status reading error.\n");
+		return -1;
+	}
+	printf("status=%d\n", status);
+
+	sprintf(fn, "%s/capabilities", path);
+	if (rd(fn,&capabilities)<0) {
+		perror("capabilities reading error.\n");
+		return -1;
+	}
+	printf("capabilities=%lx\n", capabilities);
+
+	sprintf(fn, "%s/resources", path);
+	if (rd(fn, &resources)<0) {
+		perror("resources reading error.\n");
+		return -1;
+	}
+	printf("resources=%lx\n", resources);
+
+	return 0;
+  }
+
+  int query_capabilities(char *path, err_type_info_t err_type_info,
+			u64 *capabilities)
+  {
+	char fn[MAX_FN_SIZE];
+	err_struct_info_t err_struct_info;
+	err_data_buffer_t err_data_buffer;
+
+	err_struct_info.err_struct_info=0;
+	memset(err_data_buffer.err_data_buffer, -1, ERR_DATA_BUFFER_SIZE*8);
+
+	sprintf(fn, "%s/err_type_info", path);
+	wr(fn, err_type_info.err_type_info);
+	sprintf(fn, "%s/err_struct_info", path);
+	wr(fn, 0x0);
+	sprintf(fn, "%s/err_data_buffer", path);
+	wr_multi(fn, err_data_buffer.err_data_buffer, ERR_DATA_BUFFER_SIZE);
+
+	// Fire pal_mc_error_inject procedure.
+	sprintf(fn, "%s/call_start", path);
+	wr(fn, mode);
+
+	if (rd_capabilities(path, capabilities)<0)
+		return -1;
+
+	return 0;
+  }
+
+  int query_all_capabilities()
+  {
+	int status;
+	err_type_info_t err_type_info;
+	int err_sev, err_struct, struct_hier;
+	int cap=0;
+	u64 capabilities;
+	char path[MAX_FN_SIZE];
+
+	err_type_info.err_type_info=0;			// Initial
+	err_type_info.err_type_info_u.mode=0;		// Query mode;
+	err_type_info.err_type_info_u.err_inj=0;
+
+	printf("All capabilities implemented in pal_mc_error_inject:\n");
+	sprintf(path, PATH_FORMAT ,0);
+	for (err_sev=0;err_sev<3;err_sev++)
+		for (err_struct=0;err_struct<5;err_struct++)
+			for (struct_hier=0;struct_hier<5;struct_hier++)
+	{
+		status=-1;
+		capabilities=0;
+		err_type_info.err_type_info_u.err_sev=err_sev;
+		err_type_info.err_type_info_u.err_struct=err_struct;
+		err_type_info.err_type_info_u.struct_hier=struct_hier;
+
+		if (query_capabilities(path, err_type_info, &capabilities)<0)
+			continue;
+
+		if (rd_status(path, &status)<0)
+			continue;
+
+		if (status==0) {
+			cap=1;
+			printf("For err_sev=%d, err_struct=%d, struct_hier=%d: ",
+				err_sev, err_struct, struct_hier);
+			printf("capabilities 0x%lx\n", capabilities);
+		}
+	}
+	if (!cap) {
+		printf("No capabilities supported.\n");
+		return 0;
+	}
+
+	return 0;
+  }
+
+  int err_inject(int cpu, char *path, err_type_info_t err_type_info,
+		err_struct_info_t err_struct_info,
+		err_data_buffer_t err_data_buffer)
+  {
+	int status;
+	char fn[MAX_FN_SIZE];
+
+	log_info(cpu, "err_type_info=%lx, err_struct_info=%lx, ",
+		err_type_info.err_type_info,
+		err_struct_info.err_struct_info);
+	log_info(cpu,"err_data_buffer=[%lx,%lx,%lx]\n",
+		err_data_buffer.err_data_buffer[0],
+		err_data_buffer.err_data_buffer[1],
+		err_data_buffer.err_data_buffer[2]);
+	sprintf(fn, "%s/err_type_info", path);
+	wr(fn, err_type_info.err_type_info);
+	sprintf(fn, "%s/err_struct_info", path);
+	wr(fn, err_struct_info.err_struct_info);
+	sprintf(fn, "%s/err_data_buffer", path);
+	wr_multi(fn, err_data_buffer.err_data_buffer, ERR_DATA_BUFFER_SIZE);
+
+	// Fire pal_mc_error_inject procedure.
+	sprintf(fn, "%s/call_start", path);
+	wr(fn,mode);
+
+	if (rd_status(path, &status)<0) {
+		vbprintf("fail: read status\n");
+		return -100;
+	}
+
+	if (status!=0) {
+		log_info(cpu, "fail: status=%d\n", status);
+		return status;
+	}
+
+	return status;
+  }
+
+  static int construct_data_buf(char *path, err_type_info_t err_type_info,
+		err_struct_info_t err_struct_info,
+		err_data_buffer_t *err_data_buffer,
+		void *va1)
+  {
+	char fn[MAX_FN_SIZE];
+	u64 virt_addr=0, phys_addr=0;
+
+	vbprintf("va1=%lx\n", (u64)va1);
+	memset(&err_data_buffer->err_data_buffer_cache, 0, ERR_DATA_BUFFER_SIZE*8);
+
+	switch (err_type_info.err_type_info_u.err_struct) {
+		case 1: // Cache
+			switch (err_struct_info.err_struct_info_cache.cl_id) {
+				case 1: //Virtual addr
+					err_data_buffer->err_data_buffer_cache.inj_addr=(u64)va1;
+					break;
+				case 2: //Phys addr
+					sprintf(fn, "%s/virtual_to_phys", path);
+					virt_addr=(u64)va1;
+					if (wr(fn,virt_addr)<0)
+						return -1;
+					rd(fn, &phys_addr);
+					err_data_buffer->err_data_buffer_cache.inj_addr=phys_addr;
+					break;
+				default:
+					printf("Not supported cl_id\n");
+					break;
+			}
+			break;
+		case 2: //  TLB
+			break;
+		case 3: //  Register file
+			break;
+		case 4: //  Bus/system interconnect
+		default:
+			printf("Not supported err_struct\n");
+			break;
+	}
+
+	return 0;
+  }
+
+  typedef struct {
+	u64 cpu;
+	u64 loop;
+	u64 interval;
+	u64 err_type_info;
+	u64 err_struct_info;
+	u64 err_data_buffer[ERR_DATA_BUFFER_SIZE];
+  } parameters_t;
+
+  parameters_t line_para;
+  int para;
+
+  static int empty_data_buffer(u64 *err_data_buffer)
+  {
+	int empty=1;
+	int i;
+
+	for (i=0;i<ERR_DATA_BUFFER_SIZE; i++)
+	   if (err_data_buffer[i]!=-1)
+		empty=0;
+
+	return empty;
+  }
+
+  int err_inj()
+  {
+	err_type_info_t err_type_info;
+	err_struct_info_t err_struct_info;
+	err_data_buffer_t err_data_buffer;
+	int count;
+	FILE *fp;
+	unsigned long cpu, loop, interval, err_type_info_conf, err_struct_info_conf;
+	u64 err_data_buffer_conf[ERR_DATA_BUFFER_SIZE];
+	int num;
+	int i;
+	char path[MAX_FN_SIZE];
+	parameters_t parameters[MAX_TASK_NUM]={};
+	pid_t child_pid[MAX_TASK_NUM];
+	time_t current_time;
+	int status;
+
+	if (!para) {
+	    fp=fopen("err.conf", "r");
+	    if (fp==NULL) {
+		perror("Error open err.conf");
+		return -1;
+	    }
+
+	    num=0;
+	    while (!feof(fp)) {
+		char buf[256];
+		memset(buf,0,256);
+		fgets(buf, 256, fp);
+		count=sscanf(buf, "%lx, %lx, %lx, %lx, %lx, %lx, %lx, %lx\n",
+				&cpu, &loop, &interval,&err_type_info_conf,
+				&err_struct_info_conf,
+				&err_data_buffer_conf[0],
+				&err_data_buffer_conf[1],
+				&err_data_buffer_conf[2]);
+		if (count!=PARA_FIELD_NUM+3) {
+			err_data_buffer_conf[0]=-1;
+			err_data_buffer_conf[1]=-1;
+			err_data_buffer_conf[2]=-1;
+			count=sscanf(buf, "%lx, %lx, %lx, %lx, %lx\n",
+				&cpu, &loop, &interval,&err_type_info_conf,
+				&err_struct_info_conf);
+			if (count!=PARA_FIELD_NUM)
+				continue;
+		}
+
+		parameters[num].cpu=cpu;
+		parameters[num].loop=loop;
+		parameters[num].interval= interval>MIN_INTERVAL
+					  ?interval:MIN_INTERVAL;
+		parameters[num].err_type_info=err_type_info_conf;
+		parameters[num].err_struct_info=err_struct_info_conf;
+		memcpy(parameters[num++].err_data_buffer,
+			err_data_buffer_conf,ERR_DATA_BUFFER_SIZE*8) ;
+
+		if (num>=MAX_TASK_NUM)
+			break;
+	    }
+	}
+	else {
+		parameters[0].cpu=line_para.cpu;
+		parameters[0].loop=line_para.loop;
+		parameters[0].interval= line_para.interval>MIN_INTERVAL
+					  ?line_para.interval:MIN_INTERVAL;
+		parameters[0].err_type_info=line_para.err_type_info;
+		parameters[0].err_struct_info=line_para.err_struct_info;
+		memcpy(parameters[0].err_data_buffer,
+			line_para.err_data_buffer,ERR_DATA_BUFFER_SIZE*8) ;
+
+		num=1;
+	}
+
+	/* Create semaphore: If one_lock, one semaphore for all processors.
+	   Otherwise, one semaphore for each processor. */
+	if (one_lock) {
+		if (create_sem(0)) {
+			printf("Can not create semaphore...exit\n");
+			free_sem(0);
+			return -1;
+		}
+	}
+	else {
+		for (i=0;i<num;i++) {
+		   if (create_sem(parameters[i].cpu)) {
+			printf("Can not create semaphore for cpu%d...exit\n",i);
+			free_sem(parameters[num].cpu);
+			return -1;
+		   }
+		}
+	}
+
+	/* Create a shm segment which will be used to inject/consume errors on.*/
+	if (create_shm()==-1) {
+		printf("Error to create shm...exit\n");
+		return -1;
+	}
+
+	for (i=0;i<num;i++) {
+		pid_t pid;
+
+		current_time=time(NULL);
+		log_info(parameters[i].cpu, "\nBegine at %s", ctime(&current_time));
+		log_info(parameters[i].cpu, "Configurations:\n");
+		log_info(parameters[i].cpu,"On cpu%ld: loop=%lx, interval=%lx(s)",
+			parameters[i].cpu,
+			parameters[i].loop,
+			parameters[i].interval);
+		log_info(parameters[i].cpu," err_type_info=%lx,err_struct_info=%lx\n",
+			parameters[i].err_type_info,
+			parameters[i].err_struct_info);
+
+		sprintf(path, PATH_FORMAT, (int)parameters[i].cpu);
+		err_type_info.err_type_info=parameters[i].err_type_info;
+		err_struct_info.err_struct_info=parameters[i].err_struct_info;
+		memcpy(err_data_buffer.err_data_buffer,
+			parameters[i].err_data_buffer,
+			ERR_DATA_BUFFER_SIZE*8);
+
+		pid=fork();
+		if (pid==0) {
+			unsigned long mask[MASK_SIZE];
+			int j, k;
+
+			void *va1, *va2;
+
+			/* Allocate two memory areas va1 and va2 in shm */
+			va1=shmaddr+parameters[i].cpu*PAGE_SIZE;
+			va2=shmaddr+parameters[i].cpu*PAGE_SIZE+PAGE_SIZE;
+
+			vbprintf("va1=%lx, va2=%lx\n", (u64)va1, (u64)va2);
+			memset(va1, 0x1, PAGE_SIZE);
+			memset(va2, 0x2, PAGE_SIZE);
+
+			if (empty_data_buffer(err_data_buffer.err_data_buffer))
+				/* If not specified yet, construct data buffer
+				 * with va1
+				 */
+				construct_data_buf(path, err_type_info,
+					err_struct_info, &err_data_buffer,va1);
+
+			for (j=0;j<MASK_SIZE;j++)
+				mask[j]=0;
+
+			cpu=parameters[i].cpu;
+			k = cpu%64;
+			j = cpu/64;
+			mask[j] = 1UL << k;
+
+			if (sched_setaffinity(0, MASK_SIZE*8, mask)==-1) {
+				perror("Error sched_setaffinity:");
+				return -1;
+			}
+
+			for (j=0; j<parameters[i].loop; j++) {
+				log_info(parameters[i].cpu,"Injection ");
+				log_info(parameters[i].cpu,"on cpu%ld: #%d/%ld ",
+
+					parameters[i].cpu,j+1, parameters[i].loop);
+
+				/* Hold the lock */
+				if (one_lock)
+					lock(0);
+				else
+				/* Hold lock on this cpu */
+					lock(parameters[i].cpu);
+
+				if ((status=err_inject(parameters[i].cpu,
+					   path, err_type_info,
+					   err_struct_info, err_data_buffer))
+					   ==0) {
+					/* consume the error for "inject only"*/
+					memcpy(va2, va1, PAGE_SIZE);
+					memcpy(va1, va2, PAGE_SIZE);
+					log_info(parameters[i].cpu,
+						"successful\n");
+				}
+				else {
+					log_info(parameters[i].cpu,"fail:");
+					log_info(parameters[i].cpu,
+						"status=%d\n", status);
+					unlock(parameters[i].cpu);
+					break;
+				}
+				if (one_lock)
+				/* Release the lock */
+					unlock(0);
+				/* Release lock on this cpu */
+				else
+					unlock(parameters[i].cpu);
+
+				if (j < parameters[i].loop-1)
+					sleep(parameters[i].interval);
+			}
+			current_time=time(NULL);
+			log_info(parameters[i].cpu, "Done at %s", ctime(&current_time));
+			return 0;
+		}
+		else if (pid<0) {
+			perror("Error fork:");
+			continue;
+		}
+		child_pid[i]=pid;
+	}
+	for (i=0;i<num;i++)
+		waitpid(child_pid[i], NULL, 0);
+
+	if (one_lock)
+		free_sem(0);
+	else
+		for (i=0;i<num;i++)
+			free_sem(parameters[i].cpu);
+
+	printf("All done.\n");
+
+	return 0;
+  }
+
+  void help()
+  {
+	printf("err_inject_tool:\n");
+	printf("\t-q: query all capabilities. default: off\n");
+	printf("\t-m: procedure mode. 1: physical 2: virtual. default: 1\n");
+	printf("\t-i: inject errors. default: off\n");
+	printf("\t-l: one lock per cpu. default: one lock for all\n");
+	printf("\t-e: error parameters:\n");
+	printf("\t\tcpu,loop,interval,err_type_info,err_struct_info[,err_data_buffer[0],err_data_buffer[1],err_data_buffer[2]]\n");
+	printf("\t\t   cpu: logical cpu number the error will be inject in.\n");
+	printf("\t\t   loop: times the error will be injected.\n");
+	printf("\t\t   interval: In second. every so often one error is injected.\n");
+	printf("\t\t   err_type_info, err_struct_info: PAL parameters.\n");
+	printf("\t\t   err_data_buffer: PAL parameter. Optional. If not present,\n");
+	printf("\t\t                    it's constructed by tool automatically. Be\n");
+	printf("\t\t                    careful to provide err_data_buffer and make\n");
+	printf("\t\t                    sure it's working with the environment.\n");
+	printf("\t    Note:no space between error parameters.\n");
+	printf("\t    default: Take error parameters from err.conf instead of command line.\n");
+	printf("\t-v: verbose. default: off\n");
+	printf("\t-h: help\n\n");
+	printf("The tool will take err.conf file as ");
+	printf("input to inject single or multiple errors ");
+	printf("on one or multiple cpus in parallel.\n");
+  }
+
+  int main(int argc, char **argv)
+  {
+	char c;
+	int do_err_inj=0;
+	int do_query_all=0;
+	int count;
+	u32 m;
+
+	/* Default one lock for all cpu's */
+	one_lock=1;
+	while ((c = getopt(argc, argv, "m:iqvhle:")) != EOF)
+		switch (c) {
+			case 'm':	/* Procedure mode. 1: phys 2: virt */
+				count=sscanf(optarg, "%x", &m);
+				if (count!=1 || (m!=1 && m!=2)) {
+					printf("Wrong mode number.\n");
+					help();
+					return -1;
+				}
+				mode=m;
+				break;
+			case 'i':	/* Inject errors */
+				do_err_inj=1;
+				break;
+			case 'q':	/* Query */
+				do_query_all=1;
+				break;
+			case 'v':	/* Verbose */
+				verbose=1;
+				break;
+			case 'l':	/* One lock per cpu */
+				one_lock=0;
+				break;
+			case 'e':	/* error arguments */
+				/* Take parameters:
+				 * #cpu, loop, interval, err_type_info, err_struct_info[, err_data_buffer]
+				 * err_data_buffer is optional. Recommend not to specify
+				 * err_data_buffer. Better to use tool to generate it.
+				 */
+				count=sscanf(optarg,
+					"%lx, %lx, %lx, %lx, %lx, %lx, %lx, %lx\n",
+					&line_para.cpu,
+					&line_para.loop,
+					&line_para.interval,
+					&line_para.err_type_info,
+					&line_para.err_struct_info,
+					&line_para.err_data_buffer[0],
+					&line_para.err_data_buffer[1],
+					&line_para.err_data_buffer[2]);
+				if (count!=PARA_FIELD_NUM+3) {
+				    line_para.err_data_buffer[0]=-1,
+				    line_para.err_data_buffer[1]=-1,
+				    line_para.err_data_buffer[2]=-1;
+				    count=sscanf(optarg, "%lx, %lx, %lx, %lx, %lx\n",
+						&line_para.cpu,
+						&line_para.loop,
+						&line_para.interval,
+						&line_para.err_type_info,
+						&line_para.err_struct_info);
+				    if (count!=PARA_FIELD_NUM) {
+					printf("Wrong error arguments.\n");
+					help();
+					return -1;
+				    }
+				}
+				para=1;
+				break;
+			continue;
+				break;
+			case 'h':
+				help();
+				return 0;
+			default:
+				break;
+		}
+
+	if (do_query_all)
+		query_all_capabilities();
+	if (do_err_inj)
+		err_inj();
+
+	if (!do_query_all &&  !do_err_inj)
+		help();
+
+	return 0;
+  }
diff --git a/Documentation/ia64/err_inject.txt b/Documentation/ia64/err_inject.txt
deleted file mode 100644
index 9f651c181429..000000000000
--- a/Documentation/ia64/err_inject.txt
+++ /dev/null
@@ -1,1068 +0,0 @@
-
-IPF Machine Check (MC) error inject tool
-========================================
-
-IPF Machine Check (MC) error inject tool is used to inject MC
-errors from Linux. The tool is a test bed for IPF MC work flow including
-hardware correctable error handling, OS recoverable error handling, MC
-event logging, etc.
-
-The tool includes two parts: a kernel driver and a user application
-sample. The driver provides interface to PAL to inject error
-and query error injection capabilities. The driver code is in
-arch/ia64/kernel/err_inject.c. The application sample (shown below)
-provides a combination of various errors and calls the driver's interface
-(sysfs interface) to inject errors or query error injection capabilities.
-
-The tool can be used to test Intel IPF machine MC handling capabilities.
-It's especially useful for people who can not access hardware MC injection
-tool to inject error. It's also very useful to integrate with other
-software test suits to do stressful testing on IPF.
-
-Below is a sample application as part of the whole tool. The sample
-can be used as a working test tool. Or it can be expanded to include
-more features. It also can be a integrated into a library or other user
-application to have more thorough test.
-
-The sample application takes err.conf as error configuration input. GCC
-compiles the code. After you install err_inject driver, you can run
-this sample application to inject errors.
-
-Errata: Itanium 2 Processors Specification Update lists some errata against
-the pal_mc_error_inject PAL procedure. The following err.conf has been tested
-on latest Montecito PAL.
-
-err.conf:
-
-#This is configuration file for err_inject_tool.
-#The format of the each line is:
-#cpu, loop, interval, err_type_info, err_struct_info, err_data_buffer
-#where
-#	cpu: logical cpu number the error will be inject in.
-#	loop: times the error will be injected.
-#	interval: In second. every so often one error is injected.
-#	err_type_info, err_struct_info: PAL parameters.
-#
-#Note: All values are hex w/o or w/ 0x prefix.
-
-
-#On cpu2, inject only total 0x10 errors, interval 5 seconds
-#corrected, data cache, hier-2, physical addr(assigned by tool code).
-#working on Montecito latest PAL.
-2, 10, 5, 4101, 95
-
-#On cpu4, inject and consume total 0x10 errors, interval 5 seconds
-#corrected, data cache, hier-2, physical addr(assigned by tool code).
-#working on Montecito latest PAL.
-4, 10, 5, 4109, 95
-
-#On cpu15, inject and consume total 0x10 errors, interval 5 seconds
-#recoverable, DTR0, hier-2.
-#working on Montecito latest PAL.
-0xf, 0x10, 5, 4249, 15
-
-The sample application source code:
-
-err_injection_tool.c:
-
-/*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT.  See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Copyright (C) 2006 Intel Co
- *	Fenghua Yu <fenghua.yu@intel.com>
- *
- */
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <sched.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdarg.h>
-#include <string.h>
-#include <errno.h>
-#include <time.h>
-#include <sys/ipc.h>
-#include <sys/sem.h>
-#include <sys/wait.h>
-#include <sys/mman.h>
-#include <sys/shm.h>
-
-#define MAX_FN_SIZE 		256
-#define MAX_BUF_SIZE 		256
-#define DATA_BUF_SIZE 		256
-#define NR_CPUS 		512
-#define MAX_TASK_NUM		2048
-#define MIN_INTERVAL		5	// seconds
-#define	ERR_DATA_BUFFER_SIZE 	3	// Three 8-byte.
-#define PARA_FIELD_NUM		5
-#define MASK_SIZE		(NR_CPUS/64)
-#define PATH_FORMAT "/sys/devices/system/cpu/cpu%d/err_inject/"
-
-int sched_setaffinity(pid_t pid, unsigned int len, unsigned long *mask);
-
-int verbose;
-#define vbprintf if (verbose) printf
-
-int log_info(int cpu, const char *fmt, ...)
-{
-	FILE *log;
-	char fn[MAX_FN_SIZE];
-	char buf[MAX_BUF_SIZE];
-	va_list args;
-
-	sprintf(fn, "%d.log", cpu);
-	log=fopen(fn, "a+");
-	if (log==NULL) {
-		perror("Error open:");
-		return -1;
-	}
-
-	va_start(args, fmt);
-	vprintf(fmt, args);
-	memset(buf, 0, MAX_BUF_SIZE);
-	vsprintf(buf, fmt, args);
-	va_end(args);
-
-	fwrite(buf, sizeof(buf), 1, log);
-	fclose(log);
-
-	return 0;
-}
-
-typedef unsigned long u64;
-typedef unsigned int  u32;
-
-typedef union err_type_info_u {
-	struct {
-		u64	mode		: 3,	/* 0-2 */
-			err_inj		: 3,	/* 3-5 */
-			err_sev		: 2,	/* 6-7 */
-			err_struct	: 5,	/* 8-12 */
-			struct_hier	: 3,	/* 13-15 */
-			reserved	: 48;	/* 16-63 */
-	} err_type_info_u;
-	u64	err_type_info;
-} err_type_info_t;
-
-typedef union err_struct_info_u {
-	struct {
-		u64	siv		: 1,	/* 0	 */
-			c_t		: 2,	/* 1-2	 */
-			cl_p		: 3,	/* 3-5	 */
-			cl_id		: 3,	/* 6-8	 */
-			cl_dp		: 1,	/* 9	 */
-			reserved1	: 22,	/* 10-31 */
-			tiv		: 1,	/* 32	 */
-			trigger		: 4,	/* 33-36 */
-			trigger_pl 	: 3,	/* 37-39 */
-			reserved2 	: 24;	/* 40-63 */
-	} err_struct_info_cache;
-	struct {
-		u64	siv		: 1,	/* 0	 */
-			tt		: 2,	/* 1-2	 */
-			tc_tr		: 2,	/* 3-4	 */
-			tr_slot		: 8,	/* 5-12	 */
-			reserved1	: 19,	/* 13-31 */
-			tiv		: 1,	/* 32	 */
-			trigger		: 4,	/* 33-36 */
-			trigger_pl 	: 3,	/* 37-39 */
-			reserved2 	: 24;	/* 40-63 */
-	} err_struct_info_tlb;
-	struct {
-		u64	siv		: 1,	/* 0	 */
-			regfile_id	: 4,	/* 1-4	 */
-			reg_num		: 7,	/* 5-11	 */
-			reserved1	: 20,	/* 12-31 */
-			tiv		: 1,	/* 32	 */
-			trigger		: 4,	/* 33-36 */
-			trigger_pl 	: 3,	/* 37-39 */
-			reserved2 	: 24;	/* 40-63 */
-	} err_struct_info_register;
-	struct {
-		u64	reserved;
-	} err_struct_info_bus_processor_interconnect;
-	u64	err_struct_info;
-} err_struct_info_t;
-
-typedef union err_data_buffer_u {
-	struct {
-		u64	trigger_addr;		/* 0-63		*/
-		u64	inj_addr;		/* 64-127 	*/
-		u64	way		: 5,	/* 128-132	*/
-			index		: 20,	/* 133-152	*/
-					: 39;	/* 153-191	*/
-	} err_data_buffer_cache;
-	struct {
-		u64	trigger_addr;		/* 0-63		*/
-		u64	inj_addr;		/* 64-127 	*/
-		u64	way		: 5,	/* 128-132	*/
-			index		: 20,	/* 133-152	*/
-			reserved	: 39;	/* 153-191	*/
-	} err_data_buffer_tlb;
-	struct {
-		u64	trigger_addr;		/* 0-63		*/
-	} err_data_buffer_register;
-	struct {
-		u64	reserved;		/* 0-63		*/
-	} err_data_buffer_bus_processor_interconnect;
-	u64 err_data_buffer[ERR_DATA_BUFFER_SIZE];
-} err_data_buffer_t;
-
-typedef union capabilities_u {
-	struct {
-		u64	i		: 1,
-			d		: 1,
-			rv		: 1,
-			tag		: 1,
-			data		: 1,
-			mesi		: 1,
-			dp		: 1,
-			reserved1	: 3,
-			pa		: 1,
-			va		: 1,
-			wi		: 1,
-			reserved2	: 20,
-			trigger		: 1,
-			trigger_pl	: 1,
-			reserved3	: 30;
-	} capabilities_cache;
-	struct {
-		u64	d		: 1,
-			i		: 1,
-			rv		: 1,
-			tc		: 1,
-			tr		: 1,
-			reserved1	: 27,
-			trigger		: 1,
-			trigger_pl	: 1,
-			reserved2	: 30;
-	} capabilities_tlb;
-	struct {
-		u64	gr_b0		: 1,
-			gr_b1		: 1,
-			fr		: 1,
-			br		: 1,
-			pr		: 1,
-			ar		: 1,
-			cr		: 1,
-			rr		: 1,
-			pkr		: 1,
-			dbr		: 1,
-			ibr		: 1,
-			pmc		: 1,
-			pmd		: 1,
-			reserved1	: 3,
-			regnum		: 1,
-			reserved2	: 15,
-			trigger		: 1,
-			trigger_pl	: 1,
-			reserved3	: 30;
-	} capabilities_register;
-	struct {
-		u64	reserved;
-	} capabilities_bus_processor_interconnect;
-} capabilities_t;
-
-typedef struct resources_s {
-	u64	ibr0		: 1,
-		ibr2		: 1,
-		ibr4		: 1,
-		ibr6		: 1,
-		dbr0		: 1,
-		dbr2		: 1,
-		dbr4		: 1,
-		dbr6		: 1,
-		reserved	: 48;
-} resources_t;
-
-
-long get_page_size(void)
-{
-	long page_size=sysconf(_SC_PAGESIZE);
-	return page_size;
-}
-
-#define PAGE_SIZE (get_page_size()==-1?0x4000:get_page_size())
-#define SHM_SIZE (2*PAGE_SIZE*NR_CPUS)
-#define SHM_VA 0x2000000100000000
-
-int shmid;
-void *shmaddr;
-
-int create_shm(void)
-{
-	key_t key;
-	char fn[MAX_FN_SIZE];
-
-	/* cpu0 is always existing */
-	sprintf(fn, PATH_FORMAT, 0);
-	if ((key = ftok(fn, 's')) == -1) {
-		perror("ftok");
-		return -1;
-	}
-
-	shmid = shmget(key, SHM_SIZE, 0644 | IPC_CREAT);
-	if (shmid == -1) {
-		if (errno==EEXIST) {
-			shmid = shmget(key, SHM_SIZE, 0);
-			if (shmid == -1) {
-				perror("shmget");
-				return -1;
-			}
-		}
-		else {
-			perror("shmget");
-			return -1;
-		}
-	}
-	vbprintf("shmid=%d", shmid);
-
-	/* connect to the segment: */
-	shmaddr = shmat(shmid, (void *)SHM_VA, 0);
-	if (shmaddr == (void*)-1) {
-		perror("shmat");
-		return -1;
-	}
-
-	memset(shmaddr, 0, SHM_SIZE);
-	mlock(shmaddr, SHM_SIZE);
-
-	return 0;
-}
-
-int free_shm()
-{
-	munlock(shmaddr, SHM_SIZE);
-        shmdt(shmaddr);
-	semctl(shmid, 0, IPC_RMID);
-
-	return 0;
-}
-
-#ifdef _SEM_SEMUN_UNDEFINED
-union semun
-{
-	int val;
-	struct semid_ds *buf;
-	unsigned short int *array;
-	struct seminfo *__buf;
-};
-#endif
-
-u32 mode=1; /* 1: physical mode; 2: virtual mode. */
-int one_lock=1;
-key_t key[NR_CPUS];
-int semid[NR_CPUS];
-
-int create_sem(int cpu)
-{
-	union semun arg;
-	char fn[MAX_FN_SIZE];
-	int sid;
-
-	sprintf(fn, PATH_FORMAT, cpu);
-	sprintf(fn, "%s/%s", fn, "err_type_info");
-	if ((key[cpu] = ftok(fn, 'e')) == -1) {
-		perror("ftok");
-		return -1;
-	}
-
-	if (semid[cpu]!=0)
-		return 0;
-
-	/* clear old semaphore */
-	if ((sid = semget(key[cpu], 1, 0)) != -1)
-		semctl(sid, 0, IPC_RMID);
-
-	/* get one semaphore */
-	if ((semid[cpu] = semget(key[cpu], 1, IPC_CREAT | IPC_EXCL)) == -1) {
-		perror("semget");
-		printf("Please remove semaphore with key=0x%lx, then run the tool.\n",
-			(u64)key[cpu]);
-		return -1;
-	}
-
-	vbprintf("semid[%d]=0x%lx, key[%d]=%lx\n",cpu,(u64)semid[cpu],cpu,
-		(u64)key[cpu]);
-	/* initialize the semaphore to 1: */
-	arg.val = 1;
-	if (semctl(semid[cpu], 0, SETVAL, arg) == -1) {
-		perror("semctl");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int lock(int cpu)
-{
-	struct sembuf lock;
-
-	lock.sem_num = cpu;
-	lock.sem_op = 1;
-	semop(semid[cpu], &lock, 1);
-
-        return 0;
-}
-
-static int unlock(int cpu)
-{
-	struct sembuf unlock;
-
-	unlock.sem_num = cpu;
-	unlock.sem_op = -1;
-	semop(semid[cpu], &unlock, 1);
-
-        return 0;
-}
-
-void free_sem(int cpu)
-{
-	semctl(semid[cpu], 0, IPC_RMID);
-}
-
-int wr_multi(char *fn, unsigned long *data, int size)
-{
-	int fd;
-	char buf[MAX_BUF_SIZE];
-	int ret;
-
-	if (size==1)
-		sprintf(buf, "%lx", *data);
-	else if (size==3)
-		sprintf(buf, "%lx,%lx,%lx", data[0], data[1], data[2]);
-	else {
-		fprintf(stderr,"write to file with wrong size!\n");
-		return -1;
-	}
-
-	fd=open(fn, O_RDWR);
-	if (!fd) {
-		perror("Error:");
-		return -1;
-	}
-	ret=write(fd, buf, sizeof(buf));
-	close(fd);
-	return ret;
-}
-
-int wr(char *fn, unsigned long data)
-{
-	return wr_multi(fn, &data, 1);
-}
-
-int rd(char *fn, unsigned long *data)
-{
-	int fd;
-	char buf[MAX_BUF_SIZE];
-
-	fd=open(fn, O_RDONLY);
-	if (fd<0) {
-		perror("Error:");
-		return -1;
-	}
-	read(fd, buf, MAX_BUF_SIZE);
-	*data=strtoul(buf, NULL, 16);
-	close(fd);
-	return 0;
-}
-
-int rd_status(char *path, int *status)
-{
-	char fn[MAX_FN_SIZE];
-	sprintf(fn, "%s/status", path);
-	if (rd(fn, (u64*)status)<0) {
-		perror("status reading error.\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-int rd_capabilities(char *path, u64 *capabilities)
-{
-	char fn[MAX_FN_SIZE];
-	sprintf(fn, "%s/capabilities", path);
-	if (rd(fn, capabilities)<0) {
-		perror("capabilities reading error.\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-int rd_all(char *path)
-{
-	unsigned long err_type_info, err_struct_info, err_data_buffer;
-	int status;
-	unsigned long capabilities, resources;
-	char fn[MAX_FN_SIZE];
-
-	sprintf(fn, "%s/err_type_info", path);
-	if (rd(fn, &err_type_info)<0) {
-		perror("err_type_info reading error.\n");
-		return -1;
-	}
-	printf("err_type_info=%lx\n", err_type_info);
-
-	sprintf(fn, "%s/err_struct_info", path);
-	if (rd(fn, &err_struct_info)<0) {
-		perror("err_struct_info reading error.\n");
-		return -1;
-	}
-	printf("err_struct_info=%lx\n", err_struct_info);
-
-	sprintf(fn, "%s/err_data_buffer", path);
-	if (rd(fn, &err_data_buffer)<0) {
-		perror("err_data_buffer reading error.\n");
-		return -1;
-	}
-	printf("err_data_buffer=%lx\n", err_data_buffer);
-
-	sprintf(fn, "%s/status", path);
-	if (rd("status", (u64*)&status)<0) {
-		perror("status reading error.\n");
-		return -1;
-	}
-	printf("status=%d\n", status);
-
-	sprintf(fn, "%s/capabilities", path);
-	if (rd(fn,&capabilities)<0) {
-		perror("capabilities reading error.\n");
-		return -1;
-	}
-	printf("capabilities=%lx\n", capabilities);
-
-	sprintf(fn, "%s/resources", path);
-	if (rd(fn, &resources)<0) {
-		perror("resources reading error.\n");
-		return -1;
-	}
-	printf("resources=%lx\n", resources);
-
-	return 0;
-}
-
-int query_capabilities(char *path, err_type_info_t err_type_info,
-			u64 *capabilities)
-{
-	char fn[MAX_FN_SIZE];
-	err_struct_info_t err_struct_info;
-	err_data_buffer_t err_data_buffer;
-
-	err_struct_info.err_struct_info=0;
-	memset(err_data_buffer.err_data_buffer, -1, ERR_DATA_BUFFER_SIZE*8);
-
-	sprintf(fn, "%s/err_type_info", path);
-	wr(fn, err_type_info.err_type_info);
-	sprintf(fn, "%s/err_struct_info", path);
-	wr(fn, 0x0);
-	sprintf(fn, "%s/err_data_buffer", path);
-	wr_multi(fn, err_data_buffer.err_data_buffer, ERR_DATA_BUFFER_SIZE);
-
-	// Fire pal_mc_error_inject procedure.
-	sprintf(fn, "%s/call_start", path);
-	wr(fn, mode);
-
-	if (rd_capabilities(path, capabilities)<0)
-		return -1;
-
-	return 0;
-}
-
-int query_all_capabilities()
-{
-	int status;
-	err_type_info_t err_type_info;
-	int err_sev, err_struct, struct_hier;
-	int cap=0;
-	u64 capabilities;
-	char path[MAX_FN_SIZE];
-
-	err_type_info.err_type_info=0;			// Initial
-	err_type_info.err_type_info_u.mode=0;		// Query mode;
-	err_type_info.err_type_info_u.err_inj=0;
-
-	printf("All capabilities implemented in pal_mc_error_inject:\n");
-	sprintf(path, PATH_FORMAT ,0);
-	for (err_sev=0;err_sev<3;err_sev++)
-		for (err_struct=0;err_struct<5;err_struct++)
-			for (struct_hier=0;struct_hier<5;struct_hier++)
-	{
-		status=-1;
-		capabilities=0;
-		err_type_info.err_type_info_u.err_sev=err_sev;
-		err_type_info.err_type_info_u.err_struct=err_struct;
-		err_type_info.err_type_info_u.struct_hier=struct_hier;
-
-		if (query_capabilities(path, err_type_info, &capabilities)<0)
-			continue;
-
-		if (rd_status(path, &status)<0)
-			continue;
-
-		if (status==0) {
-			cap=1;
-			printf("For err_sev=%d, err_struct=%d, struct_hier=%d: ",
-				err_sev, err_struct, struct_hier);
-			printf("capabilities 0x%lx\n", capabilities);
-		}
-	}
-	if (!cap) {
-		printf("No capabilities supported.\n");
-		return 0;
-	}
-
-	return 0;
-}
-
-int err_inject(int cpu, char *path, err_type_info_t err_type_info,
-		err_struct_info_t err_struct_info,
-		err_data_buffer_t err_data_buffer)
-{
-	int status;
-	char fn[MAX_FN_SIZE];
-
-	log_info(cpu, "err_type_info=%lx, err_struct_info=%lx, ",
-		err_type_info.err_type_info,
-		err_struct_info.err_struct_info);
-	log_info(cpu,"err_data_buffer=[%lx,%lx,%lx]\n",
-		err_data_buffer.err_data_buffer[0],
-		err_data_buffer.err_data_buffer[1],
-		err_data_buffer.err_data_buffer[2]);
-	sprintf(fn, "%s/err_type_info", path);
-	wr(fn, err_type_info.err_type_info);
-	sprintf(fn, "%s/err_struct_info", path);
-	wr(fn, err_struct_info.err_struct_info);
-	sprintf(fn, "%s/err_data_buffer", path);
-	wr_multi(fn, err_data_buffer.err_data_buffer, ERR_DATA_BUFFER_SIZE);
-
-	// Fire pal_mc_error_inject procedure.
-	sprintf(fn, "%s/call_start", path);
-	wr(fn,mode);
-
-	if (rd_status(path, &status)<0) {
-		vbprintf("fail: read status\n");
-		return -100;
-	}
-
-	if (status!=0) {
-		log_info(cpu, "fail: status=%d\n", status);
-		return status;
-	}
-
-	return status;
-}
-
-static int construct_data_buf(char *path, err_type_info_t err_type_info,
-		err_struct_info_t err_struct_info,
-		err_data_buffer_t *err_data_buffer,
-		void *va1)
-{
-	char fn[MAX_FN_SIZE];
-	u64 virt_addr=0, phys_addr=0;
-
-	vbprintf("va1=%lx\n", (u64)va1);
-	memset(&err_data_buffer->err_data_buffer_cache, 0, ERR_DATA_BUFFER_SIZE*8);
-
-	switch (err_type_info.err_type_info_u.err_struct) {
-		case 1: // Cache
-			switch (err_struct_info.err_struct_info_cache.cl_id) {
-				case 1: //Virtual addr
-					err_data_buffer->err_data_buffer_cache.inj_addr=(u64)va1;
-					break;
-				case 2: //Phys addr
-					sprintf(fn, "%s/virtual_to_phys", path);
-					virt_addr=(u64)va1;
-					if (wr(fn,virt_addr)<0)
-						return -1;
-					rd(fn, &phys_addr);
-					err_data_buffer->err_data_buffer_cache.inj_addr=phys_addr;
-					break;
-				default:
-					printf("Not supported cl_id\n");
-					break;
-			}
-			break;
-		case 2: //  TLB
-			break;
-		case 3: //  Register file
-			break;
-		case 4: //  Bus/system interconnect
-		default:
-			printf("Not supported err_struct\n");
-			break;
-	}
-
-	return 0;
-}
-
-typedef struct {
-	u64 cpu;
-	u64 loop;
-	u64 interval;
-	u64 err_type_info;
-	u64 err_struct_info;
-	u64 err_data_buffer[ERR_DATA_BUFFER_SIZE];
-} parameters_t;
-
-parameters_t line_para;
-int para;
-
-static int empty_data_buffer(u64 *err_data_buffer)
-{
-	int empty=1;
-	int i;
-
-	for (i=0;i<ERR_DATA_BUFFER_SIZE; i++)
-	   if (err_data_buffer[i]!=-1)
-		empty=0;
-
-	return empty;
-}
-
-int err_inj()
-{
-	err_type_info_t err_type_info;
-	err_struct_info_t err_struct_info;
-	err_data_buffer_t err_data_buffer;
-	int count;
-	FILE *fp;
-	unsigned long cpu, loop, interval, err_type_info_conf, err_struct_info_conf;
-	u64 err_data_buffer_conf[ERR_DATA_BUFFER_SIZE];
-	int num;
-	int i;
-	char path[MAX_FN_SIZE];
-	parameters_t parameters[MAX_TASK_NUM]={};
-	pid_t child_pid[MAX_TASK_NUM];
-	time_t current_time;
-	int status;
-
-	if (!para) {
-	    fp=fopen("err.conf", "r");
-	    if (fp==NULL) {
-		perror("Error open err.conf");
-		return -1;
-	    }
-
-	    num=0;
-	    while (!feof(fp)) {
-		char buf[256];
-		memset(buf,0,256);
-		fgets(buf, 256, fp);
-		count=sscanf(buf, "%lx, %lx, %lx, %lx, %lx, %lx, %lx, %lx\n",
-				&cpu, &loop, &interval,&err_type_info_conf,
-				&err_struct_info_conf,
-				&err_data_buffer_conf[0],
-				&err_data_buffer_conf[1],
-				&err_data_buffer_conf[2]);
-		if (count!=PARA_FIELD_NUM+3) {
-			err_data_buffer_conf[0]=-1;
-			err_data_buffer_conf[1]=-1;
-			err_data_buffer_conf[2]=-1;
-			count=sscanf(buf, "%lx, %lx, %lx, %lx, %lx\n",
-				&cpu, &loop, &interval,&err_type_info_conf,
-				&err_struct_info_conf);
-			if (count!=PARA_FIELD_NUM)
-				continue;
-		}
-
-		parameters[num].cpu=cpu;
-		parameters[num].loop=loop;
-		parameters[num].interval= interval>MIN_INTERVAL
-					  ?interval:MIN_INTERVAL;
-		parameters[num].err_type_info=err_type_info_conf;
-		parameters[num].err_struct_info=err_struct_info_conf;
-		memcpy(parameters[num++].err_data_buffer,
-			err_data_buffer_conf,ERR_DATA_BUFFER_SIZE*8) ;
-
-		if (num>=MAX_TASK_NUM)
-			break;
-	    }
-	}
-	else {
-		parameters[0].cpu=line_para.cpu;
-		parameters[0].loop=line_para.loop;
-		parameters[0].interval= line_para.interval>MIN_INTERVAL
-					  ?line_para.interval:MIN_INTERVAL;
-		parameters[0].err_type_info=line_para.err_type_info;
-		parameters[0].err_struct_info=line_para.err_struct_info;
-		memcpy(parameters[0].err_data_buffer,
-			line_para.err_data_buffer,ERR_DATA_BUFFER_SIZE*8) ;
-
-		num=1;
-	}
-
-	/* Create semaphore: If one_lock, one semaphore for all processors.
-	   Otherwise, one semaphore for each processor. */
-	if (one_lock) {
-		if (create_sem(0)) {
-			printf("Can not create semaphore...exit\n");
-			free_sem(0);
-			return -1;
-		}
-	}
-	else {
-		for (i=0;i<num;i++) {
-		   if (create_sem(parameters[i].cpu)) {
-			printf("Can not create semaphore for cpu%d...exit\n",i);
-			free_sem(parameters[num].cpu);
-			return -1;
-		   }
-		}
-	}
-
-	/* Create a shm segment which will be used to inject/consume errors on.*/
-	if (create_shm()==-1) {
-		printf("Error to create shm...exit\n");
-		return -1;
-	}
-
-	for (i=0;i<num;i++) {
-		pid_t pid;
-
-		current_time=time(NULL);
-		log_info(parameters[i].cpu, "\nBegine at %s", ctime(&current_time));
-		log_info(parameters[i].cpu, "Configurations:\n");
-		log_info(parameters[i].cpu,"On cpu%ld: loop=%lx, interval=%lx(s)",
-			parameters[i].cpu,
-			parameters[i].loop,
-			parameters[i].interval);
-		log_info(parameters[i].cpu," err_type_info=%lx,err_struct_info=%lx\n",
-			parameters[i].err_type_info,
-			parameters[i].err_struct_info);
-
-		sprintf(path, PATH_FORMAT, (int)parameters[i].cpu);
-		err_type_info.err_type_info=parameters[i].err_type_info;
-		err_struct_info.err_struct_info=parameters[i].err_struct_info;
-		memcpy(err_data_buffer.err_data_buffer,
-			parameters[i].err_data_buffer,
-			ERR_DATA_BUFFER_SIZE*8);
-
-		pid=fork();
-		if (pid==0) {
-			unsigned long mask[MASK_SIZE];
-			int j, k;
-
-			void *va1, *va2;
-
-			/* Allocate two memory areas va1 and va2 in shm */
-			va1=shmaddr+parameters[i].cpu*PAGE_SIZE;
-			va2=shmaddr+parameters[i].cpu*PAGE_SIZE+PAGE_SIZE;
-
-			vbprintf("va1=%lx, va2=%lx\n", (u64)va1, (u64)va2);
-			memset(va1, 0x1, PAGE_SIZE);
-			memset(va2, 0x2, PAGE_SIZE);
-
-			if (empty_data_buffer(err_data_buffer.err_data_buffer))
-				/* If not specified yet, construct data buffer
-				 * with va1
-				 */
-				construct_data_buf(path, err_type_info,
-					err_struct_info, &err_data_buffer,va1);
-
-			for (j=0;j<MASK_SIZE;j++)
-				mask[j]=0;
-
-			cpu=parameters[i].cpu;
-			k = cpu%64;
-			j = cpu/64;
-			mask[j] = 1UL << k;
-
-			if (sched_setaffinity(0, MASK_SIZE*8, mask)==-1) {
-				perror("Error sched_setaffinity:");
-				return -1;
-			}
-
-			for (j=0; j<parameters[i].loop; j++) {
-				log_info(parameters[i].cpu,"Injection ");
-				log_info(parameters[i].cpu,"on cpu%ld: #%d/%ld ",
-
-					parameters[i].cpu,j+1, parameters[i].loop);
-
-				/* Hold the lock */
-				if (one_lock)
-					lock(0);
-				else
-				/* Hold lock on this cpu */
-					lock(parameters[i].cpu);
-
-				if ((status=err_inject(parameters[i].cpu,
-					   path, err_type_info,
-					   err_struct_info, err_data_buffer))
-					   ==0) {
-					/* consume the error for "inject only"*/
-					memcpy(va2, va1, PAGE_SIZE);
-					memcpy(va1, va2, PAGE_SIZE);
-					log_info(parameters[i].cpu,
-						"successful\n");
-				}
-				else {
-					log_info(parameters[i].cpu,"fail:");
-					log_info(parameters[i].cpu,
-						"status=%d\n", status);
-					unlock(parameters[i].cpu);
-					break;
-				}
-				if (one_lock)
-				/* Release the lock */
-					unlock(0);
-				/* Release lock on this cpu */
-				else
-					unlock(parameters[i].cpu);
-
-				if (j < parameters[i].loop-1)
-					sleep(parameters[i].interval);
-			}
-			current_time=time(NULL);
-			log_info(parameters[i].cpu, "Done at %s", ctime(&current_time));
-			return 0;
-		}
-		else if (pid<0) {
-			perror("Error fork:");
-			continue;
-		}
-		child_pid[i]=pid;
-	}
-	for (i=0;i<num;i++)
-		waitpid(child_pid[i], NULL, 0);
-
-	if (one_lock)
-		free_sem(0);
-	else
-		for (i=0;i<num;i++)
-			free_sem(parameters[i].cpu);
-
-	printf("All done.\n");
-
-	return 0;
-}
-
-void help()
-{
-	printf("err_inject_tool:\n");
-	printf("\t-q: query all capabilities. default: off\n");
-	printf("\t-m: procedure mode. 1: physical 2: virtual. default: 1\n");
-	printf("\t-i: inject errors. default: off\n");
-	printf("\t-l: one lock per cpu. default: one lock for all\n");
-	printf("\t-e: error parameters:\n");
-	printf("\t\tcpu,loop,interval,err_type_info,err_struct_info[,err_data_buffer[0],err_data_buffer[1],err_data_buffer[2]]\n");
-	printf("\t\t   cpu: logical cpu number the error will be inject in.\n");
-	printf("\t\t   loop: times the error will be injected.\n");
-	printf("\t\t   interval: In second. every so often one error is injected.\n");
-	printf("\t\t   err_type_info, err_struct_info: PAL parameters.\n");
-	printf("\t\t   err_data_buffer: PAL parameter. Optional. If not present,\n");
-	printf("\t\t                    it's constructed by tool automatically. Be\n");
-	printf("\t\t                    careful to provide err_data_buffer and make\n");
-	printf("\t\t                    sure it's working with the environment.\n");
-	printf("\t    Note:no space between error parameters.\n");
-	printf("\t    default: Take error parameters from err.conf instead of command line.\n");
-	printf("\t-v: verbose. default: off\n");
-	printf("\t-h: help\n\n");
-	printf("The tool will take err.conf file as ");
-	printf("input to inject single or multiple errors ");
-	printf("on one or multiple cpus in parallel.\n");
-}
-
-int main(int argc, char **argv)
-{
-	char c;
-	int do_err_inj=0;
-	int do_query_all=0;
-	int count;
-	u32 m;
-
-	/* Default one lock for all cpu's */
-	one_lock=1;
-	while ((c = getopt(argc, argv, "m:iqvhle:")) != EOF)
-		switch (c) {
-			case 'm':	/* Procedure mode. 1: phys 2: virt */
-				count=sscanf(optarg, "%x", &m);
-				if (count!=1 || (m!=1 && m!=2)) {
-					printf("Wrong mode number.\n");
-					help();
-					return -1;
-				}
-				mode=m;
-				break;
-			case 'i':	/* Inject errors */
-				do_err_inj=1;
-				break;
-			case 'q':	/* Query */
-				do_query_all=1;
-				break;
-			case 'v':	/* Verbose */
-				verbose=1;
-				break;
-			case 'l':	/* One lock per cpu */
-				one_lock=0;
-				break;
-			case 'e':	/* error arguments */
-				/* Take parameters:
-				 * #cpu, loop, interval, err_type_info, err_struct_info[, err_data_buffer]
-				 * err_data_buffer is optional. Recommend not to specify
-				 * err_data_buffer. Better to use tool to generate it.
-				 */
-				count=sscanf(optarg,
-					"%lx, %lx, %lx, %lx, %lx, %lx, %lx, %lx\n",
-					&line_para.cpu,
-					&line_para.loop,
-					&line_para.interval,
-					&line_para.err_type_info,
-					&line_para.err_struct_info,
-					&line_para.err_data_buffer[0],
-					&line_para.err_data_buffer[1],
-					&line_para.err_data_buffer[2]);
-				if (count!=PARA_FIELD_NUM+3) {
-				    line_para.err_data_buffer[0]=-1,
-				    line_para.err_data_buffer[1]=-1,
-			 	    line_para.err_data_buffer[2]=-1;
-				    count=sscanf(optarg, "%lx, %lx, %lx, %lx, %lx\n",
-						&line_para.cpu,
-						&line_para.loop,
-						&line_para.interval,
-						&line_para.err_type_info,
-						&line_para.err_struct_info);
-				    if (count!=PARA_FIELD_NUM) {
-					printf("Wrong error arguments.\n");
-					help();
-					return -1;
-				    }
-				}
-				para=1;
-				break;
-			continue;
-				break;
-			case 'h':
-				help();
-				return 0;
-			default:
-				break;
-		}
-
-	if (do_query_all)
-		query_all_capabilities();
-	if (do_err_inj)
-		err_inj();
-
-	if (!do_query_all &&  !do_err_inj)
-		help();
-
-	return 0;
-}
-
diff --git a/Documentation/ia64/fsys.rst b/Documentation/ia64/fsys.rst
new file mode 100644
index 000000000000..a702d2cc94b6
--- /dev/null
+++ b/Documentation/ia64/fsys.rst
@@ -0,0 +1,303 @@
+===================================
+Light-weight System Calls for IA-64
+===================================
+
+		        Started: 13-Jan-2003
+
+		    Last update: 27-Sep-2003
+
+	              David Mosberger-Tang
+		      <davidm@hpl.hp.com>
+
+Using the "epc" instruction effectively introduces a new mode of
+execution to the ia64 linux kernel.  We call this mode the
+"fsys-mode".  To recap, the normal states of execution are:
+
+  - kernel mode:
+	Both the register stack and the memory stack have been
+	switched over to kernel memory.  The user-level state is saved
+	in a pt-regs structure at the top of the kernel memory stack.
+
+  - user mode:
+	Both the register stack and the kernel stack are in
+	user memory.  The user-level state is contained in the
+	CPU registers.
+
+  - bank 0 interruption-handling mode:
+	This is the non-interruptible state which all
+	interruption-handlers start execution in.  The user-level
+	state remains in the CPU registers and some kernel state may
+	be stored in bank 0 of registers r16-r31.
+
+In contrast, fsys-mode has the following special properties:
+
+  - execution is at privilege level 0 (most-privileged)
+
+  - CPU registers may contain a mixture of user-level and kernel-level
+    state (it is the responsibility of the kernel to ensure that no
+    security-sensitive kernel-level state is leaked back to
+    user-level)
+
+  - execution is interruptible and preemptible (an fsys-mode handler
+    can disable interrupts and avoid all other interruption-sources
+    to avoid preemption)
+
+  - neither the memory-stack nor the register-stack can be trusted while
+    in fsys-mode (they point to the user-level stacks, which may
+    be invalid, or completely bogus addresses)
+
+In summary, fsys-mode is much more similar to running in user-mode
+than it is to running in kernel-mode.  Of course, given that the
+privilege level is at level 0, this means that fsys-mode requires some
+care (see below).
+
+
+How to tell fsys-mode
+=====================
+
+Linux operates in fsys-mode when (a) the privilege level is 0 (most
+privileged) and (b) the stacks have NOT been switched to kernel memory
+yet.  For convenience, the header file <asm-ia64/ptrace.h> provides
+three macros::
+
+	user_mode(regs)
+	user_stack(task,regs)
+	fsys_mode(task,regs)
+
+The "regs" argument is a pointer to a pt_regs structure.  The "task"
+argument is a pointer to the task structure to which the "regs"
+pointer belongs to.  user_mode() returns TRUE if the CPU state pointed
+to by "regs" was executing in user mode (privilege level 3).
+user_stack() returns TRUE if the state pointed to by "regs" was
+executing on the user-level stack(s).  Finally, fsys_mode() returns
+TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
+The fsys_mode() macro is equivalent to the expression::
+
+	!user_mode(regs) && user_stack(task,regs)
+
+How to write an fsyscall handler
+================================
+
+The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
+(fsyscall_table).  This table contains one entry for each system call.
+By default, a system call is handled by fsys_fallback_syscall().  This
+routine takes care of entering (full) kernel mode and calling the
+normal Linux system call handler.  For performance-critical system
+calls, it is possible to write a hand-tuned fsyscall_handler.  For
+example, fsys.S contains fsys_getpid(), which is a hand-tuned version
+of the getpid() system call.
+
+The entry and exit-state of an fsyscall handler is as follows:
+
+Machine state on entry to fsyscall handler
+------------------------------------------
+
+  ========= ===============================================================
+  r10	    0
+  r11	    saved ar.pfs (a user-level value)
+  r15	    system call number
+  r16	    "current" task pointer (in normal kernel-mode, this is in r13)
+  r32-r39   system call arguments
+  b6	    return address (a user-level value)
+  ar.pfs    previous frame-state (a user-level value)
+  PSR.be    cleared to zero (i.e., little-endian byte order is in effect)
+  -         all other registers may contain values passed in from user-mode
+  ========= ===============================================================
+
+Required machine state on exit to fsyscall handler
+--------------------------------------------------
+
+  ========= ===========================================================
+  r11	    saved ar.pfs (as passed into the fsyscall handler)
+  r15	    system call number (as passed into the fsyscall handler)
+  r32-r39   system call arguments (as passed into the fsyscall handler)
+  b6	    return address (as passed into the fsyscall handler)
+  ar.pfs    previous frame-state (as passed into the fsyscall handler)
+  ========= ===========================================================
+
+Fsyscall handlers can execute with very little overhead, but with that
+speed comes a set of restrictions:
+
+ * Fsyscall-handlers MUST check for any pending work in the flags
+   member of the thread-info structure and if any of the
+   TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
+   doing a full system call (by calling fsys_fallback_syscall).
+
+ * Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
+   r15, b6, and ar.pfs) because they will be needed in case of a
+   system call restart.  Of course, all "preserved" registers also
+   must be preserved, in accordance to the normal calling conventions.
+
+ * Fsyscall-handlers MUST check argument registers for containing a
+   NaT value before using them in any way that could trigger a
+   NaT-consumption fault.  If a system call argument is found to
+   contain a NaT value, an fsyscall-handler may return immediately
+   with r8=EINVAL, r10=-1.
+
+ * Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
+   any other operation that would trigger mandatory RSE
+   (register-stack engine) traffic.
+
+ * Fsyscall-handlers MUST NOT write to any stacked registers because
+   it is not safe to assume that user-level called a handler with the
+   proper number of arguments.
+
+ * Fsyscall-handlers need to be careful when accessing per-CPU variables:
+   unless proper safe-guards are taken (e.g., interruptions are avoided),
+   execution may be pre-empted and resumed on another CPU at any given
+   time.
+
+ * Fsyscall-handlers must be careful not to leak sensitive kernel'
+   information back to user-level.  In particular, before returning to
+   user-level, care needs to be taken to clear any scratch registers
+   that could contain sensitive information (note that the current
+   task pointer is not considered sensitive: it's already exposed
+   through ar.k6).
+
+ * Fsyscall-handlers MUST NOT access user-memory without first
+   validating access-permission (this can be done typically via
+   probe.r.fault and/or probe.w.fault) and without guarding against
+   memory access exceptions (this can be done with the EX() macros
+   defined by asmmacro.h).
+
+The above restrictions may seem draconian, but remember that it's
+possible to trade off some of the restrictions by paying a slightly
+higher overhead.  For example, if an fsyscall-handler could benefit
+from the shadow register bank, it could temporarily disable PSR.i and
+PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
+needed.  In other words, following the above rules yields extremely
+fast system call execution (while fully preserving system call
+semantics), but there is also a lot of flexibility in handling more
+complicated cases.
+
+Signal handling
+===============
+
+The delivery of (asynchronous) signals must be delayed until fsys-mode
+is exited.  This is accomplished with the help of the lower-privilege
+transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
+checks whether the interrupted task was in fsys-mode and, if so, sets
+PSR.lp and returns immediately.  When fsys-mode is exited via the
+"br.ret" instruction that lowers the privilege level, a trap will
+occur.  The trap handler clears PSR.lp again and returns immediately.
+The kernel exit path then checks for and delivers any pending signals.
+
+PSR Handling
+============
+
+The "epc" instruction doesn't change the contents of PSR at all.  This
+is in contrast to a regular interruption, which clears almost all
+bits.  Because of that, some care needs to be taken to ensure things
+work as expected.  The following discussion describes how each PSR bit
+is handled.
+
+======= =======================================================================
+PSR.be	Cleared when entering fsys-mode.  A srlz.d instruction is used
+	to ensure the CPU is in little-endian mode before the first
+	load/store instruction is executed.  PSR.be is normally NOT
+	restored upon return from an fsys-mode handler.  In other
+	words, user-level code must not rely on PSR.be being preserved
+	across a system call.
+PSR.up	Unchanged.
+PSR.ac	Unchanged.
+PSR.mfl Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.mfh	Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.ic	Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
+PSR.i	Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
+PSR.pk	Unchanged.
+PSR.dt	Unchanged.
+PSR.dfl	Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.dfh	Unchanged.  Note: fsys-mode handlers must not write-registers!
+PSR.sp	Unchanged.
+PSR.pp	Unchanged.
+PSR.di	Unchanged.
+PSR.si	Unchanged.
+PSR.db	Unchanged.  The kernel prevents user-level from setting a hardware
+	breakpoint that triggers at any privilege level other than
+	3 (user-mode).
+PSR.lp	Unchanged.
+PSR.tb	Lazy redirect.  If a taken-branch trap occurs while in
+	fsys-mode, the trap-handler modifies the saved machine state
+	such that execution resumes in the gate page at
+	syscall_via_break(), with privilege level 3.  Note: the
+	taken branch would occur on the branch invoking the
+	fsyscall-handler, at which point, by definition, a syscall
+	restart is still safe.  If the system call number is invalid,
+	the fsys-mode handler will return directly to user-level.  This
+	return will trigger a taken-branch trap, but since the trap is
+	taken _after_ restoring the privilege level, the CPU has already
+	left fsys-mode, so no special treatment is needed.
+PSR.rt	Unchanged.
+PSR.cpl	Cleared to 0.
+PSR.is	Unchanged (guaranteed to be 0 on entry to the gate page).
+PSR.mc	Unchanged.
+PSR.it	Unchanged (guaranteed to be 1).
+PSR.id	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+PSR.da	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+PSR.dd	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+PSR.ss	Lazy redirect.  If set, "epc" will cause a Single Step Trap to
+	be taken.  The trap handler then modifies the saved machine
+	state such that execution resumes in the gate page at
+	syscall_via_break(), with privilege level 3.
+PSR.ri	Unchanged.
+PSR.ed	Unchanged.  Note: This bit could only have an effect if an fsys-mode
+	handler performed a speculative load that gets NaTted.  If so, this
+	would be the normal & expected behavior, so no special treatment is
+	needed.
+PSR.bn	Unchanged.  Note: fsys-mode handlers may clear the bit, if needed.
+	Doing so requires clearing PSR.i and PSR.ic as well.
+PSR.ia	Unchanged.  Note: the ia64 linux kernel never sets this bit.
+======= =======================================================================
+
+Using fast system calls
+=======================
+
+To use fast system calls, userspace applications need simply call
+__kernel_syscall_via_epc().  For example
+
+-- example fgettimeofday() call --
+
+-- fgettimeofday.S --
+
+::
+
+  #include <asm/asmmacro.h>
+
+  GLOBAL_ENTRY(fgettimeofday)
+  .prologue
+  .save ar.pfs, r11
+  mov r11 = ar.pfs
+  .body
+
+  mov r2 = 0xa000000000020660;;  // gate address
+			       // found by inspection of System.map for the
+			       // __kernel_syscall_via_epc() function.  See
+			       // below for how to do this for real.
+
+  mov b7 = r2
+  mov r15 = 1087		       // gettimeofday syscall
+  ;;
+  br.call.sptk.many b6 = b7
+  ;;
+
+  .restore sp
+
+  mov ar.pfs = r11
+  br.ret.sptk.many rp;;	      // return to caller
+  END(fgettimeofday)
+
+-- end fgettimeofday.S --
+
+In reality, getting the gate address is accomplished by two extra
+values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
+
+ * AT_SYSINFO : is the address of __kernel_syscall_via_epc()
+ * AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
+
+The ELF DSO is a pre-linked library that is mapped in by the kernel at
+the gate page.  It is a proper ELF shared object so, with a dynamic
+loader that recognises the library, you should be able to make calls to
+the exported functions within it as with any other shared library.
+AT_SYSINFO points into the kernel DSO at the
+__kernel_syscall_via_epc() function for historical reasons (it was
+used before the kernel DSO) and as a convenience.
diff --git a/Documentation/ia64/fsys.txt b/Documentation/ia64/fsys.txt
deleted file mode 100644
index 59dd689d9b86..000000000000
--- a/Documentation/ia64/fsys.txt
+++ /dev/null
@@ -1,286 +0,0 @@
--*-Mode: outline-*-
-
-		Light-weight System Calls for IA-64
-		-----------------------------------
-
-		        Started: 13-Jan-2003
-		    Last update: 27-Sep-2003
-
-	              David Mosberger-Tang
-		      <davidm@hpl.hp.com>
-
-Using the "epc" instruction effectively introduces a new mode of
-execution to the ia64 linux kernel.  We call this mode the
-"fsys-mode".  To recap, the normal states of execution are:
-
-  - kernel mode:
-	Both the register stack and the memory stack have been
-	switched over to kernel memory.  The user-level state is saved
-	in a pt-regs structure at the top of the kernel memory stack.
-
-  - user mode:
-	Both the register stack and the kernel stack are in
-	user memory.  The user-level state is contained in the
-	CPU registers.
-
-  - bank 0 interruption-handling mode:
-	This is the non-interruptible state which all
-	interruption-handlers start execution in.  The user-level
-	state remains in the CPU registers and some kernel state may
-	be stored in bank 0 of registers r16-r31.
-
-In contrast, fsys-mode has the following special properties:
-
-  - execution is at privilege level 0 (most-privileged)
-
-  - CPU registers may contain a mixture of user-level and kernel-level
-    state (it is the responsibility of the kernel to ensure that no
-    security-sensitive kernel-level state is leaked back to
-    user-level)
-
-  - execution is interruptible and preemptible (an fsys-mode handler
-    can disable interrupts and avoid all other interruption-sources
-    to avoid preemption)
-
-  - neither the memory-stack nor the register-stack can be trusted while
-    in fsys-mode (they point to the user-level stacks, which may
-    be invalid, or completely bogus addresses)
-
-In summary, fsys-mode is much more similar to running in user-mode
-than it is to running in kernel-mode.  Of course, given that the
-privilege level is at level 0, this means that fsys-mode requires some
-care (see below).
-
-
-* How to tell fsys-mode
-
-Linux operates in fsys-mode when (a) the privilege level is 0 (most
-privileged) and (b) the stacks have NOT been switched to kernel memory
-yet.  For convenience, the header file <asm-ia64/ptrace.h> provides
-three macros:
-
-	user_mode(regs)
-	user_stack(task,regs)
-	fsys_mode(task,regs)
-
-The "regs" argument is a pointer to a pt_regs structure.  The "task"
-argument is a pointer to the task structure to which the "regs"
-pointer belongs to.  user_mode() returns TRUE if the CPU state pointed
-to by "regs" was executing in user mode (privilege level 3).
-user_stack() returns TRUE if the state pointed to by "regs" was
-executing on the user-level stack(s).  Finally, fsys_mode() returns
-TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
-The fsys_mode() macro is equivalent to the expression:
-
-	!user_mode(regs) && user_stack(task,regs)
-
-* How to write an fsyscall handler
-
-The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
-(fsyscall_table).  This table contains one entry for each system call.
-By default, a system call is handled by fsys_fallback_syscall().  This
-routine takes care of entering (full) kernel mode and calling the
-normal Linux system call handler.  For performance-critical system
-calls, it is possible to write a hand-tuned fsyscall_handler.  For
-example, fsys.S contains fsys_getpid(), which is a hand-tuned version
-of the getpid() system call.
-
-The entry and exit-state of an fsyscall handler is as follows:
-
-** Machine state on entry to fsyscall handler:
-
- - r10	  = 0
- - r11	  = saved ar.pfs (a user-level value)
- - r15	  = system call number
- - r16	  = "current" task pointer (in normal kernel-mode, this is in r13)
- - r32-r39 = system call arguments
- - b6	  = return address (a user-level value)
- - ar.pfs = previous frame-state (a user-level value)
- - PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
- - all other registers may contain values passed in from user-mode
-
-** Required machine state on exit to fsyscall handler:
-
- - r11	  = saved ar.pfs (as passed into the fsyscall handler)
- - r15	  = system call number (as passed into the fsyscall handler)
- - r32-r39 = system call arguments (as passed into the fsyscall handler)
- - b6	  = return address (as passed into the fsyscall handler)
- - ar.pfs = previous frame-state (as passed into the fsyscall handler)
-
-Fsyscall handlers can execute with very little overhead, but with that
-speed comes a set of restrictions:
-
- o Fsyscall-handlers MUST check for any pending work in the flags
-   member of the thread-info structure and if any of the
-   TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
-   doing a full system call (by calling fsys_fallback_syscall).
-
- o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
-   r15, b6, and ar.pfs) because they will be needed in case of a
-   system call restart.  Of course, all "preserved" registers also
-   must be preserved, in accordance to the normal calling conventions.
-
- o Fsyscall-handlers MUST check argument registers for containing a
-   NaT value before using them in any way that could trigger a
-   NaT-consumption fault.  If a system call argument is found to
-   contain a NaT value, an fsyscall-handler may return immediately
-   with r8=EINVAL, r10=-1.
-
- o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
-   any other operation that would trigger mandatory RSE
-   (register-stack engine) traffic.
-
- o Fsyscall-handlers MUST NOT write to any stacked registers because
-   it is not safe to assume that user-level called a handler with the
-   proper number of arguments.
-
- o Fsyscall-handlers need to be careful when accessing per-CPU variables:
-   unless proper safe-guards are taken (e.g., interruptions are avoided),
-   execution may be pre-empted and resumed on another CPU at any given
-   time.
-
- o Fsyscall-handlers must be careful not to leak sensitive kernel'
-   information back to user-level.  In particular, before returning to
-   user-level, care needs to be taken to clear any scratch registers
-   that could contain sensitive information (note that the current
-   task pointer is not considered sensitive: it's already exposed
-   through ar.k6).
-
- o Fsyscall-handlers MUST NOT access user-memory without first
-   validating access-permission (this can be done typically via
-   probe.r.fault and/or probe.w.fault) and without guarding against
-   memory access exceptions (this can be done with the EX() macros
-   defined by asmmacro.h).
-
-The above restrictions may seem draconian, but remember that it's
-possible to trade off some of the restrictions by paying a slightly
-higher overhead.  For example, if an fsyscall-handler could benefit
-from the shadow register bank, it could temporarily disable PSR.i and
-PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
-needed.  In other words, following the above rules yields extremely
-fast system call execution (while fully preserving system call
-semantics), but there is also a lot of flexibility in handling more
-complicated cases.
-
-* Signal handling
-
-The delivery of (asynchronous) signals must be delayed until fsys-mode
-is exited.  This is accomplished with the help of the lower-privilege
-transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
-checks whether the interrupted task was in fsys-mode and, if so, sets
-PSR.lp and returns immediately.  When fsys-mode is exited via the
-"br.ret" instruction that lowers the privilege level, a trap will
-occur.  The trap handler clears PSR.lp again and returns immediately.
-The kernel exit path then checks for and delivers any pending signals.
-
-* PSR Handling
-
-The "epc" instruction doesn't change the contents of PSR at all.  This
-is in contrast to a regular interruption, which clears almost all
-bits.  Because of that, some care needs to be taken to ensure things
-work as expected.  The following discussion describes how each PSR bit
-is handled.
-
-PSR.be	Cleared when entering fsys-mode.  A srlz.d instruction is used
-	to ensure the CPU is in little-endian mode before the first
-	load/store instruction is executed.  PSR.be is normally NOT
-	restored upon return from an fsys-mode handler.  In other
-	words, user-level code must not rely on PSR.be being preserved
-	across a system call.
-PSR.up	Unchanged.
-PSR.ac	Unchanged.
-PSR.mfl Unchanged.  Note: fsys-mode handlers must not write-registers!
-PSR.mfh	Unchanged.  Note: fsys-mode handlers must not write-registers!
-PSR.ic	Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
-PSR.i	Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
-PSR.pk	Unchanged.
-PSR.dt	Unchanged.
-PSR.dfl	Unchanged.  Note: fsys-mode handlers must not write-registers!
-PSR.dfh	Unchanged.  Note: fsys-mode handlers must not write-registers!
-PSR.sp	Unchanged.
-PSR.pp	Unchanged.
-PSR.di	Unchanged.
-PSR.si	Unchanged.
-PSR.db	Unchanged.  The kernel prevents user-level from setting a hardware
-	breakpoint that triggers at any privilege level other than 3 (user-mode).
-PSR.lp	Unchanged.
-PSR.tb	Lazy redirect.  If a taken-branch trap occurs while in
-	fsys-mode, the trap-handler modifies the saved machine state
-	such that execution resumes in the gate page at
-	syscall_via_break(), with privilege level 3.  Note: the
-	taken branch would occur on the branch invoking the
-	fsyscall-handler, at which point, by definition, a syscall
-	restart is still safe.  If the system call number is invalid,
-	the fsys-mode handler will return directly to user-level.  This
-	return will trigger a taken-branch trap, but since the trap is
-	taken _after_ restoring the privilege level, the CPU has already
-	left fsys-mode, so no special treatment is needed.
-PSR.rt	Unchanged.
-PSR.cpl	Cleared to 0.
-PSR.is	Unchanged (guaranteed to be 0 on entry to the gate page).
-PSR.mc	Unchanged.
-PSR.it	Unchanged (guaranteed to be 1).
-PSR.id	Unchanged.  Note: the ia64 linux kernel never sets this bit.
-PSR.da	Unchanged.  Note: the ia64 linux kernel never sets this bit.
-PSR.dd	Unchanged.  Note: the ia64 linux kernel never sets this bit.
-PSR.ss	Lazy redirect.  If set, "epc" will cause a Single Step Trap to
-	be taken.  The trap handler then modifies the saved machine
-	state such that execution resumes in the gate page at
-	syscall_via_break(), with privilege level 3.
-PSR.ri	Unchanged.
-PSR.ed	Unchanged.  Note: This bit could only have an effect if an fsys-mode
-	handler performed a speculative load that gets NaTted.  If so, this
-	would be the normal & expected behavior, so no special treatment is
-	needed.
-PSR.bn	Unchanged.  Note: fsys-mode handlers may clear the bit, if needed.
-	Doing so requires clearing PSR.i and PSR.ic as well.
-PSR.ia	Unchanged.  Note: the ia64 linux kernel never sets this bit.
-
-* Using fast system calls
-
-To use fast system calls, userspace applications need simply call
-__kernel_syscall_via_epc().  For example
-
--- example fgettimeofday() call --
--- fgettimeofday.S --
-
-#include <asm/asmmacro.h>
-
-GLOBAL_ENTRY(fgettimeofday)
-.prologue
-.save ar.pfs, r11
-mov r11 = ar.pfs
-.body 
-
-mov r2 = 0xa000000000020660;;  // gate address 
-			       // found by inspection of System.map for the 
-			       // __kernel_syscall_via_epc() function.  See
-			       // below for how to do this for real.
-
-mov b7 = r2
-mov r15 = 1087		       // gettimeofday syscall
-;;
-br.call.sptk.many b6 = b7
-;;
-
-.restore sp
-
-mov ar.pfs = r11
-br.ret.sptk.many rp;;	      // return to caller
-END(fgettimeofday)
-
--- end fgettimeofday.S --
-
-In reality, getting the gate address is accomplished by two extra
-values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
-
- o AT_SYSINFO : is the address of __kernel_syscall_via_epc()
- o AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
-
-The ELF DSO is a pre-linked library that is mapped in by the kernel at
-the gate page.  It is a proper ELF shared object so, with a dynamic
-loader that recognises the library, you should be able to make calls to
-the exported functions within it as with any other shared library.
-AT_SYSINFO points into the kernel DSO at the
-__kernel_syscall_via_epc() function for historical reasons (it was
-used before the kernel DSO) and as a convenience.
diff --git a/Documentation/ia64/ia64.rst b/Documentation/ia64/ia64.rst
new file mode 100644
index 000000000000..b725019a9492
--- /dev/null
+++ b/Documentation/ia64/ia64.rst
@@ -0,0 +1,49 @@
+===========================================
+Linux kernel release for the IA-64 Platform
+===========================================
+
+   These are the release notes for Linux since version 2.4 for IA-64
+   platform.  This document provides information specific to IA-64
+   ONLY, to get additional information about the Linux kernel also
+   read the original Linux README provided with the kernel.
+
+Installing the Kernel
+=====================
+
+ - IA-64 kernel installation is the same as the other platforms, see
+   original README for details.
+
+
+Software Requirements
+=====================
+
+   Compiling and running this kernel requires an IA-64 compliant GCC
+   compiler.  And various software packages also compiled with an
+   IA-64 compliant GCC compiler.
+
+
+Configuring the Kernel
+======================
+
+   Configuration is the same, see original README for details.
+
+
+Compiling the Kernel:
+
+ - Compiling this kernel doesn't differ from other platform so read
+   the original README for details BUT make sure you have an IA-64
+   compliant GCC compiler.
+
+IA-64 Specifics
+===============
+
+ - General issues:
+
+    * Hardly any performance tuning has been done. Obvious targets
+      include the library routines (IP checksum, etc.). Less
+      obvious targets include making sure we don't flush the TLB
+      needlessly, etc.
+
+    * SMP locks cleanup/optimization
+
+    * IA32 support.  Currently experimental.  It mostly works.
diff --git a/Documentation/ia64/index.rst b/Documentation/ia64/index.rst
new file mode 100644
index 000000000000..a3e3052ad6e2
--- /dev/null
+++ b/Documentation/ia64/index.rst
@@ -0,0 +1,18 @@
+:orphan:
+
+==================
+IA-64 Architecture
+==================
+
+.. toctree::
+   :maxdepth: 1
+
+   ia64
+   aliasing
+   efirtc
+   err_inject
+   fsys
+   irq-redir
+   mca
+   serial
+   xen
diff --git a/Documentation/ia64/irq-redir.rst b/Documentation/ia64/irq-redir.rst
new file mode 100644
index 000000000000..39bf94484a15
--- /dev/null
+++ b/Documentation/ia64/irq-redir.rst
@@ -0,0 +1,80 @@
+==============================
+IRQ affinity on IA64 platforms
+==============================
+
+07.01.2002, Erich Focht <efocht@ess.nec.de>
+
+
+By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be
+controlled. The behavior on IA64 platforms is slightly different from
+that described in Documentation/IRQ-affinity.txt for i386 systems.
+
+Because of the usage of SAPIC mode and physical destination mode the
+IRQ target is one particular CPU and cannot be a mask of several
+CPUs. Only the first non-zero bit is taken into account.
+
+
+Usage examples
+==============
+
+The target CPU has to be specified as a hexadecimal CPU mask. The
+first non-zero bit is the selected CPU. This format has been kept for
+compatibility reasons with i386.
+
+Set the delivery mode of interrupt 41 to fixed and route the
+interrupts to CPU #3 (logical CPU number) (2^3=0x08)::
+
+     echo "8" >/proc/irq/41/smp_affinity
+
+Set the default route for IRQ number 41 to CPU 6 in lowest priority
+delivery mode (redirectable)::
+
+     echo "r 40" >/proc/irq/41/smp_affinity
+
+The output of the command::
+
+     cat /proc/irq/IRQ#/smp_affinity
+
+gives the target CPU mask for the specified interrupt vector. If the CPU
+mask is preceded by the character "r", the interrupt is redirectable
+(i.e. lowest priority mode routing is used), otherwise its route is
+fixed.
+
+
+
+Initialization and default behavior
+===================================
+
+If the platform features IRQ redirection (info provided by SAL) all
+IO-SAPIC interrupts are initialized with CPU#0 as their default target
+and the routing is the so called "lowest priority mode" (actually
+fixed SAPIC mode with hint). The XTP chipset registers are used as hints
+for the IRQ routing. Currently in Linux XTP registers can have three
+values:
+
+	- minimal for an idle task,
+	- normal if any other task runs,
+	- maximal if the CPU is going to be switched off.
+
+The IRQ is routed to the CPU with lowest XTP register value, the
+search begins at the default CPU. Therefore most of the interrupts
+will be handled by CPU #0.
+
+If the platform doesn't feature interrupt redirection IOSAPIC fixed
+routing is used. The target CPUs are distributed in a round robin
+manner. IRQs will be routed only to the selected target CPUs. Check
+with::
+
+        cat /proc/interrupts
+
+
+
+Comments
+========
+
+On large (multi-node) systems it is recommended to route the IRQs to
+the node to which the corresponding device is connected.
+For systems like the NEC AzusA we get IRQ node-affinity for free. This
+is because usually the chipsets on each node redirect the interrupts
+only to their own CPUs (as they cannot see the XTP registers on the
+other nodes).
diff --git a/Documentation/ia64/mca.rst b/Documentation/ia64/mca.rst
new file mode 100644
index 000000000000..08270bba44a4
--- /dev/null
+++ b/Documentation/ia64/mca.rst
@@ -0,0 +1,198 @@
+=============================================================
+An ad-hoc collection of notes on IA64 MCA and INIT processing
+=============================================================
+
+Feel free to update it with notes about any area that is not clear.
+
+---
+
+MCA/INIT are completely asynchronous.  They can occur at any time, when
+the OS is in any state.  Including when one of the cpus is already
+holding a spinlock.  Trying to get any lock from MCA/INIT state is
+asking for deadlock.  Also the state of structures that are protected
+by locks is indeterminate, including linked lists.
+
+---
+
+The complicated ia64 MCA process.  All of this is mandated by Intel's
+specification for ia64 SAL, error recovery and unwind, it is not as
+if we have a choice here.
+
+* MCA occurs on one cpu, usually due to a double bit memory error.
+  This is the monarch cpu.
+
+* SAL sends an MCA rendezvous interrupt (which is a normal interrupt)
+  to all the other cpus, the slaves.
+
+* Slave cpus that receive the MCA interrupt call down into SAL, they
+  end up spinning disabled while the MCA is being serviced.
+
+* If any slave cpu was already spinning disabled when the MCA occurred
+  then it cannot service the MCA interrupt.  SAL waits ~20 seconds then
+  sends an unmaskable INIT event to the slave cpus that have not
+  already rendezvoused.
+
+* Because MCA/INIT can be delivered at any time, including when the cpu
+  is down in PAL in physical mode, the registers at the time of the
+  event are _completely_ undefined.  In particular the MCA/INIT
+  handlers cannot rely on the thread pointer, PAL physical mode can
+  (and does) modify TP.  It is allowed to do that as long as it resets
+  TP on return.  However MCA/INIT events expose us to these PAL
+  internal TP changes.  Hence curr_task().
+
+* If an MCA/INIT event occurs while the kernel was running (not user
+  space) and the kernel has called PAL then the MCA/INIT handler cannot
+  assume that the kernel stack is in a fit state to be used.  Mainly
+  because PAL may or may not maintain the stack pointer internally.
+  Because the MCA/INIT handlers cannot trust the kernel stack, they
+  have to use their own, per-cpu stacks.  The MCA/INIT stacks are
+  preformatted with just enough task state to let the relevant handlers
+  do their job.
+
+* Unlike most other architectures, the ia64 struct task is embedded in
+  the kernel stack[1].  So switching to a new kernel stack means that
+  we switch to a new task as well.  Because various bits of the kernel
+  assume that current points into the struct task, switching to a new
+  stack also means a new value for current.
+
+* Once all slaves have rendezvoused and are spinning disabled, the
+  monarch is entered.  The monarch now tries to diagnose the problem
+  and decide if it can recover or not.
+
+* Part of the monarch's job is to look at the state of all the other
+  tasks.  The only way to do that on ia64 is to call the unwinder,
+  as mandated by Intel.
+
+* The starting point for the unwind depends on whether a task is
+  running or not.  That is, whether it is on a cpu or is blocked.  The
+  monarch has to determine whether or not a task is on a cpu before it
+  knows how to start unwinding it.  The tasks that received an MCA or
+  INIT event are no longer running, they have been converted to blocked
+  tasks.  But (and its a big but), the cpus that received the MCA
+  rendezvous interrupt are still running on their normal kernel stacks!
+
+* To distinguish between these two cases, the monarch must know which
+  tasks are on a cpu and which are not.  Hence each slave cpu that
+  switches to an MCA/INIT stack, registers its new stack using
+  set_curr_task(), so the monarch can tell that the _original_ task is
+  no longer running on that cpu.  That gives us a decent chance of
+  getting a valid backtrace of the _original_ task.
+
+* MCA/INIT can be nested, to a depth of 2 on any cpu.  In the case of a
+  nested error, we want diagnostics on the MCA/INIT handler that
+  failed, not on the task that was originally running.  Again this
+  requires set_curr_task() so the MCA/INIT handlers can register their
+  own stack as running on that cpu.  Then a recursive error gets a
+  trace of the failing handler's "task".
+
+[1]
+    My (Keith Owens) original design called for ia64 to separate its
+    struct task and the kernel stacks.  Then the MCA/INIT data would be
+    chained stacks like i386 interrupt stacks.  But that required
+    radical surgery on the rest of ia64, plus extra hard wired TLB
+    entries with its associated performance degradation.  David
+    Mosberger vetoed that approach.  Which meant that separate kernel
+    stacks meant separate "tasks" for the MCA/INIT handlers.
+
+---
+
+INIT is less complicated than MCA.  Pressing the nmi button or using
+the equivalent command on the management console sends INIT to all
+cpus.  SAL picks one of the cpus as the monarch and the rest are
+slaves.  All the OS INIT handlers are entered at approximately the same
+time.  The OS monarch prints the state of all tasks and returns, after
+which the slaves return and the system resumes.
+
+At least that is what is supposed to happen.  Alas there are broken
+versions of SAL out there.  Some drive all the cpus as monarchs.  Some
+drive them all as slaves.  Some drive one cpu as monarch, wait for that
+cpu to return from the OS then drive the rest as slaves.  Some versions
+of SAL cannot even cope with returning from the OS, they spin inside
+SAL on resume.  The OS INIT code has workarounds for some of these
+broken SAL symptoms, but some simply cannot be fixed from the OS side.
+
+---
+
+The scheduler hooks used by ia64 (curr_task, set_curr_task) are layer
+violations.  Unfortunately MCA/INIT start off as massive layer
+violations (can occur at _any_ time) and they build from there.
+
+At least ia64 makes an attempt at recovering from hardware errors, but
+it is a difficult problem because of the asynchronous nature of these
+errors.  When processing an unmaskable interrupt we sometimes need
+special code to cope with our inability to take any locks.
+
+---
+
+How is ia64 MCA/INIT different from x86 NMI?
+
+* x86 NMI typically gets delivered to one cpu.  MCA/INIT gets sent to
+  all cpus.
+
+* x86 NMI cannot be nested.  MCA/INIT can be nested, to a depth of 2
+  per cpu.
+
+* x86 has a separate struct task which points to one of multiple kernel
+  stacks.  ia64 has the struct task embedded in the single kernel
+  stack, so switching stack means switching task.
+
+* x86 does not call the BIOS so the NMI handler does not have to worry
+  about any registers having changed.  MCA/INIT can occur while the cpu
+  is in PAL in physical mode, with undefined registers and an undefined
+  kernel stack.
+
+* i386 backtrace is not very sensitive to whether a process is running
+  or not.  ia64 unwind is very, very sensitive to whether a process is
+  running or not.
+
+---
+
+What happens when MCA/INIT is delivered what a cpu is running user
+space code?
+
+The user mode registers are stored in the RSE area of the MCA/INIT on
+entry to the OS and are restored from there on return to SAL, so user
+mode registers are preserved across a recoverable MCA/INIT.  Since the
+OS has no idea what unwind data is available for the user space stack,
+MCA/INIT never tries to backtrace user space.  Which means that the OS
+does not bother making the user space process look like a blocked task,
+i.e. the OS does not copy pt_regs and switch_stack to the user space
+stack.  Also the OS has no idea how big the user space RSE and memory
+stacks are, which makes it too risky to copy the saved state to a user
+mode stack.
+
+---
+
+How do we get a backtrace on the tasks that were running when MCA/INIT
+was delivered?
+
+mca.c:::ia64_mca_modify_original_stack().  That identifies and
+verifies the original kernel stack, copies the dirty registers from
+the MCA/INIT stack's RSE to the original stack's RSE, copies the
+skeleton struct pt_regs and switch_stack to the original stack, fills
+in the skeleton structures from the PAL minstate area and updates the
+original stack's thread.ksp.  That makes the original stack look
+exactly like any other blocked task, i.e. it now appears to be
+sleeping.  To get a backtrace, just start with thread.ksp for the
+original task and unwind like any other sleeping task.
+
+---
+
+How do we identify the tasks that were running when MCA/INIT was
+delivered?
+
+If the previous task has been verified and converted to a blocked
+state, then sos->prev_task on the MCA/INIT stack is updated to point to
+the previous task.  You can look at that field in dumps or debuggers.
+To help distinguish between the handler and the original tasks,
+handlers have _TIF_MCA_INIT set in thread_info.flags.
+
+The sos data is always in the MCA/INIT handler stack, at offset
+MCA_SOS_OFFSET.  You can get that value from mca_asm.h or calculate it
+as KERNEL_STACK_SIZE - sizeof(struct pt_regs) - sizeof(struct
+ia64_sal_os_state), with 16 byte alignment for all structures.
+
+Also the comm field of the MCA/INIT task is modified to include the pid
+of the original task, for humans to use.  For example, a comm field of
+'MCA 12159' means that pid 12159 was running when the MCA was
+delivered.
diff --git a/Documentation/ia64/mca.txt b/Documentation/ia64/mca.txt
deleted file mode 100644
index f097c60cba1b..000000000000
--- a/Documentation/ia64/mca.txt
+++ /dev/null
@@ -1,194 +0,0 @@
-An ad-hoc collection of notes on IA64 MCA and INIT processing.  Feel
-free to update it with notes about any area that is not clear.
-
----
-
-MCA/INIT are completely asynchronous.  They can occur at any time, when
-the OS is in any state.  Including when one of the cpus is already
-holding a spinlock.  Trying to get any lock from MCA/INIT state is
-asking for deadlock.  Also the state of structures that are protected
-by locks is indeterminate, including linked lists.
-
----
-
-The complicated ia64 MCA process.  All of this is mandated by Intel's
-specification for ia64 SAL, error recovery and unwind, it is not as
-if we have a choice here.
-
-* MCA occurs on one cpu, usually due to a double bit memory error.
-  This is the monarch cpu.
-
-* SAL sends an MCA rendezvous interrupt (which is a normal interrupt)
-  to all the other cpus, the slaves.
-
-* Slave cpus that receive the MCA interrupt call down into SAL, they
-  end up spinning disabled while the MCA is being serviced.
-
-* If any slave cpu was already spinning disabled when the MCA occurred
-  then it cannot service the MCA interrupt.  SAL waits ~20 seconds then
-  sends an unmaskable INIT event to the slave cpus that have not
-  already rendezvoused.
-
-* Because MCA/INIT can be delivered at any time, including when the cpu
-  is down in PAL in physical mode, the registers at the time of the
-  event are _completely_ undefined.  In particular the MCA/INIT
-  handlers cannot rely on the thread pointer, PAL physical mode can
-  (and does) modify TP.  It is allowed to do that as long as it resets
-  TP on return.  However MCA/INIT events expose us to these PAL
-  internal TP changes.  Hence curr_task().
-
-* If an MCA/INIT event occurs while the kernel was running (not user
-  space) and the kernel has called PAL then the MCA/INIT handler cannot
-  assume that the kernel stack is in a fit state to be used.  Mainly
-  because PAL may or may not maintain the stack pointer internally.
-  Because the MCA/INIT handlers cannot trust the kernel stack, they
-  have to use their own, per-cpu stacks.  The MCA/INIT stacks are
-  preformatted with just enough task state to let the relevant handlers
-  do their job.
-
-* Unlike most other architectures, the ia64 struct task is embedded in
-  the kernel stack[1].  So switching to a new kernel stack means that
-  we switch to a new task as well.  Because various bits of the kernel
-  assume that current points into the struct task, switching to a new
-  stack also means a new value for current.
-
-* Once all slaves have rendezvoused and are spinning disabled, the
-  monarch is entered.  The monarch now tries to diagnose the problem
-  and decide if it can recover or not.
-
-* Part of the monarch's job is to look at the state of all the other
-  tasks.  The only way to do that on ia64 is to call the unwinder,
-  as mandated by Intel.
-
-* The starting point for the unwind depends on whether a task is
-  running or not.  That is, whether it is on a cpu or is blocked.  The
-  monarch has to determine whether or not a task is on a cpu before it
-  knows how to start unwinding it.  The tasks that received an MCA or
-  INIT event are no longer running, they have been converted to blocked
-  tasks.  But (and its a big but), the cpus that received the MCA
-  rendezvous interrupt are still running on their normal kernel stacks!
-
-* To distinguish between these two cases, the monarch must know which
-  tasks are on a cpu and which are not.  Hence each slave cpu that
-  switches to an MCA/INIT stack, registers its new stack using
-  set_curr_task(), so the monarch can tell that the _original_ task is
-  no longer running on that cpu.  That gives us a decent chance of
-  getting a valid backtrace of the _original_ task.
-
-* MCA/INIT can be nested, to a depth of 2 on any cpu.  In the case of a
-  nested error, we want diagnostics on the MCA/INIT handler that
-  failed, not on the task that was originally running.  Again this
-  requires set_curr_task() so the MCA/INIT handlers can register their
-  own stack as running on that cpu.  Then a recursive error gets a
-  trace of the failing handler's "task".
-
-[1] My (Keith Owens) original design called for ia64 to separate its
-    struct task and the kernel stacks.  Then the MCA/INIT data would be
-    chained stacks like i386 interrupt stacks.  But that required
-    radical surgery on the rest of ia64, plus extra hard wired TLB
-    entries with its associated performance degradation.  David
-    Mosberger vetoed that approach.  Which meant that separate kernel
-    stacks meant separate "tasks" for the MCA/INIT handlers.
-
----
-
-INIT is less complicated than MCA.  Pressing the nmi button or using
-the equivalent command on the management console sends INIT to all
-cpus.  SAL picks one of the cpus as the monarch and the rest are
-slaves.  All the OS INIT handlers are entered at approximately the same
-time.  The OS monarch prints the state of all tasks and returns, after
-which the slaves return and the system resumes.
-
-At least that is what is supposed to happen.  Alas there are broken
-versions of SAL out there.  Some drive all the cpus as monarchs.  Some
-drive them all as slaves.  Some drive one cpu as monarch, wait for that
-cpu to return from the OS then drive the rest as slaves.  Some versions
-of SAL cannot even cope with returning from the OS, they spin inside
-SAL on resume.  The OS INIT code has workarounds for some of these
-broken SAL symptoms, but some simply cannot be fixed from the OS side.
-
----
-
-The scheduler hooks used by ia64 (curr_task, set_curr_task) are layer
-violations.  Unfortunately MCA/INIT start off as massive layer
-violations (can occur at _any_ time) and they build from there.
-
-At least ia64 makes an attempt at recovering from hardware errors, but
-it is a difficult problem because of the asynchronous nature of these
-errors.  When processing an unmaskable interrupt we sometimes need
-special code to cope with our inability to take any locks.
-
----
-
-How is ia64 MCA/INIT different from x86 NMI?
-
-* x86 NMI typically gets delivered to one cpu.  MCA/INIT gets sent to
-  all cpus.
-
-* x86 NMI cannot be nested.  MCA/INIT can be nested, to a depth of 2
-  per cpu.
-
-* x86 has a separate struct task which points to one of multiple kernel
-  stacks.  ia64 has the struct task embedded in the single kernel
-  stack, so switching stack means switching task.
-
-* x86 does not call the BIOS so the NMI handler does not have to worry
-  about any registers having changed.  MCA/INIT can occur while the cpu
-  is in PAL in physical mode, with undefined registers and an undefined
-  kernel stack.
-
-* i386 backtrace is not very sensitive to whether a process is running
-  or not.  ia64 unwind is very, very sensitive to whether a process is
-  running or not.
-
----
-
-What happens when MCA/INIT is delivered what a cpu is running user
-space code?
-
-The user mode registers are stored in the RSE area of the MCA/INIT on
-entry to the OS and are restored from there on return to SAL, so user
-mode registers are preserved across a recoverable MCA/INIT.  Since the
-OS has no idea what unwind data is available for the user space stack,
-MCA/INIT never tries to backtrace user space.  Which means that the OS
-does not bother making the user space process look like a blocked task,
-i.e. the OS does not copy pt_regs and switch_stack to the user space
-stack.  Also the OS has no idea how big the user space RSE and memory
-stacks are, which makes it too risky to copy the saved state to a user
-mode stack.
-
----
-
-How do we get a backtrace on the tasks that were running when MCA/INIT
-was delivered?
-
-mca.c:::ia64_mca_modify_original_stack().  That identifies and
-verifies the original kernel stack, copies the dirty registers from
-the MCA/INIT stack's RSE to the original stack's RSE, copies the
-skeleton struct pt_regs and switch_stack to the original stack, fills
-in the skeleton structures from the PAL minstate area and updates the
-original stack's thread.ksp.  That makes the original stack look
-exactly like any other blocked task, i.e. it now appears to be
-sleeping.  To get a backtrace, just start with thread.ksp for the
-original task and unwind like any other sleeping task.
-
----
-
-How do we identify the tasks that were running when MCA/INIT was
-delivered?
-
-If the previous task has been verified and converted to a blocked
-state, then sos->prev_task on the MCA/INIT stack is updated to point to
-the previous task.  You can look at that field in dumps or debuggers.
-To help distinguish between the handler and the original tasks,
-handlers have _TIF_MCA_INIT set in thread_info.flags.
-
-The sos data is always in the MCA/INIT handler stack, at offset
-MCA_SOS_OFFSET.  You can get that value from mca_asm.h or calculate it
-as KERNEL_STACK_SIZE - sizeof(struct pt_regs) - sizeof(struct
-ia64_sal_os_state), with 16 byte alignment for all structures.
-
-Also the comm field of the MCA/INIT task is modified to include the pid
-of the original task, for humans to use.  For example, a comm field of
-'MCA 12159' means that pid 12159 was running when the MCA was
-delivered.
diff --git a/Documentation/ia64/serial.rst b/Documentation/ia64/serial.rst
new file mode 100644
index 000000000000..1de70c305a79
--- /dev/null
+++ b/Documentation/ia64/serial.rst
@@ -0,0 +1,165 @@
+==============
+Serial Devices
+==============
+
+Serial Device Naming
+====================
+
+    As of 2.6.10, serial devices on ia64 are named based on the
+    order of ACPI and PCI enumeration.  The first device in the
+    ACPI namespace (if any) becomes /dev/ttyS0, the second becomes
+    /dev/ttyS1, etc., and PCI devices are named sequentially
+    starting after the ACPI devices.
+
+    Prior to 2.6.10, there were confusing exceptions to this:
+
+	- Firmware on some machines (mostly from HP) provides an HCDP
+	  table[1] that tells the kernel about devices that can be used
+	  as a serial console.  If the user specified "console=ttyS0"
+	  or the EFI ConOut path contained only UART devices, the
+	  kernel registered the device described by the HCDP as
+	  /dev/ttyS0.
+
+	- If there was no HCDP, we assumed there were UARTs at the
+	  legacy COM port addresses (I/O ports 0x3f8 and 0x2f8), so
+	  the kernel registered those as /dev/ttyS0 and /dev/ttyS1.
+
+    Any additional ACPI or PCI devices were registered sequentially
+    after /dev/ttyS0 as they were discovered.
+
+    With an HCDP, device names changed depending on EFI configuration
+    and "console=" arguments.  Without an HCDP, device names didn't
+    change, but we registered devices that might not really exist.
+
+    For example, an HP rx1600 with a single built-in serial port
+    (described in the ACPI namespace) plus an MP[2] (a PCI device) has
+    these ports:
+
+      ==========  ==========     ============    ============   =======
+      Type        MMIO           pre-2.6.10      pre-2.6.10     2.6.10+
+		  address
+				 (EFI console    (EFI console
+                                 on builtin)     on MP port)
+      ==========  ==========     ============    ============   =======
+      builtin     0xff5e0000        ttyS0           ttyS1         ttyS0
+      MP UPS      0xf8031000        ttyS1           ttyS2         ttyS1
+      MP Console  0xf8030000        ttyS2           ttyS0         ttyS2
+      MP 2        0xf8030010        ttyS3           ttyS3         ttyS3
+      MP 3        0xf8030038        ttyS4           ttyS4         ttyS4
+      ==========  ==========     ============    ============   =======
+
+Console Selection
+=================
+
+    EFI knows what your console devices are, but it doesn't tell the
+    kernel quite enough to actually locate them.  The DIG64 HCDP
+    table[1] does tell the kernel where potential serial console
+    devices are, but not all firmware supplies it.  Also, EFI supports
+    multiple simultaneous consoles and doesn't tell the kernel which
+    should be the "primary" one.
+
+    So how do you tell Linux which console device to use?
+
+	- If your firmware supplies the HCDP, it is simplest to
+	  configure EFI with a single device (either a UART or a VGA
+	  card) as the console.  Then you don't need to tell Linux
+	  anything; the kernel will automatically use the EFI console.
+
+	  (This works only in 2.6.6 or later; prior to that you had
+	  to specify "console=ttyS0" to get a serial console.)
+
+	- Without an HCDP, Linux defaults to a VGA console unless you
+	  specify a "console=" argument.
+
+    NOTE: Don't assume that a serial console device will be /dev/ttyS0.
+    It might be ttyS1, ttyS2, etc.  Make sure you have the appropriate
+    entries in /etc/inittab (for getty) and /etc/securetty (to allow
+    root login).
+
+Early Serial Console
+====================
+
+    The kernel can't start using a serial console until it knows where
+    the device lives.  Normally this happens when the driver enumerates
+    all the serial devices, which can happen a minute or more after the
+    kernel starts booting.
+
+    2.6.10 and later kernels have an "early uart" driver that works
+    very early in the boot process.  The kernel will automatically use
+    this if the user supplies an argument like "console=uart,io,0x3f8",
+    or if the EFI console path contains only a UART device and the
+    firmware supplies an HCDP.
+
+Troubleshooting Serial Console Problems
+=======================================
+
+    No kernel output after elilo prints "Uncompressing Linux... done":
+
+	- You specified "console=ttyS0" but Linux changed the device
+	  to which ttyS0 refers.  Configure exactly one EFI console
+	  device[3] and remove the "console=" option.
+
+	- The EFI console path contains both a VGA device and a UART.
+	  EFI and elilo use both, but Linux defaults to VGA.  Remove
+	  the VGA device from the EFI console path[3].
+
+	- Multiple UARTs selected as EFI console devices.  EFI and
+	  elilo use all selected devices, but Linux uses only one.
+	  Make sure only one UART is selected in the EFI console
+	  path[3].
+
+	- You're connected to an HP MP port[2] but have a non-MP UART
+	  selected as EFI console device.  EFI uses the MP as a
+	  console device even when it isn't explicitly selected.
+	  Either move the console cable to the non-MP UART, or change
+	  the EFI console path[3] to the MP UART.
+
+    Long pause (60+ seconds) between "Uncompressing Linux... done" and
+    start of kernel output:
+
+	- No early console because you used "console=ttyS<n>".  Remove
+	  the "console=" option if your firmware supplies an HCDP.
+
+	- If you don't have an HCDP, the kernel doesn't know where
+	  your console lives until the driver discovers serial
+	  devices.  Use "console=uart,io,0x3f8" (or appropriate
+	  address for your machine).
+
+    Kernel and init script output works fine, but no "login:" prompt:
+
+	- Add getty entry to /etc/inittab for console tty.  Look for
+	  the "Adding console on ttyS<n>" message that tells you which
+	  device is the console.
+
+    "login:" prompt, but can't login as root:
+
+	- Add entry to /etc/securetty for console tty.
+
+    No ACPI serial devices found in 2.6.17 or later:
+
+	- Turn on CONFIG_PNP and CONFIG_PNPACPI.  Prior to 2.6.17, ACPI
+	  serial devices were discovered by 8250_acpi.  In 2.6.17,
+	  8250_acpi was replaced by the combination of 8250_pnp and
+	  CONFIG_PNPACPI.
+
+
+
+[1]
+    http://www.dig64.org/specifications/agreement
+    The table was originally defined as the "HCDP" for "Headless
+    Console/Debug Port."  The current version is the "PCDP" for
+    "Primary Console and Debug Port Devices."
+
+[2]
+    The HP MP (management processor) is a PCI device that provides
+    several UARTs.  One of the UARTs is often used as a console; the
+    EFI Boot Manager identifies it as "Acpi(HWP0002,700)/Pci(...)/Uart".
+    The external connection is usually a 25-pin connector, and a
+    special dongle converts that to three 9-pin connectors, one of
+    which is labelled "Console."
+
+[3]
+    EFI console devices are configured using the EFI Boot Manager
+    "Boot option maintenance" menu.  You may have to interrupt the
+    boot sequence to use this menu, and you will have to reset the
+    box after changing console configuration.
diff --git a/Documentation/ia64/serial.txt b/Documentation/ia64/serial.txt
deleted file mode 100644
index a63d2c54329b..000000000000
--- a/Documentation/ia64/serial.txt
+++ /dev/null
@@ -1,151 +0,0 @@
-SERIAL DEVICE NAMING
-
-    As of 2.6.10, serial devices on ia64 are named based on the
-    order of ACPI and PCI enumeration.  The first device in the
-    ACPI namespace (if any) becomes /dev/ttyS0, the second becomes
-    /dev/ttyS1, etc., and PCI devices are named sequentially
-    starting after the ACPI devices.
-
-    Prior to 2.6.10, there were confusing exceptions to this:
-
-	- Firmware on some machines (mostly from HP) provides an HCDP
-	  table[1] that tells the kernel about devices that can be used
-	  as a serial console.  If the user specified "console=ttyS0"
-	  or the EFI ConOut path contained only UART devices, the
-	  kernel registered the device described by the HCDP as
-	  /dev/ttyS0.
-
-	- If there was no HCDP, we assumed there were UARTs at the
-	  legacy COM port addresses (I/O ports 0x3f8 and 0x2f8), so
-	  the kernel registered those as /dev/ttyS0 and /dev/ttyS1.
-
-    Any additional ACPI or PCI devices were registered sequentially
-    after /dev/ttyS0 as they were discovered.
-
-    With an HCDP, device names changed depending on EFI configuration
-    and "console=" arguments.  Without an HCDP, device names didn't
-    change, but we registered devices that might not really exist.
-
-    For example, an HP rx1600 with a single built-in serial port
-    (described in the ACPI namespace) plus an MP[2] (a PCI device) has
-    these ports:
-
-                                  pre-2.6.10      pre-2.6.10
-                    MMIO         (EFI console    (EFI console
-                   address        on builtin)     on MP port)    2.6.10
-                  ==========      ==========      ==========     ======
-      builtin     0xff5e0000        ttyS0           ttyS1         ttyS0
-      MP UPS      0xf8031000        ttyS1           ttyS2         ttyS1
-      MP Console  0xf8030000        ttyS2           ttyS0         ttyS2
-      MP 2        0xf8030010        ttyS3           ttyS3         ttyS3
-      MP 3        0xf8030038        ttyS4           ttyS4         ttyS4
-
-CONSOLE SELECTION
-
-    EFI knows what your console devices are, but it doesn't tell the
-    kernel quite enough to actually locate them.  The DIG64 HCDP
-    table[1] does tell the kernel where potential serial console
-    devices are, but not all firmware supplies it.  Also, EFI supports
-    multiple simultaneous consoles and doesn't tell the kernel which
-    should be the "primary" one.
-
-    So how do you tell Linux which console device to use?
-
-	- If your firmware supplies the HCDP, it is simplest to
-	  configure EFI with a single device (either a UART or a VGA
-	  card) as the console.  Then you don't need to tell Linux
-	  anything; the kernel will automatically use the EFI console.
-
-	  (This works only in 2.6.6 or later; prior to that you had
-	  to specify "console=ttyS0" to get a serial console.)
-
-	- Without an HCDP, Linux defaults to a VGA console unless you
-	  specify a "console=" argument.
-
-    NOTE: Don't assume that a serial console device will be /dev/ttyS0.
-    It might be ttyS1, ttyS2, etc.  Make sure you have the appropriate
-    entries in /etc/inittab (for getty) and /etc/securetty (to allow
-    root login).
-
-EARLY SERIAL CONSOLE
-
-    The kernel can't start using a serial console until it knows where
-    the device lives.  Normally this happens when the driver enumerates
-    all the serial devices, which can happen a minute or more after the
-    kernel starts booting.
-
-    2.6.10 and later kernels have an "early uart" driver that works
-    very early in the boot process.  The kernel will automatically use
-    this if the user supplies an argument like "console=uart,io,0x3f8",
-    or if the EFI console path contains only a UART device and the
-    firmware supplies an HCDP.
-
-TROUBLESHOOTING SERIAL CONSOLE PROBLEMS
-
-    No kernel output after elilo prints "Uncompressing Linux... done":
-
-	- You specified "console=ttyS0" but Linux changed the device
-	  to which ttyS0 refers.  Configure exactly one EFI console
-	  device[3] and remove the "console=" option.
-
-	- The EFI console path contains both a VGA device and a UART.
-	  EFI and elilo use both, but Linux defaults to VGA.  Remove
-	  the VGA device from the EFI console path[3].
-
-	- Multiple UARTs selected as EFI console devices.  EFI and
-	  elilo use all selected devices, but Linux uses only one.
-	  Make sure only one UART is selected in the EFI console
-	  path[3].
-
-	- You're connected to an HP MP port[2] but have a non-MP UART
-	  selected as EFI console device.  EFI uses the MP as a
-	  console device even when it isn't explicitly selected.
-	  Either move the console cable to the non-MP UART, or change
-	  the EFI console path[3] to the MP UART.
-
-    Long pause (60+ seconds) between "Uncompressing Linux... done" and
-    start of kernel output:
-
-	- No early console because you used "console=ttyS<n>".  Remove
-	  the "console=" option if your firmware supplies an HCDP.
-
-	- If you don't have an HCDP, the kernel doesn't know where
-	  your console lives until the driver discovers serial
-	  devices.  Use "console=uart,io,0x3f8" (or appropriate
-	  address for your machine).
-
-    Kernel and init script output works fine, but no "login:" prompt:
-
-	- Add getty entry to /etc/inittab for console tty.  Look for
-	  the "Adding console on ttyS<n>" message that tells you which
-	  device is the console.
-
-    "login:" prompt, but can't login as root:
-
-	- Add entry to /etc/securetty for console tty.
-
-    No ACPI serial devices found in 2.6.17 or later:
-
-	- Turn on CONFIG_PNP and CONFIG_PNPACPI.  Prior to 2.6.17, ACPI
-	  serial devices were discovered by 8250_acpi.  In 2.6.17,
-	  8250_acpi was replaced by the combination of 8250_pnp and
-	  CONFIG_PNPACPI.
-
-
-
-[1] http://www.dig64.org/specifications/agreement 
-    The table was originally defined as the "HCDP" for "Headless
-    Console/Debug Port."  The current version is the "PCDP" for
-    "Primary Console and Debug Port Devices."
-
-[2] The HP MP (management processor) is a PCI device that provides
-    several UARTs.  One of the UARTs is often used as a console; the
-    EFI Boot Manager identifies it as "Acpi(HWP0002,700)/Pci(...)/Uart".
-    The external connection is usually a 25-pin connector, and a
-    special dongle converts that to three 9-pin connectors, one of
-    which is labelled "Console."
-
-[3] EFI console devices are configured using the EFI Boot Manager
-    "Boot option maintenance" menu.  You may have to interrupt the
-    boot sequence to use this menu, and you will have to reset the
-    box after changing console configuration.
diff --git a/Documentation/ia64/xen.rst b/Documentation/ia64/xen.rst
new file mode 100644
index 000000000000..831339c74441
--- /dev/null
+++ b/Documentation/ia64/xen.rst
@@ -0,0 +1,206 @@
+********************************************************
+Recipe for getting/building/running Xen/ia64 with pv_ops
+********************************************************
+This recipe describes how to get xen-ia64 source and build it,
+and run domU with pv_ops.
+
+Requirements
+============
+
+  - python
+  - mercurial
+    it (aka "hg") is an open-source source code
+    management software. See the below.
+    http://www.selenic.com/mercurial/wiki/
+  - git
+  - bridge-utils
+
+Getting and Building Xen and Dom0
+=================================
+
+  My environment is:
+
+    - Machine  : Tiger4
+    - Domain0 OS  : RHEL5
+    - DomainU OS  : RHEL5
+
+ 1. Download source::
+
+	# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
+	# cd xen-unstable.hg
+	# hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg
+
+ 2. # make world
+
+ 3. # make install-tools
+
+ 4. copy kernels and xen::
+
+	# cp xen/xen.gz /boot/efi/efi/redhat/
+	# cp build-linux-2.6.18-xen_ia64/vmlinux.gz \
+	/boot/efi/efi/redhat/vmlinuz-2.6.18.8-xen
+
+ 5. make initrd for Dom0/DomU::
+
+	# make -C linux-2.6.18-xen.hg ARCH=ia64 modules_install \
+          O=$(pwd)/build-linux-2.6.18-xen_ia64
+	# mkinitrd -f /boot/efi/efi/redhat/initrd-2.6.18.8-xen.img \
+	  2.6.18.8-xen --builtin mptspi --builtin mptbase \
+	  --builtin mptscsih --builtin uhci-hcd --builtin ohci-hcd \
+	  --builtin ehci-hcd
+
+Making a disk image for guest OS
+================================
+
+ 1. make file::
+
+      # dd if=/dev/zero of=/root/rhel5.img bs=1M seek=4096 count=0
+      # mke2fs -F -j /root/rhel5.img
+      # mount -o loop /root/rhel5.img /mnt
+      # cp -ax /{dev,var,etc,usr,bin,sbin,lib} /mnt
+      # mkdir /mnt/{root,proc,sys,home,tmp}
+
+      Note: You may miss some device files. If so, please create them
+      with mknod. Or you can use tar instead of cp.
+
+ 2. modify DomU's fstab::
+
+      # vi /mnt/etc/fstab
+         /dev/xvda1  /            ext3    defaults        1 1
+         none        /dev/pts     devpts  gid=5,mode=620  0 0
+         none        /dev/shm     tmpfs   defaults        0 0
+         none        /proc        proc    defaults        0 0
+         none        /sys         sysfs   defaults        0 0
+
+ 3. modify inittab
+
+    set runlevel to 3 to avoid X trying to start::
+
+      # vi /mnt/etc/inittab
+         id:3:initdefault:
+
+    Start a getty on the hvc0 console::
+
+       X0:2345:respawn:/sbin/mingetty hvc0
+
+    tty1-6 mingetty can be commented out
+
+ 4. add hvc0 into /etc/securetty::
+
+      # vi /mnt/etc/securetty (add hvc0)
+
+ 5. umount::
+
+      # umount /mnt
+
+FYI, virt-manager can also make a disk image for guest OS.
+It's GUI tools and easy to make it.
+
+Boot Xen & Domain0
+==================
+
+ 1. replace elilo
+    elilo of RHEL5 can boot Xen and Dom0.
+    If you use old elilo (e.g RHEL4), please download from the below
+    http://elilo.sourceforge.net/cgi-bin/blosxom
+    and copy into /boot/efi/efi/redhat/::
+
+      # cp elilo-3.6-ia64.efi /boot/efi/efi/redhat/elilo.efi
+
+ 2. modify elilo.conf (like the below)::
+
+      # vi /boot/efi/efi/redhat/elilo.conf
+      prompt
+      timeout=20
+      default=xen
+      relocatable
+
+      image=vmlinuz-2.6.18.8-xen
+             label=xen
+             vmm=xen.gz
+             initrd=initrd-2.6.18.8-xen.img
+             read-only
+             append=" -- rhgb root=/dev/sda2"
+
+The append options before "--" are for xen hypervisor,
+the options after "--" are for dom0.
+
+FYI, your machine may need console options like
+"com1=19200,8n1 console=vga,com1". For example,
+append="com1=19200,8n1 console=vga,com1 -- rhgb console=tty0 \
+console=ttyS0 root=/dev/sda2"
+
+Getting and Building domU with pv_ops
+=====================================
+
+ 1. get pv_ops tree::
+
+      # git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/
+
+ 2. git branch (if necessary)::
+
+      # cd linux-2.6-xen-ia64/
+      # git checkout -b your_branch origin/xen-ia64-domu-minimal-2008may19
+
+   Note:
+     The current branch is xen-ia64-domu-minimal-2008may19.
+     But you would find the new branch. You can see with
+     "git branch -r" to get the branch lists.
+
+       http://people.valinux.co.jp/~yamahata/xen-ia64/for_eagl/linux-2.6-ia64-pv-ops.git/
+
+     is also available.
+
+     The tree is based on
+
+      git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 test)
+
+ 3. copy .config for pv_ops of domU::
+
+      # cp arch/ia64/configs/xen_domu_wip_defconfig .config
+
+ 4. make kernel with pv_ops::
+
+      # make oldconfig
+      # make
+
+ 5. install the kernel and initrd::
+
+      # cp vmlinux.gz /boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU
+      # make modules_install
+      # mkinitrd -f /boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img \
+        2.6.26-rc3xen-ia64-08941-g1b12161 --builtin mptspi \
+        --builtin mptbase --builtin mptscsih --builtin uhci-hcd \
+        --builtin ohci-hcd --builtin ehci-hcd
+
+Boot DomainU with pv_ops
+========================
+
+ 1. make config of DomU::
+
+     # vi /etc/xen/rhel5
+       kernel = "/boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU"
+       ramdisk = "/boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img"
+       vcpus = 1
+       memory = 512
+       name = "rhel5"
+       disk = [ 'file:/root/rhel5.img,xvda1,w' ]
+       root = "/dev/xvda1 ro"
+       extra= "rhgb console=hvc0"
+
+ 2. After boot xen and dom0, start xend::
+
+	# /etc/init.d/xend start
+
+   ( In the debugging case, `# XEND_DEBUG=1 xend trace_start` )
+
+ 3. start domU::
+
+	# xm create -c rhel5
+
+Reference
+=========
+- Wiki of Xen/IA64 upstream merge
+  http://wiki.xensource.com/xenwiki/XenIA64/UpstreamMerge
+
+Written by Akio Takebe <takebe_akio@jp.fujitsu.com> on 28 May 2008
diff --git a/Documentation/ia64/xen.txt b/Documentation/ia64/xen.txt
deleted file mode 100644
index a12c74ce2773..000000000000
--- a/Documentation/ia64/xen.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-       Recipe for getting/building/running Xen/ia64 with pv_ops
-       --------------------------------------------------------
-
-This recipe describes how to get xen-ia64 source and build it,
-and run domU with pv_ops.
-
-============
-Requirements
-============
-
-  - python
-  - mercurial
-    it (aka "hg") is an open-source source code
-    management software. See the below.
-    http://www.selenic.com/mercurial/wiki/
-  - git
-  - bridge-utils
-
-=================================
-Getting and Building Xen and Dom0
-=================================
-
-  My environment is;
-    Machine  : Tiger4
-    Domain0 OS  : RHEL5
-    DomainU OS  : RHEL5
-
- 1. Download source
-    # hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
-    # cd xen-unstable.hg
-    # hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg
-
- 2. # make world
-
- 3. # make install-tools
-
- 4. copy kernels and xen
-    # cp xen/xen.gz /boot/efi/efi/redhat/
-    # cp build-linux-2.6.18-xen_ia64/vmlinux.gz \
-      /boot/efi/efi/redhat/vmlinuz-2.6.18.8-xen
-
- 5. make initrd for Dom0/DomU
-    # make -C linux-2.6.18-xen.hg ARCH=ia64 modules_install \
-      O=$(pwd)/build-linux-2.6.18-xen_ia64
-    # mkinitrd -f /boot/efi/efi/redhat/initrd-2.6.18.8-xen.img \
-      2.6.18.8-xen --builtin mptspi --builtin mptbase \
-      --builtin mptscsih --builtin uhci-hcd --builtin ohci-hcd \
-      --builtin ehci-hcd
-
-================================
-Making a disk image for guest OS
-================================
-
- 1. make file
-    # dd if=/dev/zero of=/root/rhel5.img bs=1M seek=4096 count=0
-    # mke2fs -F -j /root/rhel5.img
-    # mount -o loop /root/rhel5.img /mnt
-    # cp -ax /{dev,var,etc,usr,bin,sbin,lib} /mnt
-    # mkdir /mnt/{root,proc,sys,home,tmp}
-
-    Note: You may miss some device files. If so, please create them
-    with mknod. Or you can use tar instead of cp.
-
- 2. modify DomU's fstab
-    # vi /mnt/etc/fstab
-       /dev/xvda1  /            ext3    defaults        1 1
-       none        /dev/pts     devpts  gid=5,mode=620  0 0
-       none        /dev/shm     tmpfs   defaults        0 0
-       none        /proc        proc    defaults        0 0
-       none        /sys         sysfs   defaults        0 0
-
- 3. modify inittab
-    set runlevel to 3 to avoid X trying to start
-    # vi /mnt/etc/inittab
-       id:3:initdefault:
-    Start a getty on the hvc0 console
-       X0:2345:respawn:/sbin/mingetty hvc0
-    tty1-6 mingetty can be commented out
-
- 4. add hvc0 into /etc/securetty
-    # vi /mnt/etc/securetty (add hvc0)
-
- 5. umount
-    # umount /mnt
-
-FYI, virt-manager can also make a disk image for guest OS.
-It's GUI tools and easy to make it.
-
-==================
-Boot Xen & Domain0
-==================
-
- 1. replace elilo
-    elilo of RHEL5 can boot Xen and Dom0.
-    If you use old elilo (e.g RHEL4), please download from the below
-    http://elilo.sourceforge.net/cgi-bin/blosxom
-    and copy into /boot/efi/efi/redhat/
-    # cp elilo-3.6-ia64.efi /boot/efi/efi/redhat/elilo.efi
-
- 2. modify elilo.conf (like the below)
-    # vi /boot/efi/efi/redhat/elilo.conf
-     prompt
-     timeout=20
-     default=xen
-     relocatable
-
-     image=vmlinuz-2.6.18.8-xen
-             label=xen
-             vmm=xen.gz
-             initrd=initrd-2.6.18.8-xen.img
-             read-only
-             append=" -- rhgb root=/dev/sda2"
-
-The append options before "--" are for xen hypervisor,
-the options after "--" are for dom0.
-
-FYI, your machine may need console options like
-"com1=19200,8n1 console=vga,com1". For example,
-append="com1=19200,8n1 console=vga,com1 -- rhgb console=tty0 \
-console=ttyS0 root=/dev/sda2"
-
-=====================================
-Getting and Building domU with pv_ops
-=====================================
-
- 1. get pv_ops tree
-    # git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/
-
- 2. git branch (if necessary)
-    # cd linux-2.6-xen-ia64/
-    # git checkout -b your_branch origin/xen-ia64-domu-minimal-2008may19
-    (Note: The current branch is xen-ia64-domu-minimal-2008may19.
-    But you would find the new branch. You can see with
-    "git branch -r" to get the branch lists.
-    http://people.valinux.co.jp/~yamahata/xen-ia64/for_eagl/linux-2.6-ia64-pv-ops.git/
-    is also available. The tree is based on
-    git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 test)
-
-
- 3. copy .config for pv_ops of domU
-    # cp arch/ia64/configs/xen_domu_wip_defconfig .config
-
- 4. make kernel with pv_ops
-    # make oldconfig
-    # make
-
- 5. install the kernel and initrd
-    # cp vmlinux.gz /boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU
-    # make modules_install
-    # mkinitrd -f /boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img \
-      2.6.26-rc3xen-ia64-08941-g1b12161 --builtin mptspi \
-      --builtin mptbase --builtin mptscsih --builtin uhci-hcd \
-      --builtin ohci-hcd --builtin ehci-hcd
-
-========================
-Boot DomainU with pv_ops
-========================
-
- 1. make config of DomU
-   # vi /etc/xen/rhel5
-     kernel = "/boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU"
-     ramdisk = "/boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img"
-     vcpus = 1
-     memory = 512
-     name = "rhel5"
-     disk = [ 'file:/root/rhel5.img,xvda1,w' ]
-     root = "/dev/xvda1 ro"
-     extra= "rhgb console=hvc0"
-
- 2. After boot xen and dom0, start xend
-   # /etc/init.d/xend start
-   ( In the debugging case, # XEND_DEBUG=1 xend trace_start )
-
- 3. start domU
-   # xm create -c rhel5
-
-=========
-Reference
-=========
-- Wiki of Xen/IA64 upstream merge
-  http://wiki.xensource.com/xenwiki/XenIA64/UpstreamMerge
-
-Written by Akio Takebe <takebe_akio@jp.fujitsu.com> on 28 May 2008
diff --git a/MAINTAINERS b/MAINTAINERS
index 2a2d74e5d670..c30b52c9049a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14389,7 +14389,7 @@ SGI SN-IA64 (Altix) SERIAL CONSOLE DRIVER
 M:	Pat Gefre <pfg@sgi.com>
 L:	linux-ia64@vger.kernel.org
 S:	Supported
-F:	Documentation/ia64/serial.txt
+F:	Documentation/ia64/serial.rst
 F:	drivers/tty/serial/ioc?_serial.c
 F:	include/linux/ioc?.h
 
diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 8f106638913c..3795d18276c4 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -852,7 +852,7 @@ valid_phys_addr_range (phys_addr_t phys_addr, unsigned long size)
 	 * /dev/mem reads and writes use copy_to_user(), which implicitly
 	 * uses a granule-sized kernel identity mapping.  It's really
 	 * only safe to do this for regions in kern_memmap.  For more
-	 * details, see Documentation/ia64/aliasing.txt.
+	 * details, see Documentation/ia64/aliasing.rst.
 	 */
 	attr = kern_mem_attribute(phys_addr, size);
 	if (attr & EFI_MEMORY_WB || attr & EFI_MEMORY_UC)
diff --git a/arch/ia64/kernel/fsys.S b/arch/ia64/kernel/fsys.S
index d80c99a5f55d..0750a716adc7 100644
--- a/arch/ia64/kernel/fsys.S
+++ b/arch/ia64/kernel/fsys.S
@@ -28,7 +28,7 @@
 #include <asm/native/inst.h>
 
 /*
- * See Documentation/ia64/fsys.txt for details on fsyscalls.
+ * See Documentation/ia64/fsys.rst for details on fsyscalls.
  *
  * On entry to an fsyscall handler:
  *   r10	= 0 (i.e., defaults to "successful syscall return")
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index 5e3e7b1fdac5..0c0de2c4ec69 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -42,7 +42,7 @@ ioremap (unsigned long phys_addr, unsigned long size)
 	/*
 	 * For things in kern_memmap, we must use the same attribute
 	 * as the rest of the kernel.  For more details, see
-	 * Documentation/ia64/aliasing.txt.
+	 * Documentation/ia64/aliasing.rst.
 	 */
 	attr = kern_mem_attribute(phys_addr, size);
 	if (attr & EFI_MEMORY_WB)
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index e308196c2229..165e561dc81a 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -450,7 +450,7 @@ pci_mmap_legacy_page_range(struct pci_bus *bus, struct vm_area_struct *vma,
 		return -ENOSYS;
 
 	/*
-	 * Avoid attribute aliasing.  See Documentation/ia64/aliasing.txt
+	 * Avoid attribute aliasing.  See Documentation/ia64/aliasing.rst
 	 * for more details.
 	 */
 	if (!valid_mmap_phys_addr_range(vma->vm_pgoff, size))
-- 
cgit v1.2.3-55-g7522


From b02f1651ff7758c4db0d759ab765d39986a79f5a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 11:12:57 -0300
Subject: docs: laptops: convert to ReST

Rename the laptops documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
---
 Documentation/ABI/testing/sysfs-block-device       |    2 +-
 .../ABI/testing/sysfs-platform-asus-laptop         |    2 +-
 Documentation/admin-guide/kernel-parameters.txt    |    2 +-
 Documentation/laptops/asus-laptop.rst              |  271 ++++
 Documentation/laptops/asus-laptop.txt              |  257 ----
 Documentation/laptops/disk-shock-protection.rst    |  151 ++
 Documentation/laptops/disk-shock-protection.txt    |  149 --
 Documentation/laptops/index.rst                    |   17 +
 Documentation/laptops/laptop-mode.rst              |  781 ++++++++++
 Documentation/laptops/laptop-mode.txt              |  782 ----------
 Documentation/laptops/sony-laptop.rst              |  174 +++
 Documentation/laptops/sony-laptop.txt              |  144 --
 Documentation/laptops/sonypi.rst                   |  160 ++
 Documentation/laptops/sonypi.txt                   |  152 --
 Documentation/laptops/thinkpad-acpi.rst            | 1562 ++++++++++++++++++++
 Documentation/laptops/thinkpad-acpi.txt            | 1487 -------------------
 Documentation/laptops/toshiba_haps.rst             |   87 ++
 Documentation/laptops/toshiba_haps.txt             |   76 -
 Documentation/sysctl/vm.txt                        |    4 +-
 MAINTAINERS                                        |    2 +-
 drivers/char/Kconfig                               |    2 +-
 drivers/platform/x86/Kconfig                       |    4 +-
 22 files changed, 3212 insertions(+), 3056 deletions(-)
 create mode 100644 Documentation/laptops/asus-laptop.rst
 delete mode 100644 Documentation/laptops/asus-laptop.txt
 create mode 100644 Documentation/laptops/disk-shock-protection.rst
 delete mode 100644 Documentation/laptops/disk-shock-protection.txt
 create mode 100644 Documentation/laptops/index.rst
 create mode 100644 Documentation/laptops/laptop-mode.rst
 delete mode 100644 Documentation/laptops/laptop-mode.txt
 create mode 100644 Documentation/laptops/sony-laptop.rst
 delete mode 100644 Documentation/laptops/sony-laptop.txt
 create mode 100644 Documentation/laptops/sonypi.rst
 delete mode 100644 Documentation/laptops/sonypi.txt
 create mode 100644 Documentation/laptops/thinkpad-acpi.rst
 delete mode 100644 Documentation/laptops/thinkpad-acpi.txt
 create mode 100644 Documentation/laptops/toshiba_haps.rst
 delete mode 100644 Documentation/laptops/toshiba_haps.txt

diff --git a/Documentation/ABI/testing/sysfs-block-device b/Documentation/ABI/testing/sysfs-block-device
index 82ef6eab042d..0d57bbb4fddc 100644
--- a/Documentation/ABI/testing/sysfs-block-device
+++ b/Documentation/ABI/testing/sysfs-block-device
@@ -45,7 +45,7 @@ Description:
 		- Values below -2 are rejected with -EINVAL
 
 		For more information, see
-		Documentation/laptops/disk-shock-protection.txt
+		Documentation/laptops/disk-shock-protection.rst
 
 
 What:		/sys/block/*/device/ncq_prio_enable
diff --git a/Documentation/ABI/testing/sysfs-platform-asus-laptop b/Documentation/ABI/testing/sysfs-platform-asus-laptop
index cd9d667c3da2..d67fa4bafa70 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-laptop
+++ b/Documentation/ABI/testing/sysfs-platform-asus-laptop
@@ -31,7 +31,7 @@ Description:
 		To control the LED display, use the following :
 		    echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
 		where T control the 3 letters display, and DDD the 3 digits display.
-		The DDD table can be found in Documentation/laptops/asus-laptop.txt
+		The DDD table can be found in Documentation/laptops/asus-laptop.rst
 
 What:		/sys/devices/platform/asus_laptop/bluetooth
 Date:		January 2007
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ed104a44e8b2..a342dd5c95a9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4347,7 +4347,7 @@
 			Format: <integer>
 
 	sonypi.*=	[HW] Sony Programmable I/O Control Device driver
-			See Documentation/laptops/sonypi.txt
+			See Documentation/laptops/sonypi.rst
 
 	spectre_v2=	[X86] Control mitigation of Spectre variant 2
 			(indirect branch speculation) vulnerability.
diff --git a/Documentation/laptops/asus-laptop.rst b/Documentation/laptops/asus-laptop.rst
new file mode 100644
index 000000000000..95176321a25a
--- /dev/null
+++ b/Documentation/laptops/asus-laptop.rst
@@ -0,0 +1,271 @@
+==================
+Asus Laptop Extras
+==================
+
+Version 0.1
+
+August 6, 2009
+
+Corentin Chary <corentincj@iksaif.net>
+http://acpi4asus.sf.net/
+
+ This driver provides support for extra features of ACPI-compatible ASUS laptops.
+ It may also support some MEDION, JVC or VICTOR laptops (such as MEDION 9675 or
+ VICTOR XP7210 for example). It makes all the extra buttons generate input
+ events (like keyboards).
+
+ On some models adds support for changing the display brightness and output,
+ switching the LCD backlight on and off, and most importantly, allows you to
+ blink those fancy LEDs intended for reporting mail and wireless status.
+
+This driver supersedes the old asus_acpi driver.
+
+Requirements
+------------
+
+  Kernel 2.6.X sources, configured for your computer, with ACPI support.
+  You also need CONFIG_INPUT and CONFIG_ACPI.
+
+Status
+------
+
+ The features currently supported are the following (see below for
+ detailed description):
+
+ - Fn key combinations
+ - Bluetooth enable and disable
+ - Wlan enable and disable
+ - GPS enable and disable
+ - Video output switching
+ - Ambient Light Sensor on and off
+ - LED control
+ - LED Display control
+ - LCD brightness control
+ - LCD on and off
+
+ A compatibility table by model and feature is maintained on the web
+ site, http://acpi4asus.sf.net/.
+
+Usage
+-----
+
+  Try "modprobe asus-laptop". Check your dmesg (simply type dmesg). You should
+  see some lines like this :
+
+      Asus Laptop Extras version 0.42
+        - L2D model detected.
+
+  If it is not the output you have on your laptop, send it (and the laptop's
+  DSDT) to me.
+
+  That's all, now, all the events generated by the hotkeys of your laptop
+  should be reported via netlink events. You can check with
+  "acpi_genl monitor" (part of the acpica project).
+
+  Hotkeys are also reported as input keys (like keyboards) you can check
+  which key are supported using "xev" under X11.
+
+  You can get information on the version of your DSDT table by reading the
+  /sys/devices/platform/asus-laptop/infos entry. If you have a question or a
+  bug report to do, please include the output of this entry.
+
+LEDs
+----
+
+  You can modify LEDs be echoing values to `/sys/class/leds/asus/*/brightness`::
+
+    echo 1 >  /sys/class/leds/asus::mail/brightness
+
+  will switch the mail LED on.
+
+  You can also know if they are on/off by reading their content and use
+  kernel triggers like disk-activity or heartbeat.
+
+Backlight
+---------
+
+  You can control lcd backlight power and brightness with
+  /sys/class/backlight/asus-laptop/. Brightness Values are between 0 and 15.
+
+Wireless devices
+----------------
+
+  You can turn the internal Bluetooth adapter on/off with the bluetooth entry
+  (only on models with Bluetooth). This usually controls the associated LED.
+  Same for Wlan adapter.
+
+Display switching
+-----------------
+
+  Note: the display switching code is currently considered EXPERIMENTAL.
+
+  Switching works for the following models:
+
+    - L3800C
+    - A2500H
+    - L5800C
+    - M5200N
+    - W1000N (albeit with some glitches)
+    - M6700R
+    - A6JC
+    - F3J
+
+  Switching doesn't work for the following:
+
+    - M3700N
+    - L2X00D (locks the laptop under certain conditions)
+
+  To switch the displays, echo values from 0 to 15 to
+  /sys/devices/platform/asus-laptop/display. The significance of those values
+  is as follows:
+
+  +-------+-----+-----+-----+-----+-----+
+  | Bin   | Val | DVI | TV  | CRT | LCD |
+  +-------+-----+-----+-----+-----+-----+
+  | 0000  |   0 |     |     |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0001  |   1 |     |     |     |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 0010  |   2 |     |     |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0011  |   3 |     |     |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 0100  |   4 |     |  X  |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0101  |   5 |     |  X  |     | X   |
+  +-------+-----+-----+-----+-----+-----+
+  | 0110  |   6 |     |  X  |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0111  |   7 |     |  X  |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1000  |   8 |  X  |     |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1001  |   9 |  X  |     |     |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1010  |  10 |  X  |     |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1011  |  11 |  X  |     |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1100  |  12 |  X  |  X  |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1101  |  13 |  X  |  X  |     |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1110  |  14 |  X  |  X  |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1111  |  15 |  X  |  X  |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+
+  In most cases, the appropriate displays must be plugged in for the above
+  combinations to work. TV-Out may need to be initialized at boot time.
+
+  Debugging:
+
+  1) Check whether the Fn+F8 key:
+
+     a) does not lock the laptop (try a boot with noapic / nolapic if it does)
+     b) generates events (0x6n, where n is the value corresponding to the
+        configuration above)
+     c) actually works
+
+     Record the disp value at every configuration.
+  2) Echo values from 0 to 15 to /sys/devices/platform/asus-laptop/display.
+     Record its value, note any change. If nothing changes, try a broader range,
+     up to 65535.
+  3) Send ANY output (both positive and negative reports are needed, unless your
+     machine is already listed above) to the acpi4asus-user mailing list.
+
+  Note: on some machines (e.g. L3C), after the module has been loaded, only 0x6n
+  events are generated and no actual switching occurs. In such a case, a line
+  like::
+
+    echo $((10#$arg-60)) > /sys/devices/platform/asus-laptop/display
+
+  will usually do the trick ($arg is the 0000006n-like event passed to acpid).
+
+  Note: there is currently no reliable way to read display status on xxN
+  (Centrino) models.
+
+LED display
+-----------
+
+  Some models like the W1N have a LED display that can be used to display
+  several items of information.
+
+  LED display works for the following models:
+
+    - W1000N
+    - W1J
+
+  To control the LED display, use the following::
+
+    echo 0x0T000DDD > /sys/devices/platform/asus-laptop/
+
+  where T control the 3 letters display, and DDD the 3 digits display,
+  according to the tables below::
+
+         DDD (digits)
+         000 to 999 = display digits
+         AAA        = ---
+         BBB to FFF = turn-off
+
+         T  (type)
+         0 = off
+         1 = dvd
+         2 = vcd
+         3 = mp3
+         4 = cd
+         5 = tv
+         6 = cpu
+         7 = vol
+
+  For example "echo 0x01000001 >/sys/devices/platform/asus-laptop/ledd"
+  would display "DVD001".
+
+Driver options
+--------------
+
+ Options can be passed to the asus-laptop driver using the standard
+ module argument syntax (<param>=<value> when passing the option to the
+ module or asus-laptop.<param>=<value> on the kernel boot line when
+ asus-laptop is statically linked into the kernel).
+
+	     wapf: WAPF defines the behavior of the Fn+Fx wlan key
+		   The significance of values is yet to be found, but
+		   most of the time:
+
+		   - 0x0 should do nothing
+		   - 0x1 should allow to control the device with Fn+Fx key.
+		   - 0x4 should send an ACPI event (0x88) while pressing the Fn+Fx key
+		   - 0x5 like 0x1 or 0x4
+
+ The default value is 0x1.
+
+Unsupported models
+------------------
+
+ These models will never be supported by this module, as they use a completely
+ different mechanism to handle LEDs and extra stuff (meaning we have no clue
+ how it works):
+
+ - ASUS A1300 (A1B), A1370D
+ - ASUS L7300G
+ - ASUS L8400
+
+Patches, Errors, Questions
+--------------------------
+
+ I appreciate any success or failure
+ reports, especially if they add to or correct the compatibility table.
+ Please include the following information in your report:
+
+ - Asus model name
+ - a copy of your ACPI tables, using the "acpidump" utility
+ - a copy of /sys/devices/platform/asus-laptop/infos
+ - which driver features work and which don't
+ - the observed behavior of non-working features
+
+ Any other comments or patches are also more than welcome.
+
+ acpi4asus-user@lists.sourceforge.net
+
+ http://sourceforge.net/projects/acpi4asus
diff --git a/Documentation/laptops/asus-laptop.txt b/Documentation/laptops/asus-laptop.txt
deleted file mode 100644
index 5f2858712aa0..000000000000
--- a/Documentation/laptops/asus-laptop.txt
+++ /dev/null
@@ -1,257 +0,0 @@
-Asus Laptop Extras
-
-Version 0.1
-August 6, 2009
-
-Corentin Chary <corentincj@iksaif.net>
-http://acpi4asus.sf.net/
-
- This driver provides support for extra features of ACPI-compatible ASUS laptops.
- It may also support some MEDION, JVC or VICTOR laptops (such as MEDION 9675 or
- VICTOR XP7210 for example). It makes all the extra buttons generate input
- events (like keyboards).
- On some models adds support for changing the display brightness and output,
- switching the LCD backlight on and off, and most importantly, allows you to
- blink those fancy LEDs intended for reporting mail and wireless status.
-
-This driver supercedes the old asus_acpi driver.
-
-Requirements
-------------
-
-  Kernel 2.6.X sources, configured for your computer, with ACPI support.
-  You also need CONFIG_INPUT and CONFIG_ACPI.
-
-Status
-------
-
- The features currently supported are the following (see below for
- detailed description):
-
- - Fn key combinations
- - Bluetooth enable and disable
- - Wlan enable and disable
- - GPS enable and disable
- - Video output switching
- - Ambient Light Sensor on and off
- - LED control
- - LED Display control
- - LCD brightness control
- - LCD on and off
-
- A compatibility table by model and feature is maintained on the web
- site, http://acpi4asus.sf.net/.
-
-Usage
------
-
-  Try "modprobe asus-laptop". Check your dmesg (simply type dmesg). You should
-  see some lines like this :
-
-      Asus Laptop Extras version 0.42
-        L2D model detected.
-
-  If it is not the output you have on your laptop, send it (and the laptop's
-  DSDT) to me.
-
-  That's all, now, all the events generated by the hotkeys of your laptop
-  should be reported via netlink events. You can check with
-  "acpi_genl monitor" (part of the acpica project).
-
-  Hotkeys are also reported as input keys (like keyboards) you can check
-  which key are supported using "xev" under X11.
-
-  You can get information on the version of your DSDT table by reading the
-  /sys/devices/platform/asus-laptop/infos entry. If you have a question or a
-  bug report to do, please include the output of this entry.
-
-LEDs
-----
-
-  You can modify LEDs be echoing values to /sys/class/leds/asus::*/brightness :
-    echo 1 >  /sys/class/leds/asus::mail/brightness
-  will switch the mail LED on.
-  You can also know if they are on/off by reading their content and use
-  kernel triggers like disk-activity or heartbeat.
-
-Backlight
----------
-
-  You can control lcd backlight power and brightness with
-  /sys/class/backlight/asus-laptop/. Brightness Values are between 0 and 15.
-
-Wireless devices
----------------
-
-  You can turn the internal Bluetooth adapter on/off with the bluetooth entry
-  (only on models with Bluetooth). This usually controls the associated LED.
-  Same for Wlan adapter.
-
-Display switching
------------------
-
-  Note: the display switching code is currently considered EXPERIMENTAL.
-
-  Switching works for the following models:
-    L3800C
-    A2500H
-    L5800C
-    M5200N
-    W1000N (albeit with some glitches)
-    M6700R
-    A6JC
-    F3J
-
-  Switching doesn't work for the following:
-    M3700N
-    L2X00D (locks the laptop under certain conditions)
-
-  To switch the displays, echo values from 0 to 15 to
-  /sys/devices/platform/asus-laptop/display. The significance of those values
-  is as follows:
-
-  +-------+-----+-----+-----+-----+-----+
-  | Bin   | Val | DVI | TV  | CRT | LCD |
-  +-------+-----+-----+-----+-----+-----+
-  + 0000  +   0 +     +     +     +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 0001  +   1 +     +     +     +  X  +
-  +-------+-----+-----+-----+-----+-----+
-  + 0010  +   2 +     +     +  X  +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 0011  +   3 +     +     +  X  +  X  +
-  +-------+-----+-----+-----+-----+-----+
-  + 0100  +   4 +     +  X  +     +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 0101  +   5 +     +  X  +     + X   +
-  +-------+-----+-----+-----+-----+-----+
-  + 0110  +   6 +     +  X  +  X  +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 0111  +   7 +     +  X  +  X  +  X  +
-  +-------+-----+-----+-----+-----+-----+
-  + 1000  +   8 +  X  +     +     +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 1001  +   9 +  X  +     +     +  X  +
-  +-------+-----+-----+-----+-----+-----+
-  + 1010  +  10 +  X  +     +  X  +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 1011  +  11 +  X  +     +  X  +  X  +
-  +-------+-----+-----+-----+-----+-----+
-  + 1100  +  12 +  X  +  X  +     +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 1101  +  13 +  X  +  X  +     +  X  +
-  +-------+-----+-----+-----+-----+-----+
-  + 1110  +  14 +  X  +  X  +  X  +     +
-  +-------+-----+-----+-----+-----+-----+
-  + 1111  +  15 +  X  +  X  +  X  +  X  +
-  +-------+-----+-----+-----+-----+-----+
-
-  In most cases, the appropriate displays must be plugged in for the above
-  combinations to work. TV-Out may need to be initialized at boot time.
-
-  Debugging:
-  1) Check whether the Fn+F8 key:
-     a) does not lock the laptop (try a boot with noapic / nolapic if it does)
-     b) generates events (0x6n, where n is the value corresponding to the
-        configuration above)
-     c) actually works
-     Record the disp value at every configuration.
-  2) Echo values from 0 to 15 to /sys/devices/platform/asus-laptop/display.
-     Record its value, note any change. If nothing changes, try a broader range,
-     up to 65535.
-  3) Send ANY output (both positive and negative reports are needed, unless your
-     machine is already listed above) to the acpi4asus-user mailing list.
-
-  Note: on some machines (e.g. L3C), after the module has been loaded, only 0x6n
-  events are generated and no actual switching occurs. In such a case, a line
-  like:
-
-    echo $((10#$arg-60)) > /sys/devices/platform/asus-laptop/display
-
-  will usually do the trick ($arg is the 0000006n-like event passed to acpid).
-
-  Note: there is currently no reliable way to read display status on xxN
-  (Centrino) models.
-
-LED display
------------
-
-  Some models like the W1N have a LED display that can be used to display
-  several items of information.
-
-  LED display works for the following models:
-    W1000N
-    W1J
-
-  To control the LED display, use the following :
-
-    echo 0x0T000DDD > /sys/devices/platform/asus-laptop/
-
-  where T control the 3 letters display, and DDD the 3 digits display,
-  according to the tables below.
-
-         DDD (digits)
-         000 to 999 = display digits
-         AAA        = ---
-         BBB to FFF = turn-off
-
-         T  (type)
-         0 = off
-         1 = dvd
-         2 = vcd
-         3 = mp3
-         4 = cd
-         5 = tv
-         6 = cpu
-         7 = vol
-
-  For example "echo 0x01000001 >/sys/devices/platform/asus-laptop/ledd"
-  would display "DVD001".
-
-Driver options:
----------------
-
- Options can be passed to the asus-laptop driver using the standard
- module argument syntax (<param>=<value> when passing the option to the
- module or asus-laptop.<param>=<value> on the kernel boot line when
- asus-laptop is statically linked into the kernel).
-
-	     wapf: WAPF defines the behavior of the Fn+Fx wlan key
-		   The significance of values is yet to be found, but
-		   most of the time:
-		   - 0x0 should do nothing
-		   - 0x1 should allow to control the device with Fn+Fx key.
-		   - 0x4 should send an ACPI event (0x88) while pressing the Fn+Fx key
-		   - 0x5 like 0x1 or 0x4
-
- The default value is 0x1.
-
-Unsupported models
-------------------
-
- These models will never be supported by this module, as they use a completely
- different mechanism to handle LEDs and extra stuff (meaning we have no clue
- how it works):
-
- - ASUS A1300 (A1B), A1370D
- - ASUS L7300G
- - ASUS L8400
-
-Patches, Errors, Questions:
---------------------------
-
- I appreciate any success or failure
- reports, especially if they add to or correct the compatibility table.
- Please include the following information in your report:
-
- - Asus model name
- - a copy of your ACPI tables, using the "acpidump" utility
- - a copy of /sys/devices/platform/asus-laptop/infos
- - which driver features work and which don't
- - the observed behavior of non-working features
-
- Any other comments or patches are also more than welcome.
-
- acpi4asus-user@lists.sourceforge.net
- http://sourceforge.net/projects/acpi4asus
-
diff --git a/Documentation/laptops/disk-shock-protection.rst b/Documentation/laptops/disk-shock-protection.rst
new file mode 100644
index 000000000000..e97c5f78d8c3
--- /dev/null
+++ b/Documentation/laptops/disk-shock-protection.rst
@@ -0,0 +1,151 @@
+==========================
+Hard disk shock protection
+==========================
+
+Author: Elias Oltmanns <eo@nebensachen.de>
+
+Last modified: 2008-10-03
+
+
+.. 0. Contents
+
+   1. Intro
+   2. The interface
+   3. References
+   4. CREDITS
+
+
+1. Intro
+--------
+
+ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
+Issuing this command should cause the drive to switch to idle mode and
+unload disk heads. This feature is being used in modern laptops in
+conjunction with accelerometers and appropriate software to implement
+a shock protection facility. The idea is to stop all I/O operations on
+the internal hard drive and park its heads on the ramp when critical
+situations are anticipated. The desire to have such a feature
+available on GNU/Linux systems has been the original motivation to
+implement a generic disk head parking interface in the Linux kernel.
+Please note, however, that other components have to be set up on your
+system in order to get disk shock protection working (see
+section 3. References below for pointers to more information about
+that).
+
+
+2. The interface
+----------------
+
+For each ATA device, the kernel exports the file
+`block/*/device/unload_heads` in sysfs (here assumed to be mounted under
+/sys). Access to `/sys/block/*/device/unload_heads` is denied with
+-EOPNOTSUPP if the device does not support the unload feature.
+Otherwise, writing an integer value to this file will take the heads
+of the respective drive off the platter and block all I/O operations
+for the specified number of milliseconds. When the timeout expires and
+no further disk head park request has been issued in the meantime,
+normal operation will be resumed. The maximal value accepted for a
+timeout is 30000 milliseconds. Exceeding this limit will return
+-EOVERFLOW, but heads will be parked anyway and the timeout will be
+set to 30 seconds. However, you can always change a timeout to any
+value between 0 and 30000 by issuing a subsequent head park request
+before the timeout of the previous one has expired. In particular, the
+total timeout can exceed 30 seconds and, more importantly, you can
+cancel a previously set timeout and resume normal operation
+immediately by specifying a timeout of 0. Values below -2 are rejected
+with -EINVAL (see below for the special meaning of -1 and -2). If the
+timeout specified for a recent head park request has not yet expired,
+reading from `/sys/block/*/device/unload_heads` will report the number
+of milliseconds remaining until normal operation will be resumed;
+otherwise, reading the unload_heads attribute will return 0.
+
+For example, do the following in order to park the heads of drive
+/dev/sda and stop all I/O operations for five seconds::
+
+	# echo 5000 > /sys/block/sda/device/unload_heads
+
+A simple::
+
+	# cat /sys/block/sda/device/unload_heads
+
+will show you how many milliseconds are left before normal operation
+will be resumed.
+
+A word of caution: The fact that the interface operates on a basis of
+milliseconds may raise expectations that cannot be satisfied in
+reality. In fact, the ATA specs clearly state that the time for an
+unload operation to complete is vendor specific. The hint in ATA-7
+that this will typically be within 500 milliseconds apparently has
+been dropped in ATA-8.
+
+There is a technical detail of this implementation that may cause some
+confusion and should be discussed here. When a head park request has
+been issued to a device successfully, all I/O operations on the
+controller port this device is attached to will be deferred. That is
+to say, any other device that may be connected to the same port will
+be affected too. The only exception is that a subsequent head unload
+request to that other device will be executed immediately. Further
+operations on that port will be deferred until the timeout specified
+for either device on the port has expired. As far as PATA (old style
+IDE) configurations are concerned, there can only be two devices
+attached to any single port. In SATA world we have port multipliers
+which means that a user-issued head parking request to one device may
+actually result in stopping I/O to a whole bunch of devices. However,
+since this feature is supposed to be used on laptops and does not seem
+to be very useful in any other environment, there will be mostly one
+device per port. Even if the CD/DVD writer happens to be connected to
+the same port as the hard drive, it generally *should* recover just
+fine from the occasional buffer under-run incurred by a head park
+request to the HD. Actually, when you are using an ide driver rather
+than its libata counterpart (i.e. your disk is called /dev/hda
+instead of /dev/sda), then parking the heads of one drive (drive X)
+will generally not affect the mode of operation of another drive
+(drive Y) on the same port as described above. It is only when a port
+reset is required to recover from an exception on drive Y that further
+I/O operations on that drive (and the reset itself) will be delayed
+until drive X is no longer in the parked state.
+
+Finally, there are some hard drives that only comply with an earlier
+version of the ATA standard than ATA-7, but do support the unload
+feature nonetheless. Unfortunately, there is no safe way Linux can
+detect these devices, so you won't be able to write to the
+unload_heads attribute. If you know that your device really does
+support the unload feature (for instance, because the vendor of your
+laptop or the hard drive itself told you so), then you can tell the
+kernel to enable the usage of this feature for that drive by writing
+the special value -1 to the unload_heads attribute::
+
+	# echo -1 > /sys/block/sda/device/unload_heads
+
+will enable the feature for /dev/sda, and giving -2 instead of -1 will
+disable it again.
+
+
+3. References
+-------------
+
+There are several laptops from different vendors featuring shock
+protection capabilities. As manufacturers have refused to support open
+source development of the required software components so far, Linux
+support for shock protection varies considerably between different
+hardware implementations. Ideally, this section should contain a list
+of pointers at different projects aiming at an implementation of shock
+protection on different systems. Unfortunately, I only know of a
+single project which, although still considered experimental, is fit
+for use. Please feel free to add projects that have been the victims
+of my ignorance.
+
+- http://www.thinkwiki.org/wiki/HDAPS
+
+  See this page for information about Linux support of the hard disk
+  active protection system as implemented in IBM/Lenovo Thinkpads.
+
+
+4. CREDITS
+----------
+
+This implementation of disk head parking has been inspired by a patch
+originally published by Jon Escombe <lists@dresco.co.uk>. My efforts
+to develop an implementation of this feature that is fit to be merged
+into mainline have been aided by various kernel developers, in
+particular by Tejun Heo and Bartlomiej Zolnierkiewicz.
diff --git a/Documentation/laptops/disk-shock-protection.txt b/Documentation/laptops/disk-shock-protection.txt
deleted file mode 100644
index 0e6ba2663834..000000000000
--- a/Documentation/laptops/disk-shock-protection.txt
+++ /dev/null
@@ -1,149 +0,0 @@
-Hard disk shock protection
-==========================
-
-Author: Elias Oltmanns <eo@nebensachen.de>
-Last modified: 2008-10-03
-
-
-0. Contents
------------
-
-1. Intro
-2. The interface
-3. References
-4. CREDITS
-
-
-1. Intro
---------
-
-ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
-Issuing this command should cause the drive to switch to idle mode and
-unload disk heads. This feature is being used in modern laptops in
-conjunction with accelerometers and appropriate software to implement
-a shock protection facility. The idea is to stop all I/O operations on
-the internal hard drive and park its heads on the ramp when critical
-situations are anticipated. The desire to have such a feature
-available on GNU/Linux systems has been the original motivation to
-implement a generic disk head parking interface in the Linux kernel.
-Please note, however, that other components have to be set up on your
-system in order to get disk shock protection working (see
-section 3. References below for pointers to more information about
-that).
-
-
-2. The interface
-----------------
-
-For each ATA device, the kernel exports the file
-block/*/device/unload_heads in sysfs (here assumed to be mounted under
-/sys). Access to /sys/block/*/device/unload_heads is denied with
--EOPNOTSUPP if the device does not support the unload feature.
-Otherwise, writing an integer value to this file will take the heads
-of the respective drive off the platter and block all I/O operations
-for the specified number of milliseconds. When the timeout expires and
-no further disk head park request has been issued in the meantime,
-normal operation will be resumed. The maximal value accepted for a
-timeout is 30000 milliseconds. Exceeding this limit will return
--EOVERFLOW, but heads will be parked anyway and the timeout will be
-set to 30 seconds. However, you can always change a timeout to any
-value between 0 and 30000 by issuing a subsequent head park request
-before the timeout of the previous one has expired. In particular, the
-total timeout can exceed 30 seconds and, more importantly, you can
-cancel a previously set timeout and resume normal operation
-immediately by specifying a timeout of 0. Values below -2 are rejected
-with -EINVAL (see below for the special meaning of -1 and -2). If the
-timeout specified for a recent head park request has not yet expired,
-reading from /sys/block/*/device/unload_heads will report the number
-of milliseconds remaining until normal operation will be resumed;
-otherwise, reading the unload_heads attribute will return 0.
-
-For example, do the following in order to park the heads of drive
-/dev/sda and stop all I/O operations for five seconds:
-
-# echo 5000 > /sys/block/sda/device/unload_heads
-
-A simple
-
-# cat /sys/block/sda/device/unload_heads
-
-will show you how many milliseconds are left before normal operation
-will be resumed.
-
-A word of caution: The fact that the interface operates on a basis of
-milliseconds may raise expectations that cannot be satisfied in
-reality. In fact, the ATA specs clearly state that the time for an
-unload operation to complete is vendor specific. The hint in ATA-7
-that this will typically be within 500 milliseconds apparently has
-been dropped in ATA-8.
-
-There is a technical detail of this implementation that may cause some
-confusion and should be discussed here. When a head park request has
-been issued to a device successfully, all I/O operations on the
-controller port this device is attached to will be deferred. That is
-to say, any other device that may be connected to the same port will
-be affected too. The only exception is that a subsequent head unload
-request to that other device will be executed immediately. Further
-operations on that port will be deferred until the timeout specified
-for either device on the port has expired. As far as PATA (old style
-IDE) configurations are concerned, there can only be two devices
-attached to any single port. In SATA world we have port multipliers
-which means that a user-issued head parking request to one device may
-actually result in stopping I/O to a whole bunch of devices. However,
-since this feature is supposed to be used on laptops and does not seem
-to be very useful in any other environment, there will be mostly one
-device per port. Even if the CD/DVD writer happens to be connected to
-the same port as the hard drive, it generally *should* recover just
-fine from the occasional buffer under-run incurred by a head park
-request to the HD. Actually, when you are using an ide driver rather
-than its libata counterpart (i.e. your disk is called /dev/hda
-instead of /dev/sda), then parking the heads of one drive (drive X)
-will generally not affect the mode of operation of another drive
-(drive Y) on the same port as described above. It is only when a port
-reset is required to recover from an exception on drive Y that further
-I/O operations on that drive (and the reset itself) will be delayed
-until drive X is no longer in the parked state.
-
-Finally, there are some hard drives that only comply with an earlier
-version of the ATA standard than ATA-7, but do support the unload
-feature nonetheless. Unfortunately, there is no safe way Linux can
-detect these devices, so you won't be able to write to the
-unload_heads attribute. If you know that your device really does
-support the unload feature (for instance, because the vendor of your
-laptop or the hard drive itself told you so), then you can tell the
-kernel to enable the usage of this feature for that drive by writing
-the special value -1 to the unload_heads attribute:
-
-# echo -1 > /sys/block/sda/device/unload_heads
-
-will enable the feature for /dev/sda, and giving -2 instead of -1 will
-disable it again.
-
-
-3. References
--------------
-
-There are several laptops from different vendors featuring shock
-protection capabilities. As manufacturers have refused to support open
-source development of the required software components so far, Linux
-support for shock protection varies considerably between different
-hardware implementations. Ideally, this section should contain a list
-of pointers at different projects aiming at an implementation of shock
-protection on different systems. Unfortunately, I only know of a
-single project which, although still considered experimental, is fit
-for use. Please feel free to add projects that have been the victims
-of my ignorance.
-
-- http://www.thinkwiki.org/wiki/HDAPS
-  See this page for information about Linux support of the hard disk
-  active protection system as implemented in IBM/Lenovo Thinkpads.
-
-
-4. CREDITS
-----------
-
-This implementation of disk head parking has been inspired by a patch
-originally published by Jon Escombe <lists@dresco.co.uk>. My efforts
-to develop an implementation of this feature that is fit to be merged
-into mainline have been aided by various kernel developers, in
-particular by Tejun Heo and Bartlomiej Zolnierkiewicz.
diff --git a/Documentation/laptops/index.rst b/Documentation/laptops/index.rst
new file mode 100644
index 000000000000..001a30910d09
--- /dev/null
+++ b/Documentation/laptops/index.rst
@@ -0,0 +1,17 @@
+:orphan:
+
+==============
+Laptop Drivers
+==============
+
+.. toctree::
+   :maxdepth: 1
+
+   asus-laptop
+   disk-shock-protection
+   laptop-mode
+   lg-laptop
+   sony-laptop
+   sonypi
+   thinkpad-acpi
+   toshiba_haps
diff --git a/Documentation/laptops/laptop-mode.rst b/Documentation/laptops/laptop-mode.rst
new file mode 100644
index 000000000000..c984c4262f2e
--- /dev/null
+++ b/Documentation/laptops/laptop-mode.rst
@@ -0,0 +1,781 @@
+===============================================
+How to conserve battery power using laptop-mode
+===============================================
+
+Document Author: Bart Samwel (bart@samwel.tk)
+
+Date created: January 2, 2004
+
+Last modified: December 06, 2004
+
+Introduction
+------------
+
+Laptop mode is used to minimize the time that the hard disk needs to be spun up,
+to conserve battery power on laptops. It has been reported to cause significant
+power savings.
+
+.. Contents
+
+   * Introduction
+   * Installation
+   * Caveats
+   * The Details
+   * Tips & Tricks
+   * Control script
+   * ACPI integration
+   * Monitoring tool
+
+
+Installation
+------------
+
+To use laptop mode, you don't need to set any kernel configuration options
+or anything. Simply install all the files included in this document, and
+laptop mode will automatically be started when you're on battery. For
+your convenience, a tarball containing an installer can be downloaded at:
+
+	http://www.samwel.tk/laptop_mode/laptop_mode/
+
+To configure laptop mode, you need to edit the configuration file, which is
+located in /etc/default/laptop-mode on Debian-based systems, or in
+/etc/sysconfig/laptop-mode on other systems.
+
+Unfortunately, automatic enabling of laptop mode does not work for
+laptops that don't have ACPI. On those laptops, you need to start laptop
+mode manually. To start laptop mode, run "laptop_mode start", and to
+stop it, run "laptop_mode stop". (Note: The laptop mode tools package now
+has experimental support for APM, you might want to try that first.)
+
+
+Caveats
+-------
+
+* The downside of laptop mode is that you have a chance of losing up to 10
+  minutes of work. If you cannot afford this, don't use it! The supplied ACPI
+  scripts automatically turn off laptop mode when the battery almost runs out,
+  so that you won't lose any data at the end of your battery life.
+
+* Most desktop hard drives have a very limited lifetime measured in spindown
+  cycles, typically about 50.000 times (it's usually listed on the spec sheet).
+  Check your drive's rating, and don't wear down your drive's lifetime if you
+  don't need to.
+
+* If you mount some of your ext3/reiserfs filesystems with the -n option, then
+  the control script will not be able to remount them correctly. You must set
+  DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
+  wrong options -- or it will fail because it cannot write to /etc/mtab.
+
+* If you have your filesystems listed as type "auto" in fstab, like I did, then
+  the control script will not recognize them as filesystems that need remounting.
+  You must list the filesystems with their true type instead.
+
+* It has been reported that some versions of the mutt mail client use file access
+  times to determine whether a folder contains new mail. If you use mutt and
+  experience this, you must disable the noatime remounting by setting the option
+  DO_REMOUNT_NOATIME to 0 in the configuration file.
+
+
+The Details
+-----------
+
+Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is
+present for all kernels that have the laptop mode patch, regardless of any
+configuration options. When the knob is set, any physical disk I/O (that might
+have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The
+result of this is that after a disk has spun down, it will not be spun up
+anymore to write dirty blocks, because those blocks had already been written
+immediately after the most recent read operation. The value of the laptop_mode
+knob determines the time between the occurrence of disk I/O and when the flush
+is triggered. A sensible value for the knob is 5 seconds. Setting the knob to
+0 disables laptop mode.
+
+To increase the effectiveness of the laptop_mode strategy, the laptop_mode
+control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
+/proc/sys/vm to about 10 minutes (by default), which means that pages that are
+dirtied are not forced to be written to disk as often. The control script also
+changes the dirty background ratio, so that background writeback of dirty pages
+is not done anymore. Combined with a higher commit value (also 10 minutes) for
+ext3 or ReiserFS filesystems (also done automatically by the control script),
+this results in concentration of disk activity in a small time interval which
+occurs only once every 10 minutes, or whenever the disk is forced to spin up by
+a cache miss. The disk can then be spun down in the periods of inactivity.
+
+If you want to find out which process caused the disk to spin up, you can
+gather information by setting the flag /proc/sys/vm/block_dump. When this flag
+is set, Linux reports all disk read and write operations that take place, and
+all block dirtyings done to files. This makes it possible to debug why a disk
+needs to spin up, and to increase battery life even more. The output of
+block_dump is written to the kernel output, and it can be retrieved using
+"dmesg". When you use block_dump and your kernel logging level also includes
+kernel debugging messages, you probably want to turn off klogd, otherwise
+the output of block_dump will be logged, causing disk activity that is not
+normally there.
+
+
+Configuration
+-------------
+
+The laptop mode configuration file is located in /etc/default/laptop-mode on
+Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It
+contains the following options:
+
+MAX_AGE:
+
+Maximum time, in seconds, of hard drive spindown time that you are
+comfortable with. Worst case, it's possible that you could lose this
+amount of work if your battery fails while you're in laptop mode.
+
+MINIMUM_BATTERY_MINUTES:
+
+Automatically disable laptop mode if the remaining number of minutes of
+battery power is less than this value. Default is 10 minutes.
+
+AC_HD/BATT_HD:
+
+The idle timeout that should be set on your hard drive when laptop mode
+is active (BATT_HD) and when it is not active (AC_HD). The defaults are
+20 seconds (value 4) for BATT_HD  and 2 hours (value 244) for AC_HD. The
+possible values are those listed in the manual page for "hdparm" for the
+"-S" option.
+
+HD:
+
+The devices for which the spindown timeout should be adjusted by laptop mode.
+Default is /dev/hda. If you specify multiple devices, separate them by a space.
+
+READAHEAD:
+
+Disk readahead, in 512-byte sectors, while laptop mode is active. A large
+readahead can prevent disk accesses for things like executable pages (which are
+loaded on demand while the application executes) and sequentially accessed data
+(MP3s).
+
+DO_REMOUNTS:
+
+The control script automatically remounts any mounted journaled filesystems
+with appropriate commit interval options. When this option is set to 0, this
+feature is disabled.
+
+DO_REMOUNT_NOATIME:
+
+When remounting, should the filesystems be remounted with the noatime option?
+Normally, this is set to "1" (enabled), but there may be programs that require
+access time recording.
+
+DIRTY_RATIO:
+
+The percentage of memory that is allowed to contain "dirty" or unsaved data
+before a writeback is forced, while laptop mode is active. Corresponds to
+the /proc/sys/vm/dirty_ratio sysctl.
+
+DIRTY_BACKGROUND_RATIO:
+
+The percentage of memory that is allowed to contain "dirty" or unsaved data
+after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set
+this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio
+sysctl.
+
+Note that the behaviour of dirty_background_ratio is quite different
+when laptop mode is active and when it isn't. When laptop mode is inactive,
+dirty_background_ratio is the threshold percentage at which background writeouts
+start taking place. When laptop mode is active, however, background writeouts
+are disabled, and the dirty_background_ratio only determines how much writeback
+is done when dirty_ratio is reached.
+
+DO_CPU:
+
+Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup.
+See Documentation/admin-guide/pm/cpufreq.rst for more info. Disabled by default.)
+
+CPU_MAXFREQ:
+
+When on battery, what is the maximum CPU speed that the system should use? Legal
+values are "slowest" for the slowest speed that your CPU is able to operate at,
+or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies.
+
+
+Tips & Tricks
+-------------
+
+* Bartek Kania reports getting up to 50 minutes of extra battery life (on top
+  of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1).
+
+* You can spin down the disk while playing MP3, by setting disk readahead
+  to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at
+  once, and will then spin down while the MP3 is playing. (Thanks to Bartek
+  Kania.)
+
+* Drew Scott Daniels observed: "I don't know why, but when I decrease the number
+  of colours that my display uses it consumes less battery power. I've seen
+  this on powerbooks too. I hope that this is a piece of information that
+  might be useful to the Laptop Mode patch or its users."
+
+* In syslog.conf, you can prefix entries with a dash `-` to omit syncing the
+  file after every logging. When you're using laptop-mode and your disk doesn't
+  spin down, this is a likely culprit.
+
+* Richard Atterer observed that laptop mode does not work well with noflushd
+  (http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode
+  from doing its thing.
+
+* If you're worried about your data, you might want to consider using a USB
+  memory stick or something like that as a "working area". (Be aware though
+  that flash memory can only handle a limited number of writes, and overuse
+  may wear out your memory stick pretty quickly. Do _not_ use journalling
+  filesystems on flash memory sticks.)
+
+
+Configuration file for control and ACPI battery scripts
+-------------------------------------------------------
+
+This allows the tunables to be changed for the scripts via an external
+configuration file
+
+It should be installed as /etc/default/laptop-mode on Debian, and as
+/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes.
+
+Config file::
+
+  # Maximum time, in seconds, of hard drive spindown time that you are
+  # comfortable with. Worst case, it's possible that you could lose this
+  # amount of work if your battery fails you while in laptop mode.
+  #MAX_AGE=600
+
+  # Automatically disable laptop mode when the number of minutes of battery
+  # that you have left goes below this threshold.
+  MINIMUM_BATTERY_MINUTES=10
+
+  # Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG
+  # by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk
+  # will read a complete MP3 at once, and will then spin down while the MP3/OGG is
+  # playing.
+  #READAHEAD=4096
+
+  # Shall we remount journaled fs. with appropriate commit interval? (1=yes)
+  #DO_REMOUNTS=1
+
+  # And shall we add the "noatime" option to that as well? (1=yes)
+  #DO_REMOUNT_NOATIME=1
+
+  # Dirty synchronous ratio.  At this percentage of dirty pages the process
+  # which
+  # calls write() does its own writeback
+  #DIRTY_RATIO=40
+
+  #
+  # Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
+  # exceeded, the kernel will wake flusher threads which will then reduce the
+  # amount of dirty memory to dirty_background_ratio.  Set this nice and low,
+  # so once some writeout has commenced, we do a lot of it.
+  #
+  #DIRTY_BACKGROUND_RATIO=5
+
+  # kernel default dirty buffer age
+  #DEF_AGE=30
+  #DEF_UPDATE=5
+  #DEF_DIRTY_BACKGROUND_RATIO=10
+  #DEF_DIRTY_RATIO=40
+  #DEF_XFS_AGE_BUFFER=15
+  #DEF_XFS_SYNC_INTERVAL=30
+  #DEF_XFS_BUFD_INTERVAL=1
+
+  # This must be adjusted manually to the value of HZ in the running kernel
+  # on 2.4, until the XFS people change their 2.4 external interfaces to work in
+  # centisecs. This can be automated, but it's a work in progress that still
+  # needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for
+  # external interfaces, and that is currently always set to 100. So you don't
+  # need to change this on 2.6.
+  #XFS_HZ=100
+
+  # Should the maximum CPU frequency be adjusted down while on battery?
+  # Requires CPUFreq to be setup.
+  # See Documentation/admin-guide/pm/cpufreq.rst for more info
+  #DO_CPU=0
+
+  # When on battery what is the maximum CPU speed that the system should
+  # use? Legal values are "slowest" for the slowest speed that your
+  # CPU is able to operate at, or a value listed in:
+  # /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
+  # Only applicable if DO_CPU=1.
+  #CPU_MAXFREQ=slowest
+
+  # Idle timeout for your hard drive (man hdparm for valid values, -S option)
+  # Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4).
+  #AC_HD=244
+  #BATT_HD=4
+
+  # The drives for which to adjust the idle timeout. Separate them by a space,
+  # e.g. HD="/dev/hda /dev/hdb".
+  #HD="/dev/hda"
+
+  # Set the spindown timeout on a hard drive?
+  #DO_HD=1
+
+
+Control script
+--------------
+
+Please note that this control script works for the Linux 2.4 and 2.6 series (thanks
+to Kiko Piris).
+
+Control script::
+
+  #!/bin/bash
+
+  # start or stop laptop_mode, best run by a power management daemon when
+  # ac gets connected/disconnected from a laptop
+  #
+  # install as /sbin/laptop_mode
+  #
+  # Contributors to this script:   Kiko Piris
+  #				 Bart Samwel
+  #				 Micha Feigin
+  #				 Andrew Morton
+  #				 Herve Eychenne
+  #				 Dax Kelson
+  #
+  # Original Linux 2.4 version by: Jens Axboe
+
+  #############################################################################
+
+  # Source config
+  if [ -f /etc/default/laptop-mode ] ; then
+	# Debian
+	. /etc/default/laptop-mode
+  elif [ -f /etc/sysconfig/laptop-mode ] ; then
+	# Others
+          . /etc/sysconfig/laptop-mode
+  fi
+
+  # Don't raise an error if the config file is incomplete
+  # set defaults instead:
+
+  # Maximum time, in seconds, of hard drive spindown time that you are
+  # comfortable with. Worst case, it's possible that you could lose this
+  # amount of work if your battery fails you while in laptop mode.
+  MAX_AGE=${MAX_AGE:-'600'}
+
+  # Read-ahead, in kilobytes
+  READAHEAD=${READAHEAD:-'4096'}
+
+  # Shall we remount journaled fs. with appropriate commit interval? (1=yes)
+  DO_REMOUNTS=${DO_REMOUNTS:-'1'}
+
+  # And shall we add the "noatime" option to that as well? (1=yes)
+  DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'}
+
+  # Shall we adjust the idle timeout on a hard drive?
+  DO_HD=${DO_HD:-'1'}
+
+  # Adjust idle timeout on which hard drive?
+  HD="${HD:-'/dev/hda'}"
+
+  # spindown time for HD (hdparm -S values)
+  AC_HD=${AC_HD:-'244'}
+  BATT_HD=${BATT_HD:-'4'}
+
+  # Dirty synchronous ratio.  At this percentage of dirty pages the process which
+  # calls write() does its own writeback
+  DIRTY_RATIO=${DIRTY_RATIO:-'40'}
+
+  # cpu frequency scaling
+  # See Documentation/admin-guide/pm/cpufreq.rst for more info
+  DO_CPU=${CPU_MANAGE:-'0'}
+  CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'}
+
+  #
+  # Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
+  # exceeded, the kernel will wake flusher threads which will then reduce the
+  # amount of dirty memory to dirty_background_ratio.  Set this nice and low,
+  # so once some writeout has commenced, we do a lot of it.
+  #
+  DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'}
+
+  # kernel default dirty buffer age
+  DEF_AGE=${DEF_AGE:-'30'}
+  DEF_UPDATE=${DEF_UPDATE:-'5'}
+  DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'}
+  DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'}
+  DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'}
+  DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'}
+  DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'}
+
+  # This must be adjusted manually to the value of HZ in the running kernel
+  # on 2.4, until the XFS people change their 2.4 external interfaces to work in
+  # centisecs. This can be automated, but it's a work in progress that still needs
+  # some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external
+  # interfaces, and that is currently always set to 100. So you don't need to
+  # change this on 2.6.
+  XFS_HZ=${XFS_HZ:-'100'}
+
+  #############################################################################
+
+  KLEVEL="$(uname -r |
+               {
+	       IFS='.' read a b c
+	       echo $a.$b
+	     }
+  )"
+  case "$KLEVEL" in
+	"2.4"|"2.6")
+		;;
+	*)
+		echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2
+		exit 1
+		;;
+  esac
+
+  if [ ! -e /proc/sys/vm/laptop_mode ] ; then
+	echo "Kernel is not patched with laptop_mode patch." >&2
+	exit 1
+  fi
+
+  if [ ! -w /proc/sys/vm/laptop_mode ] ; then
+	echo "You do not have enough privileges to enable laptop_mode." >&2
+	exit 1
+  fi
+
+  # Remove an option (the first parameter) of the form option=<number> from
+  # a mount options string (the rest of the parameters).
+  parse_mount_opts () {
+	OPT="$1"
+	shift
+	echo ",$*," | sed		\
+	 -e 's/,'"$OPT"'=[0-9]*,/,/g'	\
+	 -e 's/,,*/,/g'			\
+	 -e 's/^,//'			\
+	 -e 's/,$//'
+  }
+
+  # Remove an option (the first parameter) without any arguments from
+  # a mount option string (the rest of the parameters).
+  parse_nonumber_mount_opts () {
+	OPT="$1"
+	shift
+	echo ",$*," | sed		\
+	 -e 's/,'"$OPT"',/,/g'		\
+	 -e 's/,,*/,/g'			\
+	 -e 's/^,//'			\
+	 -e 's/,$//'
+  }
+
+  # Find out the state of a yes/no option (e.g. "atime"/"noatime") in
+  # fstab for a given filesystem, and use this state to replace the
+  # value of the option in another mount options string. The device
+  # is the first argument, the option name the second, and the default
+  # value the third. The remainder is the mount options string.
+  #
+  # Example:
+  # parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime
+  #
+  # If fstab contains, say, "rw" for this filesystem, then the result
+  # will be "defaults,atime".
+  parse_yesno_opts_wfstab () {
+	L_DEV="$1"
+	OPT="$2"
+	DEF_OPT="$3"
+	shift 3
+	L_OPTS="$*"
+	PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)"
+	PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)"
+	# Watch for a default atime in fstab
+	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
+	if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then
+		# option specified in fstab: extract the value and use it
+		if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then
+			echo "$PARSEDOPTS1,no$OPT"
+		else
+			# no$OPT not found -- so we must have $OPT.
+			echo "$PARSEDOPTS1,$OPT"
+		fi
+	else
+		# option not specified in fstab -- choose the default.
+		echo "$PARSEDOPTS1,$DEF_OPT"
+	fi
+  }
+
+  # Find out the state of a numbered option (e.g. "commit=NNN") in
+  # fstab for a given filesystem, and use this state to replace the
+  # value of the option in another mount options string. The device
+  # is the first argument, and the option name the second. The
+  # remainder is the mount options string in which the replacement
+  # must be done.
+  #
+  # Example:
+  # parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7
+  #
+  # If fstab contains, say, "commit=3,rw" for this filesystem, then the
+  # result will be "rw,commit=3".
+  parse_mount_opts_wfstab () {
+	L_DEV="$1"
+	OPT="$2"
+	shift 2
+	L_OPTS="$*"
+	PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)"
+	# Watch for a default commit in fstab
+	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
+	if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then
+		# option specified in fstab: extract the value, and use it
+		echo -n "$PARSEDOPTS1,$OPT="
+		echo ",$FSTAB_OPTS," | sed \
+		 -e 's/.*,'"$OPT"'=//'	\
+		 -e 's/,.*//'
+	else
+		# option not specified in fstab: set it to 0
+		echo "$PARSEDOPTS1,$OPT=0"
+	fi
+  }
+
+  deduce_fstype () {
+	MP="$1"
+	# My root filesystem unfortunately has
+	# type "unknown" in /etc/mtab. If we encounter
+	# "unknown", we try to get the type from fstab.
+	cat /etc/fstab |
+	grep -v '^#' |
+	while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do
+		if [ "$FSTAB_MP" = "$MP" ]; then
+			echo $FSTAB_FST
+			exit 0
+		fi
+	done
+  }
+
+  if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then
+	NOATIME_OPT=",noatime"
+  fi
+
+  case "$1" in
+	start)
+		AGE=$((100*$MAX_AGE))
+		XFS_AGE=$(($XFS_HZ*$MAX_AGE))
+		echo -n "Starting laptop_mode"
+
+		if [ -d /proc/sys/vm/pagebuf ] ; then
+			# (For 2.4 and early 2.6.)
+			# This only needs to be set, not reset -- it is only used when
+			# laptop mode is enabled.
+			echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age
+			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
+		elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
+			# (A couple of early 2.6 laptop mode patches had these.)
+			# The same goes for these.
+			echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer
+			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
+		elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then
+			# (2.6.6)
+			# But not for these -- they are also used in normal
+			# operation.
+			echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer
+			echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval
+		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
+			# (2.6.7 upwards)
+			# And not for these either. These are in centisecs,
+			# not USER_HZ, so we have to use $AGE, not $XFS_AGE.
+			echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs
+			echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs
+			echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs
+		fi
+
+		case "$KLEVEL" in
+			"2.4")
+				echo 1					> /proc/sys/vm/laptop_mode
+				echo "30 500 0 0 $AGE $AGE 60 20 0"	> /proc/sys/vm/bdflush
+				;;
+			"2.6")
+				echo 5					> /proc/sys/vm/laptop_mode
+				echo "$AGE"				> /proc/sys/vm/dirty_writeback_centisecs
+				echo "$AGE"				> /proc/sys/vm/dirty_expire_centisecs
+				echo "$DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
+				echo "$DIRTY_BACKGROUND_RATIO"		> /proc/sys/vm/dirty_background_ratio
+				;;
+		esac
+		if [ $DO_REMOUNTS -eq 1 ]; then
+			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
+				PARSEDOPTS="$(parse_mount_opts "$OPTS")"
+				if [ "$FST" = 'unknown' ]; then
+					FST=$(deduce_fstype $MP)
+				fi
+				case "$FST" in
+					"ext3"|"reiserfs")
+						PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
+						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
+						;;
+					"xfs")
+						mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT
+						;;
+				esac
+				if [ -b $DEV ] ; then
+					blockdev --setra $(($READAHEAD * 2)) $DEV
+				fi
+			done
+		fi
+		if [ $DO_HD -eq 1 ] ; then
+			for THISHD in $HD ; do
+				/sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1
+				/sbin/hdparm -B 1 $THISHD > /dev/null 2>&1
+			done
+		fi
+		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
+			if [ $CPU_MAXFREQ = 'slowest' ]; then
+				CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq`
+			fi
+			echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
+		fi
+		echo "."
+		;;
+	stop)
+		U_AGE=$((100*$DEF_UPDATE))
+		B_AGE=$((100*$DEF_AGE))
+		echo -n "Stopping laptop_mode"
+		echo 0 > /proc/sys/vm/laptop_mode
+		if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
+			# These need to be restored, if there are no lm_*.
+			echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER))	 	> /proc/sys/fs/xfs/age_buffer
+			echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) 	> /proc/sys/fs/xfs/sync_interval
+		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
+			# These need to be restored as well.
+			echo $((100*$DEF_XFS_AGE_BUFFER))	> /proc/sys/fs/xfs/age_buffer_centisecs
+			echo $((100*$DEF_XFS_SYNC_INTERVAL))	> /proc/sys/fs/xfs/xfssyncd_centisecs
+			echo $((100*$DEF_XFS_BUFD_INTERVAL))	> /proc/sys/fs/xfs/xfsbufd_centisecs
+		fi
+		case "$KLEVEL" in
+			"2.4")
+				echo "30 500 0 0 $U_AGE $B_AGE 60 20 0"	> /proc/sys/vm/bdflush
+				;;
+			"2.6")
+				echo "$U_AGE"				> /proc/sys/vm/dirty_writeback_centisecs
+				echo "$B_AGE"				> /proc/sys/vm/dirty_expire_centisecs
+				echo "$DEF_DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
+				echo "$DEF_DIRTY_BACKGROUND_RATIO"	> /proc/sys/vm/dirty_background_ratio
+				;;
+		esac
+		if [ $DO_REMOUNTS -eq 1 ] ; then
+			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
+				# Reset commit and atime options to defaults.
+				if [ "$FST" = 'unknown' ]; then
+					FST=$(deduce_fstype $MP)
+				fi
+				case "$FST" in
+					"ext3"|"reiserfs")
+						PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
+						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
+						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
+						;;
+					"xfs")
+						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)"
+						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
+						;;
+				esac
+				if [ -b $DEV ] ; then
+					blockdev --setra 256 $DEV
+				fi
+			done
+		fi
+		if [ $DO_HD -eq 1 ] ; then
+			for THISHD in $HD ; do
+				/sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1
+				/sbin/hdparm -B 255 $THISHD > /dev/null 2>&1
+			done
+		fi
+		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
+			echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
+		fi
+		echo "."
+		;;
+	*)
+		echo "Usage: $0 {start|stop}" 2>&1
+		exit 1
+		;;
+
+  esac
+
+  exit 0
+
+
+ACPI integration
+----------------
+
+Dax Kelson submitted this so that the ACPI acpid daemon will
+kick off the laptop_mode script and run hdparm. The part that
+automatically disables laptop mode when the battery is low was
+written by Jan Topinski.
+
+/etc/acpi/events/ac_adapter::
+
+	event=ac_adapter
+	action=/etc/acpi/actions/ac.sh %e
+
+/etc/acpi/events/battery::
+
+	event=battery.*
+	action=/etc/acpi/actions/battery.sh %e
+
+/etc/acpi/actions/ac.sh::
+
+  #!/bin/bash
+
+  # ac on/offline event handler
+
+  status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state`
+
+  case $status in
+          "on-line")
+                  /sbin/laptop_mode stop
+                  exit 0
+          ;;
+          "off-line")
+                  /sbin/laptop_mode start
+                  exit 0
+          ;;
+  esac
+
+
+/etc/acpi/actions/battery.sh::
+
+  #! /bin/bash
+
+  # Automatically disable laptop mode when the battery almost runs out.
+
+  BATT_INFO=/proc/acpi/battery/$2/state
+
+  if [[ -f /proc/sys/vm/laptop_mode ]]
+  then
+     LM=`cat /proc/sys/vm/laptop_mode`
+     if [[ $LM -gt 0 ]]
+     then
+       if [[ -f $BATT_INFO ]]
+       then
+          # Source the config file only now that we know we need
+          if [ -f /etc/default/laptop-mode ] ; then
+                  # Debian
+                  . /etc/default/laptop-mode
+          elif [ -f /etc/sysconfig/laptop-mode ] ; then
+                  # Others
+                  . /etc/sysconfig/laptop-mode
+          fi
+          MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'}
+
+          ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`"
+          if [[ ACTION -eq "discharging" ]]
+          then
+             PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
+             REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
+          fi
+          if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES))
+          then
+             /sbin/laptop_mode stop
+          fi
+       else
+         logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path."
+       fi
+     fi
+  fi
+
+
+Monitoring tool
+---------------
+
+Bartek Kania submitted this, it can be used to measure how much time your disk
+spends spun up/down.  See tools/laptop/dslm/dslm.c
diff --git a/Documentation/laptops/laptop-mode.txt b/Documentation/laptops/laptop-mode.txt
deleted file mode 100644
index 1c707fc9b141..000000000000
--- a/Documentation/laptops/laptop-mode.txt
+++ /dev/null
@@ -1,782 +0,0 @@
-How to conserve battery power using laptop-mode
------------------------------------------------
-
-Document Author: Bart Samwel (bart@samwel.tk)
-Date created: January 2, 2004
-Last modified: December 06, 2004
-
-Introduction
-------------
-
-Laptop mode is used to minimize the time that the hard disk needs to be spun up,
-to conserve battery power on laptops. It has been reported to cause significant
-power savings.
-
-Contents
---------
-
-* Introduction
-* Installation
-* Caveats
-* The Details
-* Tips & Tricks
-* Control script
-* ACPI integration
-* Monitoring tool
-
-
-Installation
-------------
-
-To use laptop mode, you don't need to set any kernel configuration options
-or anything. Simply install all the files included in this document, and
-laptop mode will automatically be started when you're on battery. For
-your convenience, a tarball containing an installer can be downloaded at:
-
-http://www.samwel.tk/laptop_mode/laptop_mode/
-
-To configure laptop mode, you need to edit the configuration file, which is
-located in /etc/default/laptop-mode on Debian-based systems, or in
-/etc/sysconfig/laptop-mode on other systems.
-
-Unfortunately, automatic enabling of laptop mode does not work for
-laptops that don't have ACPI. On those laptops, you need to start laptop
-mode manually. To start laptop mode, run "laptop_mode start", and to
-stop it, run "laptop_mode stop". (Note: The laptop mode tools package now
-has experimental support for APM, you might want to try that first.)
-
-
-Caveats
--------
-
-* The downside of laptop mode is that you have a chance of losing up to 10
-  minutes of work. If you cannot afford this, don't use it! The supplied ACPI
-  scripts automatically turn off laptop mode when the battery almost runs out,
-  so that you won't lose any data at the end of your battery life.
-
-* Most desktop hard drives have a very limited lifetime measured in spindown
-  cycles, typically about 50.000 times (it's usually listed on the spec sheet).
-  Check your drive's rating, and don't wear down your drive's lifetime if you
-  don't need to.
-
-* If you mount some of your ext3/reiserfs filesystems with the -n option, then
-  the control script will not be able to remount them correctly. You must set
-  DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
-  wrong options -- or it will fail because it cannot write to /etc/mtab.
-
-* If you have your filesystems listed as type "auto" in fstab, like I did, then
-  the control script will not recognize them as filesystems that need remounting.
-  You must list the filesystems with their true type instead.
-
-* It has been reported that some versions of the mutt mail client use file access
-  times to determine whether a folder contains new mail. If you use mutt and
-  experience this, you must disable the noatime remounting by setting the option
-  DO_REMOUNT_NOATIME to 0 in the configuration file.
-
-
-The Details
------------
-
-Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is
-present for all kernels that have the laptop mode patch, regardless of any
-configuration options. When the knob is set, any physical disk I/O (that might
-have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The
-result of this is that after a disk has spun down, it will not be spun up
-anymore to write dirty blocks, because those blocks had already been written
-immediately after the most recent read operation. The value of the laptop_mode
-knob determines the time between the occurrence of disk I/O and when the flush
-is triggered. A sensible value for the knob is 5 seconds. Setting the knob to
-0 disables laptop mode.
-
-To increase the effectiveness of the laptop_mode strategy, the laptop_mode
-control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
-/proc/sys/vm to about 10 minutes (by default), which means that pages that are
-dirtied are not forced to be written to disk as often. The control script also
-changes the dirty background ratio, so that background writeback of dirty pages
-is not done anymore. Combined with a higher commit value (also 10 minutes) for
-ext3 or ReiserFS filesystems (also done automatically by the control script),
-this results in concentration of disk activity in a small time interval which
-occurs only once every 10 minutes, or whenever the disk is forced to spin up by
-a cache miss. The disk can then be spun down in the periods of inactivity.
-
-If you want to find out which process caused the disk to spin up, you can
-gather information by setting the flag /proc/sys/vm/block_dump. When this flag
-is set, Linux reports all disk read and write operations that take place, and
-all block dirtyings done to files. This makes it possible to debug why a disk
-needs to spin up, and to increase battery life even more. The output of
-block_dump is written to the kernel output, and it can be retrieved using
-"dmesg". When you use block_dump and your kernel logging level also includes
-kernel debugging messages, you probably want to turn off klogd, otherwise
-the output of block_dump will be logged, causing disk activity that is not
-normally there.
-
-
-Configuration
--------------
-
-The laptop mode configuration file is located in /etc/default/laptop-mode on
-Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It
-contains the following options:
-
-MAX_AGE:
-
-Maximum time, in seconds, of hard drive spindown time that you are
-comfortable with. Worst case, it's possible that you could lose this
-amount of work if your battery fails while you're in laptop mode.
-
-MINIMUM_BATTERY_MINUTES:
-
-Automatically disable laptop mode if the remaining number of minutes of
-battery power is less than this value. Default is 10 minutes.
-
-AC_HD/BATT_HD:
-
-The idle timeout that should be set on your hard drive when laptop mode
-is active (BATT_HD) and when it is not active (AC_HD). The defaults are
-20 seconds (value 4) for BATT_HD  and 2 hours (value 244) for AC_HD. The
-possible values are those listed in the manual page for "hdparm" for the
-"-S" option.
-
-HD:
-
-The devices for which the spindown timeout should be adjusted by laptop mode.
-Default is /dev/hda. If you specify multiple devices, separate them by a space.
-
-READAHEAD:
-
-Disk readahead, in 512-byte sectors, while laptop mode is active. A large
-readahead can prevent disk accesses for things like executable pages (which are
-loaded on demand while the application executes) and sequentially accessed data
-(MP3s).
-
-DO_REMOUNTS:
-
-The control script automatically remounts any mounted journaled filesystems
-with appropriate commit interval options. When this option is set to 0, this
-feature is disabled.
-
-DO_REMOUNT_NOATIME:
-
-When remounting, should the filesystems be remounted with the noatime option?
-Normally, this is set to "1" (enabled), but there may be programs that require
-access time recording.
-
-DIRTY_RATIO:
-
-The percentage of memory that is allowed to contain "dirty" or unsaved data
-before a writeback is forced, while laptop mode is active. Corresponds to
-the /proc/sys/vm/dirty_ratio sysctl.
-
-DIRTY_BACKGROUND_RATIO:
-
-The percentage of memory that is allowed to contain "dirty" or unsaved data
-after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set
-this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio
-sysctl.
-
-Note that the behaviour of dirty_background_ratio is quite different
-when laptop mode is active and when it isn't. When laptop mode is inactive,
-dirty_background_ratio is the threshold percentage at which background writeouts
-start taking place. When laptop mode is active, however, background writeouts
-are disabled, and the dirty_background_ratio only determines how much writeback
-is done when dirty_ratio is reached.
-
-DO_CPU:
-
-Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup.
-See Documentation/admin-guide/pm/cpufreq.rst for more info. Disabled by default.)
-
-CPU_MAXFREQ:
-
-When on battery, what is the maximum CPU speed that the system should use? Legal
-values are "slowest" for the slowest speed that your CPU is able to operate at,
-or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies.
-
-
-Tips & Tricks
--------------
-
-* Bartek Kania reports getting up to 50 minutes of extra battery life (on top
-  of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1).
-
-* You can spin down the disk while playing MP3, by setting disk readahead
-  to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at
-  once, and will then spin down while the MP3 is playing. (Thanks to Bartek
-  Kania.)
-
-* Drew Scott Daniels observed: "I don't know why, but when I decrease the number
-  of colours that my display uses it consumes less battery power. I've seen
-  this on powerbooks too. I hope that this is a piece of information that
-  might be useful to the Laptop Mode patch or its users."
-
-* In syslog.conf, you can prefix entries with a dash ``-'' to omit syncing the
-  file after every logging. When you're using laptop-mode and your disk doesn't
-  spin down, this is a likely culprit.
-
-* Richard Atterer observed that laptop mode does not work well with noflushd
-  (http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode
-  from doing its thing.
-
-* If you're worried about your data, you might want to consider using a USB
-  memory stick or something like that as a "working area". (Be aware though
-  that flash memory can only handle a limited number of writes, and overuse
-  may wear out your memory stick pretty quickly. Do _not_ use journalling
-  filesystems on flash memory sticks.)
-
-
-Configuration file for control and ACPI battery scripts
--------------------------------------------------------
-
-This allows the tunables to be changed for the scripts via an external
-configuration file
-
-It should be installed as /etc/default/laptop-mode on Debian, and as
-/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes.
-
---------------------CONFIG FILE BEGIN-------------------------------------------
-# Maximum time, in seconds, of hard drive spindown time that you are
-# comfortable with. Worst case, it's possible that you could lose this
-# amount of work if your battery fails you while in laptop mode.
-#MAX_AGE=600
-
-# Automatically disable laptop mode when the number of minutes of battery
-# that you have left goes below this threshold.
-MINIMUM_BATTERY_MINUTES=10
-
-# Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG
-# by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk
-# will read a complete MP3 at once, and will then spin down while the MP3/OGG is
-# playing.
-#READAHEAD=4096
-
-# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
-#DO_REMOUNTS=1
-
-# And shall we add the "noatime" option to that as well? (1=yes)
-#DO_REMOUNT_NOATIME=1
-
-# Dirty synchronous ratio.  At this percentage of dirty pages the process
-# which
-# calls write() does its own writeback
-#DIRTY_RATIO=40
-
-#
-# Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
-# exceeded, the kernel will wake flusher threads which will then reduce the
-# amount of dirty memory to dirty_background_ratio.  Set this nice and low,
-# so once some writeout has commenced, we do a lot of it.
-#
-#DIRTY_BACKGROUND_RATIO=5
-
-# kernel default dirty buffer age
-#DEF_AGE=30
-#DEF_UPDATE=5
-#DEF_DIRTY_BACKGROUND_RATIO=10
-#DEF_DIRTY_RATIO=40
-#DEF_XFS_AGE_BUFFER=15
-#DEF_XFS_SYNC_INTERVAL=30
-#DEF_XFS_BUFD_INTERVAL=1
-
-# This must be adjusted manually to the value of HZ in the running kernel
-# on 2.4, until the XFS people change their 2.4 external interfaces to work in
-# centisecs. This can be automated, but it's a work in progress that still
-# needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for
-# external interfaces, and that is currently always set to 100. So you don't
-# need to change this on 2.6.
-#XFS_HZ=100
-
-# Should the maximum CPU frequency be adjusted down while on battery?
-# Requires CPUFreq to be setup.
-# See Documentation/admin-guide/pm/cpufreq.rst for more info
-#DO_CPU=0
-
-# When on battery what is the maximum CPU speed that the system should
-# use? Legal values are "slowest" for the slowest speed that your
-# CPU is able to operate at, or a value listed in:
-# /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
-# Only applicable if DO_CPU=1.
-#CPU_MAXFREQ=slowest
-
-# Idle timeout for your hard drive (man hdparm for valid values, -S option)
-# Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4).
-#AC_HD=244
-#BATT_HD=4
-
-# The drives for which to adjust the idle timeout. Separate them by a space,
-# e.g. HD="/dev/hda /dev/hdb".
-#HD="/dev/hda"
-
-# Set the spindown timeout on a hard drive?
-#DO_HD=1
-
---------------------CONFIG FILE END---------------------------------------------
-
-
-Control script
---------------
-
-Please note that this control script works for the Linux 2.4 and 2.6 series (thanks
-to Kiko Piris).
-
---------------------CONTROL SCRIPT BEGIN----------------------------------------
-#!/bin/bash
-
-# start or stop laptop_mode, best run by a power management daemon when
-# ac gets connected/disconnected from a laptop
-#
-# install as /sbin/laptop_mode
-#
-# Contributors to this script:   Kiko Piris
-#				 Bart Samwel
-#				 Micha Feigin
-#				 Andrew Morton
-#				 Herve Eychenne
-#				 Dax Kelson
-#
-# Original Linux 2.4 version by: Jens Axboe
-
-#############################################################################
-
-# Source config
-if [ -f /etc/default/laptop-mode ] ; then
-	# Debian
-	. /etc/default/laptop-mode
-elif [ -f /etc/sysconfig/laptop-mode ] ; then
-	# Others
-        . /etc/sysconfig/laptop-mode
-fi
-
-# Don't raise an error if the config file is incomplete
-# set defaults instead:
-
-# Maximum time, in seconds, of hard drive spindown time that you are
-# comfortable with. Worst case, it's possible that you could lose this
-# amount of work if your battery fails you while in laptop mode.
-MAX_AGE=${MAX_AGE:-'600'}
-
-# Read-ahead, in kilobytes
-READAHEAD=${READAHEAD:-'4096'}
-
-# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
-DO_REMOUNTS=${DO_REMOUNTS:-'1'}
-
-# And shall we add the "noatime" option to that as well? (1=yes)
-DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'}
-
-# Shall we adjust the idle timeout on a hard drive?
-DO_HD=${DO_HD:-'1'}
-
-# Adjust idle timeout on which hard drive?
-HD="${HD:-'/dev/hda'}"
-
-# spindown time for HD (hdparm -S values)
-AC_HD=${AC_HD:-'244'}
-BATT_HD=${BATT_HD:-'4'}
-
-# Dirty synchronous ratio.  At this percentage of dirty pages the process which
-# calls write() does its own writeback
-DIRTY_RATIO=${DIRTY_RATIO:-'40'}
-
-# cpu frequency scaling
-# See Documentation/admin-guide/pm/cpufreq.rst for more info
-DO_CPU=${CPU_MANAGE:-'0'}
-CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'}
-
-#
-# Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
-# exceeded, the kernel will wake flusher threads which will then reduce the
-# amount of dirty memory to dirty_background_ratio.  Set this nice and low,
-# so once some writeout has commenced, we do a lot of it.
-#
-DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'}
-
-# kernel default dirty buffer age
-DEF_AGE=${DEF_AGE:-'30'}
-DEF_UPDATE=${DEF_UPDATE:-'5'}
-DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'}
-DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'}
-DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'}
-DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'}
-DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'}
-
-# This must be adjusted manually to the value of HZ in the running kernel
-# on 2.4, until the XFS people change their 2.4 external interfaces to work in
-# centisecs. This can be automated, but it's a work in progress that still needs
-# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external
-# interfaces, and that is currently always set to 100. So you don't need to
-# change this on 2.6.
-XFS_HZ=${XFS_HZ:-'100'}
-
-#############################################################################
-
-KLEVEL="$(uname -r |
-             {
-	       IFS='.' read a b c
-	       echo $a.$b
-	     }
-)"
-case "$KLEVEL" in
-	"2.4"|"2.6")
-		;;
-	*)
-		echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2
-		exit 1
-		;;
-esac
-
-if [ ! -e /proc/sys/vm/laptop_mode ] ; then
-	echo "Kernel is not patched with laptop_mode patch." >&2
-	exit 1
-fi
-
-if [ ! -w /proc/sys/vm/laptop_mode ] ; then
-	echo "You do not have enough privileges to enable laptop_mode." >&2
-	exit 1
-fi
-
-# Remove an option (the first parameter) of the form option=<number> from
-# a mount options string (the rest of the parameters).
-parse_mount_opts () {
-	OPT="$1"
-	shift
-	echo ",$*," | sed		\
-	 -e 's/,'"$OPT"'=[0-9]*,/,/g'	\
-	 -e 's/,,*/,/g'			\
-	 -e 's/^,//'			\
-	 -e 's/,$//'
-}
-
-# Remove an option (the first parameter) without any arguments from
-# a mount option string (the rest of the parameters).
-parse_nonumber_mount_opts () {
-	OPT="$1"
-	shift
-	echo ",$*," | sed		\
-	 -e 's/,'"$OPT"',/,/g'		\
-	 -e 's/,,*/,/g'			\
-	 -e 's/^,//'			\
-	 -e 's/,$//'
-}
-
-# Find out the state of a yes/no option (e.g. "atime"/"noatime") in
-# fstab for a given filesystem, and use this state to replace the
-# value of the option in another mount options string. The device
-# is the first argument, the option name the second, and the default
-# value the third. The remainder is the mount options string.
-#
-# Example:
-# parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime
-#
-# If fstab contains, say, "rw" for this filesystem, then the result
-# will be "defaults,atime".
-parse_yesno_opts_wfstab () {
-	L_DEV="$1"
-	OPT="$2"
-	DEF_OPT="$3"
-	shift 3
-	L_OPTS="$*"
-	PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)"
-	PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)"
-	# Watch for a default atime in fstab
-	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
-	if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then
-		# option specified in fstab: extract the value and use it
-		if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then
-			echo "$PARSEDOPTS1,no$OPT"
-		else
-			# no$OPT not found -- so we must have $OPT.
-			echo "$PARSEDOPTS1,$OPT"
-		fi
-	else
-		# option not specified in fstab -- choose the default.
-		echo "$PARSEDOPTS1,$DEF_OPT"
-	fi
-}
-
-# Find out the state of a numbered option (e.g. "commit=NNN") in
-# fstab for a given filesystem, and use this state to replace the
-# value of the option in another mount options string. The device
-# is the first argument, and the option name the second. The
-# remainder is the mount options string in which the replacement
-# must be done.
-#
-# Example:
-# parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7
-#
-# If fstab contains, say, "commit=3,rw" for this filesystem, then the
-# result will be "rw,commit=3".
-parse_mount_opts_wfstab () {
-	L_DEV="$1"
-	OPT="$2"
-	shift 2
-	L_OPTS="$*"
-	PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)"
-	# Watch for a default commit in fstab
-	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
-	if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then
-		# option specified in fstab: extract the value, and use it
-		echo -n "$PARSEDOPTS1,$OPT="
-		echo ",$FSTAB_OPTS," | sed \
-		 -e 's/.*,'"$OPT"'=//'	\
-		 -e 's/,.*//'
-	else
-		# option not specified in fstab: set it to 0
-		echo "$PARSEDOPTS1,$OPT=0"
-	fi
-}
-
-deduce_fstype () {
-	MP="$1"
-	# My root filesystem unfortunately has
-	# type "unknown" in /etc/mtab. If we encounter
-	# "unknown", we try to get the type from fstab.
-	cat /etc/fstab |
-	grep -v '^#' |
-	while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do
-		if [ "$FSTAB_MP" = "$MP" ]; then
-			echo $FSTAB_FST
-			exit 0
-		fi
-	done
-}
-
-if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then
-	NOATIME_OPT=",noatime"
-fi
-
-case "$1" in
-	start)
-		AGE=$((100*$MAX_AGE))
-		XFS_AGE=$(($XFS_HZ*$MAX_AGE))
-		echo -n "Starting laptop_mode"
-
-		if [ -d /proc/sys/vm/pagebuf ] ; then
-			# (For 2.4 and early 2.6.)
-			# This only needs to be set, not reset -- it is only used when
-			# laptop mode is enabled.
-			echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age
-			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
-		elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
-			# (A couple of early 2.6 laptop mode patches had these.)
-			# The same goes for these.
-			echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer
-			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
-		elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then
-			# (2.6.6)
-			# But not for these -- they are also used in normal
-			# operation.
-			echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer
-			echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval
-		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
-			# (2.6.7 upwards)
-			# And not for these either. These are in centisecs,
-			# not USER_HZ, so we have to use $AGE, not $XFS_AGE.
-			echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs
-			echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs
-			echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs
-		fi
-
-		case "$KLEVEL" in
-			"2.4")
-				echo 1					> /proc/sys/vm/laptop_mode
-				echo "30 500 0 0 $AGE $AGE 60 20 0"	> /proc/sys/vm/bdflush
-				;;
-			"2.6")
-				echo 5					> /proc/sys/vm/laptop_mode
-				echo "$AGE"				> /proc/sys/vm/dirty_writeback_centisecs
-				echo "$AGE"				> /proc/sys/vm/dirty_expire_centisecs
-				echo "$DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
-				echo "$DIRTY_BACKGROUND_RATIO"		> /proc/sys/vm/dirty_background_ratio
-				;;
-		esac
-		if [ $DO_REMOUNTS -eq 1 ]; then
-			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
-				PARSEDOPTS="$(parse_mount_opts "$OPTS")"
-				if [ "$FST" = 'unknown' ]; then
-					FST=$(deduce_fstype $MP)
-				fi
-				case "$FST" in
-					"ext3"|"reiserfs")
-						PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
-						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
-						;;
-					"xfs")
-						mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT
-						;;
-				esac
-				if [ -b $DEV ] ; then
-					blockdev --setra $(($READAHEAD * 2)) $DEV
-				fi
-			done
-		fi
-		if [ $DO_HD -eq 1 ] ; then
-			for THISHD in $HD ; do
-				/sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1
-				/sbin/hdparm -B 1 $THISHD > /dev/null 2>&1
-			done
-		fi
-		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
-			if [ $CPU_MAXFREQ = 'slowest' ]; then
-				CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq`
-			fi
-			echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-		fi
-		echo "."
-		;;
-	stop)
-		U_AGE=$((100*$DEF_UPDATE))
-		B_AGE=$((100*$DEF_AGE))
-		echo -n "Stopping laptop_mode"
-		echo 0 > /proc/sys/vm/laptop_mode
-		if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
-			# These need to be restored, if there are no lm_*.
-			echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER))	 	> /proc/sys/fs/xfs/age_buffer
-			echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) 	> /proc/sys/fs/xfs/sync_interval
-		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
-			# These need to be restored as well.
-			echo $((100*$DEF_XFS_AGE_BUFFER))	> /proc/sys/fs/xfs/age_buffer_centisecs
-			echo $((100*$DEF_XFS_SYNC_INTERVAL))	> /proc/sys/fs/xfs/xfssyncd_centisecs
-			echo $((100*$DEF_XFS_BUFD_INTERVAL))	> /proc/sys/fs/xfs/xfsbufd_centisecs
-		fi
-		case "$KLEVEL" in
-			"2.4")
-				echo "30 500 0 0 $U_AGE $B_AGE 60 20 0"	> /proc/sys/vm/bdflush
-				;;
-			"2.6")
-				echo "$U_AGE"				> /proc/sys/vm/dirty_writeback_centisecs
-				echo "$B_AGE"				> /proc/sys/vm/dirty_expire_centisecs
-				echo "$DEF_DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
-				echo "$DEF_DIRTY_BACKGROUND_RATIO"	> /proc/sys/vm/dirty_background_ratio
-				;;
-		esac
-		if [ $DO_REMOUNTS -eq 1 ] ; then
-			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
-				# Reset commit and atime options to defaults.
-				if [ "$FST" = 'unknown' ]; then
-					FST=$(deduce_fstype $MP)
-				fi
-				case "$FST" in
-					"ext3"|"reiserfs")
-						PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
-						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
-						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
-						;;
-					"xfs")
-						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)"
-						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
-						;;
-				esac
-				if [ -b $DEV ] ; then
-					blockdev --setra 256 $DEV
-				fi
-			done
-		fi
-		if [ $DO_HD -eq 1 ] ; then
-			for THISHD in $HD ; do
-				/sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1
-				/sbin/hdparm -B 255 $THISHD > /dev/null 2>&1
-			done
-		fi
-		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
-			echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-		fi
-		echo "."
-		;;
-	*)
-		echo "Usage: $0 {start|stop}" 2>&1
-		exit 1
-		;;
-
-esac
-
-exit 0
---------------------CONTROL SCRIPT END------------------------------------------
-
-
-ACPI integration
-----------------
-
-Dax Kelson submitted this so that the ACPI acpid daemon will
-kick off the laptop_mode script and run hdparm. The part that
-automatically disables laptop mode when the battery is low was
-written by Jan Topinski.
-
------------------/etc/acpi/events/ac_adapter BEGIN------------------------------
-event=ac_adapter
-action=/etc/acpi/actions/ac.sh %e
-----------------/etc/acpi/events/ac_adapter END---------------------------------
-
-
------------------/etc/acpi/events/battery BEGIN---------------------------------
-event=battery.*
-action=/etc/acpi/actions/battery.sh %e
-----------------/etc/acpi/events/battery END------------------------------------
-
-
-----------------/etc/acpi/actions/ac.sh BEGIN-----------------------------------
-#!/bin/bash
-
-# ac on/offline event handler
-
-status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state`
-
-case $status in
-        "on-line")
-                /sbin/laptop_mode stop
-                exit 0
-        ;;
-        "off-line")
-                /sbin/laptop_mode start
-                exit 0
-        ;;
-esac
----------------------------/etc/acpi/actions/ac.sh END--------------------------
-
-
----------------------------/etc/acpi/actions/battery.sh BEGIN-------------------
-#! /bin/bash
-
-# Automatically disable laptop mode when the battery almost runs out.
-
-BATT_INFO=/proc/acpi/battery/$2/state
-
-if [[ -f /proc/sys/vm/laptop_mode ]]
-then
-   LM=`cat /proc/sys/vm/laptop_mode`
-   if [[ $LM -gt 0 ]]
-   then
-     if [[ -f $BATT_INFO ]]
-     then
-        # Source the config file only now that we know we need
-        if [ -f /etc/default/laptop-mode ] ; then
-                # Debian
-                . /etc/default/laptop-mode
-        elif [ -f /etc/sysconfig/laptop-mode ] ; then
-                # Others
-                . /etc/sysconfig/laptop-mode
-        fi
-        MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'}
-
-        ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`"
-        if [[ ACTION -eq "discharging" ]]
-        then
-           PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
-           REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
-        fi
-        if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES))
-        then
-           /sbin/laptop_mode stop
-        fi
-     else
-       logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path."
-     fi
-   fi
-fi
----------------------------/etc/acpi/actions/battery.sh END--------------------
-
-
-Monitoring tool
----------------
-
-Bartek Kania submitted this, it can be used to measure how much time your disk
-spends spun up/down.  See tools/laptop/dslm/dslm.c
diff --git a/Documentation/laptops/sony-laptop.rst b/Documentation/laptops/sony-laptop.rst
new file mode 100644
index 000000000000..9edcc7f6612f
--- /dev/null
+++ b/Documentation/laptops/sony-laptop.rst
@@ -0,0 +1,174 @@
+=========================================
+Sony Notebook Control Driver (SNC) Readme
+=========================================
+
+	- Copyright (C) 2004- 2005 Stelian Pop <stelian@popies.net>
+	- Copyright (C) 2007 Mattia Dongili <malattia@linux.it>
+
+This mini-driver drives the SNC and SPIC device present in the ACPI BIOS of the
+Sony Vaio laptops. This driver mixes both devices functions under the same
+(hopefully consistent) interface. This also means that the sonypi driver is
+obsoleted by sony-laptop now.
+
+Fn keys (hotkeys):
+------------------
+
+Some models report hotkeys through the SNC or SPIC devices, such events are
+reported both through the ACPI subsystem as acpi events and through the INPUT
+subsystem. See the logs of /proc/bus/input/devices to find out what those
+events are and which input devices are created by the driver.
+Additionally, loading the driver with the debug option will report all events
+in the kernel log.
+
+The "scancodes" passed to the input system (that can be remapped with udev)
+are indexes to the table "sony_laptop_input_keycode_map" in the sony-laptop.c
+module.  For example the "FN/E" key combination (EJECTCD on some models)
+generates the scancode 20 (0x14).
+
+Backlight control:
+------------------
+If your laptop model supports it, you will find sysfs files in the
+/sys/class/backlight/sony/
+directory. You will be able to query and set the current screen
+brightness:
+
+	======================	=========================================
+	brightness		get/set screen brightness (an integer
+				between 0 and 7)
+	actual_brightness	reading from this file will query the HW
+				to get real brightness value
+	max_brightness		the maximum brightness value
+	======================	=========================================
+
+
+Platform specific:
+------------------
+Loading the sony-laptop module will create a
+/sys/devices/platform/sony-laptop/
+directory populated with some files.
+
+You then read/write integer values from/to those files by using
+standard UNIX tools.
+
+The files are:
+
+	======================	==========================================
+	brightness_default	screen brightness which will be set
+				when the laptop will be rebooted
+	cdpower			power on/off the internal CD drive
+	audiopower		power on/off the internal sound card
+	lanpower		power on/off the internal ethernet card
+				(only in debug mode)
+	bluetoothpower		power on/off the internal bluetooth device
+	fanspeed		get/set the fan speed
+	======================	==========================================
+
+Note that some files may be missing if they are not supported
+by your particular laptop model.
+
+Example usage::
+
+	# echo "1" > /sys/devices/platform/sony-laptop/brightness_default
+
+sets the lowest screen brightness for the next and later reboots
+
+::
+
+	# echo "8" > /sys/devices/platform/sony-laptop/brightness_default
+
+sets the highest screen brightness for the next and later reboots
+
+::
+
+	# cat /sys/devices/platform/sony-laptop/brightness_default
+
+retrieves the value
+
+::
+
+	# echo "0" > /sys/devices/platform/sony-laptop/audiopower
+
+powers off the sound card
+
+::
+
+	# echo "1" > /sys/devices/platform/sony-laptop/audiopower
+
+powers on the sound card.
+
+
+RFkill control:
+---------------
+More recent Vaio models expose a consistent set of ACPI methods to
+control radio frequency emitting devices. If you are a lucky owner of
+such a laptop you will find the necessary rfkill devices under
+/sys/class/rfkill. Check those starting with sony-* in::
+
+	# grep . /sys/class/rfkill/*/{state,name}
+
+
+Development:
+------------
+
+If you want to help with the development of this driver (and
+you are not afraid of any side effects doing strange things with
+your ACPI BIOS could have on your laptop), load the driver and
+pass the option 'debug=1'.
+
+REPEAT:
+	**DON'T DO THIS IF YOU DON'T LIKE RISKY BUSINESS.**
+
+In your kernel logs you will find the list of all ACPI methods
+the SNC device has on your laptop.
+
+* For new models you will see a long list of meaningless method names,
+  reading the DSDT table source should reveal that:
+
+(1) the SNC device uses an internal capability lookup table
+(2) SN00 is used to find values in the lookup table
+(3) SN06 and SN07 are used to call into the real methods based on
+    offsets you can obtain iterating the table using SN00
+(4) SN02 used to enable events.
+
+Some values in the capability lookup table are more or less known, see
+the code for all sony_call_snc_handle calls, others are more obscure.
+
+* For old models you can see the GCDP/GCDP methods used to pwer on/off
+  the CD drive, but there are others and they are usually different from
+  model to model.
+
+**I HAVE NO IDEA WHAT THOSE METHODS DO.**
+
+The sony-laptop driver creates, for some of those methods (the most
+current ones found on several Vaio models), an entry under
+/sys/devices/platform/sony-laptop, just like the 'cdpower' one.
+You can create other entries corresponding to your own laptop methods by
+further editing the source (see the 'sony_nc_values' table, and add a new
+entry to this table with your get/set method names using the
+SNC_HANDLE_NAMES macro).
+
+Your mission, should you accept it, is to try finding out what
+those entries are for, by reading/writing random values from/to those
+files and find out what is the impact on your laptop.
+
+Should you find anything interesting, please report it back to me,
+I will not disavow all knowledge of your actions :)
+
+See also http://www.linux.it/~malattia/wiki/index.php/Sony_drivers for other
+useful info.
+
+Bugs/Limitations:
+-----------------
+
+* This driver is not based on official documentation from Sony
+  (because there is none), so there is no guarantee this driver
+  will work at all, or do the right thing. Although this hasn't
+  happened to me, this driver could do very bad things to your
+  laptop, including permanent damage.
+
+* The sony-laptop and sonypi drivers do not interact at all. In the
+  future, sonypi will be removed and replaced by sony-laptop.
+
+* spicctrl, which is the userspace tool used to communicate with the
+  sonypi driver (through /dev/sonypi) is deprecated as well since all
+  its features are now available under the sysfs tree via sony-laptop.
diff --git a/Documentation/laptops/sony-laptop.txt b/Documentation/laptops/sony-laptop.txt
deleted file mode 100644
index 978b1e615155..000000000000
--- a/Documentation/laptops/sony-laptop.txt
+++ /dev/null
@@ -1,144 +0,0 @@
-Sony Notebook Control Driver (SNC) Readme
------------------------------------------
-	Copyright (C) 2004- 2005 Stelian Pop <stelian@popies.net>
-	Copyright (C) 2007 Mattia Dongili <malattia@linux.it>
-
-This mini-driver drives the SNC and SPIC device present in the ACPI BIOS of the
-Sony Vaio laptops. This driver mixes both devices functions under the same
-(hopefully consistent) interface. This also means that the sonypi driver is
-obsoleted by sony-laptop now.
-
-Fn keys (hotkeys):
-------------------
-Some models report hotkeys through the SNC or SPIC devices, such events are
-reported both through the ACPI subsystem as acpi events and through the INPUT
-subsystem. See the logs of /proc/bus/input/devices to find out what those
-events are and which input devices are created by the driver.
-Additionally, loading the driver with the debug option will report all events
-in the kernel log.
-
-The "scancodes" passed to the input system (that can be remapped with udev)
-are indexes to the table "sony_laptop_input_keycode_map" in the sony-laptop.c
-module.  For example the "FN/E" key combination (EJECTCD on some models)
-generates the scancode 20 (0x14).
-
-Backlight control:
-------------------
-If your laptop model supports it, you will find sysfs files in the
-/sys/class/backlight/sony/
-directory. You will be able to query and set the current screen
-brightness:
-	brightness		get/set screen brightness (an integer
-				between 0 and 7)
-	actual_brightness	reading from this file will query the HW
-				to get real brightness value
-	max_brightness		the maximum brightness value
-
-
-Platform specific:
-------------------
-Loading the sony-laptop module will create a
-/sys/devices/platform/sony-laptop/
-directory populated with some files.
-
-You then read/write integer values from/to those files by using
-standard UNIX tools.
-
-The files are:
-	brightness_default	screen brightness which will be set
-				when the laptop will be rebooted
-	cdpower			power on/off the internal CD drive
-	audiopower		power on/off the internal sound card
-	lanpower		power on/off the internal ethernet card
-				(only in debug mode)
-	bluetoothpower		power on/off the internal bluetooth device
-	fanspeed		get/set the fan speed
-
-Note that some files may be missing if they are not supported
-by your particular laptop model.
-
-Example usage:
-	# echo "1" > /sys/devices/platform/sony-laptop/brightness_default
-sets the lowest screen brightness for the next and later reboots,
-	# echo "8" > /sys/devices/platform/sony-laptop/brightness_default
-sets the highest screen brightness for the next and later reboots,
-	# cat /sys/devices/platform/sony-laptop/brightness_default
-retrieves the value.
-
-	# echo "0" > /sys/devices/platform/sony-laptop/audiopower
-powers off the sound card,
-	# echo "1" > /sys/devices/platform/sony-laptop/audiopower
-powers on the sound card.
-
-
-RFkill control:
----------------
-More recent Vaio models expose a consistent set of ACPI methods to
-control radio frequency emitting devices. If you are a lucky owner of
-such a laptop you will find the necessary rfkill devices under
-/sys/class/rfkill. Check those starting with sony-* in
-	# grep . /sys/class/rfkill/*/{state,name}
-
-
-Development:
-------------
-
-If you want to help with the development of this driver (and
-you are not afraid of any side effects doing strange things with
-your ACPI BIOS could have on your laptop), load the driver and
-pass the option 'debug=1'.
-
-REPEAT: DON'T DO THIS IF YOU DON'T LIKE RISKY BUSINESS.
-
-In your kernel logs you will find the list of all ACPI methods
-the SNC device has on your laptop.
-
-* For new models you will see a long list of meaningless method names,
-reading the DSDT table source should reveal that:
-(1) the SNC device uses an internal capability lookup table
-(2) SN00 is used to find values in the lookup table
-(3) SN06 and SN07 are used to call into the real methods based on
-    offsets you can obtain iterating the table using SN00
-(4) SN02 used to enable events.
-Some values in the capability lookup table are more or less known, see
-the code for all sony_call_snc_handle calls, others are more obscure.
-
-* For old models you can see the GCDP/GCDP methods used to pwer on/off
-the CD drive, but there are others and they are usually different from
-model to model.
-
-I HAVE NO IDEA WHAT THOSE METHODS DO.
-
-The sony-laptop driver creates, for some of those methods (the most
-current ones found on several Vaio models), an entry under
-/sys/devices/platform/sony-laptop, just like the 'cdpower' one.
-You can create other entries corresponding to your own laptop methods by
-further editing the source (see the 'sony_nc_values' table, and add a new
-entry to this table with your get/set method names using the
-SNC_HANDLE_NAMES macro).
-
-Your mission, should you accept it, is to try finding out what
-those entries are for, by reading/writing random values from/to those
-files and find out what is the impact on your laptop.
-
-Should you find anything interesting, please report it back to me,
-I will not disavow all knowledge of your actions :)
-
-See also http://www.linux.it/~malattia/wiki/index.php/Sony_drivers for other
-useful info.
-
-Bugs/Limitations:
------------------
-
-* This driver is not based on official documentation from Sony
-  (because there is none), so there is no guarantee this driver
-  will work at all, or do the right thing. Although this hasn't
-  happened to me, this driver could do very bad things to your
-  laptop, including permanent damage.
-
-* The sony-laptop and sonypi drivers do not interact at all. In the
-  future, sonypi will be removed and replaced by sony-laptop.
-
-* spicctrl, which is the userspace tool used to communicate with the
-  sonypi driver (through /dev/sonypi) is deprecated as well since all
-  its features are now available under the sysfs tree via sony-laptop.
diff --git a/Documentation/laptops/sonypi.rst b/Documentation/laptops/sonypi.rst
new file mode 100644
index 000000000000..2a1975ed7ee4
--- /dev/null
+++ b/Documentation/laptops/sonypi.rst
@@ -0,0 +1,160 @@
+==================================================
+Sony Programmable I/O Control Device Driver Readme
+==================================================
+
+	- Copyright (C) 2001-2004 Stelian Pop <stelian@popies.net>
+	- Copyright (C) 2001-2002 Alcôve <www.alcove.com>
+	- Copyright (C) 2001 Michael Ashley <m.ashley@unsw.edu.au>
+	- Copyright (C) 2001 Junichi Morita <jun1m@mars.dti.ne.jp>
+	- Copyright (C) 2000 Takaya Kinjo <t-kinjo@tc4.so-net.ne.jp>
+	- Copyright (C) 2000 Andrew Tridgell <tridge@samba.org>
+
+This driver enables access to the Sony Programmable I/O Control Device which
+can be found in many Sony Vaio laptops. Some newer Sony laptops (seems to be
+limited to new FX series laptops, at least the FX501 and the FX702) lack a
+sonypi device and are not supported at all by this driver.
+
+It will give access (through a user space utility) to some events those laptops
+generate, like:
+
+	- jogdial events (the small wheel on the side of Vaios)
+	- capture button events (only on Vaio Picturebook series)
+	- Fn keys
+	- bluetooth button (only on C1VR model)
+	- programmable keys, back, help, zoom, thumbphrase buttons, etc.
+	  (when available)
+
+Those events (see linux/sonypi.h) can be polled using the character device node
+/dev/sonypi (major 10, minor auto allocated or specified as a option).
+A simple daemon which translates the jogdial movements into mouse wheel events
+can be downloaded at: <http://popies.net/sonypi/>
+
+Another option to intercept the events is to get them directly through the
+input layer.
+
+This driver supports also some ioctl commands for setting the LCD screen
+brightness and querying the batteries charge information (some more
+commands may be added in the future).
+
+This driver can also be used to set the camera controls on Picturebook series
+(brightness, contrast etc), and is used by the video4linux driver for the
+Motion Eye camera.
+
+Please note that this driver was created by reverse engineering the Windows
+driver and the ACPI BIOS, because Sony doesn't agree to release any programming
+specs for its laptops. If someone convinces them to do so, drop me a note.
+
+Driver options:
+---------------
+
+Several options can be passed to the sonypi driver using the standard
+module argument syntax (<param>=<value> when passing the option to the
+module or sonypi.<param>=<value> on the kernel boot line when sonypi is
+statically linked into the kernel). Those options are:
+
+	=============== =======================================================
+	minor: 		minor number of the misc device /dev/sonypi,
+			default is -1 (automatic allocation, see /proc/misc
+			or kernel logs)
+
+	camera:		if you have a PictureBook series Vaio (with the
+			integrated MotionEye camera), set this parameter to 1
+			in order to let the driver access to the camera
+
+	fnkeyinit:	on some Vaios (C1VE, C1VR etc), the Fn key events don't
+			get enabled unless you set this parameter to 1.
+			Do not use this option unless it's actually necessary,
+			some Vaio models don't deal well with this option.
+			This option is available only if the kernel is
+			compiled without ACPI support (since it conflicts
+			with it and it shouldn't be required anyway if
+			ACPI is already enabled).
+
+	verbose:	set to 1 to print unknown events received from the
+			sonypi device.
+			set to 2 to print all events received from the
+			sonypi device.
+
+	compat:		uses some compatibility code for enabling the sonypi
+			events. If the driver worked for you in the past
+			(prior to version 1.5) and does not work anymore,
+			add this option and report to the author.
+
+	mask:		event mask telling the driver what events will be
+			reported to the user. This parameter is required for
+			some Vaio models where the hardware reuses values
+			used in other Vaio models (like the FX series who does
+			not have a jogdial but reuses the jogdial events for
+			programmable keys events). The default event mask is
+			set to 0xffffffff, meaning that all possible events
+			will be tried. You can use the following bits to
+			construct your own event mask (from
+			drivers/char/sonypi.h):
+
+				========================	======
+				SONYPI_JOGGER_MASK 		0x0001
+				SONYPI_CAPTURE_MASK 		0x0002
+				SONYPI_FNKEY_MASK 		0x0004
+				SONYPI_BLUETOOTH_MASK 		0x0008
+				SONYPI_PKEY_MASK 		0x0010
+				SONYPI_BACK_MASK 		0x0020
+				SONYPI_HELP_MASK 		0x0040
+				SONYPI_LID_MASK 		0x0080
+				SONYPI_ZOOM_MASK 		0x0100
+				SONYPI_THUMBPHRASE_MASK 	0x0200
+				SONYPI_MEYE_MASK		0x0400
+				SONYPI_MEMORYSTICK_MASK		0x0800
+				SONYPI_BATTERY_MASK		0x1000
+				SONYPI_WIRELESS_MASK		0x2000
+				========================	======
+
+	useinput:	if set (which is the default) two input devices are
+			created, one which interprets the jogdial events as
+			mouse events, the other one which acts like a
+			keyboard reporting the pressing of the special keys.
+	=============== =======================================================
+
+Module use:
+-----------
+
+In order to automatically load the sonypi module on use, you can put those
+lines a configuration file in /etc/modprobe.d/::
+
+	alias char-major-10-250 sonypi
+	options sonypi minor=250
+
+This supposes the use of minor 250 for the sonypi device::
+
+	# mknod /dev/sonypi c 10 250
+
+Bugs:
+-----
+
+	- several users reported that this driver disables the BIOS-managed
+	  Fn-keys which put the laptop in sleeping state, or switch the
+	  external monitor on/off. There is no workaround yet, since this
+	  driver disables all APM management for those keys, by enabling the
+	  ACPI management (and the ACPI core stuff is not complete yet). If
+	  you have one of those laptops with working Fn keys and want to
+	  continue to use them, don't use this driver.
+
+	- some users reported that the laptop speed is lower (dhrystone
+	  tested) when using the driver with the fnkeyinit parameter. I cannot
+	  reproduce it on my laptop and not all users have this problem.
+	  This happens because the fnkeyinit parameter enables the ACPI
+	  mode (but without additional ACPI control, like processor
+	  speed handling etc). Use ACPI instead of APM if it works on your
+	  laptop.
+
+	- sonypi lacks the ability to distinguish between certain key
+	  events on some models.
+
+	- some models with the nvidia card (geforce go 6200 tc) uses a
+	  different way to adjust the backlighting of the screen. There
+	  is a userspace utility to adjust the brightness on those models,
+	  which can be downloaded from
+	  http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
+
+	- since all development was done by reverse engineering, there is
+	  *absolutely no guarantee* that this driver will not crash your
+	  laptop. Permanently.
diff --git a/Documentation/laptops/sonypi.txt b/Documentation/laptops/sonypi.txt
deleted file mode 100644
index 606bdb9ce036..000000000000
--- a/Documentation/laptops/sonypi.txt
+++ /dev/null
@@ -1,152 +0,0 @@
-Sony Programmable I/O Control Device Driver Readme
---------------------------------------------------
-	Copyright (C) 2001-2004 Stelian Pop <stelian@popies.net>
-	Copyright (C) 2001-2002 Alcôve <www.alcove.com>
-	Copyright (C) 2001 Michael Ashley <m.ashley@unsw.edu.au>
-	Copyright (C) 2001 Junichi Morita <jun1m@mars.dti.ne.jp>
-	Copyright (C) 2000 Takaya Kinjo <t-kinjo@tc4.so-net.ne.jp>
-	Copyright (C) 2000 Andrew Tridgell <tridge@samba.org>
-
-This driver enables access to the Sony Programmable I/O Control Device which
-can be found in many Sony Vaio laptops. Some newer Sony laptops (seems to be
-limited to new FX series laptops, at least the FX501 and the FX702) lack a
-sonypi device and are not supported at all by this driver.
-
-It will give access (through a user space utility) to some events those laptops
-generate, like:
-	- jogdial events (the small wheel on the side of Vaios)
-	- capture button events (only on Vaio Picturebook series)
-	- Fn keys
-	- bluetooth button (only on C1VR model)
-	- programmable keys, back, help, zoom, thumbphrase buttons, etc.
-	  (when available)
-
-Those events (see linux/sonypi.h) can be polled using the character device node
-/dev/sonypi (major 10, minor auto allocated or specified as a option).
-A simple daemon which translates the jogdial movements into mouse wheel events
-can be downloaded at: <http://popies.net/sonypi/>
-
-Another option to intercept the events is to get them directly through the
-input layer.
-
-This driver supports also some ioctl commands for setting the LCD screen
-brightness and querying the batteries charge information (some more
-commands may be added in the future).
-
-This driver can also be used to set the camera controls on Picturebook series
-(brightness, contrast etc), and is used by the video4linux driver for the
-Motion Eye camera.
-
-Please note that this driver was created by reverse engineering the Windows
-driver and the ACPI BIOS, because Sony doesn't agree to release any programming
-specs for its laptops. If someone convinces them to do so, drop me a note.
-
-Driver options:
----------------
-
-Several options can be passed to the sonypi driver using the standard
-module argument syntax (<param>=<value> when passing the option to the
-module or sonypi.<param>=<value> on the kernel boot line when sonypi is
-statically linked into the kernel). Those options are:
-
-	minor: 		minor number of the misc device /dev/sonypi,
-			default is -1 (automatic allocation, see /proc/misc
-			or kernel logs)
-
-	camera:		if you have a PictureBook series Vaio (with the
-			integrated MotionEye camera), set this parameter to 1
-			in order to let the driver access to the camera
-
-	fnkeyinit:	on some Vaios (C1VE, C1VR etc), the Fn key events don't
-			get enabled unless you set this parameter to 1.
-			Do not use this option unless it's actually necessary,
-			some Vaio models don't deal well with this option.
-			This option is available only if the kernel is
-			compiled without ACPI support (since it conflicts
-			with it and it shouldn't be required anyway if
-			ACPI is already enabled).
-
-	verbose:	set to 1 to print unknown events received from the
-			sonypi device.
-			set to 2 to print all events received from the
-			sonypi device.
-
-	compat:		uses some compatibility code for enabling the sonypi
-			events. If the driver worked for you in the past
-			(prior to version 1.5) and does not work anymore,
-			add this option and report to the author.
-
-	mask:		event mask telling the driver what events will be
-			reported to the user. This parameter is required for
-			some Vaio models where the hardware reuses values
-			used in other Vaio models (like the FX series who does
-			not have a jogdial but reuses the jogdial events for
-			programmable keys events). The default event mask is
-			set to 0xffffffff, meaning that all possible events
-			will be tried. You can use the following bits to
-			construct your own event mask (from
-			drivers/char/sonypi.h):
-				SONYPI_JOGGER_MASK 		0x0001
-				SONYPI_CAPTURE_MASK 		0x0002
-				SONYPI_FNKEY_MASK 		0x0004
-				SONYPI_BLUETOOTH_MASK 		0x0008
-				SONYPI_PKEY_MASK 		0x0010
-				SONYPI_BACK_MASK 		0x0020
-				SONYPI_HELP_MASK 		0x0040
-				SONYPI_LID_MASK 		0x0080
-				SONYPI_ZOOM_MASK 		0x0100
-				SONYPI_THUMBPHRASE_MASK 	0x0200
-				SONYPI_MEYE_MASK		0x0400
-				SONYPI_MEMORYSTICK_MASK		0x0800
-				SONYPI_BATTERY_MASK		0x1000
-				SONYPI_WIRELESS_MASK		0x2000
-
-	useinput:	if set (which is the default) two input devices are
-			created, one which interprets the jogdial events as
-			mouse events, the other one which acts like a
-			keyboard reporting the pressing of the special keys.
-
-Module use:
------------
-
-In order to automatically load the sonypi module on use, you can put those
-lines a configuration file in /etc/modprobe.d/:
-
-	alias char-major-10-250 sonypi
-	options sonypi minor=250
-
-This supposes the use of minor 250 for the sonypi device:
-
-	# mknod /dev/sonypi c 10 250
-
-Bugs:
------
-
-	- several users reported that this driver disables the BIOS-managed
-	  Fn-keys which put the laptop in sleeping state, or switch the
-	  external monitor on/off. There is no workaround yet, since this
-	  driver disables all APM management for those keys, by enabling the
-	  ACPI management (and the ACPI core stuff is not complete yet). If
-	  you have one of those laptops with working Fn keys and want to
-	  continue to use them, don't use this driver.
-
-	- some users reported that the laptop speed is lower (dhrystone
-	  tested) when using the driver with the fnkeyinit parameter. I cannot
-	  reproduce it on my laptop and not all users have this problem.
-	  This happens because the fnkeyinit parameter enables the ACPI
-	  mode (but without additional ACPI control, like processor
-	  speed handling etc). Use ACPI instead of APM if it works on your
-	  laptop.
-
-	- sonypi lacks the ability to distinguish between certain key
-	  events on some models.
-
-	- some models with the nvidia card (geforce go 6200 tc) uses a
-	  different way to adjust the backlighting of the screen. There
-	  is a userspace utility to adjust the brightness on those models,
-	  which can be downloaded from
-	  http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
-
-	- since all development was done by reverse engineering, there is
-	  _absolutely no guarantee_ that this driver will not crash your
-	  laptop. Permanently.
diff --git a/Documentation/laptops/thinkpad-acpi.rst b/Documentation/laptops/thinkpad-acpi.rst
new file mode 100644
index 000000000000..19d52fc3c5e9
--- /dev/null
+++ b/Documentation/laptops/thinkpad-acpi.rst
@@ -0,0 +1,1562 @@
+===========================
+ThinkPad ACPI Extras Driver
+===========================
+
+Version 0.25
+
+October 16th,  2013
+
+- Borislav Deianov <borislav@users.sf.net>
+- Henrique de Moraes Holschuh <hmh@hmh.eng.br>
+
+http://ibm-acpi.sf.net/
+
+This is a Linux driver for the IBM and Lenovo ThinkPad laptops. It
+supports various features of these laptops which are accessible
+through the ACPI and ACPI EC framework, but not otherwise fully
+supported by the generic Linux ACPI drivers.
+
+This driver used to be named ibm-acpi until kernel 2.6.21 and release
+0.13-20070314.  It used to be in the drivers/acpi tree, but it was
+moved to the drivers/misc tree and renamed to thinkpad-acpi for kernel
+2.6.22, and release 0.14.  It was moved to drivers/platform/x86 for
+kernel 2.6.29 and release 0.22.
+
+The driver is named "thinkpad-acpi".  In some places, like module
+names and log messages, "thinkpad_acpi" is used because of userspace
+issues.
+
+"tpacpi" is used as a shorthand where "thinkpad-acpi" would be too
+long due to length limitations on some Linux kernel versions.
+
+Status
+------
+
+The features currently supported are the following (see below for
+detailed description):
+
+	- Fn key combinations
+	- Bluetooth enable and disable
+	- video output switching, expansion control
+	- ThinkLight on and off
+	- CMOS/UCMS control
+	- LED control
+	- ACPI sounds
+	- temperature sensors
+	- Experimental: embedded controller register dump
+	- LCD brightness control
+	- Volume control
+	- Fan control and monitoring: fan speed, fan enable/disable
+	- WAN enable and disable
+	- UWB enable and disable
+
+A compatibility table by model and feature is maintained on the web
+site, http://ibm-acpi.sf.net/. I appreciate any success or failure
+reports, especially if they add to or correct the compatibility table.
+Please include the following information in your report:
+
+	- ThinkPad model name
+	- a copy of your ACPI tables, using the "acpidump" utility
+	- a copy of the output of dmidecode, with serial numbers
+	  and UUIDs masked off
+	- which driver features work and which don't
+	- the observed behavior of non-working features
+
+Any other comments or patches are also more than welcome.
+
+
+Installation
+------------
+
+If you are compiling this driver as included in the Linux kernel
+sources, look for the CONFIG_THINKPAD_ACPI Kconfig option.
+It is located on the menu path: "Device Drivers" -> "X86 Platform
+Specific Device Drivers" -> "ThinkPad ACPI Laptop Extras".
+
+
+Features
+--------
+
+The driver exports two different interfaces to userspace, which can be
+used to access the features it provides.  One is a legacy procfs-based
+interface, which will be removed at some time in the future.  The other
+is a new sysfs-based interface which is not complete yet.
+
+The procfs interface creates the /proc/acpi/ibm directory.  There is a
+file under that directory for each feature it supports.  The procfs
+interface is mostly frozen, and will change very little if at all: it
+will not be extended to add any new functionality in the driver, instead
+all new functionality will be implemented on the sysfs interface.
+
+The sysfs interface tries to blend in the generic Linux sysfs subsystems
+and classes as much as possible.  Since some of these subsystems are not
+yet ready or stabilized, it is expected that this interface will change,
+and any and all userspace programs must deal with it.
+
+
+Notes about the sysfs interface
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Unlike what was done with the procfs interface, correctness when talking
+to the sysfs interfaces will be enforced, as will correctness in the
+thinkpad-acpi's implementation of sysfs interfaces.
+
+Also, any bugs in the thinkpad-acpi sysfs driver code or in the
+thinkpad-acpi's implementation of the sysfs interfaces will be fixed for
+maximum correctness, even if that means changing an interface in
+non-compatible ways.  As these interfaces mature both in the kernel and
+in thinkpad-acpi, such changes should become quite rare.
+
+Applications interfacing to the thinkpad-acpi sysfs interfaces must
+follow all sysfs guidelines and correctly process all errors (the sysfs
+interface makes extensive use of errors).  File descriptors and open /
+close operations to the sysfs inodes must also be properly implemented.
+
+The version of thinkpad-acpi's sysfs interface is exported by the driver
+as a driver attribute (see below).
+
+Sysfs driver attributes are on the driver's sysfs attribute space,
+for 2.6.23+ this is /sys/bus/platform/drivers/thinkpad_acpi/ and
+/sys/bus/platform/drivers/thinkpad_hwmon/
+
+Sysfs device attributes are on the thinkpad_acpi device sysfs attribute
+space, for 2.6.23+ this is /sys/devices/platform/thinkpad_acpi/.
+
+Sysfs device attributes for the sensors and fan are on the
+thinkpad_hwmon device's sysfs attribute space, but you should locate it
+looking for a hwmon device with the name attribute of "thinkpad", or
+better yet, through libsensors. For 4.14+ sysfs attributes were moved to the
+hwmon device (/sys/bus/platform/devices/thinkpad_hwmon/hwmon/hwmon? or
+/sys/class/hwmon/hwmon?).
+
+Driver version
+--------------
+
+procfs: /proc/acpi/ibm/driver
+
+sysfs driver attribute: version
+
+The driver name and version. No commands can be written to this file.
+
+
+Sysfs interface version
+-----------------------
+
+sysfs driver attribute: interface_version
+
+Version of the thinkpad-acpi sysfs interface, as an unsigned long
+(output in hex format: 0xAAAABBCC), where:
+
+	AAAA
+	  - major revision
+	BB
+	  - minor revision
+	CC
+	  - bugfix revision
+
+The sysfs interface version changelog for the driver can be found at the
+end of this document.  Changes to the sysfs interface done by the kernel
+subsystems are not documented here, nor are they tracked by this
+attribute.
+
+Changes to the thinkpad-acpi sysfs interface are only considered
+non-experimental when they are submitted to Linux mainline, at which
+point the changes in this interface are documented and interface_version
+may be updated.  If you are using any thinkpad-acpi features not yet
+sent to mainline for merging, you do so on your own risk: these features
+may disappear, or be implemented in a different and incompatible way by
+the time they are merged in Linux mainline.
+
+Changes that are backwards-compatible by nature (e.g. the addition of
+attributes that do not change the way the other attributes work) do not
+always warrant an update of interface_version.  Therefore, one must
+expect that an attribute might not be there, and deal with it properly
+(an attribute not being there *is* a valid way to make it clear that a
+feature is not available in sysfs).
+
+
+Hot keys
+--------
+
+procfs: /proc/acpi/ibm/hotkey
+
+sysfs device attribute: hotkey_*
+
+In a ThinkPad, the ACPI HKEY handler is responsible for communicating
+some important events and also keyboard hot key presses to the operating
+system.  Enabling the hotkey functionality of thinkpad-acpi signals the
+firmware that such a driver is present, and modifies how the ThinkPad
+firmware will behave in many situations.
+
+The driver enables the HKEY ("hot key") event reporting automatically
+when loaded, and disables it when it is removed.
+
+The driver will report HKEY events in the following format::
+
+	ibm/hotkey HKEY 00000080 0000xxxx
+
+Some of these events refer to hot key presses, but not all of them.
+
+The driver will generate events over the input layer for hot keys and
+radio switches, and over the ACPI netlink layer for other events.  The
+input layer support accepts the standard IOCTLs to remap the keycodes
+assigned to each hot key.
+
+The hot key bit mask allows some control over which hot keys generate
+events.  If a key is "masked" (bit set to 0 in the mask), the firmware
+will handle it.  If it is "unmasked", it signals the firmware that
+thinkpad-acpi would prefer to handle it, if the firmware would be so
+kind to allow it (and it often doesn't!).
+
+Not all bits in the mask can be modified.  Not all bits that can be
+modified do anything.  Not all hot keys can be individually controlled
+by the mask.  Some models do not support the mask at all.  The behaviour
+of the mask is, therefore, highly dependent on the ThinkPad model.
+
+The driver will filter out any unmasked hotkeys, so even if the firmware
+doesn't allow disabling an specific hotkey, the driver will not report
+events for unmasked hotkeys.
+
+Note that unmasking some keys prevents their default behavior.  For
+example, if Fn+F5 is unmasked, that key will no longer enable/disable
+Bluetooth by itself in firmware.
+
+Note also that not all Fn key combinations are supported through ACPI
+depending on the ThinkPad model and firmware version.  On those
+ThinkPads, it is still possible to support some extra hotkeys by
+polling the "CMOS NVRAM" at least 10 times per second.  The driver
+attempts to enables this functionality automatically when required.
+
+procfs notes
+^^^^^^^^^^^^
+
+The following commands can be written to the /proc/acpi/ibm/hotkey file::
+
+	echo 0xffffffff > /proc/acpi/ibm/hotkey -- enable all hot keys
+	echo 0 > /proc/acpi/ibm/hotkey -- disable all possible hot keys
+	... any other 8-hex-digit mask ...
+	echo reset > /proc/acpi/ibm/hotkey -- restore the recommended mask
+
+The following commands have been deprecated and will cause the kernel
+to log a warning::
+
+	echo enable > /proc/acpi/ibm/hotkey -- does nothing
+	echo disable > /proc/acpi/ibm/hotkey -- returns an error
+
+The procfs interface does not support NVRAM polling control.  So as to
+maintain maximum bug-to-bug compatibility, it does not report any masks,
+nor does it allow one to manipulate the hot key mask when the firmware
+does not support masks at all, even if NVRAM polling is in use.
+
+sysfs notes
+^^^^^^^^^^^
+
+	hotkey_bios_enabled:
+		DEPRECATED, WILL BE REMOVED SOON.
+
+		Returns 0.
+
+	hotkey_bios_mask:
+		DEPRECATED, DON'T USE, WILL BE REMOVED IN THE FUTURE.
+
+		Returns the hot keys mask when thinkpad-acpi was loaded.
+		Upon module unload, the hot keys mask will be restored
+		to this value.   This is always 0x80c, because those are
+		the hotkeys that were supported by ancient firmware
+		without mask support.
+
+	hotkey_enable:
+		DEPRECATED, WILL BE REMOVED SOON.
+
+		0: returns -EPERM
+		1: does nothing
+
+	hotkey_mask:
+		bit mask to enable reporting (and depending on
+		the firmware, ACPI event generation) for each hot key
+		(see above).  Returns the current status of the hot keys
+		mask, and allows one to modify it.
+
+	hotkey_all_mask:
+		bit mask that should enable event reporting for all
+		supported hot keys, when echoed to hotkey_mask above.
+		Unless you know which events need to be handled
+		passively (because the firmware *will* handle them
+		anyway), do *not* use hotkey_all_mask.  Use
+		hotkey_recommended_mask, instead. You have been warned.
+
+	hotkey_recommended_mask:
+		bit mask that should enable event reporting for all
+		supported hot keys, except those which are always
+		handled by the firmware anyway.  Echo it to
+		hotkey_mask above, to use.  This is the default mask
+		used by the driver.
+
+	hotkey_source_mask:
+		bit mask that selects which hot keys will the driver
+		poll the NVRAM for.  This is auto-detected by the driver
+		based on the capabilities reported by the ACPI firmware,
+		but it can be overridden at runtime.
+
+		Hot keys whose bits are set in hotkey_source_mask are
+		polled for in NVRAM, and reported as hotkey events if
+		enabled in hotkey_mask.  Only a few hot keys are
+		available through CMOS NVRAM polling.
+
+		Warning: when in NVRAM mode, the volume up/down/mute
+		keys are synthesized according to changes in the mixer,
+		which uses a single volume up or volume down hotkey
+		press to unmute, as per the ThinkPad volume mixer user
+		interface.  When in ACPI event mode, volume up/down/mute
+		events are reported by the firmware and can behave
+		differently (and that behaviour changes with firmware
+		version -- not just with firmware models -- as well as
+		OSI(Linux) state).
+
+	hotkey_poll_freq:
+		frequency in Hz for hot key polling. It must be between
+		0 and 25 Hz.  Polling is only carried out when strictly
+		needed.
+
+		Setting hotkey_poll_freq to zero disables polling, and
+		will cause hot key presses that require NVRAM polling
+		to never be reported.
+
+		Setting hotkey_poll_freq too low may cause repeated
+		pressings of the same hot key to be misreported as a
+		single key press, or to not even be detected at all.
+		The recommended polling frequency is 10Hz.
+
+	hotkey_radio_sw:
+		If the ThinkPad has a hardware radio switch, this
+		attribute will read 0 if the switch is in the "radios
+		disabled" position, and 1 if the switch is in the
+		"radios enabled" position.
+
+		This attribute has poll()/select() support.
+
+	hotkey_tablet_mode:
+		If the ThinkPad has tablet capabilities, this attribute
+		will read 0 if the ThinkPad is in normal mode, and
+		1 if the ThinkPad is in tablet mode.
+
+		This attribute has poll()/select() support.
+
+	wakeup_reason:
+		Set to 1 if the system is waking up because the user
+		requested a bay ejection.  Set to 2 if the system is
+		waking up because the user requested the system to
+		undock.  Set to zero for normal wake-ups or wake-ups
+		due to unknown reasons.
+
+		This attribute has poll()/select() support.
+
+	wakeup_hotunplug_complete:
+		Set to 1 if the system was waken up because of an
+		undock or bay ejection request, and that request
+		was successfully completed.  At this point, it might
+		be useful to send the system back to sleep, at the
+		user's choice.  Refer to HKEY events 0x4003 and
+		0x3003, below.
+
+		This attribute has poll()/select() support.
+
+input layer notes
+^^^^^^^^^^^^^^^^^
+
+A Hot key is mapped to a single input layer EV_KEY event, possibly
+followed by an EV_MSC MSC_SCAN event that shall contain that key's scan
+code.  An EV_SYN event will always be generated to mark the end of the
+event block.
+
+Do not use the EV_MSC MSC_SCAN events to process keys.  They are to be
+used as a helper to remap keys, only.  They are particularly useful when
+remapping KEY_UNKNOWN keys.
+
+The events are available in an input device, with the following id:
+
+	==============  ==============================
+	Bus		BUS_HOST
+	vendor		0x1014 (PCI_VENDOR_ID_IBM)  or
+			0x17aa (PCI_VENDOR_ID_LENOVO)
+	product		0x5054 ("TP")
+	version		0x4101
+	==============  ==============================
+
+The version will have its LSB incremented if the keymap changes in a
+backwards-compatible way.  The MSB shall always be 0x41 for this input
+device.  If the MSB is not 0x41, do not use the device as described in
+this section, as it is either something else (e.g. another input device
+exported by a thinkpad driver, such as HDAPS) or its functionality has
+been changed in a non-backwards compatible way.
+
+Adding other event types for other functionalities shall be considered a
+backwards-compatible change for this input device.
+
+Thinkpad-acpi Hot Key event map (version 0x4101):
+
+=======	=======	==============	==============================================
+ACPI	Scan
+event	code	Key		Notes
+=======	=======	==============	==============================================
+0x1001	0x00	FN+F1		-
+
+0x1002	0x01	FN+F2		IBM: battery (rare)
+				Lenovo: Screen lock
+
+0x1003	0x02	FN+F3		Many IBM models always report
+				this hot key, even with hot keys
+				disabled or with Fn+F3 masked
+				off
+				IBM: screen lock, often turns
+				off the ThinkLight as side-effect
+				Lenovo: battery
+
+0x1004	0x03	FN+F4		Sleep button (ACPI sleep button
+				semantics, i.e. sleep-to-RAM).
+				It always generates some kind
+				of event, either the hot key
+				event or an ACPI sleep button
+				event. The firmware may
+				refuse to generate further FN+F4
+				key presses until a S3 or S4 ACPI
+				sleep cycle is performed or some
+				time passes.
+
+0x1005	0x04	FN+F5		Radio.  Enables/disables
+				the internal Bluetooth hardware
+				and W-WAN card if left in control
+				of the firmware.  Does not affect
+				the WLAN card.
+				Should be used to turn on/off all
+				radios (Bluetooth+W-WAN+WLAN),
+				really.
+
+0x1006	0x05	FN+F6		-
+
+0x1007	0x06	FN+F7		Video output cycle.
+				Do you feel lucky today?
+
+0x1008	0x07	FN+F8		IBM: toggle screen expand
+				Lenovo: configure UltraNav,
+				or toggle screen expand
+
+0x1009	0x08	FN+F9		-
+
+...	...	...		...
+
+0x100B	0x0A	FN+F11		-
+
+0x100C	0x0B	FN+F12		Sleep to disk.  You are always
+				supposed to handle it yourself,
+				either through the ACPI event,
+				or through a hotkey event.
+				The firmware may refuse to
+				generate further FN+F12 key
+				press events until a S3 or S4
+				ACPI sleep cycle is performed,
+				or some time passes.
+
+0x100D	0x0C	FN+BACKSPACE	-
+0x100E	0x0D	FN+INSERT	-
+0x100F	0x0E	FN+DELETE	-
+
+0x1010	0x0F	FN+HOME		Brightness up.  This key is
+				always handled by the firmware
+				in IBM ThinkPads, even when
+				unmasked.  Just leave it alone.
+				For Lenovo ThinkPads with a new
+				BIOS, it has to be handled either
+				by the ACPI OSI, or by userspace.
+				The driver does the right thing,
+				never mess with this.
+0x1011	0x10	FN+END		Brightness down.  See brightness
+				up for details.
+
+0x1012	0x11	FN+PGUP		ThinkLight toggle.  This key is
+				always handled by the firmware,
+				even when unmasked.
+
+0x1013	0x12	FN+PGDOWN	-
+
+0x1014	0x13	FN+SPACE	Zoom key
+
+0x1015	0x14	VOLUME UP	Internal mixer volume up. This
+				key is always handled by the
+				firmware, even when unmasked.
+				NOTE: Lenovo seems to be changing
+				this.
+0x1016	0x15	VOLUME DOWN	Internal mixer volume up. This
+				key is always handled by the
+				firmware, even when unmasked.
+				NOTE: Lenovo seems to be changing
+				this.
+0x1017	0x16	MUTE		Mute internal mixer. This
+				key is always handled by the
+				firmware, even when unmasked.
+
+0x1018	0x17	THINKPAD	ThinkPad/Access IBM/Lenovo key
+
+0x1019	0x18	unknown
+
+...	...	...
+
+0x1020	0x1F	unknown
+=======	=======	==============	==============================================
+
+The ThinkPad firmware does not allow one to differentiate when most hot
+keys are pressed or released (either that, or we don't know how to, yet).
+For these keys, the driver generates a set of events for a key press and
+immediately issues the same set of events for a key release.  It is
+unknown by the driver if the ThinkPad firmware triggered these events on
+hot key press or release, but the firmware will do it for either one, not
+both.
+
+If a key is mapped to KEY_RESERVED, it generates no input events at all.
+If a key is mapped to KEY_UNKNOWN, it generates an input event that
+includes an scan code.  If a key is mapped to anything else, it will
+generate input device EV_KEY events.
+
+In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
+events for switches:
+
+==============	==============================================
+SW_RFKILL_ALL	T60 and later hardware rfkill rocker switch
+SW_TABLET_MODE	Tablet ThinkPads HKEY events 0x5009 and 0x500A
+==============	==============================================
+
+Non hotkey ACPI HKEY event map
+------------------------------
+
+Events that are never propagated by the driver:
+
+======		==================================================
+0x2304		System is waking up from suspend to undock
+0x2305		System is waking up from suspend to eject bay
+0x2404		System is waking up from hibernation to undock
+0x2405		System is waking up from hibernation to eject bay
+0x5001		Lid closed
+0x5002		Lid opened
+0x5009		Tablet swivel: switched to tablet mode
+0x500A		Tablet swivel: switched to normal mode
+0x5010		Brightness level changed/control event
+0x6000		KEYBOARD: Numlock key pressed
+0x6005		KEYBOARD: Fn key pressed (TO BE VERIFIED)
+0x7000		Radio Switch may have changed state
+======		==================================================
+
+
+Events that are propagated by the driver to userspace:
+
+======		=====================================================
+0x2313		ALARM: System is waking up from suspend because
+		the battery is nearly empty
+0x2413		ALARM: System is waking up from hibernation because
+		the battery is nearly empty
+0x3003		Bay ejection (see 0x2x05) complete, can sleep again
+0x3006		Bay hotplug request (hint to power up SATA link when
+		the optical drive tray is ejected)
+0x4003		Undocked (see 0x2x04), can sleep again
+0x4010		Docked into hotplug port replicator (non-ACPI dock)
+0x4011		Undocked from hotplug port replicator (non-ACPI dock)
+0x500B		Tablet pen inserted into its storage bay
+0x500C		Tablet pen removed from its storage bay
+0x6011		ALARM: battery is too hot
+0x6012		ALARM: battery is extremely hot
+0x6021		ALARM: a sensor is too hot
+0x6022		ALARM: a sensor is extremely hot
+0x6030		System thermal table changed
+0x6032		Thermal Control command set completion  (DYTC, Windows)
+0x6040		Nvidia Optimus/AC adapter related (TO BE VERIFIED)
+0x60C0		X1 Yoga 2016, Tablet mode status changed
+0x60F0		Thermal Transformation changed (GMTS, Windows)
+======		=====================================================
+
+Battery nearly empty alarms are a last resort attempt to get the
+operating system to hibernate or shutdown cleanly (0x2313), or shutdown
+cleanly (0x2413) before power is lost.  They must be acted upon, as the
+wake up caused by the firmware will have negated most safety nets...
+
+When any of the "too hot" alarms happen, according to Lenovo the user
+should suspend or hibernate the laptop (and in the case of battery
+alarms, unplug the AC adapter) to let it cool down.  These alarms do
+signal that something is wrong, they should never happen on normal
+operating conditions.
+
+The "extremely hot" alarms are emergencies.  According to Lenovo, the
+operating system is to force either an immediate suspend or hibernate
+cycle, or a system shutdown.  Obviously, something is very wrong if this
+happens.
+
+
+Brightness hotkey notes
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Don't mess with the brightness hotkeys in a Thinkpad.  If you want
+notifications for OSD, use the sysfs backlight class event support.
+
+The driver will issue KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN events
+automatically for the cases were userspace has to do something to
+implement brightness changes.  When you override these events, you will
+either fail to handle properly the ThinkPads that require explicit
+action to change backlight brightness, or the ThinkPads that require
+that no action be taken to work properly.
+
+
+Bluetooth
+---------
+
+procfs: /proc/acpi/ibm/bluetooth
+
+sysfs device attribute: bluetooth_enable (deprecated)
+
+sysfs rfkill class: switch "tpacpi_bluetooth_sw"
+
+This feature shows the presence and current state of a ThinkPad
+Bluetooth device in the internal ThinkPad CDC slot.
+
+If the ThinkPad supports it, the Bluetooth state is stored in NVRAM,
+so it is kept across reboots and power-off.
+
+Procfs notes
+^^^^^^^^^^^^
+
+If Bluetooth is installed, the following commands can be used::
+
+	echo enable > /proc/acpi/ibm/bluetooth
+	echo disable > /proc/acpi/ibm/bluetooth
+
+Sysfs notes
+^^^^^^^^^^^
+
+	If the Bluetooth CDC card is installed, it can be enabled /
+	disabled through the "bluetooth_enable" thinkpad-acpi device
+	attribute, and its current status can also be queried.
+
+	enable:
+
+		- 0: disables Bluetooth / Bluetooth is disabled
+		- 1: enables Bluetooth / Bluetooth is enabled.
+
+	Note: this interface has been superseded by the	generic rfkill
+	class.  It has been deprecated, and it will be removed in year
+	2010.
+
+	rfkill controller switch "tpacpi_bluetooth_sw": refer to
+	Documentation/rfkill.txt for details.
+
+
+Video output control -- /proc/acpi/ibm/video
+--------------------------------------------
+
+This feature allows control over the devices used for video output -
+LCD, CRT or DVI (if available). The following commands are available::
+
+	echo lcd_enable > /proc/acpi/ibm/video
+	echo lcd_disable > /proc/acpi/ibm/video
+	echo crt_enable > /proc/acpi/ibm/video
+	echo crt_disable > /proc/acpi/ibm/video
+	echo dvi_enable > /proc/acpi/ibm/video
+	echo dvi_disable > /proc/acpi/ibm/video
+	echo auto_enable > /proc/acpi/ibm/video
+	echo auto_disable > /proc/acpi/ibm/video
+	echo expand_toggle > /proc/acpi/ibm/video
+	echo video_switch > /proc/acpi/ibm/video
+
+NOTE:
+  Access to this feature is restricted to processes owning the
+  CAP_SYS_ADMIN capability for safety reasons, as it can interact badly
+  enough with some versions of X.org to crash it.
+
+Each video output device can be enabled or disabled individually.
+Reading /proc/acpi/ibm/video shows the status of each device.
+
+Automatic video switching can be enabled or disabled.  When automatic
+video switching is enabled, certain events (e.g. opening the lid,
+docking or undocking) cause the video output device to change
+automatically. While this can be useful, it also causes flickering
+and, on the X40, video corruption. By disabling automatic switching,
+the flickering or video corruption can be avoided.
+
+The video_switch command cycles through the available video outputs
+(it simulates the behavior of Fn-F7).
+
+Video expansion can be toggled through this feature. This controls
+whether the display is expanded to fill the entire LCD screen when a
+mode with less than full resolution is used. Note that the current
+video expansion status cannot be determined through this feature.
+
+Note that on many models (particularly those using Radeon graphics
+chips) the X driver configures the video card in a way which prevents
+Fn-F7 from working. This also disables the video output switching
+features of this driver, as it uses the same ACPI methods as
+Fn-F7. Video switching on the console should still work.
+
+UPDATE: refer to https://bugs.freedesktop.org/show_bug.cgi?id=2000
+
+
+ThinkLight control
+------------------
+
+procfs: /proc/acpi/ibm/light
+
+sysfs attributes: as per LED class, for the "tpacpi::thinklight" LED
+
+procfs notes
+^^^^^^^^^^^^
+
+The ThinkLight status can be read and set through the procfs interface.  A
+few models which do not make the status available will show the ThinkLight
+status as "unknown". The available commands are::
+
+	echo on  > /proc/acpi/ibm/light
+	echo off > /proc/acpi/ibm/light
+
+sysfs notes
+^^^^^^^^^^^
+
+The ThinkLight sysfs interface is documented by the LED class
+documentation, in Documentation/leds/leds-class.rst.  The ThinkLight LED name
+is "tpacpi::thinklight".
+
+Due to limitations in the sysfs LED class, if the status of the ThinkLight
+cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
+It is impossible to know if the status returned through sysfs is valid.
+
+
+CMOS/UCMS control
+-----------------
+
+procfs: /proc/acpi/ibm/cmos
+
+sysfs device attribute: cmos_command
+
+This feature is mostly used internally by the ACPI firmware to keep the legacy
+CMOS NVRAM bits in sync with the current machine state, and to record this
+state so that the ThinkPad will retain such settings across reboots.
+
+Some of these commands actually perform actions in some ThinkPad models, but
+this is expected to disappear more and more in newer models.  As an example, in
+a T43 and in a X40, commands 12 and 13 still control the ThinkLight state for
+real, but commands 0 to 2 don't control the mixer anymore (they have been
+phased out) and just update the NVRAM.
+
+The range of valid cmos command numbers is 0 to 21, but not all have an
+effect and the behavior varies from model to model.  Here is the behavior
+on the X40 (tpb is the ThinkPad Buttons utility):
+
+	- 0 - Related to "Volume down" key press
+	- 1 - Related to "Volume up" key press
+	- 2 - Related to "Mute on" key press
+	- 3 - Related to "Access IBM" key press
+	- 4 - Related to "LCD brightness up" key press
+	- 5 - Related to "LCD brightness down" key press
+	- 11 - Related to "toggle screen expansion" key press/function
+	- 12 - Related to "ThinkLight on"
+	- 13 - Related to "ThinkLight off"
+	- 14 - Related to "ThinkLight" key press (toggle ThinkLight)
+
+The cmos command interface is prone to firmware split-brain problems, as
+in newer ThinkPads it is just a compatibility layer.  Do not use it, it is
+exported just as a debug tool.
+
+
+LED control
+-----------
+
+procfs: /proc/acpi/ibm/led
+sysfs attributes: as per LED class, see below for names
+
+Some of the LED indicators can be controlled through this feature.  On
+some older ThinkPad models, it is possible to query the status of the
+LED indicators as well.  Newer ThinkPads cannot query the real status
+of the LED indicators.
+
+Because misuse of the LEDs could induce an unaware user to perform
+dangerous actions (like undocking or ejecting a bay device while the
+buses are still active), or mask an important alarm (such as a nearly
+empty battery, or a broken battery), access to most LEDs is
+restricted.
+
+Unrestricted access to all LEDs requires that thinkpad-acpi be
+compiled with the CONFIG_THINKPAD_ACPI_UNSAFE_LEDS option enabled.
+Distributions must never enable this option.  Individual users that
+are aware of the consequences are welcome to enabling it.
+
+Audio mute and microphone mute LEDs are supported, but currently not
+visible to userspace. They are used by the snd-hda-intel audio driver.
+
+procfs notes
+^^^^^^^^^^^^
+
+The available commands are::
+
+	echo '<LED number> on' >/proc/acpi/ibm/led
+	echo '<LED number> off' >/proc/acpi/ibm/led
+	echo '<LED number> blink' >/proc/acpi/ibm/led
+
+The <LED number> range is 0 to 15. The set of LEDs that can be
+controlled varies from model to model. Here is the common ThinkPad
+mapping:
+
+	- 0 - power
+	- 1 - battery (orange)
+	- 2 - battery (green)
+	- 3 - UltraBase/dock
+	- 4 - UltraBay
+	- 5 - UltraBase battery slot
+	- 6 - (unknown)
+	- 7 - standby
+	- 8 - dock status 1
+	- 9 - dock status 2
+	- 10, 11 - (unknown)
+	- 12 - thinkvantage
+	- 13, 14, 15 - (unknown)
+
+All of the above can be turned on and off and can be made to blink.
+
+sysfs notes
+^^^^^^^^^^^
+
+The ThinkPad LED sysfs interface is described in detail by the LED class
+documentation, in Documentation/leds/leds-class.rst.
+
+The LEDs are named (in LED ID order, from 0 to 12):
+"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
+"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
+"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
+"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
+"tpacpi::thinkvantage".
+
+Due to limitations in the sysfs LED class, if the status of the LED
+indicators cannot be read due to an error, thinkpad-acpi will report it as
+a brightness of zero (same as LED off).
+
+If the thinkpad firmware doesn't support reading the current status,
+trying to read the current LED brightness will just return whatever
+brightness was last written to that attribute.
+
+These LEDs can blink using hardware acceleration.  To request that a
+ThinkPad indicator LED should blink in hardware accelerated mode, use the
+"timer" trigger, and leave the delay_on and delay_off parameters set to
+zero (to request hardware acceleration autodetection).
+
+LEDs that are known not to exist in a given ThinkPad model are not
+made available through the sysfs interface.  If you have a dock and you
+notice there are LEDs listed for your ThinkPad that do not exist (and
+are not in the dock), or if you notice that there are missing LEDs,
+a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
+
+
+ACPI sounds -- /proc/acpi/ibm/beep
+----------------------------------
+
+The BEEP method is used internally by the ACPI firmware to provide
+audible alerts in various situations. This feature allows the same
+sounds to be triggered manually.
+
+The commands are non-negative integer numbers::
+
+	echo <number> >/proc/acpi/ibm/beep
+
+The valid <number> range is 0 to 17. Not all numbers trigger sounds
+and the sounds vary from model to model. Here is the behavior on the
+X40:
+
+	- 0 - stop a sound in progress (but use 17 to stop 16)
+	- 2 - two beeps, pause, third beep ("low battery")
+	- 3 - single beep
+	- 4 - high, followed by low-pitched beep ("unable")
+	- 5 - single beep
+	- 6 - very high, followed by high-pitched beep ("AC/DC")
+	- 7 - high-pitched beep
+	- 9 - three short beeps
+	- 10 - very long beep
+	- 12 - low-pitched beep
+	- 15 - three high-pitched beeps repeating constantly, stop with 0
+	- 16 - one medium-pitched beep repeating constantly, stop with 17
+	- 17 - stop 16
+
+
+Temperature sensors
+-------------------
+
+procfs: /proc/acpi/ibm/thermal
+
+sysfs device attributes: (hwmon "thinkpad") temp*_input
+
+Most ThinkPads include six or more separate temperature sensors but only
+expose the CPU temperature through the standard ACPI methods.  This
+feature shows readings from up to eight different sensors on older
+ThinkPads, and up to sixteen different sensors on newer ThinkPads.
+
+For example, on the X40, a typical output may be:
+
+temperatures:
+	42 42 45 41 36 -128 33 -128
+
+On the T43/p, a typical output may be:
+
+temperatures:
+	48 48 36 52 38 -128 31 -128 48 52 48 -128 -128 -128 -128 -128
+
+The mapping of thermal sensors to physical locations varies depending on
+system-board model (and thus, on ThinkPad model).
+
+http://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
+tries to track down these locations for various models.
+
+Most (newer?) models seem to follow this pattern:
+
+- 1:  CPU
+- 2:  (depends on model)
+- 3:  (depends on model)
+- 4:  GPU
+- 5:  Main battery: main sensor
+- 6:  Bay battery: main sensor
+- 7:  Main battery: secondary sensor
+- 8:  Bay battery: secondary sensor
+- 9-15: (depends on model)
+
+For the R51 (source: Thomas Gruber):
+
+- 2:  Mini-PCI
+- 3:  Internal HDD
+
+For the T43, T43/p (source: Shmidoax/Thinkwiki.org)
+http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
+
+- 2:  System board, left side (near PCMCIA slot), reported as HDAPS temp
+- 3:  PCMCIA slot
+- 9:  MCH (northbridge) to DRAM Bus
+- 10: Clock-generator, mini-pci card and ICH (southbridge), under Mini-PCI
+      card, under touchpad
+- 11: Power regulator, underside of system board, below F2 key
+
+The A31 has a very atypical layout for the thermal sensors
+(source: Milos Popovic, http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
+
+- 1:  CPU
+- 2:  Main Battery: main sensor
+- 3:  Power Converter
+- 4:  Bay Battery: main sensor
+- 5:  MCH (northbridge)
+- 6:  PCMCIA/ambient
+- 7:  Main Battery: secondary sensor
+- 8:  Bay Battery: secondary sensor
+
+
+Procfs notes
+^^^^^^^^^^^^
+
+	Readings from sensors that are not available return -128.
+	No commands can be written to this file.
+
+Sysfs notes
+^^^^^^^^^^^
+
+	Sensors that are not available return the ENXIO error.  This
+	status may change at runtime, as there are hotplug thermal
+	sensors, like those inside the batteries and docks.
+
+	thinkpad-acpi thermal sensors are reported through the hwmon
+	subsystem, and follow all of the hwmon guidelines at
+	Documentation/hwmon.
+
+EXPERIMENTAL: Embedded controller register dump
+-----------------------------------------------
+
+This feature is not included in the thinkpad driver anymore.
+Instead the EC can be accessed through /sys/kernel/debug/ec with
+a userspace tool which can be found here:
+ftp://ftp.suse.com/pub/people/trenn/sources/ec
+
+Use it to determine the register holding the fan
+speed on some models. To do that, do the following:
+
+	- make sure the battery is fully charged
+	- make sure the fan is running
+	- use above mentioned tool to read out the EC
+
+Often fan and temperature values vary between
+readings. Since temperatures don't change vary fast, you can take
+several quick dumps to eliminate them.
+
+You can use a similar method to figure out the meaning of other
+embedded controller registers - e.g. make sure nothing else changes
+except the charging or discharging battery to determine which
+registers contain the current battery capacity, etc. If you experiment
+with this, do send me your results (including some complete dumps with
+a description of the conditions when they were taken.)
+
+
+LCD brightness control
+----------------------
+
+procfs: /proc/acpi/ibm/brightness
+
+sysfs backlight device "thinkpad_screen"
+
+This feature allows software control of the LCD brightness on ThinkPad
+models which don't have a hardware brightness slider.
+
+It has some limitations: the LCD backlight cannot be actually turned
+on or off by this interface, it just controls the backlight brightness
+level.
+
+On IBM (and some of the earlier Lenovo) ThinkPads, the backlight control
+has eight brightness levels, ranging from 0 to 7.  Some of the levels
+may not be distinct.  Later Lenovo models that implement the ACPI
+display backlight brightness control methods have 16 levels, ranging
+from 0 to 15.
+
+For IBM ThinkPads, there are two interfaces to the firmware for direct
+brightness control, EC and UCMS (or CMOS).  To select which one should be
+used, use the brightness_mode module parameter: brightness_mode=1 selects
+EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
+mode with NVRAM backing (so that brightness changes are remembered across
+shutdown/reboot).
+
+The driver tries to select which interface to use from a table of
+defaults for each ThinkPad model.  If it makes a wrong choice, please
+report this as a bug, so that we can fix it.
+
+Lenovo ThinkPads only support brightness_mode=2 (UCMS).
+
+When display backlight brightness controls are available through the
+standard ACPI interface, it is best to use it instead of this direct
+ThinkPad-specific interface.  The driver will disable its native
+backlight brightness control interface if it detects that the standard
+ACPI interface is available in the ThinkPad.
+
+If you want to use the thinkpad-acpi backlight brightness control
+instead of the generic ACPI video backlight brightness control for some
+reason, you should use the acpi_backlight=vendor kernel parameter.
+
+The brightness_enable module parameter can be used to control whether
+the LCD brightness control feature will be enabled when available.
+brightness_enable=0 forces it to be disabled.  brightness_enable=1
+forces it to be enabled when available, even if the standard ACPI
+interface is also available.
+
+Procfs notes
+^^^^^^^^^^^^
+
+The available commands are::
+
+	echo up   >/proc/acpi/ibm/brightness
+	echo down >/proc/acpi/ibm/brightness
+	echo 'level <level>' >/proc/acpi/ibm/brightness
+
+Sysfs notes
+^^^^^^^^^^^
+
+The interface is implemented through the backlight sysfs class, which is
+poorly documented at this time.
+
+Locate the thinkpad_screen device under /sys/class/backlight, and inside
+it there will be the following attributes:
+
+	max_brightness:
+		Reads the maximum brightness the hardware can be set to.
+		The minimum is always zero.
+
+	actual_brightness:
+		Reads what brightness the screen is set to at this instant.
+
+	brightness:
+		Writes request the driver to change brightness to the
+		given value.  Reads will tell you what brightness the
+		driver is trying to set the display to when "power" is set
+		to zero and the display has not been dimmed by a kernel
+		power management event.
+
+	power:
+		power management mode, where 0 is "display on", and 1 to 3
+		will dim the display backlight to brightness level 0
+		because thinkpad-acpi cannot really turn the backlight
+		off.  Kernel power management events can temporarily
+		increase the current power management level, i.e. they can
+		dim the display.
+
+
+WARNING:
+
+    Whatever you do, do NOT ever call thinkpad-acpi backlight-level change
+    interface and the ACPI-based backlight level change interface
+    (available on newer BIOSes, and driven by the Linux ACPI video driver)
+    at the same time.  The two will interact in bad ways, do funny things,
+    and maybe reduce the life of the backlight lamps by needlessly kicking
+    its level up and down at every change.
+
+
+Volume control (Console Audio control)
+--------------------------------------
+
+procfs: /proc/acpi/ibm/volume
+
+ALSA: "ThinkPad Console Audio Control", default ID: "ThinkPadEC"
+
+NOTE: by default, the volume control interface operates in read-only
+mode, as it is supposed to be used for on-screen-display purposes.
+The read/write mode can be enabled through the use of the
+"volume_control=1" module parameter.
+
+NOTE: distros are urged to not enable volume_control by default, this
+should be done by the local admin only.  The ThinkPad UI is for the
+console audio control to be done through the volume keys only, and for
+the desktop environment to just provide on-screen-display feedback.
+Software volume control should be done only in the main AC97/HDA
+mixer.
+
+
+About the ThinkPad Console Audio control
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ThinkPads have a built-in amplifier and muting circuit that drives the
+console headphone and speakers.  This circuit is after the main AC97
+or HDA mixer in the audio path, and under exclusive control of the
+firmware.
+
+ThinkPads have three special hotkeys to interact with the console
+audio control: volume up, volume down and mute.
+
+It is worth noting that the normal way the mute function works (on
+ThinkPads that do not have a "mute LED") is:
+
+1. Press mute to mute.  It will *always* mute, you can press it as
+   many times as you want, and the sound will remain mute.
+
+2. Press either volume key to unmute the ThinkPad (it will _not_
+   change the volume, it will just unmute).
+
+This is a very superior design when compared to the cheap software-only
+mute-toggle solution found on normal consumer laptops:  you can be
+absolutely sure the ThinkPad will not make noise if you press the mute
+button, no matter the previous state.
+
+The IBM ThinkPads, and the earlier Lenovo ThinkPads have variable-gain
+amplifiers driving the speakers and headphone output, and the firmware
+also handles volume control for the headphone and speakers on these
+ThinkPads without any help from the operating system (this volume
+control stage exists after the main AC97 or HDA mixer in the audio
+path).
+
+The newer Lenovo models only have firmware mute control, and depend on
+the main HDA mixer to do volume control (which is done by the operating
+system).  In this case, the volume keys are filtered out for unmute
+key press (there are some firmware bugs in this area) and delivered as
+normal key presses to the operating system (thinkpad-acpi is not
+involved).
+
+
+The ThinkPad-ACPI volume control
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The preferred way to interact with the Console Audio control is the
+ALSA interface.
+
+The legacy procfs interface allows one to read the current state,
+and if volume control is enabled, accepts the following commands::
+
+	echo up   >/proc/acpi/ibm/volume
+	echo down >/proc/acpi/ibm/volume
+	echo mute >/proc/acpi/ibm/volume
+	echo unmute >/proc/acpi/ibm/volume
+	echo 'level <level>' >/proc/acpi/ibm/volume
+
+The <level> number range is 0 to 14 although not all of them may be
+distinct. To unmute the volume after the mute command, use either the
+up or down command (the level command will not unmute the volume), or
+the unmute command.
+
+You can use the volume_capabilities parameter to tell the driver
+whether your thinkpad has volume control or mute-only control:
+volume_capabilities=1 for mixers with mute and volume control,
+volume_capabilities=2 for mixers with only mute control.
+
+If the driver misdetects the capabilities for your ThinkPad model,
+please report this to ibm-acpi-devel@lists.sourceforge.net, so that we
+can update the driver.
+
+There are two strategies for volume control.  To select which one
+should be used, use the volume_mode module parameter: volume_mode=1
+selects EC mode, and volume_mode=3 selects EC mode with NVRAM backing
+(so that volume/mute changes are remembered across shutdown/reboot).
+
+The driver will operate in volume_mode=3 by default. If that does not
+work well on your ThinkPad model, please report this to
+ibm-acpi-devel@lists.sourceforge.net.
+
+The driver supports the standard ALSA module parameters.  If the ALSA
+mixer is disabled, the driver will disable all volume functionality.
+
+
+Fan control and monitoring: fan speed, fan enable/disable
+---------------------------------------------------------
+
+procfs: /proc/acpi/ibm/fan
+
+sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1, pwm1_enable, fan2_input
+
+sysfs hwmon driver attributes: fan_watchdog
+
+NOTE NOTE NOTE:
+   fan control operations are disabled by default for
+   safety reasons.  To enable them, the module parameter "fan_control=1"
+   must be given to thinkpad-acpi.
+
+This feature attempts to show the current fan speed, control mode and
+other fan data that might be available.  The speed is read directly
+from the hardware registers of the embedded controller.  This is known
+to work on later R, T, X and Z series ThinkPads but may show a bogus
+value on other models.
+
+Some Lenovo ThinkPads support a secondary fan.  This fan cannot be
+controlled separately, it shares the main fan control.
+
+Fan levels
+^^^^^^^^^^
+
+Most ThinkPad fans work in "levels" at the firmware interface.  Level 0
+stops the fan.  The higher the level, the higher the fan speed, although
+adjacent levels often map to the same fan speed.  7 is the highest
+level, where the fan reaches the maximum recommended speed.
+
+Level "auto" means the EC changes the fan level according to some
+internal algorithm, usually based on readings from the thermal sensors.
+
+There is also a "full-speed" level, also known as "disengaged" level.
+In this level, the EC disables the speed-locked closed-loop fan control,
+and drives the fan as fast as it can go, which might exceed hardware
+limits, so use this level with caution.
+
+The fan usually ramps up or down slowly from one speed to another, and
+it is normal for the EC to take several seconds to react to fan
+commands.  The full-speed level may take up to two minutes to ramp up to
+maximum speed, and in some ThinkPads, the tachometer readings go stale
+while the EC is transitioning to the full-speed level.
+
+WARNING WARNING WARNING: do not leave the fan disabled unless you are
+monitoring all of the temperature sensor readings and you are ready to
+enable it if necessary to avoid overheating.
+
+An enabled fan in level "auto" may stop spinning if the EC decides the
+ThinkPad is cool enough and doesn't need the extra airflow.  This is
+normal, and the EC will spin the fan up if the various thermal readings
+rise too much.
+
+On the X40, this seems to depend on the CPU and HDD temperatures.
+Specifically, the fan is turned on when either the CPU temperature
+climbs to 56 degrees or the HDD temperature climbs to 46 degrees.  The
+fan is turned off when the CPU temperature drops to 49 degrees and the
+HDD temperature drops to 41 degrees.  These thresholds cannot
+currently be controlled.
+
+The ThinkPad's ACPI DSDT code will reprogram the fan on its own when
+certain conditions are met.  It will override any fan programming done
+through thinkpad-acpi.
+
+The thinkpad-acpi kernel driver can be programmed to revert the fan
+level to a safe setting if userspace does not issue one of the procfs
+fan commands: "enable", "disable", "level" or "watchdog", or if there
+are no writes to pwm1_enable (or to pwm1 *if and only if* pwm1_enable is
+set to 1, manual mode) within a configurable amount of time of up to
+120 seconds.  This functionality is called fan safety watchdog.
+
+Note that the watchdog timer stops after it enables the fan.  It will be
+rearmed again automatically (using the same interval) when one of the
+above mentioned fan commands is received.  The fan watchdog is,
+therefore, not suitable to protect against fan mode changes made through
+means other than the "enable", "disable", and "level" procfs fan
+commands, or the hwmon fan control sysfs interface.
+
+Procfs notes
+^^^^^^^^^^^^
+
+The fan may be enabled or disabled with the following commands::
+
+	echo enable  >/proc/acpi/ibm/fan
+	echo disable >/proc/acpi/ibm/fan
+
+Placing a fan on level 0 is the same as disabling it.  Enabling a fan
+will try to place it in a safe level if it is too slow or disabled.
+
+The fan level can be controlled with the command::
+
+	echo 'level <level>' > /proc/acpi/ibm/fan
+
+Where <level> is an integer from 0 to 7, or one of the words "auto" or
+"full-speed" (without the quotes).  Not all ThinkPads support the "auto"
+and "full-speed" levels.  The driver accepts "disengaged" as an alias for
+"full-speed", and reports it as "disengaged" for backwards
+compatibility.
+
+On the X31 and X40 (and ONLY on those models), the fan speed can be
+controlled to a certain degree.  Once the fan is running, it can be
+forced to run faster or slower with the following command::
+
+	echo 'speed <speed>' > /proc/acpi/ibm/fan
+
+The sustainable range of fan speeds on the X40 appears to be from about
+3700 to about 7350. Values outside this range either do not have any
+effect or the fan speed eventually settles somewhere in that range.  The
+fan cannot be stopped or started with this command.  This functionality
+is incomplete, and not available through the sysfs interface.
+
+To program the safety watchdog, use the "watchdog" command::
+
+	echo 'watchdog <interval in seconds>' > /proc/acpi/ibm/fan
+
+If you want to disable the watchdog, use 0 as the interval.
+
+Sysfs notes
+^^^^^^^^^^^
+
+The sysfs interface follows the hwmon subsystem guidelines for the most
+part, and the exception is the fan safety watchdog.
+
+Writes to any of the sysfs attributes may return the EINVAL error if
+that operation is not supported in a given ThinkPad or if the parameter
+is out-of-bounds, and EPERM if it is forbidden.  They may also return
+EINTR (interrupted system call), and EIO (I/O error while trying to talk
+to the firmware).
+
+Features not yet implemented by the driver return ENOSYS.
+
+hwmon device attribute pwm1_enable:
+	- 0: PWM offline (fan is set to full-speed mode)
+	- 1: Manual PWM control (use pwm1 to set fan level)
+	- 2: Hardware PWM control (EC "auto" mode)
+	- 3: reserved (Software PWM control, not implemented yet)
+
+	Modes 0 and 2 are not supported by all ThinkPads, and the
+	driver is not always able to detect this.  If it does know a
+	mode is unsupported, it will return -EINVAL.
+
+hwmon device attribute pwm1:
+	Fan level, scaled from the firmware values of 0-7 to the hwmon
+	scale of 0-255.  0 means fan stopped, 255 means highest normal
+	speed (level 7).
+
+	This attribute only commands the fan if pmw1_enable is set to 1
+	(manual PWM control).
+
+hwmon device attribute fan1_input:
+	Fan tachometer reading, in RPM.  May go stale on certain
+	ThinkPads while the EC transitions the PWM to offline mode,
+	which can take up to two minutes.  May return rubbish on older
+	ThinkPads.
+
+hwmon device attribute fan2_input:
+	Fan tachometer reading, in RPM, for the secondary fan.
+	Available only on some ThinkPads.  If the secondary fan is
+	not installed, will always read 0.
+
+hwmon driver attribute fan_watchdog:
+	Fan safety watchdog timer interval, in seconds.  Minimum is
+	1 second, maximum is 120 seconds.  0 disables the watchdog.
+
+To stop the fan: set pwm1 to zero, and pwm1_enable to 1.
+
+To start the fan in a safe mode: set pwm1_enable to 2.  If that fails
+with EINVAL, try to set pwm1_enable to 1 and pwm1 to at least 128 (255
+would be the safest choice, though).
+
+
+WAN
+---
+
+procfs: /proc/acpi/ibm/wan
+
+sysfs device attribute: wwan_enable (deprecated)
+
+sysfs rfkill class: switch "tpacpi_wwan_sw"
+
+This feature shows the presence and current state of the built-in
+Wireless WAN device.
+
+If the ThinkPad supports it, the WWAN state is stored in NVRAM,
+so it is kept across reboots and power-off.
+
+It was tested on a Lenovo ThinkPad X60. It should probably work on other
+ThinkPad models which come with this module installed.
+
+Procfs notes
+^^^^^^^^^^^^
+
+If the W-WAN card is installed, the following commands can be used::
+
+	echo enable > /proc/acpi/ibm/wan
+	echo disable > /proc/acpi/ibm/wan
+
+Sysfs notes
+^^^^^^^^^^^
+
+	If the W-WAN card is installed, it can be enabled /
+	disabled through the "wwan_enable" thinkpad-acpi device
+	attribute, and its current status can also be queried.
+
+	enable:
+		- 0: disables WWAN card / WWAN card is disabled
+		- 1: enables WWAN card / WWAN card is enabled.
+
+	Note: this interface has been superseded by the	generic rfkill
+	class.  It has been deprecated, and it will be removed in year
+	2010.
+
+	rfkill controller switch "tpacpi_wwan_sw": refer to
+	Documentation/rfkill.txt for details.
+
+
+EXPERIMENTAL: UWB
+-----------------
+
+This feature is considered EXPERIMENTAL because it has not been extensively
+tested and validated in various ThinkPad models yet.  The feature may not
+work as expected. USE WITH CAUTION! To use this feature, you need to supply
+the experimental=1 parameter when loading the module.
+
+sysfs rfkill class: switch "tpacpi_uwb_sw"
+
+This feature exports an rfkill controller for the UWB device, if one is
+present and enabled in the BIOS.
+
+Sysfs notes
+^^^^^^^^^^^
+
+	rfkill controller switch "tpacpi_uwb_sw": refer to
+	Documentation/rfkill.txt for details.
+
+Adaptive keyboard
+-----------------
+
+sysfs device attribute: adaptive_kbd_mode
+
+This sysfs attribute controls the keyboard "face" that will be shown on the
+Lenovo X1 Carbon 2nd gen (2014)'s adaptive keyboard. The value can be read
+and set.
+
+- 1 = Home mode
+- 2 = Web-browser mode
+- 3 = Web-conference mode
+- 4 = Function mode
+- 5 = Layflat mode
+
+For more details about which buttons will appear depending on the mode, please
+review the laptop's user guide:
+http://www.lenovo.com/shop/americas/content/user_guides/x1carbon_2_ug_en.pdf
+
+Multiple Commands, Module Parameters
+------------------------------------
+
+Multiple commands can be written to the proc files in one shot by
+separating them with commas, for example::
+
+	echo enable,0xffff > /proc/acpi/ibm/hotkey
+	echo lcd_disable,crt_enable > /proc/acpi/ibm/video
+
+Commands can also be specified when loading the thinkpad-acpi module,
+for example::
+
+	modprobe thinkpad_acpi hotkey=enable,0xffff video=auto_disable
+
+
+Enabling debugging output
+-------------------------
+
+The module takes a debug parameter which can be used to selectively
+enable various classes of debugging output, for example::
+
+	 modprobe thinkpad_acpi debug=0xffff
+
+will enable all debugging output classes.  It takes a bitmask, so
+to enable more than one output class, just add their values.
+
+	=============		======================================
+	Debug bitmask		Description
+	=============		======================================
+	0x8000			Disclose PID of userspace programs
+				accessing some functions of the driver
+	0x0001			Initialization and probing
+	0x0002			Removal
+	0x0004			RF Transmitter control (RFKILL)
+				(bluetooth, WWAN, UWB...)
+	0x0008			HKEY event interface, hotkeys
+	0x0010			Fan control
+	0x0020			Backlight brightness
+	0x0040			Audio mixer/volume control
+	=============		======================================
+
+There is also a kernel build option to enable more debugging
+information, which may be necessary to debug driver problems.
+
+The level of debugging information output by the driver can be changed
+at runtime through sysfs, using the driver attribute debug_level.  The
+attribute takes the same bitmask as the debug module parameter above.
+
+
+Force loading of module
+-----------------------
+
+If thinkpad-acpi refuses to detect your ThinkPad, you can try to specify
+the module parameter force_load=1.  Regardless of whether this works or
+not, please contact ibm-acpi-devel@lists.sourceforge.net with a report.
+
+
+Sysfs interface changelog
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+=========	===============================================================
+0x000100:	Initial sysfs support, as a single platform driver and
+		device.
+0x000200:	Hot key support for 32 hot keys, and radio slider switch
+		support.
+0x010000:	Hot keys are now handled by default over the input
+		layer, the radio switch generates input event EV_RADIO,
+		and the driver enables hot key handling by default in
+		the firmware.
+
+0x020000:	ABI fix: added a separate hwmon platform device and
+		driver, which must be located by name (thinkpad)
+		and the hwmon class for libsensors4 (lm-sensors 3)
+		compatibility.  Moved all hwmon attributes to this
+		new platform device.
+
+0x020100:	Marker for thinkpad-acpi with hot key NVRAM polling
+		support.  If you must, use it to know you should not
+		start a userspace NVRAM poller (allows to detect when
+		NVRAM is compiled out by the user because it is
+		unneeded/undesired in the first place).
+0x020101:	Marker for thinkpad-acpi with hot key NVRAM polling
+		and proper hotkey_mask semantics (version 8 of the
+		NVRAM polling patch).  Some development snapshots of
+		0.18 had an earlier version that did strange things
+		to hotkey_mask.
+
+0x020200:	Add poll()/select() support to the following attributes:
+		hotkey_radio_sw, wakeup_hotunplug_complete, wakeup_reason
+
+0x020300:	hotkey enable/disable support removed, attributes
+		hotkey_bios_enabled and hotkey_enable deprecated and
+		marked for removal.
+
+0x020400:	Marker for 16 LEDs support.  Also, LEDs that are known
+		to not exist in a given model are not registered with
+		the LED sysfs class anymore.
+
+0x020500:	Updated hotkey driver, hotkey_mask is always available
+		and it is always able to disable hot keys.  Very old
+		thinkpads are properly supported.  hotkey_bios_mask
+		is deprecated and marked for removal.
+
+0x020600:	Marker for backlight change event support.
+
+0x020700:	Support for mute-only mixers.
+		Volume control in read-only mode by default.
+		Marker for ALSA mixer support.
+
+0x030000:	Thermal and fan sysfs attributes were moved to the hwmon
+		device instead of being attached to the backing platform
+		device.
+=========	===============================================================
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt
deleted file mode 100644
index 75ef063622d2..000000000000
--- a/Documentation/laptops/thinkpad-acpi.txt
+++ /dev/null
@@ -1,1487 +0,0 @@
-		     ThinkPad ACPI Extras Driver
-
-                            Version 0.25
-                        October 16th,  2013
-
-               Borislav Deianov <borislav@users.sf.net>
-             Henrique de Moraes Holschuh <hmh@hmh.eng.br>
-                      http://ibm-acpi.sf.net/
-
-
-This is a Linux driver for the IBM and Lenovo ThinkPad laptops. It
-supports various features of these laptops which are accessible
-through the ACPI and ACPI EC framework, but not otherwise fully
-supported by the generic Linux ACPI drivers.
-
-This driver used to be named ibm-acpi until kernel 2.6.21 and release
-0.13-20070314.  It used to be in the drivers/acpi tree, but it was
-moved to the drivers/misc tree and renamed to thinkpad-acpi for kernel
-2.6.22, and release 0.14.  It was moved to drivers/platform/x86 for
-kernel 2.6.29 and release 0.22.
-
-The driver is named "thinkpad-acpi".  In some places, like module
-names and log messages, "thinkpad_acpi" is used because of userspace
-issues.
-
-"tpacpi" is used as a shorthand where "thinkpad-acpi" would be too
-long due to length limitations on some Linux kernel versions.
-
-Status
-------
-
-The features currently supported are the following (see below for
-detailed description):
-
-	- Fn key combinations
-	- Bluetooth enable and disable
-	- video output switching, expansion control
-	- ThinkLight on and off
-	- CMOS/UCMS control
-	- LED control
-	- ACPI sounds
-	- temperature sensors
-	- Experimental: embedded controller register dump
-	- LCD brightness control
-	- Volume control
-	- Fan control and monitoring: fan speed, fan enable/disable
-	- WAN enable and disable
-	- UWB enable and disable
-
-A compatibility table by model and feature is maintained on the web
-site, http://ibm-acpi.sf.net/. I appreciate any success or failure
-reports, especially if they add to or correct the compatibility table.
-Please include the following information in your report:
-
-	- ThinkPad model name
-	- a copy of your ACPI tables, using the "acpidump" utility
-	- a copy of the output of dmidecode, with serial numbers
-	  and UUIDs masked off
-	- which driver features work and which don't
-	- the observed behavior of non-working features
-
-Any other comments or patches are also more than welcome.
-
-
-Installation
-------------
-
-If you are compiling this driver as included in the Linux kernel
-sources, look for the CONFIG_THINKPAD_ACPI Kconfig option.
-It is located on the menu path: "Device Drivers" -> "X86 Platform
-Specific Device Drivers" -> "ThinkPad ACPI Laptop Extras".
-
-
-Features
---------
-
-The driver exports two different interfaces to userspace, which can be
-used to access the features it provides.  One is a legacy procfs-based
-interface, which will be removed at some time in the future.  The other
-is a new sysfs-based interface which is not complete yet.
-
-The procfs interface creates the /proc/acpi/ibm directory.  There is a
-file under that directory for each feature it supports.  The procfs
-interface is mostly frozen, and will change very little if at all: it
-will not be extended to add any new functionality in the driver, instead
-all new functionality will be implemented on the sysfs interface.
-
-The sysfs interface tries to blend in the generic Linux sysfs subsystems
-and classes as much as possible.  Since some of these subsystems are not
-yet ready or stabilized, it is expected that this interface will change,
-and any and all userspace programs must deal with it.
-
-
-Notes about the sysfs interface:
-
-Unlike what was done with the procfs interface, correctness when talking
-to the sysfs interfaces will be enforced, as will correctness in the
-thinkpad-acpi's implementation of sysfs interfaces.
-
-Also, any bugs in the thinkpad-acpi sysfs driver code or in the
-thinkpad-acpi's implementation of the sysfs interfaces will be fixed for
-maximum correctness, even if that means changing an interface in
-non-compatible ways.  As these interfaces mature both in the kernel and
-in thinkpad-acpi, such changes should become quite rare.
-
-Applications interfacing to the thinkpad-acpi sysfs interfaces must
-follow all sysfs guidelines and correctly process all errors (the sysfs
-interface makes extensive use of errors).  File descriptors and open /
-close operations to the sysfs inodes must also be properly implemented.
-
-The version of thinkpad-acpi's sysfs interface is exported by the driver
-as a driver attribute (see below).
-
-Sysfs driver attributes are on the driver's sysfs attribute space,
-for 2.6.23+ this is /sys/bus/platform/drivers/thinkpad_acpi/ and
-/sys/bus/platform/drivers/thinkpad_hwmon/
-
-Sysfs device attributes are on the thinkpad_acpi device sysfs attribute
-space, for 2.6.23+ this is /sys/devices/platform/thinkpad_acpi/.
-
-Sysfs device attributes for the sensors and fan are on the
-thinkpad_hwmon device's sysfs attribute space, but you should locate it
-looking for a hwmon device with the name attribute of "thinkpad", or
-better yet, through libsensors. For 4.14+ sysfs attributes were moved to the
-hwmon device (/sys/bus/platform/devices/thinkpad_hwmon/hwmon/hwmon? or
-/sys/class/hwmon/hwmon?).
-
-Driver version
---------------
-
-procfs: /proc/acpi/ibm/driver
-sysfs driver attribute: version
-
-The driver name and version. No commands can be written to this file.
-
-
-Sysfs interface version
------------------------
-
-sysfs driver attribute: interface_version
-
-Version of the thinkpad-acpi sysfs interface, as an unsigned long
-(output in hex format: 0xAAAABBCC), where:
-	AAAA - major revision
-	BB - minor revision
-	CC - bugfix revision
-
-The sysfs interface version changelog for the driver can be found at the
-end of this document.  Changes to the sysfs interface done by the kernel
-subsystems are not documented here, nor are they tracked by this
-attribute.
-
-Changes to the thinkpad-acpi sysfs interface are only considered
-non-experimental when they are submitted to Linux mainline, at which
-point the changes in this interface are documented and interface_version
-may be updated.  If you are using any thinkpad-acpi features not yet
-sent to mainline for merging, you do so on your own risk: these features
-may disappear, or be implemented in a different and incompatible way by
-the time they are merged in Linux mainline.
-
-Changes that are backwards-compatible by nature (e.g. the addition of
-attributes that do not change the way the other attributes work) do not
-always warrant an update of interface_version.  Therefore, one must
-expect that an attribute might not be there, and deal with it properly
-(an attribute not being there *is* a valid way to make it clear that a
-feature is not available in sysfs).
-
-
-Hot keys
---------
-
-procfs: /proc/acpi/ibm/hotkey
-sysfs device attribute: hotkey_*
-
-In a ThinkPad, the ACPI HKEY handler is responsible for communicating
-some important events and also keyboard hot key presses to the operating
-system.  Enabling the hotkey functionality of thinkpad-acpi signals the
-firmware that such a driver is present, and modifies how the ThinkPad
-firmware will behave in many situations.
-
-The driver enables the HKEY ("hot key") event reporting automatically
-when loaded, and disables it when it is removed.
-
-The driver will report HKEY events in the following format:
-
-	ibm/hotkey HKEY 00000080 0000xxxx
-
-Some of these events refer to hot key presses, but not all of them.
-
-The driver will generate events over the input layer for hot keys and
-radio switches, and over the ACPI netlink layer for other events.  The
-input layer support accepts the standard IOCTLs to remap the keycodes
-assigned to each hot key.
-
-The hot key bit mask allows some control over which hot keys generate
-events.  If a key is "masked" (bit set to 0 in the mask), the firmware
-will handle it.  If it is "unmasked", it signals the firmware that
-thinkpad-acpi would prefer to handle it, if the firmware would be so
-kind to allow it (and it often doesn't!).
-
-Not all bits in the mask can be modified.  Not all bits that can be
-modified do anything.  Not all hot keys can be individually controlled
-by the mask.  Some models do not support the mask at all.  The behaviour
-of the mask is, therefore, highly dependent on the ThinkPad model.
-
-The driver will filter out any unmasked hotkeys, so even if the firmware
-doesn't allow disabling an specific hotkey, the driver will not report
-events for unmasked hotkeys.
-
-Note that unmasking some keys prevents their default behavior.  For
-example, if Fn+F5 is unmasked, that key will no longer enable/disable
-Bluetooth by itself in firmware.
-
-Note also that not all Fn key combinations are supported through ACPI
-depending on the ThinkPad model and firmware version.  On those
-ThinkPads, it is still possible to support some extra hotkeys by
-polling the "CMOS NVRAM" at least 10 times per second.  The driver
-attempts to enables this functionality automatically when required.
-
-procfs notes:
-
-The following commands can be written to the /proc/acpi/ibm/hotkey file:
-
-	echo 0xffffffff > /proc/acpi/ibm/hotkey -- enable all hot keys
-	echo 0 > /proc/acpi/ibm/hotkey -- disable all possible hot keys
-	... any other 8-hex-digit mask ...
-	echo reset > /proc/acpi/ibm/hotkey -- restore the recommended mask
-
-The following commands have been deprecated and will cause the kernel
-to log a warning:
-
-	echo enable > /proc/acpi/ibm/hotkey -- does nothing
-	echo disable > /proc/acpi/ibm/hotkey -- returns an error
-
-The procfs interface does not support NVRAM polling control.  So as to
-maintain maximum bug-to-bug compatibility, it does not report any masks,
-nor does it allow one to manipulate the hot key mask when the firmware
-does not support masks at all, even if NVRAM polling is in use.
-
-sysfs notes:
-
-	hotkey_bios_enabled:
-		DEPRECATED, WILL BE REMOVED SOON.
-
-		Returns 0.
-
-	hotkey_bios_mask:
-		DEPRECATED, DON'T USE, WILL BE REMOVED IN THE FUTURE.
-
-		Returns the hot keys mask when thinkpad-acpi was loaded.
-		Upon module unload, the hot keys mask will be restored
-		to this value.   This is always 0x80c, because those are
-		the hotkeys that were supported by ancient firmware
-		without mask support.
-
-	hotkey_enable:
-		DEPRECATED, WILL BE REMOVED SOON.
-
-		0: returns -EPERM
-		1: does nothing
-
-	hotkey_mask:
-		bit mask to enable reporting (and depending on
-		the firmware, ACPI event generation) for each hot key
-		(see above).  Returns the current status of the hot keys
-		mask, and allows one to modify it.
-
-	hotkey_all_mask:
-		bit mask that should enable event reporting for all
-		supported hot keys, when echoed to hotkey_mask above.
-		Unless you know which events need to be handled
-		passively (because the firmware *will* handle them
-		anyway), do *not* use hotkey_all_mask.  Use
-		hotkey_recommended_mask, instead. You have been warned.
-
-	hotkey_recommended_mask:
-		bit mask that should enable event reporting for all
-		supported hot keys, except those which are always
-		handled by the firmware anyway.  Echo it to
-		hotkey_mask above, to use.  This is the default mask
-		used by the driver.
-
-	hotkey_source_mask:
-		bit mask that selects which hot keys will the driver
-		poll the NVRAM for.  This is auto-detected by the driver
-		based on the capabilities reported by the ACPI firmware,
-		but it can be overridden at runtime.
-
-		Hot keys whose bits are set in hotkey_source_mask are
-		polled for in NVRAM, and reported as hotkey events if
-		enabled in hotkey_mask.  Only a few hot keys are
-		available through CMOS NVRAM polling.
-
-		Warning: when in NVRAM mode, the volume up/down/mute
-		keys are synthesized according to changes in the mixer,
-		which uses a single volume up or volume down hotkey
-		press to unmute, as per the ThinkPad volume mixer user
-		interface.  When in ACPI event mode, volume up/down/mute
-		events are reported by the firmware and can behave
-		differently (and that behaviour changes with firmware
-		version -- not just with firmware models -- as well as
-		OSI(Linux) state).
-
-	hotkey_poll_freq:
-		frequency in Hz for hot key polling. It must be between
-		0 and 25 Hz.  Polling is only carried out when strictly
-		needed.
-
-		Setting hotkey_poll_freq to zero disables polling, and
-		will cause hot key presses that require NVRAM polling
-		to never be reported.
-
-		Setting hotkey_poll_freq too low may cause repeated
-		pressings of the same hot key to be misreported as a
-		single key press, or to not even be detected at all.
-		The recommended polling frequency is 10Hz.
-
-	hotkey_radio_sw:
-		If the ThinkPad has a hardware radio switch, this
-		attribute will read 0 if the switch is in the "radios
-		disabled" position, and 1 if the switch is in the
-		"radios enabled" position.
-
-		This attribute has poll()/select() support.
-
-	hotkey_tablet_mode:
-		If the ThinkPad has tablet capabilities, this attribute
-		will read 0 if the ThinkPad is in normal mode, and
-		1 if the ThinkPad is in tablet mode.
-
-		This attribute has poll()/select() support.
-
-	wakeup_reason:
-		Set to 1 if the system is waking up because the user
-		requested a bay ejection.  Set to 2 if the system is
-		waking up because the user requested the system to
-		undock.  Set to zero for normal wake-ups or wake-ups
-		due to unknown reasons.
-
-		This attribute has poll()/select() support.
-
-	wakeup_hotunplug_complete:
-		Set to 1 if the system was waken up because of an
-		undock or bay ejection request, and that request
-		was successfully completed.  At this point, it might
-		be useful to send the system back to sleep, at the
-		user's choice.  Refer to HKEY events 0x4003 and
-		0x3003, below.
-
-		This attribute has poll()/select() support.
-
-input layer notes:
-
-A Hot key is mapped to a single input layer EV_KEY event, possibly
-followed by an EV_MSC MSC_SCAN event that shall contain that key's scan
-code.  An EV_SYN event will always be generated to mark the end of the
-event block.
-
-Do not use the EV_MSC MSC_SCAN events to process keys.  They are to be
-used as a helper to remap keys, only.  They are particularly useful when
-remapping KEY_UNKNOWN keys.
-
-The events are available in an input device, with the following id:
-
-	Bus:		BUS_HOST
-	vendor:		0x1014 (PCI_VENDOR_ID_IBM)  or
-			0x17aa (PCI_VENDOR_ID_LENOVO)
-	product:	0x5054 ("TP")
-	version:	0x4101
-
-The version will have its LSB incremented if the keymap changes in a
-backwards-compatible way.  The MSB shall always be 0x41 for this input
-device.  If the MSB is not 0x41, do not use the device as described in
-this section, as it is either something else (e.g. another input device
-exported by a thinkpad driver, such as HDAPS) or its functionality has
-been changed in a non-backwards compatible way.
-
-Adding other event types for other functionalities shall be considered a
-backwards-compatible change for this input device.
-
-Thinkpad-acpi Hot Key event map (version 0x4101):
-
-ACPI	Scan
-event	code	Key		Notes
-
-0x1001	0x00	FN+F1		-
-
-0x1002	0x01	FN+F2		IBM: battery (rare)
-				Lenovo: Screen lock
-
-0x1003	0x02	FN+F3		Many IBM models always report
-				this hot key, even with hot keys
-				disabled or with Fn+F3 masked
-				off
-				IBM: screen lock, often turns
-				off the ThinkLight as side-effect
-				Lenovo: battery
-
-0x1004	0x03	FN+F4		Sleep button (ACPI sleep button
-				semantics, i.e. sleep-to-RAM).
-				It always generates some kind
-				of event, either the hot key
-				event or an ACPI sleep button
-				event. The firmware may
-				refuse to generate further FN+F4
-				key presses until a S3 or S4 ACPI
-				sleep cycle is performed or some
-				time passes.
-
-0x1005	0x04	FN+F5		Radio.  Enables/disables
-				the internal Bluetooth hardware
-				and W-WAN card if left in control
-				of the firmware.  Does not affect
-				the WLAN card.
-				Should be used to turn on/off all
-				radios (Bluetooth+W-WAN+WLAN),
-				really.
-
-0x1006	0x05	FN+F6		-
-
-0x1007	0x06	FN+F7		Video output cycle.
-				Do you feel lucky today?
-
-0x1008	0x07	FN+F8		IBM: toggle screen expand
-				Lenovo: configure UltraNav,
-				or toggle screen expand
-
-0x1009	0x08	FN+F9		-
-	..	..		..
-0x100B	0x0A	FN+F11		-
-
-0x100C	0x0B	FN+F12		Sleep to disk.  You are always
-				supposed to handle it yourself,
-				either through the ACPI event,
-				or through a hotkey event.
-				The firmware may refuse to
-				generate further FN+F12 key
-				press events until a S3 or S4
-				ACPI sleep cycle is performed,
-				or some time passes.
-
-0x100D	0x0C	FN+BACKSPACE	-
-0x100E	0x0D	FN+INSERT	-
-0x100F	0x0E	FN+DELETE	-
-
-0x1010	0x0F	FN+HOME		Brightness up.  This key is
-				always handled by the firmware
-				in IBM ThinkPads, even when
-				unmasked.  Just leave it alone.
-				For Lenovo ThinkPads with a new
-				BIOS, it has to be handled either
-				by the ACPI OSI, or by userspace.
-				The driver does the right thing,
-				never mess with this.
-0x1011	0x10	FN+END		Brightness down.  See brightness
-				up for details.
-
-0x1012	0x11	FN+PGUP		ThinkLight toggle.  This key is
-				always handled by the firmware,
-				even when unmasked.
-
-0x1013	0x12	FN+PGDOWN	-
-
-0x1014	0x13	FN+SPACE	Zoom key
-
-0x1015	0x14	VOLUME UP	Internal mixer volume up. This
-				key is always handled by the
-				firmware, even when unmasked.
-				NOTE: Lenovo seems to be changing
-				this.
-0x1016	0x15	VOLUME DOWN	Internal mixer volume up. This
-				key is always handled by the
-				firmware, even when unmasked.
-				NOTE: Lenovo seems to be changing
-				this.
-0x1017	0x16	MUTE		Mute internal mixer. This
-				key is always handled by the
-				firmware, even when unmasked.
-
-0x1018	0x17	THINKPAD	ThinkPad/Access IBM/Lenovo key
-
-0x1019	0x18	unknown
-..	..	..
-0x1020	0x1F	unknown
-
-The ThinkPad firmware does not allow one to differentiate when most hot
-keys are pressed or released (either that, or we don't know how to, yet).
-For these keys, the driver generates a set of events for a key press and
-immediately issues the same set of events for a key release.  It is
-unknown by the driver if the ThinkPad firmware triggered these events on
-hot key press or release, but the firmware will do it for either one, not
-both.
-
-If a key is mapped to KEY_RESERVED, it generates no input events at all.
-If a key is mapped to KEY_UNKNOWN, it generates an input event that
-includes an scan code.  If a key is mapped to anything else, it will
-generate input device EV_KEY events.
-
-In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
-events for switches:
-
-SW_RFKILL_ALL	T60 and later hardware rfkill rocker switch
-SW_TABLET_MODE	Tablet ThinkPads HKEY events 0x5009 and 0x500A
-
-Non hotkey ACPI HKEY event map:
--------------------------------
-
-Events that are never propagated by the driver:
-
-0x2304		System is waking up from suspend to undock
-0x2305		System is waking up from suspend to eject bay
-0x2404		System is waking up from hibernation to undock
-0x2405		System is waking up from hibernation to eject bay
-0x5001		Lid closed
-0x5002		Lid opened
-0x5009		Tablet swivel: switched to tablet mode
-0x500A		Tablet swivel: switched to normal mode
-0x5010		Brightness level changed/control event
-0x6000		KEYBOARD: Numlock key pressed
-0x6005		KEYBOARD: Fn key pressed (TO BE VERIFIED)
-0x7000		Radio Switch may have changed state
-
-
-Events that are propagated by the driver to userspace:
-
-0x2313		ALARM: System is waking up from suspend because
-		the battery is nearly empty
-0x2413		ALARM: System is waking up from hibernation because
-		the battery is nearly empty
-0x3003		Bay ejection (see 0x2x05) complete, can sleep again
-0x3006		Bay hotplug request (hint to power up SATA link when
-		the optical drive tray is ejected)
-0x4003		Undocked (see 0x2x04), can sleep again
-0x4010		Docked into hotplug port replicator (non-ACPI dock)
-0x4011		Undocked from hotplug port replicator (non-ACPI dock)
-0x500B		Tablet pen inserted into its storage bay
-0x500C		Tablet pen removed from its storage bay
-0x6011		ALARM: battery is too hot
-0x6012		ALARM: battery is extremely hot
-0x6021		ALARM: a sensor is too hot
-0x6022		ALARM: a sensor is extremely hot
-0x6030		System thermal table changed
-0x6032		Thermal Control command set completion  (DYTC, Windows)
-0x6040		Nvidia Optimus/AC adapter related (TO BE VERIFIED)
-0x60C0		X1 Yoga 2016, Tablet mode status changed
-0x60F0		Thermal Transformation changed (GMTS, Windows)
-
-Battery nearly empty alarms are a last resort attempt to get the
-operating system to hibernate or shutdown cleanly (0x2313), or shutdown
-cleanly (0x2413) before power is lost.  They must be acted upon, as the
-wake up caused by the firmware will have negated most safety nets...
-
-When any of the "too hot" alarms happen, according to Lenovo the user
-should suspend or hibernate the laptop (and in the case of battery
-alarms, unplug the AC adapter) to let it cool down.  These alarms do
-signal that something is wrong, they should never happen on normal
-operating conditions.
-
-The "extremely hot" alarms are emergencies.  According to Lenovo, the
-operating system is to force either an immediate suspend or hibernate
-cycle, or a system shutdown.  Obviously, something is very wrong if this
-happens.
-
-
-Brightness hotkey notes:
-
-Don't mess with the brightness hotkeys in a Thinkpad.  If you want
-notifications for OSD, use the sysfs backlight class event support.
-
-The driver will issue KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN events
-automatically for the cases were userspace has to do something to
-implement brightness changes.  When you override these events, you will
-either fail to handle properly the ThinkPads that require explicit
-action to change backlight brightness, or the ThinkPads that require
-that no action be taken to work properly.
-
-
-Bluetooth
----------
-
-procfs: /proc/acpi/ibm/bluetooth
-sysfs device attribute: bluetooth_enable (deprecated)
-sysfs rfkill class: switch "tpacpi_bluetooth_sw"
-
-This feature shows the presence and current state of a ThinkPad
-Bluetooth device in the internal ThinkPad CDC slot.
-
-If the ThinkPad supports it, the Bluetooth state is stored in NVRAM,
-so it is kept across reboots and power-off.
-
-Procfs notes:
-
-If Bluetooth is installed, the following commands can be used:
-
-	echo enable > /proc/acpi/ibm/bluetooth
-	echo disable > /proc/acpi/ibm/bluetooth
-
-Sysfs notes:
-
-	If the Bluetooth CDC card is installed, it can be enabled /
-	disabled through the "bluetooth_enable" thinkpad-acpi device
-	attribute, and its current status can also be queried.
-
-	enable:
-		0: disables Bluetooth / Bluetooth is disabled
-		1: enables Bluetooth / Bluetooth is enabled.
-
-	Note: this interface has been superseded by the	generic rfkill
-	class.  It has been deprecated, and it will be removed in year
-	2010.
-
-	rfkill controller switch "tpacpi_bluetooth_sw": refer to
-	Documentation/rfkill.txt for details.
-
-
-Video output control -- /proc/acpi/ibm/video
---------------------------------------------
-
-This feature allows control over the devices used for video output -
-LCD, CRT or DVI (if available). The following commands are available:
-
-	echo lcd_enable > /proc/acpi/ibm/video
-	echo lcd_disable > /proc/acpi/ibm/video
-	echo crt_enable > /proc/acpi/ibm/video
-	echo crt_disable > /proc/acpi/ibm/video
-	echo dvi_enable > /proc/acpi/ibm/video
-	echo dvi_disable > /proc/acpi/ibm/video
-	echo auto_enable > /proc/acpi/ibm/video
-	echo auto_disable > /proc/acpi/ibm/video
-	echo expand_toggle > /proc/acpi/ibm/video
-	echo video_switch > /proc/acpi/ibm/video
-
-NOTE: Access to this feature is restricted to processes owning the
-CAP_SYS_ADMIN capability for safety reasons, as it can interact badly
-enough with some versions of X.org to crash it.
-
-Each video output device can be enabled or disabled individually.
-Reading /proc/acpi/ibm/video shows the status of each device.
-
-Automatic video switching can be enabled or disabled.  When automatic
-video switching is enabled, certain events (e.g. opening the lid,
-docking or undocking) cause the video output device to change
-automatically. While this can be useful, it also causes flickering
-and, on the X40, video corruption. By disabling automatic switching,
-the flickering or video corruption can be avoided.
-
-The video_switch command cycles through the available video outputs
-(it simulates the behavior of Fn-F7).
-
-Video expansion can be toggled through this feature. This controls
-whether the display is expanded to fill the entire LCD screen when a
-mode with less than full resolution is used. Note that the current
-video expansion status cannot be determined through this feature.
-
-Note that on many models (particularly those using Radeon graphics
-chips) the X driver configures the video card in a way which prevents
-Fn-F7 from working. This also disables the video output switching
-features of this driver, as it uses the same ACPI methods as
-Fn-F7. Video switching on the console should still work.
-
-UPDATE: refer to https://bugs.freedesktop.org/show_bug.cgi?id=2000
-
-
-ThinkLight control
-------------------
-
-procfs: /proc/acpi/ibm/light
-sysfs attributes: as per LED class, for the "tpacpi::thinklight" LED
-
-procfs notes:
-
-The ThinkLight status can be read and set through the procfs interface.  A
-few models which do not make the status available will show the ThinkLight
-status as "unknown". The available commands are:
-
-	echo on  > /proc/acpi/ibm/light
-	echo off > /proc/acpi/ibm/light
-
-sysfs notes:
-
-The ThinkLight sysfs interface is documented by the LED class
-documentation, in Documentation/leds/leds-class.rst.  The ThinkLight LED name
-is "tpacpi::thinklight".
-
-Due to limitations in the sysfs LED class, if the status of the ThinkLight
-cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
-It is impossible to know if the status returned through sysfs is valid.
-
-
-CMOS/UCMS control
------------------
-
-procfs: /proc/acpi/ibm/cmos
-sysfs device attribute: cmos_command
-
-This feature is mostly used internally by the ACPI firmware to keep the legacy
-CMOS NVRAM bits in sync with the current machine state, and to record this
-state so that the ThinkPad will retain such settings across reboots.
-
-Some of these commands actually perform actions in some ThinkPad models, but
-this is expected to disappear more and more in newer models.  As an example, in
-a T43 and in a X40, commands 12 and 13 still control the ThinkLight state for
-real, but commands 0 to 2 don't control the mixer anymore (they have been
-phased out) and just update the NVRAM.
-
-The range of valid cmos command numbers is 0 to 21, but not all have an
-effect and the behavior varies from model to model.  Here is the behavior
-on the X40 (tpb is the ThinkPad Buttons utility):
-
-	0 - Related to "Volume down" key press
-	1 - Related to "Volume up" key press
-	2 - Related to "Mute on" key press
-	3 - Related to "Access IBM" key press
-	4 - Related to "LCD brightness up" key press
-	5 - Related to "LCD brightness down" key press
-	11 - Related to "toggle screen expansion" key press/function
-	12 - Related to "ThinkLight on"
-	13 - Related to "ThinkLight off"
-	14 - Related to "ThinkLight" key press (toggle ThinkLight)
-
-The cmos command interface is prone to firmware split-brain problems, as
-in newer ThinkPads it is just a compatibility layer.  Do not use it, it is
-exported just as a debug tool.
-
-
-LED control
------------
-
-procfs: /proc/acpi/ibm/led
-sysfs attributes: as per LED class, see below for names
-
-Some of the LED indicators can be controlled through this feature.  On
-some older ThinkPad models, it is possible to query the status of the
-LED indicators as well.  Newer ThinkPads cannot query the real status
-of the LED indicators.
-
-Because misuse of the LEDs could induce an unaware user to perform
-dangerous actions (like undocking or ejecting a bay device while the
-buses are still active), or mask an important alarm (such as a nearly
-empty battery, or a broken battery), access to most LEDs is
-restricted.
-
-Unrestricted access to all LEDs requires that thinkpad-acpi be
-compiled with the CONFIG_THINKPAD_ACPI_UNSAFE_LEDS option enabled.
-Distributions must never enable this option.  Individual users that
-are aware of the consequences are welcome to enabling it.
-
-Audio mute and microphone mute LEDs are supported, but currently not
-visible to userspace. They are used by the snd-hda-intel audio driver.
-
-procfs notes:
-
-The available commands are:
-
-	echo '<LED number> on' >/proc/acpi/ibm/led
-	echo '<LED number> off' >/proc/acpi/ibm/led
-	echo '<LED number> blink' >/proc/acpi/ibm/led
-
-The <LED number> range is 0 to 15. The set of LEDs that can be
-controlled varies from model to model. Here is the common ThinkPad
-mapping:
-
-	0 - power
-	1 - battery (orange)
-	2 - battery (green)
-	3 - UltraBase/dock
-	4 - UltraBay
-	5 - UltraBase battery slot
-	6 - (unknown)
-	7 - standby
-	8 - dock status 1
-	9 - dock status 2
-	10, 11 - (unknown)
-	12 - thinkvantage
-	13, 14, 15 - (unknown)
-
-All of the above can be turned on and off and can be made to blink.
-
-sysfs notes:
-
-The ThinkPad LED sysfs interface is described in detail by the LED class
-documentation, in Documentation/leds/leds-class.rst.
-
-The LEDs are named (in LED ID order, from 0 to 12):
-"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
-"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
-"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
-"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
-"tpacpi::thinkvantage".
-
-Due to limitations in the sysfs LED class, if the status of the LED
-indicators cannot be read due to an error, thinkpad-acpi will report it as
-a brightness of zero (same as LED off).
-
-If the thinkpad firmware doesn't support reading the current status,
-trying to read the current LED brightness will just return whatever
-brightness was last written to that attribute.
-
-These LEDs can blink using hardware acceleration.  To request that a
-ThinkPad indicator LED should blink in hardware accelerated mode, use the
-"timer" trigger, and leave the delay_on and delay_off parameters set to
-zero (to request hardware acceleration autodetection).
-
-LEDs that are known not to exist in a given ThinkPad model are not
-made available through the sysfs interface.  If you have a dock and you
-notice there are LEDs listed for your ThinkPad that do not exist (and
-are not in the dock), or if you notice that there are missing LEDs,
-a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
-
-
-ACPI sounds -- /proc/acpi/ibm/beep
-----------------------------------
-
-The BEEP method is used internally by the ACPI firmware to provide
-audible alerts in various situations. This feature allows the same
-sounds to be triggered manually.
-
-The commands are non-negative integer numbers:
-
-	echo <number> >/proc/acpi/ibm/beep
-
-The valid <number> range is 0 to 17. Not all numbers trigger sounds
-and the sounds vary from model to model. Here is the behavior on the
-X40:
-
-	0 - stop a sound in progress (but use 17 to stop 16)
-	2 - two beeps, pause, third beep ("low battery")
-	3 - single beep
-	4 - high, followed by low-pitched beep ("unable")
-	5 - single beep
-	6 - very high, followed by high-pitched beep ("AC/DC")
-	7 - high-pitched beep
-	9 - three short beeps
-	10 - very long beep
-	12 - low-pitched beep
-	15 - three high-pitched beeps repeating constantly, stop with 0
-	16 - one medium-pitched beep repeating constantly, stop with 17
-	17 - stop 16
-
-
-Temperature sensors
--------------------
-
-procfs: /proc/acpi/ibm/thermal
-sysfs device attributes: (hwmon "thinkpad") temp*_input
-
-Most ThinkPads include six or more separate temperature sensors but only
-expose the CPU temperature through the standard ACPI methods.  This
-feature shows readings from up to eight different sensors on older
-ThinkPads, and up to sixteen different sensors on newer ThinkPads.
-
-For example, on the X40, a typical output may be:
-temperatures:   42 42 45 41 36 -128 33 -128
-
-On the T43/p, a typical output may be:
-temperatures:   48 48 36 52 38 -128 31 -128 48 52 48 -128 -128 -128 -128 -128
-
-The mapping of thermal sensors to physical locations varies depending on
-system-board model (and thus, on ThinkPad model).
-
-http://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
-tries to track down these locations for various models.
-
-Most (newer?) models seem to follow this pattern:
-
-1:  CPU
-2:  (depends on model)
-3:  (depends on model)
-4:  GPU
-5:  Main battery: main sensor
-6:  Bay battery: main sensor
-7:  Main battery: secondary sensor
-8:  Bay battery: secondary sensor
-9-15: (depends on model)
-
-For the R51 (source: Thomas Gruber):
-2:  Mini-PCI
-3:  Internal HDD
-
-For the T43, T43/p (source: Shmidoax/Thinkwiki.org)
-http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
-2:  System board, left side (near PCMCIA slot), reported as HDAPS temp
-3:  PCMCIA slot
-9:  MCH (northbridge) to DRAM Bus
-10: Clock-generator, mini-pci card and ICH (southbridge), under Mini-PCI
-    card, under touchpad
-11: Power regulator, underside of system board, below F2 key
-
-The A31 has a very atypical layout for the thermal sensors
-(source: Milos Popovic, http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
-1:  CPU
-2:  Main Battery: main sensor
-3:  Power Converter
-4:  Bay Battery: main sensor
-5:  MCH (northbridge)
-6:  PCMCIA/ambient
-7:  Main Battery: secondary sensor
-8:  Bay Battery: secondary sensor
-
-
-Procfs notes:
-	Readings from sensors that are not available return -128.
-	No commands can be written to this file.
-
-Sysfs notes:
-	Sensors that are not available return the ENXIO error.  This
-	status may change at runtime, as there are hotplug thermal
-	sensors, like those inside the batteries and docks.
-
-	thinkpad-acpi thermal sensors are reported through the hwmon
-	subsystem, and follow all of the hwmon guidelines at
-	Documentation/hwmon.
-
-EXPERIMENTAL: Embedded controller register dump
------------------------------------------------
-
-This feature is not included in the thinkpad driver anymore.
-Instead the EC can be accessed through /sys/kernel/debug/ec with
-a userspace tool which can be found here:
-ftp://ftp.suse.com/pub/people/trenn/sources/ec
-
-Use it to determine the register holding the fan
-speed on some models. To do that, do the following:
-	- make sure the battery is fully charged
-	- make sure the fan is running
-	- use above mentioned tool to read out the EC
-
-Often fan and temperature values vary between
-readings. Since temperatures don't change vary fast, you can take
-several quick dumps to eliminate them.
-
-You can use a similar method to figure out the meaning of other
-embedded controller registers - e.g. make sure nothing else changes
-except the charging or discharging battery to determine which
-registers contain the current battery capacity, etc. If you experiment
-with this, do send me your results (including some complete dumps with
-a description of the conditions when they were taken.)
-
-
-LCD brightness control
-----------------------
-
-procfs: /proc/acpi/ibm/brightness
-sysfs backlight device "thinkpad_screen"
-
-This feature allows software control of the LCD brightness on ThinkPad
-models which don't have a hardware brightness slider.
-
-It has some limitations: the LCD backlight cannot be actually turned
-on or off by this interface, it just controls the backlight brightness
-level.
-
-On IBM (and some of the earlier Lenovo) ThinkPads, the backlight control
-has eight brightness levels, ranging from 0 to 7.  Some of the levels
-may not be distinct.  Later Lenovo models that implement the ACPI
-display backlight brightness control methods have 16 levels, ranging
-from 0 to 15.
-
-For IBM ThinkPads, there are two interfaces to the firmware for direct
-brightness control, EC and UCMS (or CMOS).  To select which one should be
-used, use the brightness_mode module parameter: brightness_mode=1 selects
-EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
-mode with NVRAM backing (so that brightness changes are remembered across
-shutdown/reboot).
-
-The driver tries to select which interface to use from a table of
-defaults for each ThinkPad model.  If it makes a wrong choice, please
-report this as a bug, so that we can fix it.
-
-Lenovo ThinkPads only support brightness_mode=2 (UCMS).
-
-When display backlight brightness controls are available through the
-standard ACPI interface, it is best to use it instead of this direct
-ThinkPad-specific interface.  The driver will disable its native
-backlight brightness control interface if it detects that the standard
-ACPI interface is available in the ThinkPad.
-
-If you want to use the thinkpad-acpi backlight brightness control
-instead of the generic ACPI video backlight brightness control for some
-reason, you should use the acpi_backlight=vendor kernel parameter.
-
-The brightness_enable module parameter can be used to control whether
-the LCD brightness control feature will be enabled when available.
-brightness_enable=0 forces it to be disabled.  brightness_enable=1
-forces it to be enabled when available, even if the standard ACPI
-interface is also available.
-
-Procfs notes:
-
-	The available commands are:
-
-	echo up   >/proc/acpi/ibm/brightness
-	echo down >/proc/acpi/ibm/brightness
-	echo 'level <level>' >/proc/acpi/ibm/brightness
-
-Sysfs notes:
-
-The interface is implemented through the backlight sysfs class, which is
-poorly documented at this time.
-
-Locate the thinkpad_screen device under /sys/class/backlight, and inside
-it there will be the following attributes:
-
-	max_brightness:
-		Reads the maximum brightness the hardware can be set to.
-		The minimum is always zero.
-
-	actual_brightness:
-		Reads what brightness the screen is set to at this instant.
-
-	brightness:
-		Writes request the driver to change brightness to the
-		given value.  Reads will tell you what brightness the
-		driver is trying to set the display to when "power" is set
-		to zero and the display has not been dimmed by a kernel
-		power management event.
-
-	power:
-		power management mode, where 0 is "display on", and 1 to 3
-		will dim the display backlight to brightness level 0
-		because thinkpad-acpi cannot really turn the backlight
-		off.  Kernel power management events can temporarily
-		increase the current power management level, i.e. they can
-		dim the display.
-
-
-WARNING:
-
-    Whatever you do, do NOT ever call thinkpad-acpi backlight-level change
-    interface and the ACPI-based backlight level change interface
-    (available on newer BIOSes, and driven by the Linux ACPI video driver)
-    at the same time.  The two will interact in bad ways, do funny things,
-    and maybe reduce the life of the backlight lamps by needlessly kicking
-    its level up and down at every change.
-
-
-Volume control (Console Audio control)
---------------------------------------
-
-procfs: /proc/acpi/ibm/volume
-ALSA: "ThinkPad Console Audio Control", default ID: "ThinkPadEC"
-
-NOTE: by default, the volume control interface operates in read-only
-mode, as it is supposed to be used for on-screen-display purposes.
-The read/write mode can be enabled through the use of the
-"volume_control=1" module parameter.
-
-NOTE: distros are urged to not enable volume_control by default, this
-should be done by the local admin only.  The ThinkPad UI is for the
-console audio control to be done through the volume keys only, and for
-the desktop environment to just provide on-screen-display feedback.
-Software volume control should be done only in the main AC97/HDA
-mixer.
-
-
-About the ThinkPad Console Audio control:
-
-ThinkPads have a built-in amplifier and muting circuit that drives the
-console headphone and speakers.  This circuit is after the main AC97
-or HDA mixer in the audio path, and under exclusive control of the
-firmware.
-
-ThinkPads have three special hotkeys to interact with the console
-audio control: volume up, volume down and mute.
-
-It is worth noting that the normal way the mute function works (on
-ThinkPads that do not have a "mute LED") is:
-
-1. Press mute to mute.  It will *always* mute, you can press it as
-   many times as you want, and the sound will remain mute.
-
-2. Press either volume key to unmute the ThinkPad (it will _not_
-   change the volume, it will just unmute).
-
-This is a very superior design when compared to the cheap software-only
-mute-toggle solution found on normal consumer laptops:  you can be
-absolutely sure the ThinkPad will not make noise if you press the mute
-button, no matter the previous state.
-
-The IBM ThinkPads, and the earlier Lenovo ThinkPads have variable-gain
-amplifiers driving the speakers and headphone output, and the firmware
-also handles volume control for the headphone and speakers on these
-ThinkPads without any help from the operating system (this volume
-control stage exists after the main AC97 or HDA mixer in the audio
-path).
-
-The newer Lenovo models only have firmware mute control, and depend on
-the main HDA mixer to do volume control (which is done by the operating
-system).  In this case, the volume keys are filtered out for unmute
-key press (there are some firmware bugs in this area) and delivered as
-normal key presses to the operating system (thinkpad-acpi is not
-involved).
-
-
-The ThinkPad-ACPI volume control:
-
-The preferred way to interact with the Console Audio control is the
-ALSA interface.
-
-The legacy procfs interface allows one to read the current state,
-and if volume control is enabled, accepts the following commands:
-
-	echo up   >/proc/acpi/ibm/volume
-	echo down >/proc/acpi/ibm/volume
-	echo mute >/proc/acpi/ibm/volume
-	echo unmute >/proc/acpi/ibm/volume
-	echo 'level <level>' >/proc/acpi/ibm/volume
-
-The <level> number range is 0 to 14 although not all of them may be
-distinct. To unmute the volume after the mute command, use either the
-up or down command (the level command will not unmute the volume), or
-the unmute command.
-
-You can use the volume_capabilities parameter to tell the driver
-whether your thinkpad has volume control or mute-only control:
-volume_capabilities=1 for mixers with mute and volume control,
-volume_capabilities=2 for mixers with only mute control.
-
-If the driver misdetects the capabilities for your ThinkPad model,
-please report this to ibm-acpi-devel@lists.sourceforge.net, so that we
-can update the driver.
-
-There are two strategies for volume control.  To select which one
-should be used, use the volume_mode module parameter: volume_mode=1
-selects EC mode, and volume_mode=3 selects EC mode with NVRAM backing
-(so that volume/mute changes are remembered across shutdown/reboot).
-
-The driver will operate in volume_mode=3 by default. If that does not
-work well on your ThinkPad model, please report this to
-ibm-acpi-devel@lists.sourceforge.net.
-
-The driver supports the standard ALSA module parameters.  If the ALSA
-mixer is disabled, the driver will disable all volume functionality.
-
-
-Fan control and monitoring: fan speed, fan enable/disable
----------------------------------------------------------
-
-procfs: /proc/acpi/ibm/fan
-sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1,
-			  pwm1_enable, fan2_input
-sysfs hwmon driver attributes: fan_watchdog
-
-NOTE NOTE NOTE: fan control operations are disabled by default for
-safety reasons.  To enable them, the module parameter "fan_control=1"
-must be given to thinkpad-acpi.
-
-This feature attempts to show the current fan speed, control mode and
-other fan data that might be available.  The speed is read directly
-from the hardware registers of the embedded controller.  This is known
-to work on later R, T, X and Z series ThinkPads but may show a bogus
-value on other models.
-
-Some Lenovo ThinkPads support a secondary fan.  This fan cannot be
-controlled separately, it shares the main fan control.
-
-Fan levels:
-
-Most ThinkPad fans work in "levels" at the firmware interface.  Level 0
-stops the fan.  The higher the level, the higher the fan speed, although
-adjacent levels often map to the same fan speed.  7 is the highest
-level, where the fan reaches the maximum recommended speed.
-
-Level "auto" means the EC changes the fan level according to some
-internal algorithm, usually based on readings from the thermal sensors.
-
-There is also a "full-speed" level, also known as "disengaged" level.
-In this level, the EC disables the speed-locked closed-loop fan control,
-and drives the fan as fast as it can go, which might exceed hardware
-limits, so use this level with caution.
-
-The fan usually ramps up or down slowly from one speed to another, and
-it is normal for the EC to take several seconds to react to fan
-commands.  The full-speed level may take up to two minutes to ramp up to
-maximum speed, and in some ThinkPads, the tachometer readings go stale
-while the EC is transitioning to the full-speed level.
-
-WARNING WARNING WARNING: do not leave the fan disabled unless you are
-monitoring all of the temperature sensor readings and you are ready to
-enable it if necessary to avoid overheating.
-
-An enabled fan in level "auto" may stop spinning if the EC decides the
-ThinkPad is cool enough and doesn't need the extra airflow.  This is
-normal, and the EC will spin the fan up if the various thermal readings
-rise too much.
-
-On the X40, this seems to depend on the CPU and HDD temperatures.
-Specifically, the fan is turned on when either the CPU temperature
-climbs to 56 degrees or the HDD temperature climbs to 46 degrees.  The
-fan is turned off when the CPU temperature drops to 49 degrees and the
-HDD temperature drops to 41 degrees.  These thresholds cannot
-currently be controlled.
-
-The ThinkPad's ACPI DSDT code will reprogram the fan on its own when
-certain conditions are met.  It will override any fan programming done
-through thinkpad-acpi.
-
-The thinkpad-acpi kernel driver can be programmed to revert the fan
-level to a safe setting if userspace does not issue one of the procfs
-fan commands: "enable", "disable", "level" or "watchdog", or if there
-are no writes to pwm1_enable (or to pwm1 *if and only if* pwm1_enable is
-set to 1, manual mode) within a configurable amount of time of up to
-120 seconds.  This functionality is called fan safety watchdog.
-
-Note that the watchdog timer stops after it enables the fan.  It will be
-rearmed again automatically (using the same interval) when one of the
-above mentioned fan commands is received.  The fan watchdog is,
-therefore, not suitable to protect against fan mode changes made through
-means other than the "enable", "disable", and "level" procfs fan
-commands, or the hwmon fan control sysfs interface.
-
-Procfs notes:
-
-The fan may be enabled or disabled with the following commands:
-
-	echo enable  >/proc/acpi/ibm/fan
-	echo disable >/proc/acpi/ibm/fan
-
-Placing a fan on level 0 is the same as disabling it.  Enabling a fan
-will try to place it in a safe level if it is too slow or disabled.
-
-The fan level can be controlled with the command:
-
-	echo 'level <level>' > /proc/acpi/ibm/fan
-
-Where <level> is an integer from 0 to 7, or one of the words "auto" or
-"full-speed" (without the quotes).  Not all ThinkPads support the "auto"
-and "full-speed" levels.  The driver accepts "disengaged" as an alias for
-"full-speed", and reports it as "disengaged" for backwards
-compatibility.
-
-On the X31 and X40 (and ONLY on those models), the fan speed can be
-controlled to a certain degree.  Once the fan is running, it can be
-forced to run faster or slower with the following command:
-
-	echo 'speed <speed>' > /proc/acpi/ibm/fan
-
-The sustainable range of fan speeds on the X40 appears to be from about
-3700 to about 7350. Values outside this range either do not have any
-effect or the fan speed eventually settles somewhere in that range.  The
-fan cannot be stopped or started with this command.  This functionality
-is incomplete, and not available through the sysfs interface.
-
-To program the safety watchdog, use the "watchdog" command.
-
-	echo 'watchdog <interval in seconds>' > /proc/acpi/ibm/fan
-
-If you want to disable the watchdog, use 0 as the interval.
-
-Sysfs notes:
-
-The sysfs interface follows the hwmon subsystem guidelines for the most
-part, and the exception is the fan safety watchdog.
-
-Writes to any of the sysfs attributes may return the EINVAL error if
-that operation is not supported in a given ThinkPad or if the parameter
-is out-of-bounds, and EPERM if it is forbidden.  They may also return
-EINTR (interrupted system call), and EIO (I/O error while trying to talk
-to the firmware).
-
-Features not yet implemented by the driver return ENOSYS.
-
-hwmon device attribute pwm1_enable:
-	0: PWM offline (fan is set to full-speed mode)
-	1: Manual PWM control (use pwm1 to set fan level)
-	2: Hardware PWM control (EC "auto" mode)
-	3: reserved (Software PWM control, not implemented yet)
-
-	Modes 0 and 2 are not supported by all ThinkPads, and the
-	driver is not always able to detect this.  If it does know a
-	mode is unsupported, it will return -EINVAL.
-
-hwmon device attribute pwm1:
-	Fan level, scaled from the firmware values of 0-7 to the hwmon
-	scale of 0-255.  0 means fan stopped, 255 means highest normal
-	speed (level 7).
-
-	This attribute only commands the fan if pmw1_enable is set to 1
-	(manual PWM control).
-
-hwmon device attribute fan1_input:
-	Fan tachometer reading, in RPM.  May go stale on certain
-	ThinkPads while the EC transitions the PWM to offline mode,
-	which can take up to two minutes.  May return rubbish on older
-	ThinkPads.
-
-hwmon device attribute fan2_input:
-	Fan tachometer reading, in RPM, for the secondary fan.
-	Available only on some ThinkPads.  If the secondary fan is
-	not installed, will always read 0.
-
-hwmon driver attribute fan_watchdog:
-	Fan safety watchdog timer interval, in seconds.  Minimum is
-	1 second, maximum is 120 seconds.  0 disables the watchdog.
-
-To stop the fan: set pwm1 to zero, and pwm1_enable to 1.
-
-To start the fan in a safe mode: set pwm1_enable to 2.  If that fails
-with EINVAL, try to set pwm1_enable to 1 and pwm1 to at least 128 (255
-would be the safest choice, though).
-
-
-WAN
----
-
-procfs: /proc/acpi/ibm/wan
-sysfs device attribute: wwan_enable (deprecated)
-sysfs rfkill class: switch "tpacpi_wwan_sw"
-
-This feature shows the presence and current state of the built-in
-Wireless WAN device.
-
-If the ThinkPad supports it, the WWAN state is stored in NVRAM,
-so it is kept across reboots and power-off.
-
-It was tested on a Lenovo ThinkPad X60. It should probably work on other
-ThinkPad models which come with this module installed.
-
-Procfs notes:
-
-If the W-WAN card is installed, the following commands can be used:
-
-	echo enable > /proc/acpi/ibm/wan
-	echo disable > /proc/acpi/ibm/wan
-
-Sysfs notes:
-
-	If the W-WAN card is installed, it can be enabled /
-	disabled through the "wwan_enable" thinkpad-acpi device
-	attribute, and its current status can also be queried.
-
-	enable:
-		0: disables WWAN card / WWAN card is disabled
-		1: enables WWAN card / WWAN card is enabled.
-
-	Note: this interface has been superseded by the	generic rfkill
-	class.  It has been deprecated, and it will be removed in year
-	2010.
-
-	rfkill controller switch "tpacpi_wwan_sw": refer to
-	Documentation/rfkill.txt for details.
-
-
-EXPERIMENTAL: UWB
------------------
-
-This feature is considered EXPERIMENTAL because it has not been extensively
-tested and validated in various ThinkPad models yet.  The feature may not
-work as expected. USE WITH CAUTION! To use this feature, you need to supply
-the experimental=1 parameter when loading the module.
-
-sysfs rfkill class: switch "tpacpi_uwb_sw"
-
-This feature exports an rfkill controller for the UWB device, if one is
-present and enabled in the BIOS.
-
-Sysfs notes:
-
-	rfkill controller switch "tpacpi_uwb_sw": refer to
-	Documentation/rfkill.txt for details.
-
-Adaptive keyboard
------------------
-
-sysfs device attribute: adaptive_kbd_mode
-
-This sysfs attribute controls the keyboard "face" that will be shown on the
-Lenovo X1 Carbon 2nd gen (2014)'s adaptive keyboard. The value can be read
-and set.
-
-1 = Home mode
-2 = Web-browser mode
-3 = Web-conference mode
-4 = Function mode
-5 = Layflat mode
-
-For more details about which buttons will appear depending on the mode, please
-review the laptop's user guide:
-http://www.lenovo.com/shop/americas/content/user_guides/x1carbon_2_ug_en.pdf
-
-Multiple Commands, Module Parameters
-------------------------------------
-
-Multiple commands can be written to the proc files in one shot by
-separating them with commas, for example:
-
-	echo enable,0xffff > /proc/acpi/ibm/hotkey
-	echo lcd_disable,crt_enable > /proc/acpi/ibm/video
-
-Commands can also be specified when loading the thinkpad-acpi module,
-for example:
-
-	modprobe thinkpad_acpi hotkey=enable,0xffff video=auto_disable
-
-
-Enabling debugging output
--------------------------
-
-The module takes a debug parameter which can be used to selectively
-enable various classes of debugging output, for example:
-
-	 modprobe thinkpad_acpi debug=0xffff
-
-will enable all debugging output classes.  It takes a bitmask, so
-to enable more than one output class, just add their values.
-
-	Debug bitmask		Description
-	0x8000			Disclose PID of userspace programs
-				accessing some functions of the driver
-	0x0001			Initialization and probing
-	0x0002			Removal
-	0x0004			RF Transmitter control (RFKILL)
-				(bluetooth, WWAN, UWB...)
-	0x0008			HKEY event interface, hotkeys
-	0x0010			Fan control
-	0x0020			Backlight brightness
-	0x0040			Audio mixer/volume control
-
-There is also a kernel build option to enable more debugging
-information, which may be necessary to debug driver problems.
-
-The level of debugging information output by the driver can be changed
-at runtime through sysfs, using the driver attribute debug_level.  The
-attribute takes the same bitmask as the debug module parameter above.
-
-
-Force loading of module
------------------------
-
-If thinkpad-acpi refuses to detect your ThinkPad, you can try to specify
-the module parameter force_load=1.  Regardless of whether this works or
-not, please contact ibm-acpi-devel@lists.sourceforge.net with a report.
-
-
-Sysfs interface changelog:
-
-0x000100:	Initial sysfs support, as a single platform driver and
-		device.
-0x000200:	Hot key support for 32 hot keys, and radio slider switch
-		support.
-0x010000:	Hot keys are now handled by default over the input
-		layer, the radio switch generates input event EV_RADIO,
-		and the driver enables hot key handling by default in
-		the firmware.
-
-0x020000:	ABI fix: added a separate hwmon platform device and
-		driver, which must be located by name (thinkpad)
-		and the hwmon class for libsensors4 (lm-sensors 3)
-		compatibility.  Moved all hwmon attributes to this
-		new platform device.
-
-0x020100:	Marker for thinkpad-acpi with hot key NVRAM polling
-		support.  If you must, use it to know you should not
-		start a userspace NVRAM poller (allows to detect when
-		NVRAM is compiled out by the user because it is
-		unneeded/undesired in the first place).
-0x020101:	Marker for thinkpad-acpi with hot key NVRAM polling
-		and proper hotkey_mask semantics (version 8 of the
-		NVRAM polling patch).  Some development snapshots of
-		0.18 had an earlier version that did strange things
-		to hotkey_mask.
-
-0x020200:	Add poll()/select() support to the following attributes:
-		hotkey_radio_sw, wakeup_hotunplug_complete, wakeup_reason
-
-0x020300:	hotkey enable/disable support removed, attributes
-		hotkey_bios_enabled and hotkey_enable deprecated and
-		marked for removal.
-
-0x020400:	Marker for 16 LEDs support.  Also, LEDs that are known
-		to not exist in a given model are not registered with
-		the LED sysfs class anymore.
-
-0x020500:	Updated hotkey driver, hotkey_mask is always available
-		and it is always able to disable hot keys.  Very old
-		thinkpads are properly supported.  hotkey_bios_mask
-		is deprecated and marked for removal.
-
-0x020600:	Marker for backlight change event support.
-
-0x020700:	Support for mute-only mixers.
-		Volume control in read-only mode by default.
-		Marker for ALSA mixer support.
-
-0x030000:	Thermal and fan sysfs attributes were moved to the hwmon
-		device instead of being attached to the backing platform
-		device.
diff --git a/Documentation/laptops/toshiba_haps.rst b/Documentation/laptops/toshiba_haps.rst
new file mode 100644
index 000000000000..11dfc428c080
--- /dev/null
+++ b/Documentation/laptops/toshiba_haps.rst
@@ -0,0 +1,87 @@
+====================================
+Toshiba HDD Active Protection Sensor
+====================================
+
+Kernel driver: toshiba_haps
+
+Author: Azael Avalos <coproscefalo@gmail.com>
+
+
+.. 0. Contents
+
+   1. Description
+   2. Interface
+   3. Accelerometer axes
+   4. Supported devices
+   5. Usage
+
+
+1. Description
+--------------
+
+This driver provides support for the accelerometer found in various Toshiba
+laptops, being called "Toshiba HDD Protection - Shock Sensor" officially,
+and detects laptops automatically with this device.
+On Windows, Toshiba provided software monitors this device and provides
+automatic HDD protection (head unload) on sudden moves or harsh vibrations,
+however, this driver only provides a notification via a sysfs file to let
+userspace tools or daemons act accordingly, as well as providing a sysfs
+file to set the desired protection level or sensor sensibility.
+
+
+2. Interface
+------------
+
+This device comes with 3 methods:
+
+====	=====================================================================
+_STA    Checks existence of the device, returning Zero if the device does not
+	exists or is not supported.
+PTLV    Sets the desired protection level.
+RSSS    Shuts down the HDD protection interface for a few seconds,
+	then restores normal operation.
+====	=====================================================================
+
+Note:
+  The presence of Solid State Drives (SSD) can make this driver to fail loading,
+  given the fact that such drives have no movable parts, and thus, not requiring
+  any "protection" as well as failing during the evaluation of the _STA method
+  found under this device.
+
+
+3. Accelerometer axes
+---------------------
+
+This device does not report any axes, however, to query the sensor position
+a couple HCI (Hardware Configuration Interface) calls (0x6D and 0xA6) are
+provided to query such information, handled by the kernel module toshiba_acpi
+since kernel version 3.15.
+
+
+4. Supported devices
+--------------------
+
+This driver binds itself to the ACPI device TOS620A, and any Toshiba laptop
+with this device is supported, given the fact that they have the presence of
+conventional HDD and not only SSD, or a combination of both HDD and SSD.
+
+
+5. Usage
+--------
+
+The sysfs files under /sys/devices/LNXSYSTM:00/LNXSYBUS:00/TOS620A:00/ are:
+
+================   ============================================================
+protection_level   The protection_level is readable and writeable, and
+		   provides a way to let userspace query the current protection
+		   level, as well as set the desired protection level, the
+		   available protection levels are:
+
+		   ============   =======   ==========   ========
+		   0 - Disabled   1 - Low   2 - Medium   3 - High
+		   ============   =======   ==========   ========
+
+reset_protection   The reset_protection entry is writeable only, being "1"
+		   the only parameter it accepts, it is used to trigger
+		   a reset of the protection interface.
+================   ============================================================
diff --git a/Documentation/laptops/toshiba_haps.txt b/Documentation/laptops/toshiba_haps.txt
deleted file mode 100644
index 0c1d88dedbde..000000000000
--- a/Documentation/laptops/toshiba_haps.txt
+++ /dev/null
@@ -1,76 +0,0 @@
-Kernel driver toshiba_haps
-Toshiba HDD Active Protection Sensor
-====================================
-
-Author: Azael Avalos <coproscefalo@gmail.com>
-
-
-0. Contents
------------
-
-1. Description
-2. Interface
-3. Accelerometer axes
-4. Supported devices
-5. Usage
-
-
-1. Description
---------------
-
-This driver provides support for the accelerometer found in various Toshiba
-laptops, being called "Toshiba HDD Protection - Shock Sensor" officially,
-and detects laptops automatically with this device.
-On Windows, Toshiba provided software monitors this device and provides
-automatic HDD protection (head unload) on sudden moves or harsh vibrations,
-however, this driver only provides a notification via a sysfs file to let
-userspace tools or daemons act accordingly, as well as providing a sysfs
-file to set the desired protection level or sensor sensibility.
-
-
-2. Interface
-------------
-
-This device comes with 3 methods:
-_STA -  Checks existence of the device, returning Zero if the device does not
-	exists or is not supported.
-PTLV -  Sets the desired protection level.
-RSSS -  Shuts down the HDD protection interface for a few seconds,
-	then restores normal operation.
-
-Note:
-The presence of Solid State Drives (SSD) can make this driver to fail loading,
-given the fact that such drives have no movable parts, and thus, not requiring
-any "protection" as well as failing during the evaluation of the _STA method
-found under this device.
-
-
-3. Accelerometer axes
----------------------
-
-This device does not report any axes, however, to query the sensor position
-a couple HCI (Hardware Configuration Interface) calls (0x6D and 0xA6) are
-provided to query such information, handled by the kernel module toshiba_acpi
-since kernel version 3.15.
-
-
-4. Supported devices
---------------------
-
-This driver binds itself to the ACPI device TOS620A, and any Toshiba laptop
-with this device is supported, given the fact that they have the presence of
-conventional HDD and not only SSD, or a combination of both HDD and SSD.
-
-
-5. Usage
---------
-
-The sysfs files under /sys/devices/LNXSYSTM:00/LNXSYBUS:00/TOS620A:00/ are:
-protection_level - The protection_level is readable and writeable, and
-		   provides a way to let userspace query the current protection
-		   level, as well as set the desired protection level, the
-		   available protection levels are:
-		   0 - Disabled | 1 - Low | 2 - Medium | 3 - High
-reset_protection - The reset_protection entry is writeable only, being "1"
-		   the only parameter it accepts, it is used to trigger
-		   a reset of the protection interface.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 749322060f10..c5f0d44433a2 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -102,7 +102,7 @@ Changing this takes effect whenever an application requests memory.
 block_dump
 
 block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/laptops/laptop-mode.txt.
+information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
 
 ==============================================================
 
@@ -286,7 +286,7 @@ shared memory segment using hugetlb page.
 laptop_mode
 
 laptop_mode is a knob that controls "laptop mode". All the things that are
-controlled by this knob are discussed in Documentation/laptops/laptop-mode.txt.
+controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
 
 ==============================================================
 
diff --git a/MAINTAINERS b/MAINTAINERS
index c30b52c9049a..3ee73751f56c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14888,7 +14888,7 @@ M:	Mattia Dongili <malattia@linux.it>
 L:	platform-driver-x86@vger.kernel.org
 W:	http://www.linux.it/~malattia/wiki/index.php/Sony_drivers
 S:	Maintained
-F:	Documentation/laptops/sony-laptop.txt
+F:	Documentation/laptops/sony-laptop.rst
 F:	drivers/char/sonypi.c
 F:	drivers/platform/x86/sony-laptop.c
 F:	include/linux/sony-laptop.h
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 466ebd84ad17..bb734066075f 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -382,7 +382,7 @@ config SONYPI
 	  Device which can be found in many (all ?) Sony Vaio laptops.
 
 	  If you have one of those laptops, read
-	  <file:Documentation/laptops/sonypi.txt>, and say Y or M here.
+	  <file:Documentation/laptops/sonypi.rst>, and say Y or M here.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called sonypi.
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index cc29fe79c283..8f91d9ef8a7b 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -448,7 +448,7 @@ config SONY_LAPTOP
 	  screen brightness control, Fn keys and allows powering on/off some
 	  devices.
 
-	  Read <file:Documentation/laptops/sony-laptop.txt> for more information.
+	  Read <file:Documentation/laptops/sony-laptop.rst> for more information.
 
 config SONYPI_COMPAT
 	bool "Sonypi compatibility"
@@ -500,7 +500,7 @@ config THINKPAD_ACPI
 	  support for Fn-Fx key combinations, Bluetooth control, video
 	  output switching, ThinkLight control, UltraBay eject and more.
 	  For more information about this driver see
-	  <file:Documentation/laptops/thinkpad-acpi.txt> and
+	  <file:Documentation/laptops/thinkpad-acpi.rst> and
 	  <http://ibm-acpi.sf.net/> .
 
 	  This driver was formerly known as ibm-acpi.
-- 
cgit v1.2.3-55-g7522


From 20a78ae9ed297f217537211e3304f525326ee517 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 12:43:16 -0300
Subject: docs: namespaces: convert to ReST

Rename the namespaces documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

There are two upper case file names. Rename them to
lower case, as we're working to avoid upper case file
names at Documentation.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/namespaces/compatibility-list.rst | 43 +++++++++++++++++++++++++
 Documentation/namespaces/compatibility-list.txt | 39 ----------------------
 Documentation/namespaces/index.rst              | 11 +++++++
 Documentation/namespaces/resource-control.rst   | 18 +++++++++++
 Documentation/namespaces/resource-control.txt   | 14 --------
 5 files changed, 72 insertions(+), 53 deletions(-)
 create mode 100644 Documentation/namespaces/compatibility-list.rst
 delete mode 100644 Documentation/namespaces/compatibility-list.txt
 create mode 100644 Documentation/namespaces/index.rst
 create mode 100644 Documentation/namespaces/resource-control.rst
 delete mode 100644 Documentation/namespaces/resource-control.txt

diff --git a/Documentation/namespaces/compatibility-list.rst b/Documentation/namespaces/compatibility-list.rst
new file mode 100644
index 000000000000..318800b2a943
--- /dev/null
+++ b/Documentation/namespaces/compatibility-list.rst
@@ -0,0 +1,43 @@
+=============================
+Namespaces compatibility list
+=============================
+
+This document contains the information about the problems user
+may have when creating tasks living in different namespaces.
+
+Here's the summary. This matrix shows the known problems, that
+occur when tasks share some namespace (the columns) while living
+in different other namespaces (the rows):
+
+====	===	===	===	===	====	===
+-	UTS	IPC	VFS	PID	User	Net
+====	===	===	===	===	====	===
+UTS	 X
+IPC		 X	 1
+VFS			 X
+PID		 1	 1	 X
+User		 2	 2		 X
+Net						 X
+====	===	===	===	===	====	===
+
+1. Both the IPC and the PID namespaces provide IDs to address
+   object inside the kernel. E.g. semaphore with IPCID or
+   process group with pid.
+
+   In both cases, tasks shouldn't try exposing this ID to some
+   other task living in a different namespace via a shared filesystem
+   or IPC shmem/message. The fact is that this ID is only valid
+   within the namespace it was obtained in and may refer to some
+   other object in another namespace.
+
+2. Intentionally, two equal user IDs in different user namespaces
+   should not be equal from the VFS point of view. In other
+   words, user 10 in one user namespace shouldn't have the same
+   access permissions to files, belonging to user 10 in another
+   namespace.
+
+   The same is true for the IPC namespaces being shared - two users
+   from different user namespaces should not access the same IPC objects
+   even having equal UIDs.
+
+   But currently this is not so.
diff --git a/Documentation/namespaces/compatibility-list.txt b/Documentation/namespaces/compatibility-list.txt
deleted file mode 100644
index defc5589bfcd..000000000000
--- a/Documentation/namespaces/compatibility-list.txt
+++ /dev/null
@@ -1,39 +0,0 @@
-	Namespaces compatibility list
-
-This document contains the information about the problems user
-may have when creating tasks living in different namespaces.
-
-Here's the summary. This matrix shows the known problems, that
-occur when tasks share some namespace (the columns) while living
-in different other namespaces (the rows):
-
-	UTS	IPC	VFS	PID	User	Net
-UTS	 X
-IPC		 X	 1
-VFS			 X
-PID		 1	 1	 X
-User		 2	 2		 X
-Net						 X
-
-1. Both the IPC and the PID namespaces provide IDs to address
-   object inside the kernel. E.g. semaphore with IPCID or
-   process group with pid.
-
-   In both cases, tasks shouldn't try exposing this ID to some
-   other task living in a different namespace via a shared filesystem
-   or IPC shmem/message. The fact is that this ID is only valid
-   within the namespace it was obtained in and may refer to some
-   other object in another namespace.
-
-2. Intentionally, two equal user IDs in different user namespaces
-   should not be equal from the VFS point of view. In other
-   words, user 10 in one user namespace shouldn't have the same
-   access permissions to files, belonging to user 10 in another
-   namespace.
-
-   The same is true for the IPC namespaces being shared - two users
-   from different user namespaces should not access the same IPC objects
-   even having equal UIDs.
-
-   But currently this is not so.
-
diff --git a/Documentation/namespaces/index.rst b/Documentation/namespaces/index.rst
new file mode 100644
index 000000000000..bf40625dd11a
--- /dev/null
+++ b/Documentation/namespaces/index.rst
@@ -0,0 +1,11 @@
+:orphan:
+
+==========
+Namespaces
+==========
+
+.. toctree::
+   :maxdepth: 1
+
+   compatibility-list
+   resource-control
diff --git a/Documentation/namespaces/resource-control.rst b/Documentation/namespaces/resource-control.rst
new file mode 100644
index 000000000000..369556e00f0c
--- /dev/null
+++ b/Documentation/namespaces/resource-control.rst
@@ -0,0 +1,18 @@
+===========================
+Namespaces research control
+===========================
+
+There are a lot of kinds of objects in the kernel that don't have
+individual limits or that have limits that are ineffective when a set
+of processes is allowed to switch user ids.  With user namespaces
+enabled in a kernel for people who don't trust their users or their
+users programs to play nice this problems becomes more acute.
+
+Therefore it is recommended that memory control groups be enabled in
+kernels that enable user namespaces, and it is further recommended
+that userspace configure memory control groups to limit how much
+memory user's they don't trust to play nice can use.
+
+Memory control groups can be configured by installing the libcgroup
+package present on most distros editing /etc/cgrules.conf,
+/etc/cgconfig.conf and setting up libpam-cgroup.
diff --git a/Documentation/namespaces/resource-control.txt b/Documentation/namespaces/resource-control.txt
deleted file mode 100644
index abc13c394738..000000000000
--- a/Documentation/namespaces/resource-control.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-There are a lot of kinds of objects in the kernel that don't have
-individual limits or that have limits that are ineffective when a set
-of processes is allowed to switch user ids.  With user namespaces
-enabled in a kernel for people who don't trust their users or their
-users programs to play nice this problems becomes more acute.
-
-Therefore it is recommended that memory control groups be enabled in
-kernels that enable user namespaces, and it is further recommended
-that userspace configure memory control groups to limit how much
-memory user's they don't trust to play nice can use.
-
-Memory control groups can be configured by installing the libcgroup
-package present on most distros editing /etc/cgrules.conf,
-/etc/cgconfig.conf and setting up libpam-cgroup.
-- 
cgit v1.2.3-55-g7522


From 9e678dd886c11fad6511ffad4d400e3abde81d64 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 13:02:23 -0300
Subject: docs: nfc: convert to ReST

Rename the nfc documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/nfc/index.rst     |  11 ++
 Documentation/nfc/nfc-hci.rst   | 311 ++++++++++++++++++++++++++++++++++++++++
 Documentation/nfc/nfc-hci.txt   | 290 -------------------------------------
 Documentation/nfc/nfc-pn544.rst |  34 +++++
 Documentation/nfc/nfc-pn544.txt |  32 -----
 5 files changed, 356 insertions(+), 322 deletions(-)
 create mode 100644 Documentation/nfc/index.rst
 create mode 100644 Documentation/nfc/nfc-hci.rst
 delete mode 100644 Documentation/nfc/nfc-hci.txt
 create mode 100644 Documentation/nfc/nfc-pn544.rst
 delete mode 100644 Documentation/nfc/nfc-pn544.txt

diff --git a/Documentation/nfc/index.rst b/Documentation/nfc/index.rst
new file mode 100644
index 000000000000..4f4947fce80d
--- /dev/null
+++ b/Documentation/nfc/index.rst
@@ -0,0 +1,11 @@
+:orphan:
+
+========================
+Near Field Communication
+========================
+
+.. toctree::
+   :maxdepth: 1
+
+   nfc-hci
+   nfc-pn544
diff --git a/Documentation/nfc/nfc-hci.rst b/Documentation/nfc/nfc-hci.rst
new file mode 100644
index 000000000000..eb8a1a14e919
--- /dev/null
+++ b/Documentation/nfc/nfc-hci.rst
@@ -0,0 +1,311 @@
+========================
+HCI backend for NFC Core
+========================
+
+- Author: Eric Lapuyade, Samuel Ortiz
+- Contact: eric.lapuyade@intel.com, samuel.ortiz@intel.com
+
+General
+-------
+
+The HCI layer implements much of the ETSI TS 102 622 V10.2.0 specification. It
+enables easy writing of HCI-based NFC drivers. The HCI layer runs as an NFC Core
+backend, implementing an abstract nfc device and translating NFC Core API
+to HCI commands and events.
+
+HCI
+---
+
+HCI registers as an nfc device with NFC Core. Requests coming from userspace are
+routed through netlink sockets to NFC Core and then to HCI. From this point,
+they are translated in a sequence of HCI commands sent to the HCI layer in the
+host controller (the chip). Commands can be executed synchronously (the sending
+context blocks waiting for response) or asynchronously (the response is returned
+from HCI Rx context).
+HCI events can also be received from the host controller. They will be handled
+and a translation will be forwarded to NFC Core as needed. There are hooks to
+let the HCI driver handle proprietary events or override standard behavior.
+HCI uses 2 execution contexts:
+
+- one for executing commands : nfc_hci_msg_tx_work(). Only one command
+  can be executing at any given moment.
+- one for dispatching received events and commands : nfc_hci_msg_rx_work().
+
+HCI Session initialization
+--------------------------
+
+The Session initialization is an HCI standard which must unfortunately
+support proprietary gates. This is the reason why the driver will pass a list
+of proprietary gates that must be part of the session. HCI will ensure all
+those gates have pipes connected when the hci device is set up.
+In case the chip supports pre-opened gates and pseudo-static pipes, the driver
+can pass that information to HCI core.
+
+HCI Gates and Pipes
+-------------------
+
+A gate defines the 'port' where some service can be found. In order to access
+a service, one must create a pipe to that gate and open it. In this
+implementation, pipes are totally hidden. The public API only knows gates.
+This is consistent with the driver need to send commands to proprietary gates
+without knowing the pipe connected to it.
+
+Driver interface
+----------------
+
+A driver is generally written in two parts : the physical link management and
+the HCI management. This makes it easier to maintain a driver for a chip that
+can be connected using various phy (i2c, spi, ...)
+
+HCI Management
+--------------
+
+A driver would normally register itself with HCI and provide the following
+entry points::
+
+  struct nfc_hci_ops {
+	int (*open)(struct nfc_hci_dev *hdev);
+	void (*close)(struct nfc_hci_dev *hdev);
+	int (*hci_ready) (struct nfc_hci_dev *hdev);
+	int (*xmit) (struct nfc_hci_dev *hdev, struct sk_buff *skb);
+	int (*start_poll) (struct nfc_hci_dev *hdev,
+			   u32 im_protocols, u32 tm_protocols);
+	int (*dep_link_up)(struct nfc_hci_dev *hdev, struct nfc_target *target,
+			   u8 comm_mode, u8 *gb, size_t gb_len);
+	int (*dep_link_down)(struct nfc_hci_dev *hdev);
+	int (*target_from_gate) (struct nfc_hci_dev *hdev, u8 gate,
+				 struct nfc_target *target);
+	int (*complete_target_discovered) (struct nfc_hci_dev *hdev, u8 gate,
+					   struct nfc_target *target);
+	int (*im_transceive) (struct nfc_hci_dev *hdev,
+			      struct nfc_target *target, struct sk_buff *skb,
+			      data_exchange_cb_t cb, void *cb_context);
+	int (*tm_send)(struct nfc_hci_dev *hdev, struct sk_buff *skb);
+	int (*check_presence)(struct nfc_hci_dev *hdev,
+			      struct nfc_target *target);
+	int (*event_received)(struct nfc_hci_dev *hdev, u8 gate, u8 event,
+			      struct sk_buff *skb);
+  };
+
+- open() and close() shall turn the hardware on and off.
+- hci_ready() is an optional entry point that is called right after the hci
+  session has been set up. The driver can use it to do additional initialization
+  that must be performed using HCI commands.
+- xmit() shall simply write a frame to the physical link.
+- start_poll() is an optional entrypoint that shall set the hardware in polling
+  mode. This must be implemented only if the hardware uses proprietary gates or a
+  mechanism slightly different from the HCI standard.
+- dep_link_up() is called after a p2p target has been detected, to finish
+  the p2p connection setup with hardware parameters that need to be passed back
+  to nfc core.
+- dep_link_down() is called to bring the p2p link down.
+- target_from_gate() is an optional entrypoint to return the nfc protocols
+  corresponding to a proprietary gate.
+- complete_target_discovered() is an optional entry point to let the driver
+  perform additional proprietary processing necessary to auto activate the
+  discovered target.
+- im_transceive() must be implemented by the driver if proprietary HCI commands
+  are required to send data to the tag. Some tag types will require custom
+  commands, others can be written to using the standard HCI commands. The driver
+  can check the tag type and either do proprietary processing, or return 1 to ask
+  for standard processing. The data exchange command itself must be sent
+  asynchronously.
+- tm_send() is called to send data in the case of a p2p connection
+- check_presence() is an optional entry point that will be called regularly
+  by the core to check that an activated tag is still in the field. If this is
+  not implemented, the core will not be able to push tag_lost events to the user
+  space
+- event_received() is called to handle an event coming from the chip. Driver
+  can handle the event or return 1 to let HCI attempt standard processing.
+
+On the rx path, the driver is responsible to push incoming HCP frames to HCI
+using nfc_hci_recv_frame(). HCI will take care of re-aggregation and handling
+This must be done from a context that can sleep.
+
+PHY Management
+--------------
+
+The physical link (i2c, ...) management is defined by the following structure::
+
+  struct nfc_phy_ops {
+	int (*write)(void *dev_id, struct sk_buff *skb);
+	int (*enable)(void *dev_id);
+	void (*disable)(void *dev_id);
+  };
+
+enable():
+	turn the phy on (power on), make it ready to transfer data
+disable():
+	turn the phy off
+write():
+	Send a data frame to the chip. Note that to enable higher
+	layers such as an llc to store the frame for re-emission, this
+	function must not alter the skb. It must also not return a positive
+	result (return 0 for success, negative for failure).
+
+Data coming from the chip shall be sent directly to nfc_hci_recv_frame().
+
+LLC
+---
+
+Communication between the CPU and the chip often requires some link layer
+protocol. Those are isolated as modules managed by the HCI layer. There are
+currently two modules : nop (raw transfert) and shdlc.
+A new llc must implement the following functions::
+
+  struct nfc_llc_ops {
+	void *(*init) (struct nfc_hci_dev *hdev, xmit_to_drv_t xmit_to_drv,
+		       rcv_to_hci_t rcv_to_hci, int tx_headroom,
+		       int tx_tailroom, int *rx_headroom, int *rx_tailroom,
+		       llc_failure_t llc_failure);
+	void (*deinit) (struct nfc_llc *llc);
+	int (*start) (struct nfc_llc *llc);
+	int (*stop) (struct nfc_llc *llc);
+	void (*rcv_from_drv) (struct nfc_llc *llc, struct sk_buff *skb);
+	int (*xmit_from_hci) (struct nfc_llc *llc, struct sk_buff *skb);
+  };
+
+init():
+	allocate and init your private storage
+deinit():
+	cleanup
+start():
+	establish the logical connection
+stop ():
+	terminate the logical connection
+rcv_from_drv():
+	handle data coming from the chip, going to HCI
+xmit_from_hci():
+	handle data sent by HCI, going to the chip
+
+The llc must be registered with nfc before it can be used. Do that by
+calling::
+
+	nfc_llc_register(const char *name, struct nfc_llc_ops *ops);
+
+Again, note that the llc does not handle the physical link. It is thus very
+easy to mix any physical link with any llc for a given chip driver.
+
+Included Drivers
+----------------
+
+An HCI based driver for an NXP PN544, connected through I2C bus, and using
+shdlc is included.
+
+Execution Contexts
+------------------
+
+The execution contexts are the following:
+- IRQ handler (IRQH):
+fast, cannot sleep. sends incoming frames to HCI where they are passed to
+the current llc. In case of shdlc, the frame is queued in shdlc rx queue.
+
+- SHDLC State Machine worker (SMW)
+
+  Only when llc_shdlc is used: handles shdlc rx & tx queues.
+
+  Dispatches HCI cmd responses.
+
+- HCI Tx Cmd worker (MSGTXWQ)
+
+  Serializes execution of HCI commands.
+
+  Completes execution in case of response timeout.
+
+- HCI Rx worker (MSGRXWQ)
+
+  Dispatches incoming HCI commands or events.
+
+- Syscall context from a userspace call (SYSCALL)
+
+  Any entrypoint in HCI called from NFC Core
+
+Workflow executing an HCI command (using shdlc)
+-----------------------------------------------
+
+Executing an HCI command can easily be performed synchronously using the
+following API::
+
+  int nfc_hci_send_cmd (struct nfc_hci_dev *hdev, u8 gate, u8 cmd,
+			const u8 *param, size_t param_len, struct sk_buff **skb)
+
+The API must be invoked from a context that can sleep. Most of the time, this
+will be the syscall context. skb will return the result that was received in
+the response.
+
+Internally, execution is asynchronous. So all this API does is to enqueue the
+HCI command, setup a local wait queue on stack, and wait_event() for completion.
+The wait is not interruptible because it is guaranteed that the command will
+complete after some short timeout anyway.
+
+MSGTXWQ context will then be scheduled and invoke nfc_hci_msg_tx_work().
+This function will dequeue the next pending command and send its HCP fragments
+to the lower layer which happens to be shdlc. It will then start a timer to be
+able to complete the command with a timeout error if no response arrive.
+
+SMW context gets scheduled and invokes nfc_shdlc_sm_work(). This function
+handles shdlc framing in and out. It uses the driver xmit to send frames and
+receives incoming frames in an skb queue filled from the driver IRQ handler.
+SHDLC I(nformation) frames payload are HCP fragments. They are aggregated to
+form complete HCI frames, which can be a response, command, or event.
+
+HCI Responses are dispatched immediately from this context to unblock
+waiting command execution. Response processing involves invoking the completion
+callback that was provided by nfc_hci_msg_tx_work() when it sent the command.
+The completion callback will then wake the syscall context.
+
+It is also possible to execute the command asynchronously using this API::
+
+  static int nfc_hci_execute_cmd_async(struct nfc_hci_dev *hdev, u8 pipe, u8 cmd,
+				       const u8 *param, size_t param_len,
+				       data_exchange_cb_t cb, void *cb_context)
+
+The workflow is the same, except that the API call returns immediately, and
+the callback will be called with the result from the SMW context.
+
+Workflow receiving an HCI event or command
+------------------------------------------
+
+HCI commands or events are not dispatched from SMW context. Instead, they are
+queued to HCI rx_queue and will be dispatched from HCI rx worker
+context (MSGRXWQ). This is done this way to allow a cmd or event handler
+to also execute other commands (for example, handling the
+NFC_HCI_EVT_TARGET_DISCOVERED event from PN544 requires to issue an
+ANY_GET_PARAMETER to the reader A gate to get information on the target
+that was discovered).
+
+Typically, such an event will be propagated to NFC Core from MSGRXWQ context.
+
+Error management
+----------------
+
+Errors that occur synchronously with the execution of an NFC Core request are
+simply returned as the execution result of the request. These are easy.
+
+Errors that occur asynchronously (e.g. in a background protocol handling thread)
+must be reported such that upper layers don't stay ignorant that something
+went wrong below and know that expected events will probably never happen.
+Handling of these errors is done as follows:
+
+- driver (pn544) fails to deliver an incoming frame: it stores the error such
+  that any subsequent call to the driver will result in this error. Then it
+  calls the standard nfc_shdlc_recv_frame() with a NULL argument to report the
+  problem above. shdlc stores a EREMOTEIO sticky status, which will trigger
+  SMW to report above in turn.
+
+- SMW is basically a background thread to handle incoming and outgoing shdlc
+  frames. This thread will also check the shdlc sticky status and report to HCI
+  when it discovers it is not able to run anymore because of an unrecoverable
+  error that happened within shdlc or below. If the problem occurs during shdlc
+  connection, the error is reported through the connect completion.
+
+- HCI: if an internal HCI error happens (frame is lost), or HCI is reported an
+  error from a lower layer, HCI will either complete the currently executing
+  command with that error, or notify NFC Core directly if no command is
+  executing.
+
+- NFC Core: when NFC Core is notified of an error from below and polling is
+  active, it will send a tag discovered event with an empty tag list to the user
+  space to let it know that the poll operation will never be able to detect a
+  tag. If polling is not active and the error was sticky, lower levels will
+  return it at next invocation.
diff --git a/Documentation/nfc/nfc-hci.txt b/Documentation/nfc/nfc-hci.txt
deleted file mode 100644
index 0dc078cab972..000000000000
--- a/Documentation/nfc/nfc-hci.txt
+++ /dev/null
@@ -1,290 +0,0 @@
-HCI backend for NFC Core
-
-Author: Eric Lapuyade, Samuel Ortiz
-Contact: eric.lapuyade@intel.com, samuel.ortiz@intel.com
-
-General
--------
-
-The HCI layer implements much of the ETSI TS 102 622 V10.2.0 specification. It
-enables easy writing of HCI-based NFC drivers. The HCI layer runs as an NFC Core
-backend, implementing an abstract nfc device and translating NFC Core API
-to HCI commands and events.
-
-HCI
----
-
-HCI registers as an nfc device with NFC Core. Requests coming from userspace are
-routed through netlink sockets to NFC Core and then to HCI. From this point,
-they are translated in a sequence of HCI commands sent to the HCI layer in the
-host controller (the chip). Commands can be executed synchronously (the sending
-context blocks waiting for response) or asynchronously (the response is returned
-from HCI Rx context).
-HCI events can also be received from the host controller. They will be handled
-and a translation will be forwarded to NFC Core as needed. There are hooks to
-let the HCI driver handle proprietary events or override standard behavior.
-HCI uses 2 execution contexts:
-- one for executing commands : nfc_hci_msg_tx_work(). Only one command
-can be executing at any given moment.
-- one for dispatching received events and commands : nfc_hci_msg_rx_work().
-
-HCI Session initialization:
----------------------------
-
-The Session initialization is an HCI standard which must unfortunately
-support proprietary gates. This is the reason why the driver will pass a list
-of proprietary gates that must be part of the session. HCI will ensure all
-those gates have pipes connected when the hci device is set up.
-In case the chip supports pre-opened gates and pseudo-static pipes, the driver
-can pass that information to HCI core.
-
-HCI Gates and Pipes
--------------------
-
-A gate defines the 'port' where some service can be found. In order to access
-a service, one must create a pipe to that gate and open it. In this
-implementation, pipes are totally hidden. The public API only knows gates.
-This is consistent with the driver need to send commands to proprietary gates
-without knowing the pipe connected to it.
-
-Driver interface
-----------------
-
-A driver is generally written in two parts : the physical link management and
-the HCI management. This makes it easier to maintain a driver for a chip that
-can be connected using various phy (i2c, spi, ...)
-
-HCI Management
---------------
-
-A driver would normally register itself with HCI and provide the following
-entry points:
-
-struct nfc_hci_ops {
-	int (*open)(struct nfc_hci_dev *hdev);
-	void (*close)(struct nfc_hci_dev *hdev);
-	int (*hci_ready) (struct nfc_hci_dev *hdev);
-	int (*xmit) (struct nfc_hci_dev *hdev, struct sk_buff *skb);
-	int (*start_poll) (struct nfc_hci_dev *hdev,
-			   u32 im_protocols, u32 tm_protocols);
-	int (*dep_link_up)(struct nfc_hci_dev *hdev, struct nfc_target *target,
-			   u8 comm_mode, u8 *gb, size_t gb_len);
-	int (*dep_link_down)(struct nfc_hci_dev *hdev);
-	int (*target_from_gate) (struct nfc_hci_dev *hdev, u8 gate,
-				 struct nfc_target *target);
-	int (*complete_target_discovered) (struct nfc_hci_dev *hdev, u8 gate,
-					   struct nfc_target *target);
-	int (*im_transceive) (struct nfc_hci_dev *hdev,
-			      struct nfc_target *target, struct sk_buff *skb,
-			      data_exchange_cb_t cb, void *cb_context);
-	int (*tm_send)(struct nfc_hci_dev *hdev, struct sk_buff *skb);
-	int (*check_presence)(struct nfc_hci_dev *hdev,
-			      struct nfc_target *target);
-	int (*event_received)(struct nfc_hci_dev *hdev, u8 gate, u8 event,
-			      struct sk_buff *skb);
-};
-
-- open() and close() shall turn the hardware on and off.
-- hci_ready() is an optional entry point that is called right after the hci
-session has been set up. The driver can use it to do additional initialization
-that must be performed using HCI commands.
-- xmit() shall simply write a frame to the physical link.
-- start_poll() is an optional entrypoint that shall set the hardware in polling
-mode. This must be implemented only if the hardware uses proprietary gates or a
-mechanism slightly different from the HCI standard.
-- dep_link_up() is called after a p2p target has been detected, to finish
-the p2p connection setup with hardware parameters that need to be passed back
-to nfc core.
-- dep_link_down() is called to bring the p2p link down.
-- target_from_gate() is an optional entrypoint to return the nfc protocols
-corresponding to a proprietary gate.
-- complete_target_discovered() is an optional entry point to let the driver
-perform additional proprietary processing necessary to auto activate the
-discovered target.
-- im_transceive() must be implemented by the driver if proprietary HCI commands
-are required to send data to the tag. Some tag types will require custom
-commands, others can be written to using the standard HCI commands. The driver
-can check the tag type and either do proprietary processing, or return 1 to ask
-for standard processing. The data exchange command itself must be sent
-asynchronously.
-- tm_send() is called to send data in the case of a p2p connection
-- check_presence() is an optional entry point that will be called regularly
-by the core to check that an activated tag is still in the field. If this is
-not implemented, the core will not be able to push tag_lost events to the user
-space
-- event_received() is called to handle an event coming from the chip. Driver
-can handle the event or return 1 to let HCI attempt standard processing.
-
-On the rx path, the driver is responsible to push incoming HCP frames to HCI
-using nfc_hci_recv_frame(). HCI will take care of re-aggregation and handling
-This must be done from a context that can sleep.
-
-PHY Management
---------------
-
-The physical link (i2c, ...) management is defined by the following structure:
-
-struct nfc_phy_ops {
-	int (*write)(void *dev_id, struct sk_buff *skb);
-	int (*enable)(void *dev_id);
-	void (*disable)(void *dev_id);
-};
-
-enable(): turn the phy on (power on), make it ready to transfer data
-disable(): turn the phy off
-write(): Send a data frame to the chip. Note that to enable higher
-layers such as an llc to store the frame for re-emission, this function must
-not alter the skb. It must also not return a positive result (return 0 for
-success, negative for failure).
-
-Data coming from the chip shall be sent directly to nfc_hci_recv_frame().
-
-LLC
----
-
-Communication between the CPU and the chip often requires some link layer
-protocol. Those are isolated as modules managed by the HCI layer. There are
-currently two modules : nop (raw transfert) and shdlc.
-A new llc must implement the following functions:
-
-struct nfc_llc_ops {
-	void *(*init) (struct nfc_hci_dev *hdev, xmit_to_drv_t xmit_to_drv,
-		       rcv_to_hci_t rcv_to_hci, int tx_headroom,
-		       int tx_tailroom, int *rx_headroom, int *rx_tailroom,
-		       llc_failure_t llc_failure);
-	void (*deinit) (struct nfc_llc *llc);
-	int (*start) (struct nfc_llc *llc);
-	int (*stop) (struct nfc_llc *llc);
-	void (*rcv_from_drv) (struct nfc_llc *llc, struct sk_buff *skb);
-	int (*xmit_from_hci) (struct nfc_llc *llc, struct sk_buff *skb);
-};
-
-- init() : allocate and init your private storage
-- deinit() : cleanup
-- start() : establish the logical connection
-- stop () : terminate the logical connection
-- rcv_from_drv() : handle data coming from the chip, going to HCI
-- xmit_from_hci() : handle data sent by HCI, going to the chip
-
-The llc must be registered with nfc before it can be used. Do that by
-calling nfc_llc_register(const char *name, struct nfc_llc_ops *ops);
-
-Again, note that the llc does not handle the physical link. It is thus very
-easy to mix any physical link with any llc for a given chip driver.
-
-Included Drivers
-----------------
-
-An HCI based driver for an NXP PN544, connected through I2C bus, and using
-shdlc is included.
-
-Execution Contexts
-------------------
-
-The execution contexts are the following:
-- IRQ handler (IRQH):
-fast, cannot sleep. sends incoming frames to HCI where they are passed to
-the current llc. In case of shdlc, the frame is queued in shdlc rx queue.
-
-- SHDLC State Machine worker (SMW)
-Only when llc_shdlc is used: handles shdlc rx & tx queues.
-Dispatches HCI cmd responses.
-
-- HCI Tx Cmd worker (MSGTXWQ)
-Serializes execution of HCI commands. Completes execution in case of response
-timeout.
-
-- HCI Rx worker (MSGRXWQ)
-Dispatches incoming HCI commands or events.
-
-- Syscall context from a userspace call (SYSCALL)
-Any entrypoint in HCI called from NFC Core
-
-Workflow executing an HCI command (using shdlc)
------------------------------------------------
-
-Executing an HCI command can easily be performed synchronously using the
-following API:
-
-int nfc_hci_send_cmd (struct nfc_hci_dev *hdev, u8 gate, u8 cmd,
-			const u8 *param, size_t param_len, struct sk_buff **skb)
-
-The API must be invoked from a context that can sleep. Most of the time, this
-will be the syscall context. skb will return the result that was received in
-the response.
-
-Internally, execution is asynchronous. So all this API does is to enqueue the
-HCI command, setup a local wait queue on stack, and wait_event() for completion.
-The wait is not interruptible because it is guaranteed that the command will
-complete after some short timeout anyway.
-
-MSGTXWQ context will then be scheduled and invoke nfc_hci_msg_tx_work().
-This function will dequeue the next pending command and send its HCP fragments
-to the lower layer which happens to be shdlc. It will then start a timer to be
-able to complete the command with a timeout error if no response arrive.
-
-SMW context gets scheduled and invokes nfc_shdlc_sm_work(). This function
-handles shdlc framing in and out. It uses the driver xmit to send frames and
-receives incoming frames in an skb queue filled from the driver IRQ handler.
-SHDLC I(nformation) frames payload are HCP fragments. They are aggregated to
-form complete HCI frames, which can be a response, command, or event.
-
-HCI Responses are dispatched immediately from this context to unblock
-waiting command execution. Response processing involves invoking the completion
-callback that was provided by nfc_hci_msg_tx_work() when it sent the command.
-The completion callback will then wake the syscall context.
-
-It is also possible to execute the command asynchronously using this API:
-
-static int nfc_hci_execute_cmd_async(struct nfc_hci_dev *hdev, u8 pipe, u8 cmd,
-			       const u8 *param, size_t param_len,
-			       data_exchange_cb_t cb, void *cb_context)
-
-The workflow is the same, except that the API call returns immediately, and
-the callback will be called with the result from the SMW context.
-
-Workflow receiving an HCI event or command
-------------------------------------------
-
-HCI commands or events are not dispatched from SMW context. Instead, they are
-queued to HCI rx_queue and will be dispatched from HCI rx worker
-context (MSGRXWQ). This is done this way to allow a cmd or event handler
-to also execute other commands (for example, handling the
-NFC_HCI_EVT_TARGET_DISCOVERED event from PN544 requires to issue an
-ANY_GET_PARAMETER to the reader A gate to get information on the target
-that was discovered).
-
-Typically, such an event will be propagated to NFC Core from MSGRXWQ context.
-
-Error management
-----------------
-
-Errors that occur synchronously with the execution of an NFC Core request are
-simply returned as the execution result of the request. These are easy.
-
-Errors that occur asynchronously (e.g. in a background protocol handling thread)
-must be reported such that upper layers don't stay ignorant that something
-went wrong below and know that expected events will probably never happen.
-Handling of these errors is done as follows:
-
-- driver (pn544) fails to deliver an incoming frame: it stores the error such
-that any subsequent call to the driver will result in this error. Then it calls
-the standard nfc_shdlc_recv_frame() with a NULL argument to report the problem
-above. shdlc stores a EREMOTEIO sticky status, which will trigger SMW to
-report above in turn.
-
-- SMW is basically a background thread to handle incoming and outgoing shdlc
-frames. This thread will also check the shdlc sticky status and report to HCI
-when it discovers it is not able to run anymore because of an unrecoverable
-error that happened within shdlc or below. If the problem occurs during shdlc
-connection, the error is reported through the connect completion.
-
-- HCI: if an internal HCI error happens (frame is lost), or HCI is reported an
-error from a lower layer, HCI will either complete the currently executing
-command with that error, or notify NFC Core directly if no command is executing.
-
-- NFC Core: when NFC Core is notified of an error from below and polling is
-active, it will send a tag discovered event with an empty tag list to the user
-space to let it know that the poll operation will never be able to detect a tag.
-If polling is not active and the error was sticky, lower levels will return it
-at next invocation.
diff --git a/Documentation/nfc/nfc-pn544.rst b/Documentation/nfc/nfc-pn544.rst
new file mode 100644
index 000000000000..6b2d8aae0c4e
--- /dev/null
+++ b/Documentation/nfc/nfc-pn544.rst
@@ -0,0 +1,34 @@
+============================================================================
+Kernel driver for the NXP Semiconductors PN544 Near Field Communication chip
+============================================================================
+
+
+General
+-------
+
+The PN544 is an integrated transmission module for contactless
+communication. The driver goes under drives/nfc/ and is compiled as a
+module named "pn544".
+
+Host Interfaces: I2C, SPI and HSU, this driver supports currently only I2C.
+
+Protocols
+---------
+
+In the normal (HCI) mode and in the firmware update mode read and
+write functions behave a bit differently because the message formats
+or the protocols are different.
+
+In the normal (HCI) mode the protocol used is derived from the ETSI
+HCI specification. The firmware is updated using a specific protocol,
+which is different from HCI.
+
+HCI messages consist of an eight bit header and the message body. The
+header contains the message length. Maximum size for an HCI message is
+33. In HCI mode sent messages are tested for a correct
+checksum. Firmware update messages have the length in the second (MSB)
+and third (LSB) bytes of the message. The maximum FW message length is
+1024 bytes.
+
+For the ETSI HCI specification see
+http://www.etsi.org/WebSite/Technologies/ProtocolSpecification.aspx
diff --git a/Documentation/nfc/nfc-pn544.txt b/Documentation/nfc/nfc-pn544.txt
deleted file mode 100644
index b36ca14ca2d6..000000000000
--- a/Documentation/nfc/nfc-pn544.txt
+++ /dev/null
@@ -1,32 +0,0 @@
-Kernel driver for the NXP Semiconductors PN544 Near Field
-Communication chip
-
-General
--------
-
-The PN544 is an integrated transmission module for contactless
-communication. The driver goes under drives/nfc/ and is compiled as a
-module named "pn544".
-
-Host Interfaces: I2C, SPI and HSU, this driver supports currently only I2C.
-
-Protocols
----------
-
-In the normal (HCI) mode and in the firmware update mode read and
-write functions behave a bit differently because the message formats
-or the protocols are different.
-
-In the normal (HCI) mode the protocol used is derived from the ETSI
-HCI specification. The firmware is updated using a specific protocol,
-which is different from HCI.
-
-HCI messages consist of an eight bit header and the message body. The
-header contains the message length. Maximum size for an HCI message is
-33. In HCI mode sent messages are tested for a correct
-checksum. Firmware update messages have the length in the second (MSB)
-and third (LSB) bytes of the message. The maximum FW message length is
-1024 bytes.
-
-For the ETSI HCI specification see
-http://www.etsi.org/WebSite/Technologies/ProtocolSpecification.aspx
-- 
cgit v1.2.3-55-g7522


From 7ed44d59f1959942b8d882e6eeea51616b72e2ec Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 13:31:33 -0300
Subject: docs: md: convert to ReST

Rename the md documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/md/index.rst       |  12 ++
 Documentation/md/md-cluster.rst  | 385 +++++++++++++++++++++++++++++++++++++++
 Documentation/md/md-cluster.txt  | 325 ---------------------------------
 Documentation/md/raid5-cache.rst | 111 +++++++++++
 Documentation/md/raid5-cache.txt | 109 -----------
 Documentation/md/raid5-ppl.rst   |  47 +++++
 Documentation/md/raid5-ppl.txt   |  45 -----
 7 files changed, 555 insertions(+), 479 deletions(-)
 create mode 100644 Documentation/md/index.rst
 create mode 100644 Documentation/md/md-cluster.rst
 delete mode 100644 Documentation/md/md-cluster.txt
 create mode 100644 Documentation/md/raid5-cache.rst
 delete mode 100644 Documentation/md/raid5-cache.txt
 create mode 100644 Documentation/md/raid5-ppl.rst
 delete mode 100644 Documentation/md/raid5-ppl.txt

diff --git a/Documentation/md/index.rst b/Documentation/md/index.rst
new file mode 100644
index 000000000000..c4db34ed327d
--- /dev/null
+++ b/Documentation/md/index.rst
@@ -0,0 +1,12 @@
+:orphan:
+
+====
+RAID
+====
+
+.. toctree::
+   :maxdepth: 1
+
+   md-cluster
+   raid5-cache
+   raid5-ppl
diff --git a/Documentation/md/md-cluster.rst b/Documentation/md/md-cluster.rst
new file mode 100644
index 000000000000..96eb52cec7eb
--- /dev/null
+++ b/Documentation/md/md-cluster.rst
@@ -0,0 +1,385 @@
+==========
+MD Cluster
+==========
+
+The cluster MD is a shared-device RAID for a cluster, it supports
+two levels: raid1 and raid10 (limited support).
+
+
+1. On-disk format
+=================
+
+Separate write-intent-bitmaps are used for each cluster node.
+The bitmaps record all writes that may have been started on that node,
+and may not yet have finished. The on-disk layout is::
+
+  0                    4k                     8k                    12k
+  -------------------------------------------------------------------
+  | idle                | md super            | bm super [0] + bits |
+  | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
+  | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
+  | bm bits [3, contd]  |                     |                     |
+
+During "normal" functioning we assume the filesystem ensures that only
+one node writes to any given block at a time, so a write request will
+
+ - set the appropriate bit (if not already set)
+ - commit the write to all mirrors
+ - schedule the bit to be cleared after a timeout.
+
+Reads are just handled normally. It is up to the filesystem to ensure
+one node doesn't read from a location where another node (or the same
+node) is writing.
+
+
+2. DLM Locks for management
+===========================
+
+There are three groups of locks for managing the device:
+
+2.1 Bitmap lock resource (bm_lockres)
+-------------------------------------
+
+ The bm_lockres protects individual node bitmaps. They are named in
+ the form bitmap000 for node 1, bitmap001 for node 2 and so on. When a
+ node joins the cluster, it acquires the lock in PW mode and it stays
+ so during the lifetime the node is part of the cluster. The lock
+ resource number is based on the slot number returned by the DLM
+ subsystem. Since DLM starts node count from one and bitmap slots
+ start from zero, one is subtracted from the DLM slot number to arrive
+ at the bitmap slot number.
+
+ The LVB of the bitmap lock for a particular node records the range
+ of sectors that are being re-synced by that node.  No other
+ node may write to those sectors.  This is used when a new nodes
+ joins the cluster.
+
+2.2 Message passing locks
+-------------------------
+
+ Each node has to communicate with other nodes when starting or ending
+ resync, and for metadata superblock updates.  This communication is
+ managed through three locks: "token", "message", and "ack", together
+ with the Lock Value Block (LVB) of one of the "message" lock.
+
+2.3 new-device management
+-------------------------
+
+ A single lock: "no-new-dev" is used to co-ordinate the addition of
+ new devices - this must be synchronized across the array.
+ Normally all nodes hold a concurrent-read lock on this device.
+
+3. Communication
+================
+
+ Messages can be broadcast to all nodes, and the sender waits for all
+ other nodes to acknowledge the message before proceeding.  Only one
+ message can be processed at a time.
+
+3.1 Message Types
+-----------------
+
+ There are six types of messages which are passed:
+
+3.1.1 METADATA_UPDATED
+^^^^^^^^^^^^^^^^^^^^^^
+
+   informs other nodes that the metadata has
+   been updated, and the node must re-read the md superblock. This is
+   performed synchronously. It is primarily used to signal device
+   failure.
+
+3.1.2 RESYNCING
+^^^^^^^^^^^^^^^
+   informs other nodes that a resync is initiated or
+   ended so that each node may suspend or resume the region.  Each
+   RESYNCING message identifies a range of the devices that the
+   sending node is about to resync. This overrides any previous
+   notification from that node: only one ranged can be resynced at a
+   time per-node.
+
+3.1.3 NEWDISK
+^^^^^^^^^^^^^
+
+   informs other nodes that a device is being added to
+   the array. Message contains an identifier for that device.  See
+   below for further details.
+
+3.1.4 REMOVE
+^^^^^^^^^^^^
+
+   A failed or spare device is being removed from the
+   array. The slot-number of the device is included in the message.
+
+ 3.1.5 RE_ADD:
+
+   A failed device is being re-activated - the assumption
+   is that it has been determined to be working again.
+
+ 3.1.6 BITMAP_NEEDS_SYNC:
+
+   If a node is stopped locally but the bitmap
+   isn't clean, then another node is informed to take the ownership of
+   resync.
+
+3.2 Communication mechanism
+---------------------------
+
+ The DLM LVB is used to communicate within nodes of the cluster. There
+ are three resources used for the purpose:
+
+3.2.1 token
+^^^^^^^^^^^
+   The resource which protects the entire communication
+   system. The node having the token resource is allowed to
+   communicate.
+
+3.2.2 message
+^^^^^^^^^^^^^
+   The lock resource which carries the data to communicate.
+
+3.2.3 ack
+^^^^^^^^^
+
+   The resource, acquiring which means the message has been
+   acknowledged by all nodes in the cluster. The BAST of the resource
+   is used to inform the receiving node that a node wants to
+   communicate.
+
+The algorithm is:
+
+ 1. receive status - all nodes have concurrent-reader lock on "ack"::
+
+	sender                         receiver                 receiver
+	"ack":CR                       "ack":CR                 "ack":CR
+
+ 2. sender get EX on "token",
+    sender get EX on "message"::
+
+	sender                        receiver                 receiver
+	"token":EX                    "ack":CR                 "ack":CR
+	"message":EX
+	"ack":CR
+
+    Sender checks that it still needs to send a message. Messages
+    received or other events that happened while waiting for the
+    "token" may have made this message inappropriate or redundant.
+
+ 3. sender writes LVB
+
+    sender down-convert "message" from EX to CW
+
+    sender try to get EX of "ack"
+
+    ::
+
+      [ wait until all receivers have *processed* the "message" ]
+
+                                       [ triggered by bast of "ack" ]
+                                       receiver get CR on "message"
+                                       receiver read LVB
+                                       receiver processes the message
+                                       [ wait finish ]
+                                       receiver releases "ack"
+                                       receiver tries to get PR on "message"
+
+     sender                         receiver                  receiver
+     "token":EX                     "message":CR              "message":CR
+     "message":CW
+     "ack":EX
+
+ 4. triggered by grant of EX on "ack" (indicating all receivers
+    have processed message)
+
+    sender down-converts "ack" from EX to CR
+
+    sender releases "message"
+
+    sender releases "token"
+
+    ::
+
+                                 receiver upconvert to PR on "message"
+                                 receiver get CR of "ack"
+                                 receiver release "message"
+
+     sender                      receiver                   receiver
+     "ack":CR                    "ack":CR                   "ack":CR
+
+
+4. Handling Failures
+====================
+
+4.1 Node Failure
+----------------
+
+ When a node fails, the DLM informs the cluster with the slot
+ number. The node starts a cluster recovery thread. The cluster
+ recovery thread:
+
+	- acquires the bitmap<number> lock of the failed node
+	- opens the bitmap
+	- reads the bitmap of the failed node
+	- copies the set bitmap to local node
+	- cleans the bitmap of the failed node
+	- releases bitmap<number> lock of the failed node
+	- initiates resync of the bitmap on the current node
+	  md_check_recovery is invoked within recover_bitmaps,
+	  then md_check_recovery -> metadata_update_start/finish,
+	  it will lock the communication by lock_comm.
+	  Which means when one node is resyncing it blocks all
+	  other nodes from writing anywhere on the array.
+
+ The resync process is the regular md resync. However, in a clustered
+ environment when a resync is performed, it needs to tell other nodes
+ of the areas which are suspended. Before a resync starts, the node
+ send out RESYNCING with the (lo,hi) range of the area which needs to
+ be suspended. Each node maintains a suspend_list, which contains the
+ list of ranges which are currently suspended. On receiving RESYNCING,
+ the node adds the range to the suspend_list. Similarly, when the node
+ performing resync finishes, it sends RESYNCING with an empty range to
+ other nodes and other nodes remove the corresponding entry from the
+ suspend_list.
+
+ A helper function, ->area_resyncing() can be used to check if a
+ particular I/O range should be suspended or not.
+
+4.2 Device Failure
+==================
+
+ Device failures are handled and communicated with the metadata update
+ routine.  When a node detects a device failure it does not allow
+ any further writes to that device until the failure has been
+ acknowledged by all other nodes.
+
+5. Adding a new Device
+----------------------
+
+ For adding a new device, it is necessary that all nodes "see" the new
+ device to be added. For this, the following algorithm is used:
+
+   1.  Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
+       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CLUSTER_ADD)
+   2.  Node 1 sends a NEWDISK message with uuid and slot number
+   3.  Other nodes issue kobject_uevent_env with uuid and slot number
+       (Steps 4,5 could be a udev rule)
+   4.  In userspace, the node searches for the disk, perhaps
+       using blkid -t SUB_UUID=""
+   5.  Other nodes issue either of the following depending on whether
+       the disk was found:
+       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
+       disc.number set to slot number)
+       ioctl(CLUSTERED_DISK_NACK)
+   6.  Other nodes drop lock on "no-new-devs" (CR) if device is found
+   7.  Node 1 attempts EX lock on "no-new-dev"
+   8.  If node 1 gets the lock, it sends METADATA_UPDATED after
+       unmarking the disk as SpareLocal
+   9.  If not (get "no-new-dev" lock), it fails the operation and sends
+       METADATA_UPDATED.
+   10. Other nodes get the information whether a disk is added or not
+       by the following METADATA_UPDATED.
+
+6. Module interface
+===================
+
+ There are 17 call-backs which the md core can make to the cluster
+ module.  Understanding these can give a good overview of the whole
+ process.
+
+6.1 join(nodes) and leave()
+---------------------------
+
+ These are called when an array is started with a clustered bitmap,
+ and when the array is stopped.  join() ensures the cluster is
+ available and initializes the various resources.
+ Only the first 'nodes' nodes in the cluster can use the array.
+
+6.2 slot_number()
+-----------------
+
+ Reports the slot number advised by the cluster infrastructure.
+ Range is from 0 to nodes-1.
+
+6.3 resync_info_update()
+------------------------
+
+ This updates the resync range that is stored in the bitmap lock.
+ The starting point is updated as the resync progresses.  The
+ end point is always the end of the array.
+ It does *not* send a RESYNCING message.
+
+6.4 resync_start(), resync_finish()
+-----------------------------------
+
+ These are called when resync/recovery/reshape starts or stops.
+ They update the resyncing range in the bitmap lock and also
+ send a RESYNCING message.  resync_start reports the whole
+ array as resyncing, resync_finish reports none of it.
+
+ resync_finish() also sends a BITMAP_NEEDS_SYNC message which
+ allows some other node to take over.
+
+6.5 metadata_update_start(), metadata_update_finish(), metadata_update_cancel()
+-------------------------------------------------------------------------------
+
+ metadata_update_start is used to get exclusive access to
+ the metadata.  If a change is still needed once that access is
+ gained, metadata_update_finish() will send a METADATA_UPDATE
+ message to all other nodes, otherwise metadata_update_cancel()
+ can be used to release the lock.
+
+6.6 area_resyncing()
+--------------------
+
+ This combines two elements of functionality.
+
+ Firstly, it will check if any node is currently resyncing
+ anything in a given range of sectors.  If any resync is found,
+ then the caller will avoid writing or read-balancing in that
+ range.
+
+ Secondly, while node recovery is happening it reports that
+ all areas are resyncing for READ requests.  This avoids races
+ between the cluster-filesystem and the cluster-RAID handling
+ a node failure.
+
+6.7 add_new_disk_start(), add_new_disk_finish(), new_disk_ack()
+---------------------------------------------------------------
+
+ These are used to manage the new-disk protocol described above.
+ When a new device is added, add_new_disk_start() is called before
+ it is bound to the array and, if that succeeds, add_new_disk_finish()
+ is called the device is fully added.
+
+ When a device is added in acknowledgement to a previous
+ request, or when the device is declared "unavailable",
+ new_disk_ack() is called.
+
+6.8 remove_disk()
+-----------------
+
+ This is called when a spare or failed device is removed from
+ the array.  It causes a REMOVE message to be send to other nodes.
+
+6.9 gather_bitmaps()
+--------------------
+
+ This sends a RE_ADD message to all other nodes and then
+ gathers bitmap information from all bitmaps.  This combined
+ bitmap is then used to recovery the re-added device.
+
+6.10 lock_all_bitmaps() and unlock_all_bitmaps()
+------------------------------------------------
+
+ These are called when change bitmap to none. If a node plans
+ to clear the cluster raid's bitmap, it need to make sure no other
+ nodes are using the raid which is achieved by lock all bitmap
+ locks within the cluster, and also those locks are unlocked
+ accordingly.
+
+7. Unsupported features
+=======================
+
+There are somethings which are not supported by cluster MD yet.
+
+- change array_sectors.
diff --git a/Documentation/md/md-cluster.txt b/Documentation/md/md-cluster.txt
deleted file mode 100644
index e1055f105cf5..000000000000
--- a/Documentation/md/md-cluster.txt
+++ /dev/null
@@ -1,325 +0,0 @@
-The cluster MD is a shared-device RAID for a cluster, it supports
-two levels: raid1 and raid10 (limited support).
-
-
-1. On-disk format
-
-Separate write-intent-bitmaps are used for each cluster node.
-The bitmaps record all writes that may have been started on that node,
-and may not yet have finished. The on-disk layout is:
-
-0                    4k                     8k                    12k
--------------------------------------------------------------------
-| idle                | md super            | bm super [0] + bits |
-| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
-| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
-| bm bits [3, contd]  |                     |                     |
-
-During "normal" functioning we assume the filesystem ensures that only
-one node writes to any given block at a time, so a write request will
-
- - set the appropriate bit (if not already set)
- - commit the write to all mirrors
- - schedule the bit to be cleared after a timeout.
-
-Reads are just handled normally. It is up to the filesystem to ensure
-one node doesn't read from a location where another node (or the same
-node) is writing.
-
-
-2. DLM Locks for management
-
-There are three groups of locks for managing the device:
-
-2.1 Bitmap lock resource (bm_lockres)
-
- The bm_lockres protects individual node bitmaps. They are named in
- the form bitmap000 for node 1, bitmap001 for node 2 and so on. When a
- node joins the cluster, it acquires the lock in PW mode and it stays
- so during the lifetime the node is part of the cluster. The lock
- resource number is based on the slot number returned by the DLM
- subsystem. Since DLM starts node count from one and bitmap slots
- start from zero, one is subtracted from the DLM slot number to arrive
- at the bitmap slot number.
-
- The LVB of the bitmap lock for a particular node records the range
- of sectors that are being re-synced by that node.  No other
- node may write to those sectors.  This is used when a new nodes
- joins the cluster.
-
-2.2 Message passing locks
-
- Each node has to communicate with other nodes when starting or ending
- resync, and for metadata superblock updates.  This communication is
- managed through three locks: "token", "message", and "ack", together
- with the Lock Value Block (LVB) of one of the "message" lock.
-
-2.3 new-device management
-
- A single lock: "no-new-dev" is used to co-ordinate the addition of
- new devices - this must be synchronized across the array.
- Normally all nodes hold a concurrent-read lock on this device.
-
-3. Communication
-
- Messages can be broadcast to all nodes, and the sender waits for all
- other nodes to acknowledge the message before proceeding.  Only one
- message can be processed at a time.
-
-3.1 Message Types
-
- There are six types of messages which are passed:
-
- 3.1.1 METADATA_UPDATED: informs other nodes that the metadata has
-   been updated, and the node must re-read the md superblock. This is
-   performed synchronously. It is primarily used to signal device
-   failure.
-
- 3.1.2 RESYNCING: informs other nodes that a resync is initiated or
-   ended so that each node may suspend or resume the region.  Each
-   RESYNCING message identifies a range of the devices that the
-   sending node is about to resync. This overrides any previous
-   notification from that node: only one ranged can be resynced at a
-   time per-node.
-
- 3.1.3 NEWDISK: informs other nodes that a device is being added to
-   the array. Message contains an identifier for that device.  See
-   below for further details.
-
- 3.1.4 REMOVE: A failed or spare device is being removed from the
-   array. The slot-number of the device is included in the message.
-
- 3.1.5 RE_ADD: A failed device is being re-activated - the assumption
-   is that it has been determined to be working again.
-
- 3.1.6 BITMAP_NEEDS_SYNC: if a node is stopped locally but the bitmap
-   isn't clean, then another node is informed to take the ownership of
-   resync.
-
-3.2 Communication mechanism
-
- The DLM LVB is used to communicate within nodes of the cluster. There
- are three resources used for the purpose:
-
-  3.2.1 token: The resource which protects the entire communication
-   system. The node having the token resource is allowed to
-   communicate.
-
-  3.2.2 message: The lock resource which carries the data to
-   communicate.
-
-  3.2.3 ack: The resource, acquiring which means the message has been
-   acknowledged by all nodes in the cluster. The BAST of the resource
-   is used to inform the receiving node that a node wants to
-   communicate.
-
-The algorithm is:
-
- 1. receive status - all nodes have concurrent-reader lock on "ack".
-
-   sender                         receiver                 receiver
-   "ack":CR                       "ack":CR                 "ack":CR
-
- 2. sender get EX on "token"
-    sender get EX on "message"
-    sender                        receiver                 receiver
-    "token":EX                    "ack":CR                 "ack":CR
-    "message":EX
-    "ack":CR
-
-    Sender checks that it still needs to send a message. Messages
-    received or other events that happened while waiting for the
-    "token" may have made this message inappropriate or redundant.
-
- 3. sender writes LVB.
-    sender down-convert "message" from EX to CW
-    sender try to get EX of "ack"
-    [ wait until all receivers have *processed* the "message" ]
-
-                                     [ triggered by bast of "ack" ]
-                                     receiver get CR on "message"
-                                     receiver read LVB
-                                     receiver processes the message
-                                     [ wait finish ]
-                                     receiver releases "ack"
-                                     receiver tries to get PR on "message"
-
-   sender                         receiver                  receiver
-   "token":EX                     "message":CR              "message":CR
-   "message":CW
-   "ack":EX
-
- 4. triggered by grant of EX on "ack" (indicating all receivers
-    have processed message)
-    sender down-converts "ack" from EX to CR
-    sender releases "message"
-    sender releases "token"
-                               receiver upconvert to PR on "message"
-                               receiver get CR of "ack"
-                               receiver release "message"
-
-   sender                      receiver                   receiver
-   "ack":CR                    "ack":CR                   "ack":CR
-
-
-4. Handling Failures
-
-4.1 Node Failure
-
- When a node fails, the DLM informs the cluster with the slot
- number. The node starts a cluster recovery thread. The cluster
- recovery thread:
-
-	- acquires the bitmap<number> lock of the failed node
-	- opens the bitmap
-	- reads the bitmap of the failed node
-	- copies the set bitmap to local node
-	- cleans the bitmap of the failed node
-	- releases bitmap<number> lock of the failed node
-	- initiates resync of the bitmap on the current node
-		md_check_recovery is invoked within recover_bitmaps,
-		then md_check_recovery -> metadata_update_start/finish,
-		it will lock the communication by lock_comm.
-		Which means when one node is resyncing it blocks all
-		other nodes from writing anywhere on the array.
-
- The resync process is the regular md resync. However, in a clustered
- environment when a resync is performed, it needs to tell other nodes
- of the areas which are suspended. Before a resync starts, the node
- send out RESYNCING with the (lo,hi) range of the area which needs to
- be suspended. Each node maintains a suspend_list, which contains the
- list of ranges which are currently suspended. On receiving RESYNCING,
- the node adds the range to the suspend_list. Similarly, when the node
- performing resync finishes, it sends RESYNCING with an empty range to
- other nodes and other nodes remove the corresponding entry from the
- suspend_list.
-
- A helper function, ->area_resyncing() can be used to check if a
- particular I/O range should be suspended or not.
-
-4.2 Device Failure
-
- Device failures are handled and communicated with the metadata update
- routine.  When a node detects a device failure it does not allow
- any further writes to that device until the failure has been
- acknowledged by all other nodes.
-
-5. Adding a new Device
-
- For adding a new device, it is necessary that all nodes "see" the new
- device to be added. For this, the following algorithm is used:
-
-    1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
-       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CLUSTER_ADD)
-    2. Node 1 sends a NEWDISK message with uuid and slot number
-    3. Other nodes issue kobject_uevent_env with uuid and slot number
-       (Steps 4,5 could be a udev rule)
-    4. In userspace, the node searches for the disk, perhaps
-       using blkid -t SUB_UUID=""
-    5. Other nodes issue either of the following depending on whether
-       the disk was found:
-       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
-             disc.number set to slot number)
-       ioctl(CLUSTERED_DISK_NACK)
-    6. Other nodes drop lock on "no-new-devs" (CR) if device is found
-    7. Node 1 attempts EX lock on "no-new-dev"
-    8. If node 1 gets the lock, it sends METADATA_UPDATED after
-       unmarking the disk as SpareLocal
-    9. If not (get "no-new-dev" lock), it fails the operation and sends
-       METADATA_UPDATED.
-   10. Other nodes get the information whether a disk is added or not
-       by the following METADATA_UPDATED.
-
-6. Module interface.
-
- There are 17 call-backs which the md core can make to the cluster
- module.  Understanding these can give a good overview of the whole
- process.
-
-6.1 join(nodes) and leave()
-
- These are called when an array is started with a clustered bitmap,
- and when the array is stopped.  join() ensures the cluster is
- available and initializes the various resources.
- Only the first 'nodes' nodes in the cluster can use the array.
-
-6.2 slot_number()
-
- Reports the slot number advised by the cluster infrastructure.
- Range is from 0 to nodes-1.
-
-6.3 resync_info_update()
-
- This updates the resync range that is stored in the bitmap lock.
- The starting point is updated as the resync progresses.  The
- end point is always the end of the array.
- It does *not* send a RESYNCING message.
-
-6.4 resync_start(), resync_finish()
-
- These are called when resync/recovery/reshape starts or stops.
- They update the resyncing range in the bitmap lock and also
- send a RESYNCING message.  resync_start reports the whole
- array as resyncing, resync_finish reports none of it.
-
- resync_finish() also sends a BITMAP_NEEDS_SYNC message which
- allows some other node to take over.
-
-6.5 metadata_update_start(), metadata_update_finish(),
-    metadata_update_cancel().
-
- metadata_update_start is used to get exclusive access to
- the metadata.  If a change is still needed once that access is
- gained, metadata_update_finish() will send a METADATA_UPDATE
- message to all other nodes, otherwise metadata_update_cancel()
- can be used to release the lock.
-
-6.6 area_resyncing()
-
- This combines two elements of functionality.
-
- Firstly, it will check if any node is currently resyncing
- anything in a given range of sectors.  If any resync is found,
- then the caller will avoid writing or read-balancing in that
- range.
-
- Secondly, while node recovery is happening it reports that
- all areas are resyncing for READ requests.  This avoids races
- between the cluster-filesystem and the cluster-RAID handling
- a node failure.
-
-6.7 add_new_disk_start(), add_new_disk_finish(), new_disk_ack()
-
- These are used to manage the new-disk protocol described above.
- When a new device is added, add_new_disk_start() is called before
- it is bound to the array and, if that succeeds, add_new_disk_finish()
- is called the device is fully added.
-
- When a device is added in acknowledgement to a previous
- request, or when the device is declared "unavailable",
- new_disk_ack() is called.
-
-6.8 remove_disk()
-
- This is called when a spare or failed device is removed from
- the array.  It causes a REMOVE message to be send to other nodes.
-
-6.9 gather_bitmaps()
-
- This sends a RE_ADD message to all other nodes and then
- gathers bitmap information from all bitmaps.  This combined
- bitmap is then used to recovery the re-added device.
-
-6.10 lock_all_bitmaps() and unlock_all_bitmaps()
-
- These are called when change bitmap to none. If a node plans
- to clear the cluster raid's bitmap, it need to make sure no other
- nodes are using the raid which is achieved by lock all bitmap
- locks within the cluster, and also those locks are unlocked
- accordingly.
-
-7. Unsupported features
-
-There are somethings which are not supported by cluster MD yet.
-
-- change array_sectors.
diff --git a/Documentation/md/raid5-cache.rst b/Documentation/md/raid5-cache.rst
new file mode 100644
index 000000000000..d7a15f44a7c3
--- /dev/null
+++ b/Documentation/md/raid5-cache.rst
@@ -0,0 +1,111 @@
+================
+RAID 4/5/6 cache
+================
+
+Raid 4/5/6 could include an extra disk for data cache besides normal RAID
+disks. The role of RAID disks isn't changed with the cache disk. The cache disk
+caches data to the RAID disks. The cache can be in write-through (supported
+since 4.4) or write-back mode (supported since 4.10). mdadm (supported since
+3.4) has a new option '--write-journal' to create array with cache. Please
+refer to mdadm manual for details. By default (RAID array starts), the cache is
+in write-through mode. A user can switch it to write-back mode by::
+
+	echo "write-back" > /sys/block/md0/md/journal_mode
+
+And switch it back to write-through mode by::
+
+	echo "write-through" > /sys/block/md0/md/journal_mode
+
+In both modes, all writes to the array will hit cache disk first. This means
+the cache disk must be fast and sustainable.
+
+write-through mode
+==================
+
+This mode mainly fixes the 'write hole' issue. For RAID 4/5/6 array, an unclean
+shutdown can cause data in some stripes to not be in consistent state, eg, data
+and parity don't match. The reason is that a stripe write involves several RAID
+disks and it's possible the writes don't hit all RAID disks yet before the
+unclean shutdown. We call an array degraded if it has inconsistent data. MD
+tries to resync the array to bring it back to normal state. But before the
+resync completes, any system crash will expose the chance of real data
+corruption in the RAID array. This problem is called 'write hole'.
+
+The write-through cache will cache all data on cache disk first. After the data
+is safe on the cache disk, the data will be flushed onto RAID disks. The
+two-step write will guarantee MD can recover correct data after unclean
+shutdown even the array is degraded. Thus the cache can close the 'write hole'.
+
+In write-through mode, MD reports IO completion to upper layer (usually
+filesystems) after the data is safe on RAID disks, so cache disk failure
+doesn't cause data loss. Of course cache disk failure means the array is
+exposed to 'write hole' again.
+
+In write-through mode, the cache disk isn't required to be big. Several
+hundreds megabytes are enough.
+
+write-back mode
+===============
+
+write-back mode fixes the 'write hole' issue too, since all write data is
+cached on cache disk. But the main goal of 'write-back' cache is to speed up
+write. If a write crosses all RAID disks of a stripe, we call it full-stripe
+write. For non-full-stripe writes, MD must read old data before the new parity
+can be calculated. These synchronous reads hurt write throughput. Some writes
+which are sequential but not dispatched in the same time will suffer from this
+overhead too. Write-back cache will aggregate the data and flush the data to
+RAID disks only after the data becomes a full stripe write. This will
+completely avoid the overhead, so it's very helpful for some workloads. A
+typical workload which does sequential write followed by fsync is an example.
+
+In write-back mode, MD reports IO completion to upper layer (usually
+filesystems) right after the data hits cache disk. The data is flushed to raid
+disks later after specific conditions met. So cache disk failure will cause
+data loss.
+
+In write-back mode, MD also caches data in memory. The memory cache includes
+the same data stored on cache disk, so a power loss doesn't cause data loss.
+The memory cache size has performance impact for the array. It's recommended
+the size is big. A user can configure the size by::
+
+	echo "2048" > /sys/block/md0/md/stripe_cache_size
+
+Too small cache disk will make the write aggregation less efficient in this
+mode depending on the workloads. It's recommended to use a cache disk with at
+least several gigabytes size in write-back mode.
+
+The implementation
+==================
+
+The write-through and write-back cache use the same disk format. The cache disk
+is organized as a simple write log. The log consists of 'meta data' and 'data'
+pairs. The meta data describes the data. It also includes checksum and sequence
+ID for recovery identification. Data can be IO data and parity data. Data is
+checksumed too. The checksum is stored in the meta data ahead of the data. The
+checksum is an optimization because MD can write meta and data freely without
+worry about the order. MD superblock has a field pointed to the valid meta data
+of log head.
+
+The log implementation is pretty straightforward. The difficult part is the
+order in which MD writes data to cache disk and RAID disks. Specifically, in
+write-through mode, MD calculates parity for IO data, writes both IO data and
+parity to the log, writes the data and parity to RAID disks after the data and
+parity is settled down in log and finally the IO is finished. Read just reads
+from raid disks as usual.
+
+In write-back mode, MD writes IO data to the log and reports IO completion. The
+data is also fully cached in memory at that time, which means read must query
+memory cache. If some conditions are met, MD will flush the data to RAID disks.
+MD will calculate parity for the data and write parity into the log. After this
+is finished, MD will write both data and parity into RAID disks, then MD can
+release the memory cache. The flush conditions could be stripe becomes a full
+stripe write, free cache disk space is low or free in-kernel memory cache space
+is low.
+
+After an unclean shutdown, MD does recovery. MD reads all meta data and data
+from the log. The sequence ID and checksum will help us detect corrupted meta
+data and data. If MD finds a stripe with data and valid parities (1 parity for
+raid4/5 and 2 for raid6), MD will write the data and parities to RAID disks. If
+parities are incompleted, they are discarded. If part of data is corrupted,
+they are discarded too. MD then loads valid data and writes them to RAID disks
+in normal way.
diff --git a/Documentation/md/raid5-cache.txt b/Documentation/md/raid5-cache.txt
deleted file mode 100644
index 2b210f295786..000000000000
--- a/Documentation/md/raid5-cache.txt
+++ /dev/null
@@ -1,109 +0,0 @@
-RAID5 cache
-
-Raid 4/5/6 could include an extra disk for data cache besides normal RAID
-disks. The role of RAID disks isn't changed with the cache disk. The cache disk
-caches data to the RAID disks. The cache can be in write-through (supported
-since 4.4) or write-back mode (supported since 4.10). mdadm (supported since
-3.4) has a new option '--write-journal' to create array with cache. Please
-refer to mdadm manual for details. By default (RAID array starts), the cache is
-in write-through mode. A user can switch it to write-back mode by:
-
-echo "write-back" > /sys/block/md0/md/journal_mode
-
-And switch it back to write-through mode by:
-
-echo "write-through" > /sys/block/md0/md/journal_mode
-
-In both modes, all writes to the array will hit cache disk first. This means
-the cache disk must be fast and sustainable.
-
--------------------------------------
-write-through mode:
-
-This mode mainly fixes the 'write hole' issue. For RAID 4/5/6 array, an unclean
-shutdown can cause data in some stripes to not be in consistent state, eg, data
-and parity don't match. The reason is that a stripe write involves several RAID
-disks and it's possible the writes don't hit all RAID disks yet before the
-unclean shutdown. We call an array degraded if it has inconsistent data. MD
-tries to resync the array to bring it back to normal state. But before the
-resync completes, any system crash will expose the chance of real data
-corruption in the RAID array. This problem is called 'write hole'.
-
-The write-through cache will cache all data on cache disk first. After the data
-is safe on the cache disk, the data will be flushed onto RAID disks. The
-two-step write will guarantee MD can recover correct data after unclean
-shutdown even the array is degraded. Thus the cache can close the 'write hole'.
-
-In write-through mode, MD reports IO completion to upper layer (usually
-filesystems) after the data is safe on RAID disks, so cache disk failure
-doesn't cause data loss. Of course cache disk failure means the array is
-exposed to 'write hole' again.
-
-In write-through mode, the cache disk isn't required to be big. Several
-hundreds megabytes are enough.
-
---------------------------------------
-write-back mode:
-
-write-back mode fixes the 'write hole' issue too, since all write data is
-cached on cache disk. But the main goal of 'write-back' cache is to speed up
-write. If a write crosses all RAID disks of a stripe, we call it full-stripe
-write. For non-full-stripe writes, MD must read old data before the new parity
-can be calculated. These synchronous reads hurt write throughput. Some writes
-which are sequential but not dispatched in the same time will suffer from this
-overhead too. Write-back cache will aggregate the data and flush the data to
-RAID disks only after the data becomes a full stripe write. This will
-completely avoid the overhead, so it's very helpful for some workloads. A
-typical workload which does sequential write followed by fsync is an example.
-
-In write-back mode, MD reports IO completion to upper layer (usually
-filesystems) right after the data hits cache disk. The data is flushed to raid
-disks later after specific conditions met. So cache disk failure will cause
-data loss.
-
-In write-back mode, MD also caches data in memory. The memory cache includes
-the same data stored on cache disk, so a power loss doesn't cause data loss.
-The memory cache size has performance impact for the array. It's recommended
-the size is big. A user can configure the size by:
-
-echo "2048" > /sys/block/md0/md/stripe_cache_size
-
-Too small cache disk will make the write aggregation less efficient in this
-mode depending on the workloads. It's recommended to use a cache disk with at
-least several gigabytes size in write-back mode.
-
---------------------------------------
-The implementation:
-
-The write-through and write-back cache use the same disk format. The cache disk
-is organized as a simple write log. The log consists of 'meta data' and 'data'
-pairs. The meta data describes the data. It also includes checksum and sequence
-ID for recovery identification. Data can be IO data and parity data. Data is
-checksumed too. The checksum is stored in the meta data ahead of the data. The
-checksum is an optimization because MD can write meta and data freely without
-worry about the order. MD superblock has a field pointed to the valid meta data
-of log head.
-
-The log implementation is pretty straightforward. The difficult part is the
-order in which MD writes data to cache disk and RAID disks. Specifically, in
-write-through mode, MD calculates parity for IO data, writes both IO data and
-parity to the log, writes the data and parity to RAID disks after the data and
-parity is settled down in log and finally the IO is finished. Read just reads
-from raid disks as usual.
-
-In write-back mode, MD writes IO data to the log and reports IO completion. The
-data is also fully cached in memory at that time, which means read must query
-memory cache. If some conditions are met, MD will flush the data to RAID disks.
-MD will calculate parity for the data and write parity into the log. After this
-is finished, MD will write both data and parity into RAID disks, then MD can
-release the memory cache. The flush conditions could be stripe becomes a full
-stripe write, free cache disk space is low or free in-kernel memory cache space
-is low.
-
-After an unclean shutdown, MD does recovery. MD reads all meta data and data
-from the log. The sequence ID and checksum will help us detect corrupted meta
-data and data. If MD finds a stripe with data and valid parities (1 parity for
-raid4/5 and 2 for raid6), MD will write the data and parities to RAID disks. If
-parities are incompleted, they are discarded. If part of data is corrupted,
-they are discarded too. MD then loads valid data and writes them to RAID disks
-in normal way.
diff --git a/Documentation/md/raid5-ppl.rst b/Documentation/md/raid5-ppl.rst
new file mode 100644
index 000000000000..357e5515bc55
--- /dev/null
+++ b/Documentation/md/raid5-ppl.rst
@@ -0,0 +1,47 @@
+==================
+Partial Parity Log
+==================
+
+Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
+addressed by PPL is that after a dirty shutdown, parity of a particular stripe
+may become inconsistent with data on other member disks. If the array is also
+in degraded state, there is no way to recalculate parity, because one of the
+disks is missing. This can lead to silent data corruption when rebuilding the
+array or using it is as degraded - data calculated from parity for array blocks
+that have not been touched by a write request during the unclean shutdown can
+be incorrect. Such condition is known as the RAID5 Write Hole. Because of
+this, md by default does not allow starting a dirty degraded array.
+
+Partial parity for a write operation is the XOR of stripe data chunks not
+modified by this write. It is just enough data needed for recovering from the
+write hole. XORing partial parity with the modified chunks produces parity for
+the stripe, consistent with its state before the write operation, regardless of
+which chunk writes have completed. If one of the not modified data disks of
+this stripe is missing, this updated parity can be used to recover its
+contents. PPL recovery is also performed when starting an array after an
+unclean shutdown and all disks are available, eliminating the need to resync
+the array. Because of this, using write-intent bitmap and PPL together is not
+supported.
+
+When handling a write request PPL writes partial parity before new data and
+parity are dispatched to disks. PPL is a distributed log - it is stored on
+array member drives in the metadata area, on the parity drive of a particular
+stripe.  It does not require a dedicated journaling drive. Write performance is
+reduced by up to 30%-40% but it scales with the number of drives in the array
+and the journaling drive does not become a bottleneck or a single point of
+failure.
+
+Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
+not a true journal. It does not protect from losing in-flight data, only from
+silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
+performed for this stripe (parity is not updated). So it is possible to have
+arbitrary data in the written part of a stripe if that disk is lost. In such
+case the behavior is the same as in plain raid5.
+
+PPL is available for md version-1 metadata and external (specifically IMSM)
+metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
+
+There is a limitation of maximum 64 disks in the array for PPL. It allows to
+keep data structures and implementation simple. RAID5 arrays with so many disks
+are not likely due to high risk of multiple disks failure. Such restriction
+should not be a real life limitation.
diff --git a/Documentation/md/raid5-ppl.txt b/Documentation/md/raid5-ppl.txt
deleted file mode 100644
index bfa092589e00..000000000000
--- a/Documentation/md/raid5-ppl.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-Partial Parity Log
-
-Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
-addressed by PPL is that after a dirty shutdown, parity of a particular stripe
-may become inconsistent with data on other member disks. If the array is also
-in degraded state, there is no way to recalculate parity, because one of the
-disks is missing. This can lead to silent data corruption when rebuilding the
-array or using it is as degraded - data calculated from parity for array blocks
-that have not been touched by a write request during the unclean shutdown can
-be incorrect. Such condition is known as the RAID5 Write Hole. Because of
-this, md by default does not allow starting a dirty degraded array.
-
-Partial parity for a write operation is the XOR of stripe data chunks not
-modified by this write. It is just enough data needed for recovering from the
-write hole. XORing partial parity with the modified chunks produces parity for
-the stripe, consistent with its state before the write operation, regardless of
-which chunk writes have completed. If one of the not modified data disks of
-this stripe is missing, this updated parity can be used to recover its
-contents. PPL recovery is also performed when starting an array after an
-unclean shutdown and all disks are available, eliminating the need to resync
-the array. Because of this, using write-intent bitmap and PPL together is not
-supported.
-
-When handling a write request PPL writes partial parity before new data and
-parity are dispatched to disks. PPL is a distributed log - it is stored on
-array member drives in the metadata area, on the parity drive of a particular
-stripe.  It does not require a dedicated journaling drive. Write performance is
-reduced by up to 30%-40% but it scales with the number of drives in the array
-and the journaling drive does not become a bottleneck or a single point of
-failure.
-
-Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
-not a true journal. It does not protect from losing in-flight data, only from
-silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
-performed for this stripe (parity is not updated). So it is possible to have
-arbitrary data in the written part of a stripe if that disk is lost. In such
-case the behavior is the same as in plain raid5.
-
-PPL is available for md version-1 metadata and external (specifically IMSM)
-metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
-
-There is a limitation of maximum 64 disks in the array for PPL. It allows to
-keep data structures and implementation simple. RAID5 arrays with so many disks
-are not likely due to high risk of multiple disks failure. Such restriction
-should not be a real life limitation.
-- 
cgit v1.2.3-55-g7522


From 6e58e2d81367308ffd891bd0b34d47e9104e7ae4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 13:44:38 -0300
Subject: docs: mtd: convert to ReST

Rename the mtd documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

It should be noticed that Sphinx doesn't handle very well
URLs with dots in the middle. Thankfully, internally, the '.'
char is translated to %2E, so we can jus use %2E instead of
dots, and this will work fine on both text and processed files.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/mtd/index.rst     |  12 +
 Documentation/mtd/intel-spi.rst |  90 +++++
 Documentation/mtd/intel-spi.txt |  88 -----
 Documentation/mtd/nand_ecc.rst  | 763 ++++++++++++++++++++++++++++++++++++++++
 Documentation/mtd/nand_ecc.txt  | 714 -------------------------------------
 Documentation/mtd/spi-nor.rst   |  66 ++++
 Documentation/mtd/spi-nor.txt   |  65 ----
 drivers/mtd/nand/raw/nand_ecc.c |   2 +-
 8 files changed, 932 insertions(+), 868 deletions(-)
 create mode 100644 Documentation/mtd/index.rst
 create mode 100644 Documentation/mtd/intel-spi.rst
 delete mode 100644 Documentation/mtd/intel-spi.txt
 create mode 100644 Documentation/mtd/nand_ecc.rst
 delete mode 100644 Documentation/mtd/nand_ecc.txt
 create mode 100644 Documentation/mtd/spi-nor.rst
 delete mode 100644 Documentation/mtd/spi-nor.txt

diff --git a/Documentation/mtd/index.rst b/Documentation/mtd/index.rst
new file mode 100644
index 000000000000..4fdae418ac97
--- /dev/null
+++ b/Documentation/mtd/index.rst
@@ -0,0 +1,12 @@
+:orphan:
+
+==============================
+Memory Technology Device (MTD)
+==============================
+
+.. toctree::
+   :maxdepth: 1
+
+   intel-spi
+   nand_ecc
+   spi-nor
diff --git a/Documentation/mtd/intel-spi.rst b/Documentation/mtd/intel-spi.rst
new file mode 100644
index 000000000000..0e6d9cd5388d
--- /dev/null
+++ b/Documentation/mtd/intel-spi.rst
@@ -0,0 +1,90 @@
+==============================
+Upgrading BIOS using intel-spi
+==============================
+
+Many Intel CPUs like Baytrail and Braswell include SPI serial flash host
+controller which is used to hold BIOS and other platform specific data.
+Since contents of the SPI serial flash is crucial for machine to function,
+it is typically protected by different hardware protection mechanisms to
+avoid accidental (or on purpose) overwrite of the content.
+
+Not all manufacturers protect the SPI serial flash, mainly because it
+allows upgrading the BIOS image directly from an OS.
+
+The intel-spi driver makes it possible to read and write the SPI serial
+flash, if certain protection bits are not set and locked. If it finds
+any of them set, the whole MTD device is made read-only to prevent
+partial overwrites. By default the driver exposes SPI serial flash
+contents as read-only but it can be changed from kernel command line,
+passing "intel-spi.writeable=1".
+
+Please keep in mind that overwriting the BIOS image on SPI serial flash
+might render the machine unbootable and requires special equipment like
+Dediprog to revive. You have been warned!
+
+Below are the steps how to upgrade MinnowBoard MAX BIOS directly from
+Linux.
+
+ 1) Download and extract the latest Minnowboard MAX BIOS SPI image
+    [1]. At the time writing this the latest image is v92.
+
+ 2) Install mtd-utils package [2]. We need this in order to erase the SPI
+    serial flash. Distros like Debian and Fedora have this prepackaged with
+    name "mtd-utils".
+
+ 3) Add "intel-spi.writeable=1" to the kernel command line and reboot
+    the board (you can also reload the driver passing "writeable=1" as
+    module parameter to modprobe).
+
+ 4) Once the board is up and running again, find the right MTD partition
+    (it is named as "BIOS")::
+
+	# cat /proc/mtd
+	dev:    size   erasesize  name
+	mtd0: 00800000 00001000 "BIOS"
+
+    So here it will be /dev/mtd0 but it may vary.
+
+ 5) Make backup of the existing image first::
+
+	# dd if=/dev/mtd0ro of=bios.bak
+	16384+0 records in
+	16384+0 records out
+	8388608 bytes (8.4 MB) copied, 10.0269 s, 837 kB/s
+
+ 6) Verify the backup:
+
+	# sha1sum /dev/mtd0ro bios.bak
+	fdbb011920572ca6c991377c4b418a0502668b73  /dev/mtd0ro
+	fdbb011920572ca6c991377c4b418a0502668b73  bios.bak
+
+    The SHA1 sums must match. Otherwise do not continue any further!
+
+ 7) Erase the SPI serial flash. After this step, do not reboot the
+    board! Otherwise it will not start anymore::
+
+	# flash_erase /dev/mtd0 0 0
+	Erasing 4 Kibyte @ 7ff000 -- 100 % complete
+
+ 8) Once completed without errors you can write the new BIOS image:
+
+    # dd if=MNW2MAX1.X64.0092.R01.1605221712.bin of=/dev/mtd0
+
+ 9) Verify that the new content of the SPI serial flash matches the new
+    BIOS image::
+
+	# sha1sum /dev/mtd0ro MNW2MAX1.X64.0092.R01.1605221712.bin
+	9b4df9e4be2057fceec3a5529ec3d950836c87a2  /dev/mtd0ro
+	9b4df9e4be2057fceec3a5529ec3d950836c87a2 MNW2MAX1.X64.0092.R01.1605221712.bin
+
+    The SHA1 sums should match.
+
+ 10) Now you can reboot your board and observe the new BIOS starting up
+     properly.
+
+References
+----------
+
+[1] https://firmware.intel.com/sites/default/files/MinnowBoard%2EMAX_%2EX64%2E92%2ER01%2Ezip
+
+[2] http://www.linux-mtd.infradead.org/
diff --git a/Documentation/mtd/intel-spi.txt b/Documentation/mtd/intel-spi.txt
deleted file mode 100644
index bc357729c2cb..000000000000
--- a/Documentation/mtd/intel-spi.txt
+++ /dev/null
@@ -1,88 +0,0 @@
-Upgrading BIOS using intel-spi
-------------------------------
-
-Many Intel CPUs like Baytrail and Braswell include SPI serial flash host
-controller which is used to hold BIOS and other platform specific data.
-Since contents of the SPI serial flash is crucial for machine to function,
-it is typically protected by different hardware protection mechanisms to
-avoid accidental (or on purpose) overwrite of the content.
-
-Not all manufacturers protect the SPI serial flash, mainly because it
-allows upgrading the BIOS image directly from an OS.
-
-The intel-spi driver makes it possible to read and write the SPI serial
-flash, if certain protection bits are not set and locked. If it finds
-any of them set, the whole MTD device is made read-only to prevent
-partial overwrites. By default the driver exposes SPI serial flash
-contents as read-only but it can be changed from kernel command line,
-passing "intel-spi.writeable=1".
-
-Please keep in mind that overwriting the BIOS image on SPI serial flash
-might render the machine unbootable and requires special equipment like
-Dediprog to revive. You have been warned!
-
-Below are the steps how to upgrade MinnowBoard MAX BIOS directly from
-Linux.
-
- 1) Download and extract the latest Minnowboard MAX BIOS SPI image
-    [1]. At the time writing this the latest image is v92.
-
- 2) Install mtd-utils package [2]. We need this in order to erase the SPI
-    serial flash. Distros like Debian and Fedora have this prepackaged with
-    name "mtd-utils".
-
- 3) Add "intel-spi.writeable=1" to the kernel command line and reboot
-    the board (you can also reload the driver passing "writeable=1" as
-    module parameter to modprobe).
-
- 4) Once the board is up and running again, find the right MTD partition
-    (it is named as "BIOS"):
-
-    # cat /proc/mtd
-    dev:    size   erasesize  name
-    mtd0: 00800000 00001000 "BIOS"
-
-    So here it will be /dev/mtd0 but it may vary.
-
- 5) Make backup of the existing image first:
-
-    # dd if=/dev/mtd0ro of=bios.bak
-    16384+0 records in
-    16384+0 records out
-    8388608 bytes (8.4 MB) copied, 10.0269 s, 837 kB/s
-
- 6) Verify the backup
-
-    # sha1sum /dev/mtd0ro bios.bak
-    fdbb011920572ca6c991377c4b418a0502668b73  /dev/mtd0ro
-    fdbb011920572ca6c991377c4b418a0502668b73  bios.bak
-
-    The SHA1 sums must match. Otherwise do not continue any further!
-
- 7) Erase the SPI serial flash. After this step, do not reboot the
-    board! Otherwise it will not start anymore.
-
-    # flash_erase /dev/mtd0 0 0
-    Erasing 4 Kibyte @ 7ff000 -- 100 % complete
-
- 8) Once completed without errors you can write the new BIOS image:
-
-    # dd if=MNW2MAX1.X64.0092.R01.1605221712.bin of=/dev/mtd0
-
- 9) Verify that the new content of the SPI serial flash matches the new
-    BIOS image:
-
-    # sha1sum /dev/mtd0ro MNW2MAX1.X64.0092.R01.1605221712.bin
-    9b4df9e4be2057fceec3a5529ec3d950836c87a2  /dev/mtd0ro
-    9b4df9e4be2057fceec3a5529ec3d950836c87a2 MNW2MAX1.X64.0092.R01.1605221712.bin
-
-    The SHA1 sums should match.
-
- 10) Now you can reboot your board and observe the new BIOS starting up
-     properly.
-
-References
-----------
-
-[1] https://firmware.intel.com/sites/default/files/MinnowBoard.MAX_.X64.92.R01.zip
-[2] http://www.linux-mtd.infradead.org/
diff --git a/Documentation/mtd/nand_ecc.rst b/Documentation/mtd/nand_ecc.rst
new file mode 100644
index 000000000000..e8d3c53a5056
--- /dev/null
+++ b/Documentation/mtd/nand_ecc.rst
@@ -0,0 +1,763 @@
+==========================
+NAND Error-correction Code
+==========================
+
+Introduction
+============
+
+Having looked at the linux mtd/nand driver and more specific at nand_ecc.c
+I felt there was room for optimisation. I bashed the code for a few hours
+performing tricks like table lookup removing superfluous code etc.
+After that the speed was increased by 35-40%.
+Still I was not too happy as I felt there was additional room for improvement.
+
+Bad! I was hooked.
+I decided to annotate my steps in this file. Perhaps it is useful to someone
+or someone learns something from it.
+
+
+The problem
+===========
+
+NAND flash (at least SLC one) typically has sectors of 256 bytes.
+However NAND flash is not extremely reliable so some error detection
+(and sometimes correction) is needed.
+
+This is done by means of a Hamming code. I'll try to explain it in
+laymans terms (and apologies to all the pro's in the field in case I do
+not use the right terminology, my coding theory class was almost 30
+years ago, and I must admit it was not one of my favourites).
+
+As I said before the ecc calculation is performed on sectors of 256
+bytes. This is done by calculating several parity bits over the rows and
+columns. The parity used is even parity which means that the parity bit = 1
+if the data over which the parity is calculated is 1 and the parity bit = 0
+if the data over which the parity is calculated is 0. So the total
+number of bits over the data over which the parity is calculated + the
+parity bit is even. (see wikipedia if you can't follow this).
+Parity is often calculated by means of an exclusive or operation,
+sometimes also referred to as xor. In C the operator for xor is ^
+
+Back to ecc.
+Let's give a small figure:
+
+=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
+byte   0:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp4 ... rp14
+byte   1:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp2 rp4 ... rp14
+byte   2:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp4 ... rp14
+byte   3:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp4 ... rp14
+byte   4:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp5 ... rp14
+...
+byte 254:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp5 ... rp15
+byte 255:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp5 ... rp15
+           cp1  cp0  cp1  cp0  cp1  cp0  cp1  cp0
+           cp3  cp3  cp2  cp2  cp3  cp3  cp2  cp2
+           cp5  cp5  cp5  cp5  cp4  cp4  cp4  cp4
+=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
+
+This figure represents a sector of 256 bytes.
+cp is my abbreviation for column parity, rp for row parity.
+
+Let's start to explain column parity.
+
+- cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
+
+  so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.
+
+Similarly cp1 is the sum of all bit1, bit3, bit5 and bit7.
+
+- cp2 is the parity over bit0, bit1, bit4 and bit5
+- cp3 is the parity over bit2, bit3, bit6 and bit7.
+- cp4 is the parity over bit0, bit1, bit2 and bit3.
+- cp5 is the parity over bit4, bit5, bit6 and bit7.
+
+Note that each of cp0 .. cp5 is exactly one bit.
+
+Row parity actually works almost the same.
+
+- rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)
+- rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)
+- rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...
+  (so handle two bytes, then skip 2 bytes).
+- rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)
+- for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.
+
+  so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)
+- and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..
+
+The story now becomes quite boring. I guess you get the idea.
+
+- rp6 covers 8 bytes then skips 8 etc
+- rp7 skips 8 bytes then covers 8 etc
+- rp8 covers 16 bytes then skips 16 etc
+- rp9 skips 16 bytes then covers 16 etc
+- rp10 covers 32 bytes then skips 32 etc
+- rp11 skips 32 bytes then covers 32 etc
+- rp12 covers 64 bytes then skips 64 etc
+- rp13 skips 64 bytes then covers 64 etc
+- rp14 covers 128 bytes then skips 128
+- rp15 skips 128 bytes then covers 128
+
+In the end the parity bits are grouped together in three bytes as
+follows:
+
+=====  ===== ===== ===== ===== ===== ===== ===== =====
+ECC    Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
+=====  ===== ===== ===== ===== ===== ===== ===== =====
+ECC 0   rp07  rp06  rp05  rp04  rp03  rp02  rp01  rp00
+ECC 1   rp15  rp14  rp13  rp12  rp11  rp10  rp09  rp08
+ECC 2   cp5   cp4   cp3   cp2   cp1   cp0      1     1
+=====  ===== ===== ===== ===== ===== ===== ===== =====
+
+I detected after writing this that ST application note AN1823
+(http://www.st.com/stonline/) gives a much
+nicer picture.(but they use line parity as term where I use row parity)
+Oh well, I'm graphically challenged, so suffer with me for a moment :-)
+
+And I could not reuse the ST picture anyway for copyright reasons.
+
+
+Attempt 0
+=========
+
+Implementing the parity calculation is pretty simple.
+In C pseudocode::
+
+  for (i = 0; i < 256; i++)
+  {
+    if (i & 0x01)
+       rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
+    else
+       rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
+    if (i & 0x02)
+       rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
+    else
+       rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;
+    if (i & 0x04)
+      rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;
+    else
+      rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;
+    if (i & 0x08)
+      rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;
+    else
+      rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;
+    if (i & 0x10)
+      rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;
+    else
+      rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;
+    if (i & 0x20)
+      rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
+    else
+      rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
+    if (i & 0x40)
+      rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
+    else
+      rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;
+    if (i & 0x80)
+      rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;
+    else
+      rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;
+    cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;
+    cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;
+    cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;
+    cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3
+    cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4
+    cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5
+  }
+
+
+Analysis 0
+==========
+
+C does have bitwise operators but not really operators to do the above
+efficiently (and most hardware has no such instructions either).
+Therefore without implementing this it was clear that the code above was
+not going to bring me a Nobel prize :-)
+
+Fortunately the exclusive or operation is commutative, so we can combine
+the values in any order. So instead of calculating all the bits
+individually, let us try to rearrange things.
+For the column parity this is easy. We can just xor the bytes and in the
+end filter out the relevant bits. This is pretty nice as it will bring
+all cp calculation out of the for loop.
+
+Similarly we can first xor the bytes for the various rows.
+This leads to:
+
+
+Attempt 1
+=========
+
+::
+
+  const char parity[256] = {
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0
+  };
+
+  void ecc1(const unsigned char *buf, unsigned char *code)
+  {
+      int i;
+      const unsigned char *bp = buf;
+      unsigned char cur;
+      unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
+      unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
+      unsigned char par;
+
+      par = 0;
+      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
+      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
+      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
+      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
+
+      for (i = 0; i < 256; i++)
+      {
+          cur = *bp++;
+          par ^= cur;
+          if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;
+          if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;
+          if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;
+          if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;
+          if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;
+          if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;
+          if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;
+          if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;
+      }
+      code[0] =
+          (parity[rp7] << 7) |
+          (parity[rp6] << 6) |
+          (parity[rp5] << 5) |
+          (parity[rp4] << 4) |
+          (parity[rp3] << 3) |
+          (parity[rp2] << 2) |
+          (parity[rp1] << 1) |
+          (parity[rp0]);
+      code[1] =
+          (parity[rp15] << 7) |
+          (parity[rp14] << 6) |
+          (parity[rp13] << 5) |
+          (parity[rp12] << 4) |
+          (parity[rp11] << 3) |
+          (parity[rp10] << 2) |
+          (parity[rp9]  << 1) |
+          (parity[rp8]);
+      code[2] =
+          (parity[par & 0xf0] << 7) |
+          (parity[par & 0x0f] << 6) |
+          (parity[par & 0xcc] << 5) |
+          (parity[par & 0x33] << 4) |
+          (parity[par & 0xaa] << 3) |
+          (parity[par & 0x55] << 2);
+      code[0] = ~code[0];
+      code[1] = ~code[1];
+      code[2] = ~code[2];
+  }
+
+Still pretty straightforward. The last three invert statements are there to
+give a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash
+all data is 0xff, so the checksum then matches.
+
+I also introduced the parity lookup. I expected this to be the fastest
+way to calculate the parity, but I will investigate alternatives later
+on.
+
+
+Analysis 1
+==========
+
+The code works, but is not terribly efficient. On my system it took
+almost 4 times as much time as the linux driver code. But hey, if it was
+*that* easy this would have been done long before.
+No pain. no gain.
+
+Fortunately there is plenty of room for improvement.
+
+In step 1 we moved from bit-wise calculation to byte-wise calculation.
+However in C we can also use the unsigned long data type and virtually
+every modern microprocessor supports 32 bit operations, so why not try
+to write our code in such a way that we process data in 32 bit chunks.
+
+Of course this means some modification as the row parity is byte by
+byte. A quick analysis:
+for the column parity we use the par variable. When extending to 32 bits
+we can in the end easily calculate rp0 and rp1 from it.
+(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
+respectively, from MSB to LSB)
+also rp2 and rp3 can be easily retrieved from par as rp3 covers the
+first two MSBs and rp2 covers the last two LSBs.
+
+Note that of course now the loop is executed only 64 times (256/4).
+And note that care must taken wrt byte ordering. The way bytes are
+ordered in a long is machine dependent, and might affect us.
+Anyway, if there is an issue: this code is developed on x86 (to be
+precise: a DELL PC with a D920 Intel CPU)
+
+And of course the performance might depend on alignment, but I expect
+that the I/O buffers in the nand driver are aligned properly (and
+otherwise that should be fixed to get maximum performance).
+
+Let's give it a try...
+
+
+Attempt 2
+=========
+
+::
+
+  extern const char parity[256];
+
+  void ecc2(const unsigned char *buf, unsigned char *code)
+  {
+      int i;
+      const unsigned long *bp = (unsigned long *)buf;
+      unsigned long cur;
+      unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
+      unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
+      unsigned long par;
+
+      par = 0;
+      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
+      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
+      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
+      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
+
+      for (i = 0; i < 64; i++)
+      {
+          cur = *bp++;
+          par ^= cur;
+          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
+          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
+          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
+          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
+          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
+          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
+      }
+      /*
+         we need to adapt the code generation for the fact that rp vars are now
+         long; also the column parity calculation needs to be changed.
+         we'll bring rp4 to 15 back to single byte entities by shifting and
+         xoring
+      */
+      rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;
+      rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;
+      rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;
+      rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;
+      rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;
+      rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;
+      rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;
+      rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;
+      rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;
+      rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;
+      rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;
+      rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;
+      rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;
+      rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;
+      par ^= (par >> 16);
+      rp1 = (par >> 8); rp1 &= 0xff;
+      rp0 = (par & 0xff);
+      par ^= (par >> 8); par &= 0xff;
+
+      code[0] =
+          (parity[rp7] << 7) |
+          (parity[rp6] << 6) |
+          (parity[rp5] << 5) |
+          (parity[rp4] << 4) |
+          (parity[rp3] << 3) |
+          (parity[rp2] << 2) |
+          (parity[rp1] << 1) |
+          (parity[rp0]);
+      code[1] =
+          (parity[rp15] << 7) |
+          (parity[rp14] << 6) |
+          (parity[rp13] << 5) |
+          (parity[rp12] << 4) |
+          (parity[rp11] << 3) |
+          (parity[rp10] << 2) |
+          (parity[rp9]  << 1) |
+          (parity[rp8]);
+      code[2] =
+          (parity[par & 0xf0] << 7) |
+          (parity[par & 0x0f] << 6) |
+          (parity[par & 0xcc] << 5) |
+          (parity[par & 0x33] << 4) |
+          (parity[par & 0xaa] << 3) |
+          (parity[par & 0x55] << 2);
+      code[0] = ~code[0];
+      code[1] = ~code[1];
+      code[2] = ~code[2];
+  }
+
+The parity array is not shown any more. Note also that for these
+examples I kinda deviated from my regular programming style by allowing
+multiple statements on a line, not using { } in then and else blocks
+with only a single statement and by using operators like ^=
+
+
+Analysis 2
+==========
+
+The code (of course) works, and hurray: we are a little bit faster than
+the linux driver code (about 15%). But wait, don't cheer too quickly.
+There is more to be gained.
+If we look at e.g. rp14 and rp15 we see that we either xor our data with
+rp14 or with rp15. However we also have par which goes over all data.
+This means there is no need to calculate rp14 as it can be calculated from
+rp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
+(or if desired we can avoid calculating rp15 and calculate it from
+rp14).  That is why some places refer to inverse parity.
+Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
+Effectively this means we can eliminate the else clause from the if
+statements. Also we can optimise the calculation in the end a little bit
+by going from long to byte first. Actually we can even avoid the table
+lookups
+
+Attempt 3
+=========
+
+Odd replaced::
+
+          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
+          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
+          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
+          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
+          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
+          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
+
+with::
+
+          if (i & 0x01) rp5 ^= cur;
+          if (i & 0x02) rp7 ^= cur;
+          if (i & 0x04) rp9 ^= cur;
+          if (i & 0x08) rp11 ^= cur;
+          if (i & 0x10) rp13 ^= cur;
+          if (i & 0x20) rp15 ^= cur;
+
+and outside the loop added::
+
+          rp4  = par ^ rp5;
+          rp6  = par ^ rp7;
+          rp8  = par ^ rp9;
+          rp10  = par ^ rp11;
+          rp12  = par ^ rp13;
+          rp14  = par ^ rp15;
+
+And after that the code takes about 30% more time, although the number of
+statements is reduced. This is also reflected in the assembly code.
+
+
+Analysis 3
+==========
+
+Very weird. Guess it has to do with caching or instruction parallellism
+or so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting
+observation was that this one is only 30% slower (according to time)
+executing the code as my 3Ghz D920 processor.
+
+Well, it was expected not to be easy so maybe instead move to a
+different track: let's move back to the code from attempt2 and do some
+loop unrolling. This will eliminate a few if statements. I'll try
+different amounts of unrolling to see what works best.
+
+
+Attempt 4
+=========
+
+Unrolled the loop 1, 2, 3 and 4 times.
+For 4 the code starts with::
+
+    for (i = 0; i < 4; i++)
+    {
+        cur = *bp++;
+        par ^= cur;
+        rp4 ^= cur;
+        rp6 ^= cur;
+        rp8 ^= cur;
+        rp10 ^= cur;
+        if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;
+        if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;
+        cur = *bp++;
+        par ^= cur;
+        rp5 ^= cur;
+        rp6 ^= cur;
+        ...
+
+
+Analysis 4
+==========
+
+Unrolling once gains about 15%
+
+Unrolling twice keeps the gain at about 15%
+
+Unrolling three times gives a gain of 30% compared to attempt 2.
+
+Unrolling four times gives a marginal improvement compared to unrolling
+three times.
+
+I decided to proceed with a four time unrolled loop anyway. It was my gut
+feeling that in the next steps I would obtain additional gain from it.
+
+The next step was triggered by the fact that par contains the xor of all
+bytes and rp4 and rp5 each contain the xor of half of the bytes.
+So in effect par = rp4 ^ rp5. But as xor is commutative we can also say
+that rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can
+eliminate rp5 (or rp4, but I already foresaw another optimisation).
+The same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.
+
+
+Attempt 5
+=========
+
+Effectively so all odd digit rp assignments in the loop were removed.
+This included the else clause of the if statements.
+Of course after the loop we need to correct things by adding code like::
+
+    rp5 = par ^ rp4;
+
+Also the initial assignments (rp5 = 0; etc) could be removed.
+Along the line I also removed the initialisation of rp0/1/2/3.
+
+
+Analysis 5
+==========
+
+Measurements showed this was a good move. The run-time roughly halved
+compared with attempt 4 with 4 times unrolled, and we only require 1/3rd
+of the processor time compared to the current code in the linux kernel.
+
+However, still I thought there was more. I didn't like all the if
+statements. Why not keep a running parity and only keep the last if
+statement. Time for yet another version!
+
+
+Attempt 6
+=========
+
+THe code within the for loop was changed to::
+
+    for (i = 0; i < 4; i++)
+    {
+        cur = *bp++; tmppar  = cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
+
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
+
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp8 ^= cur;
+
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur;
+
+        par ^= tmppar;
+        if ((i & 0x1) == 0) rp12 ^= tmppar;
+        if ((i & 0x2) == 0) rp14 ^= tmppar;
+    }
+
+As you can see tmppar is used to accumulate the parity within a for
+iteration. In the last 3 statements is added to par and, if needed,
+to rp12 and rp14.
+
+While making the changes I also found that I could exploit that tmppar
+contains the running parity for this iteration. So instead of having:
+rp4 ^= cur; rp6 ^= cur;
+I removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
+statement. A similar change was done for rp8 and rp10
+
+
+Analysis 6
+==========
+
+Measuring this code again showed big gain. When executing the original
+linux code 1 million times, this took about 1 second on my system.
+(using time to measure the performance). After this iteration I was back
+to 0.075 sec. Actually I had to decide to start measuring over 10
+million iterations in order not to lose too much accuracy. This one
+definitely seemed to be the jackpot!
+
+There is a little bit more room for improvement though. There are three
+places with statements::
+
+	rp4 ^= cur; rp6 ^= cur;
+
+It seems more efficient to also maintain a variable rp4_6 in the while
+loop; This eliminates 3 statements per loop. Of course after the loop we
+need to correct by adding::
+
+	rp4 ^= rp4_6;
+	rp6 ^= rp4_6
+
+Furthermore there are 4 sequential assignments to rp8. This can be
+encoded slightly more efficiently by saving tmppar before those 4 lines
+and later do rp8 = rp8 ^ tmppar ^ notrp8;
+(where notrp8 is the value of rp8 before those 4 lines).
+Again a use of the commutative property of xor.
+Time for a new test!
+
+
+Attempt 7
+=========
+
+The new code now looks like::
+
+    for (i = 0; i < 4; i++)
+    {
+        cur = *bp++; tmppar  = cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
+
+        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
+
+        notrp8 = tmppar;
+        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur;
+        rp8 = rp8 ^ tmppar ^ notrp8;
+
+        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur;
+
+        par ^= tmppar;
+        if ((i & 0x1) == 0) rp12 ^= tmppar;
+        if ((i & 0x2) == 0) rp14 ^= tmppar;
+    }
+    rp4 ^= rp4_6;
+    rp6 ^= rp4_6;
+
+
+Not a big change, but every penny counts :-)
+
+
+Analysis 7
+==========
+
+Actually this made things worse. Not very much, but I don't want to move
+into the wrong direction. Maybe something to investigate later. Could
+have to do with caching again.
+
+Guess that is what there is to win within the loop. Maybe unrolling one
+more time will help. I'll keep the optimisations from 7 for now.
+
+
+Attempt 8
+=========
+
+Unrolled the loop one more time.
+
+
+Analysis 8
+==========
+
+This makes things worse. Let's stick with attempt 6 and continue from there.
+Although it seems that the code within the loop cannot be optimised
+further there is still room to optimize the generation of the ecc codes.
+We can simply calculate the total parity. If this is 0 then rp4 = rp5
+etc. If the parity is 1, then rp4 = !rp5;
+
+But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
+in the result byte and then do something like::
+
+    code[0] |= (code[0] << 1);
+
+Lets test this.
+
+
+Attempt 9
+=========
+
+Changed the code but again this slightly degrades performance. Tried all
+kind of other things, like having dedicated parity arrays to avoid the
+shift after parity[rp7] << 7; No gain.
+Change the lookup using the parity array by using shift operators (e.g.
+replace parity[rp7] << 7 with::
+
+	rp7 ^= (rp7 << 4);
+	rp7 ^= (rp7 << 2);
+	rp7 ^= (rp7 << 1);
+	rp7 &= 0x80;
+
+No gain.
+
+The only marginal change was inverting the parity bits, so we can remove
+the last three invert statements.
+
+Ah well, pity this does not deliver more. Then again 10 million
+iterations using the linux driver code takes between 13 and 13.5
+seconds, whereas my code now takes about 0.73 seconds for those 10
+million iterations. So basically I've improved the performance by a
+factor 18 on my system. Not that bad. Of course on different hardware
+you will get different results. No warranties!
+
+But of course there is no such thing as a free lunch. The codesize almost
+tripled (from 562 bytes to 1434 bytes). Then again, it is not that much.
+
+
+Correcting errors
+=================
+
+For correcting errors I again used the ST application note as a starter,
+but I also peeked at the existing code.
+
+The algorithm itself is pretty straightforward. Just xor the given and
+the calculated ecc. If all bytes are 0 there is no problem. If 11 bits
+are 1 we have one correctable bit error. If there is 1 bit 1, we have an
+error in the given ecc code.
+
+It proved to be fastest to do some table lookups. Performance gain
+introduced by this is about a factor 2 on my system when a repair had to
+be done, and 1% or so if no repair had to be done.
+
+Code size increased from 330 bytes to 686 bytes for this function.
+(gcc 4.2, -O3)
+
+
+Conclusion
+==========
+
+The gain when calculating the ecc is tremendous. Om my development hardware
+a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
+embedded system with a MIPS core a factor 7 was obtained.
+
+On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
+5 (big endian mode, gcc 4.1.2, -O3)
+
+For correction not much gain could be obtained (as bitflips are rare). Then
+again there are also much less cycles spent there.
+
+It seems there is not much more gain possible in this, at least when
+programmed in C. Of course it might be possible to squeeze something more
+out of it with an assembler program, but due to pipeline behaviour etc
+this is very tricky (at least for intel hw).
+
+Author: Frans Meulenbroeks
+
+Copyright (C) 2008 Koninklijke Philips Electronics NV.
diff --git a/Documentation/mtd/nand_ecc.txt b/Documentation/mtd/nand_ecc.txt
deleted file mode 100644
index f8c3284bf6a7..000000000000
--- a/Documentation/mtd/nand_ecc.txt
+++ /dev/null
@@ -1,714 +0,0 @@
-Introduction
-============
-
-Having looked at the linux mtd/nand driver and more specific at nand_ecc.c
-I felt there was room for optimisation. I bashed the code for a few hours
-performing tricks like table lookup removing superfluous code etc.
-After that the speed was increased by 35-40%.
-Still I was not too happy as I felt there was additional room for improvement.
-
-Bad! I was hooked.
-I decided to annotate my steps in this file. Perhaps it is useful to someone
-or someone learns something from it.
-
-
-The problem
-===========
-
-NAND flash (at least SLC one) typically has sectors of 256 bytes.
-However NAND flash is not extremely reliable so some error detection
-(and sometimes correction) is needed.
-
-This is done by means of a Hamming code. I'll try to explain it in
-laymans terms (and apologies to all the pro's in the field in case I do
-not use the right terminology, my coding theory class was almost 30
-years ago, and I must admit it was not one of my favourites).
-
-As I said before the ecc calculation is performed on sectors of 256
-bytes. This is done by calculating several parity bits over the rows and
-columns. The parity used is even parity which means that the parity bit = 1
-if the data over which the parity is calculated is 1 and the parity bit = 0
-if the data over which the parity is calculated is 0. So the total
-number of bits over the data over which the parity is calculated + the
-parity bit is even. (see wikipedia if you can't follow this).
-Parity is often calculated by means of an exclusive or operation,
-sometimes also referred to as xor. In C the operator for xor is ^
-
-Back to ecc.
-Let's give a small figure:
-
-byte   0:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp4 ... rp14
-byte   1:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp2 rp4 ... rp14
-byte   2:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp4 ... rp14
-byte   3:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp4 ... rp14
-byte   4:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp5 ... rp14
-....
-byte 254:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp5 ... rp15
-byte 255:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp5 ... rp15
-           cp1  cp0  cp1  cp0  cp1  cp0  cp1  cp0
-           cp3  cp3  cp2  cp2  cp3  cp3  cp2  cp2
-           cp5  cp5  cp5  cp5  cp4  cp4  cp4  cp4
-
-This figure represents a sector of 256 bytes.
-cp is my abbreviation for column parity, rp for row parity.
-
-Let's start to explain column parity.
-cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
-so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.
-Similarly cp1 is the sum of all bit1, bit3, bit5 and bit7.
-cp2 is the parity over bit0, bit1, bit4 and bit5
-cp3 is the parity over bit2, bit3, bit6 and bit7.
-cp4 is the parity over bit0, bit1, bit2 and bit3.
-cp5 is the parity over bit4, bit5, bit6 and bit7.
-Note that each of cp0 .. cp5 is exactly one bit.
-
-Row parity actually works almost the same.
-rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)
-rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)
-rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...
-(so handle two bytes, then skip 2 bytes).
-rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)
-for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.
-so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)
-and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..
-The story now becomes quite boring. I guess you get the idea.
-rp6 covers 8 bytes then skips 8 etc
-rp7 skips 8 bytes then covers 8 etc
-rp8 covers 16 bytes then skips 16 etc
-rp9 skips 16 bytes then covers 16 etc
-rp10 covers 32 bytes then skips 32 etc
-rp11 skips 32 bytes then covers 32 etc
-rp12 covers 64 bytes then skips 64 etc
-rp13 skips 64 bytes then covers 64 etc
-rp14 covers 128 bytes then skips 128
-rp15 skips 128 bytes then covers 128
-
-In the end the parity bits are grouped together in three bytes as
-follows:
-ECC    Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
-ECC 0   rp07  rp06  rp05  rp04  rp03  rp02  rp01  rp00
-ECC 1   rp15  rp14  rp13  rp12  rp11  rp10  rp09  rp08
-ECC 2   cp5   cp4   cp3   cp2   cp1   cp0      1     1
-
-I detected after writing this that ST application note AN1823
-(http://www.st.com/stonline/) gives a much
-nicer picture.(but they use line parity as term where I use row parity)
-Oh well, I'm graphically challenged, so suffer with me for a moment :-)
-And I could not reuse the ST picture anyway for copyright reasons.
-
-
-Attempt 0
-=========
-
-Implementing the parity calculation is pretty simple.
-In C pseudocode:
-for (i = 0; i < 256; i++)
-{
-    if (i & 0x01)
-       rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
-    else
-       rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
-    if (i & 0x02)
-       rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
-    else
-       rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;
-    if (i & 0x04)
-      rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;
-    else
-      rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;
-    if (i & 0x08)
-      rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;
-    else
-      rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;
-    if (i & 0x10)
-      rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;
-    else
-      rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;
-    if (i & 0x20)
-      rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
-    else
-      rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
-    if (i & 0x40)
-      rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
-    else
-      rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;
-    if (i & 0x80)
-      rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;
-    else
-      rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;
-    cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;
-    cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;
-    cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;
-    cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3
-    cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4
-    cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5
-}
-
-
-Analysis 0
-==========
-
-C does have bitwise operators but not really operators to do the above
-efficiently (and most hardware has no such instructions either).
-Therefore without implementing this it was clear that the code above was
-not going to bring me a Nobel prize :-)
-
-Fortunately the exclusive or operation is commutative, so we can combine
-the values in any order. So instead of calculating all the bits
-individually, let us try to rearrange things.
-For the column parity this is easy. We can just xor the bytes and in the
-end filter out the relevant bits. This is pretty nice as it will bring
-all cp calculation out of the for loop.
-
-Similarly we can first xor the bytes for the various rows.
-This leads to:
-
-
-Attempt 1
-=========
-
-const char parity[256] = {
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-    0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0
-};
-
-void ecc1(const unsigned char *buf, unsigned char *code)
-{
-    int i;
-    const unsigned char *bp = buf;
-    unsigned char cur;
-    unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
-    unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
-    unsigned char par;
-
-    par = 0;
-    rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
-    rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
-    rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
-    rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
-
-    for (i = 0; i < 256; i++)
-    {
-        cur = *bp++;
-        par ^= cur;
-        if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;
-        if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;
-        if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;
-        if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;
-        if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;
-        if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;
-        if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;
-        if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;
-    }
-    code[0] =
-        (parity[rp7] << 7) |
-        (parity[rp6] << 6) |
-        (parity[rp5] << 5) |
-        (parity[rp4] << 4) |
-        (parity[rp3] << 3) |
-        (parity[rp2] << 2) |
-        (parity[rp1] << 1) |
-        (parity[rp0]);
-    code[1] =
-        (parity[rp15] << 7) |
-        (parity[rp14] << 6) |
-        (parity[rp13] << 5) |
-        (parity[rp12] << 4) |
-        (parity[rp11] << 3) |
-        (parity[rp10] << 2) |
-        (parity[rp9]  << 1) |
-        (parity[rp8]);
-    code[2] =
-        (parity[par & 0xf0] << 7) |
-        (parity[par & 0x0f] << 6) |
-        (parity[par & 0xcc] << 5) |
-        (parity[par & 0x33] << 4) |
-        (parity[par & 0xaa] << 3) |
-        (parity[par & 0x55] << 2);
-    code[0] = ~code[0];
-    code[1] = ~code[1];
-    code[2] = ~code[2];
-}
-
-Still pretty straightforward. The last three invert statements are there to
-give a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash
-all data is 0xff, so the checksum then matches.
-
-I also introduced the parity lookup. I expected this to be the fastest
-way to calculate the parity, but I will investigate alternatives later
-on.
-
-
-Analysis 1
-==========
-
-The code works, but is not terribly efficient. On my system it took
-almost 4 times as much time as the linux driver code. But hey, if it was
-*that* easy this would have been done long before.
-No pain. no gain.
-
-Fortunately there is plenty of room for improvement.
-
-In step 1 we moved from bit-wise calculation to byte-wise calculation.
-However in C we can also use the unsigned long data type and virtually
-every modern microprocessor supports 32 bit operations, so why not try
-to write our code in such a way that we process data in 32 bit chunks.
-
-Of course this means some modification as the row parity is byte by
-byte. A quick analysis:
-for the column parity we use the par variable. When extending to 32 bits
-we can in the end easily calculate rp0 and rp1 from it.
-(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
-respectively, from MSB to LSB)
-also rp2 and rp3 can be easily retrieved from par as rp3 covers the
-first two MSBs and rp2 covers the last two LSBs.
-
-Note that of course now the loop is executed only 64 times (256/4).
-And note that care must taken wrt byte ordering. The way bytes are
-ordered in a long is machine dependent, and might affect us.
-Anyway, if there is an issue: this code is developed on x86 (to be
-precise: a DELL PC with a D920 Intel CPU)
-
-And of course the performance might depend on alignment, but I expect
-that the I/O buffers in the nand driver are aligned properly (and
-otherwise that should be fixed to get maximum performance).
-
-Let's give it a try...
-
-
-Attempt 2
-=========
-
-extern const char parity[256];
-
-void ecc2(const unsigned char *buf, unsigned char *code)
-{
-    int i;
-    const unsigned long *bp = (unsigned long *)buf;
-    unsigned long cur;
-    unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
-    unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
-    unsigned long par;
-
-    par = 0;
-    rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
-    rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
-    rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
-    rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
-
-    for (i = 0; i < 64; i++)
-    {
-        cur = *bp++;
-        par ^= cur;
-        if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
-        if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
-        if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
-        if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
-        if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
-        if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
-    }
-    /*
-       we need to adapt the code generation for the fact that rp vars are now
-       long; also the column parity calculation needs to be changed.
-       we'll bring rp4 to 15 back to single byte entities by shifting and
-       xoring
-    */
-    rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;
-    rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;
-    rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;
-    rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;
-    rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;
-    rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;
-    rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;
-    rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;
-    rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;
-    rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;
-    rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;
-    rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;
-    rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;
-    rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;
-    par ^= (par >> 16);
-    rp1 = (par >> 8); rp1 &= 0xff;
-    rp0 = (par & 0xff);
-    par ^= (par >> 8); par &= 0xff;
-
-    code[0] =
-        (parity[rp7] << 7) |
-        (parity[rp6] << 6) |
-        (parity[rp5] << 5) |
-        (parity[rp4] << 4) |
-        (parity[rp3] << 3) |
-        (parity[rp2] << 2) |
-        (parity[rp1] << 1) |
-        (parity[rp0]);
-    code[1] =
-        (parity[rp15] << 7) |
-        (parity[rp14] << 6) |
-        (parity[rp13] << 5) |
-        (parity[rp12] << 4) |
-        (parity[rp11] << 3) |
-        (parity[rp10] << 2) |
-        (parity[rp9]  << 1) |
-        (parity[rp8]);
-    code[2] =
-        (parity[par & 0xf0] << 7) |
-        (parity[par & 0x0f] << 6) |
-        (parity[par & 0xcc] << 5) |
-        (parity[par & 0x33] << 4) |
-        (parity[par & 0xaa] << 3) |
-        (parity[par & 0x55] << 2);
-    code[0] = ~code[0];
-    code[1] = ~code[1];
-    code[2] = ~code[2];
-}
-
-The parity array is not shown any more. Note also that for these
-examples I kinda deviated from my regular programming style by allowing
-multiple statements on a line, not using { } in then and else blocks
-with only a single statement and by using operators like ^=
-
-
-Analysis 2
-==========
-
-The code (of course) works, and hurray: we are a little bit faster than
-the linux driver code (about 15%). But wait, don't cheer too quickly.
-There is more to be gained.
-If we look at e.g. rp14 and rp15 we see that we either xor our data with
-rp14 or with rp15. However we also have par which goes over all data.
-This means there is no need to calculate rp14 as it can be calculated from
-rp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
-(or if desired we can avoid calculating rp15 and calculate it from
-rp14).  That is why some places refer to inverse parity.
-Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
-Effectively this means we can eliminate the else clause from the if
-statements. Also we can optimise the calculation in the end a little bit
-by going from long to byte first. Actually we can even avoid the table
-lookups
-
-Attempt 3
-=========
-
-Odd replaced:
-        if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
-        if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
-        if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
-        if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
-        if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
-        if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
-with
-        if (i & 0x01) rp5 ^= cur;
-        if (i & 0x02) rp7 ^= cur;
-        if (i & 0x04) rp9 ^= cur;
-        if (i & 0x08) rp11 ^= cur;
-        if (i & 0x10) rp13 ^= cur;
-        if (i & 0x20) rp15 ^= cur;
-
-        and outside the loop added:
-        rp4  = par ^ rp5;
-        rp6  = par ^ rp7;
-        rp8  = par ^ rp9;
-        rp10  = par ^ rp11;
-        rp12  = par ^ rp13;
-        rp14  = par ^ rp15;
-
-And after that the code takes about 30% more time, although the number of
-statements is reduced. This is also reflected in the assembly code.
-
-
-Analysis 3
-==========
-
-Very weird. Guess it has to do with caching or instruction parallellism
-or so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting
-observation was that this one is only 30% slower (according to time)
-executing the code as my 3Ghz D920 processor.
-
-Well, it was expected not to be easy so maybe instead move to a
-different track: let's move back to the code from attempt2 and do some
-loop unrolling. This will eliminate a few if statements. I'll try
-different amounts of unrolling to see what works best.
-
-
-Attempt 4
-=========
-
-Unrolled the loop 1, 2, 3 and 4 times.
-For 4 the code starts with:
-
-    for (i = 0; i < 4; i++)
-    {
-        cur = *bp++;
-        par ^= cur;
-        rp4 ^= cur;
-        rp6 ^= cur;
-        rp8 ^= cur;
-        rp10 ^= cur;
-        if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;
-        if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;
-        cur = *bp++;
-        par ^= cur;
-        rp5 ^= cur;
-        rp6 ^= cur;
-        ...
-
-
-Analysis 4
-==========
-
-Unrolling once gains about 15%
-Unrolling twice keeps the gain at about 15%
-Unrolling three times gives a gain of 30% compared to attempt 2.
-Unrolling four times gives a marginal improvement compared to unrolling
-three times.
-
-I decided to proceed with a four time unrolled loop anyway. It was my gut
-feeling that in the next steps I would obtain additional gain from it.
-
-The next step was triggered by the fact that par contains the xor of all
-bytes and rp4 and rp5 each contain the xor of half of the bytes.
-So in effect par = rp4 ^ rp5. But as xor is commutative we can also say
-that rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can
-eliminate rp5 (or rp4, but I already foresaw another optimisation).
-The same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.
-
-
-Attempt 5
-=========
-
-Effectively so all odd digit rp assignments in the loop were removed.
-This included the else clause of the if statements.
-Of course after the loop we need to correct things by adding code like:
-    rp5 = par ^ rp4;
-Also the initial assignments (rp5 = 0; etc) could be removed.
-Along the line I also removed the initialisation of rp0/1/2/3.
-
-
-Analysis 5
-==========
-
-Measurements showed this was a good move. The run-time roughly halved
-compared with attempt 4 with 4 times unrolled, and we only require 1/3rd
-of the processor time compared to the current code in the linux kernel.
-
-However, still I thought there was more. I didn't like all the if
-statements. Why not keep a running parity and only keep the last if
-statement. Time for yet another version!
-
-
-Attempt 6
-=========
-
-THe code within the for loop was changed to:
-
-    for (i = 0; i < 4; i++)
-    {
-        cur = *bp++; tmppar  = cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
-
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
-
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp8 ^= cur;
-
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur;
-
-        par ^= tmppar;
-        if ((i & 0x1) == 0) rp12 ^= tmppar;
-        if ((i & 0x2) == 0) rp14 ^= tmppar;
-    }
-
-As you can see tmppar is used to accumulate the parity within a for
-iteration. In the last 3 statements is added to par and, if needed,
-to rp12 and rp14.
-
-While making the changes I also found that I could exploit that tmppar
-contains the running parity for this iteration. So instead of having:
-rp4 ^= cur; rp6 ^= cur;
-I removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
-statement. A similar change was done for rp8 and rp10
-
-
-Analysis 6
-==========
-
-Measuring this code again showed big gain. When executing the original
-linux code 1 million times, this took about 1 second on my system.
-(using time to measure the performance). After this iteration I was back
-to 0.075 sec. Actually I had to decide to start measuring over 10
-million iterations in order not to lose too much accuracy. This one
-definitely seemed to be the jackpot!
-
-There is a little bit more room for improvement though. There are three
-places with statements:
-rp4 ^= cur; rp6 ^= cur;
-It seems more efficient to also maintain a variable rp4_6 in the while
-loop; This eliminates 3 statements per loop. Of course after the loop we
-need to correct by adding:
-    rp4 ^= rp4_6;
-    rp6 ^= rp4_6
-Furthermore there are 4 sequential assignments to rp8. This can be
-encoded slightly more efficiently by saving tmppar before those 4 lines
-and later do rp8 = rp8 ^ tmppar ^ notrp8;
-(where notrp8 is the value of rp8 before those 4 lines).
-Again a use of the commutative property of xor.
-Time for a new test!
-
-
-Attempt 7
-=========
-
-The new code now looks like:
-
-    for (i = 0; i < 4; i++)
-    {
-        cur = *bp++; tmppar  = cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
-
-        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
-
-        notrp8 = tmppar;
-        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur;
-        rp8 = rp8 ^ tmppar ^ notrp8;
-
-        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur;
-
-        par ^= tmppar;
-        if ((i & 0x1) == 0) rp12 ^= tmppar;
-        if ((i & 0x2) == 0) rp14 ^= tmppar;
-    }
-    rp4 ^= rp4_6;
-    rp6 ^= rp4_6;
-
-
-Not a big change, but every penny counts :-)
-
-
-Analysis 7
-==========
-
-Actually this made things worse. Not very much, but I don't want to move
-into the wrong direction. Maybe something to investigate later. Could
-have to do with caching again.
-
-Guess that is what there is to win within the loop. Maybe unrolling one
-more time will help. I'll keep the optimisations from 7 for now.
-
-
-Attempt 8
-=========
-
-Unrolled the loop one more time.
-
-
-Analysis 8
-==========
-
-This makes things worse. Let's stick with attempt 6 and continue from there.
-Although it seems that the code within the loop cannot be optimised
-further there is still room to optimize the generation of the ecc codes.
-We can simply calculate the total parity. If this is 0 then rp4 = rp5
-etc. If the parity is 1, then rp4 = !rp5;
-But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
-in the result byte and then do something like
-    code[0] |= (code[0] << 1);
-Lets test this.
-
-
-Attempt 9
-=========
-
-Changed the code but again this slightly degrades performance. Tried all
-kind of other things, like having dedicated parity arrays to avoid the
-shift after parity[rp7] << 7; No gain.
-Change the lookup using the parity array by using shift operators (e.g.
-replace parity[rp7] << 7 with:
-rp7 ^= (rp7 << 4);
-rp7 ^= (rp7 << 2);
-rp7 ^= (rp7 << 1);
-rp7 &= 0x80;
-No gain.
-
-The only marginal change was inverting the parity bits, so we can remove
-the last three invert statements.
-
-Ah well, pity this does not deliver more. Then again 10 million
-iterations using the linux driver code takes between 13 and 13.5
-seconds, whereas my code now takes about 0.73 seconds for those 10
-million iterations. So basically I've improved the performance by a
-factor 18 on my system. Not that bad. Of course on different hardware
-you will get different results. No warranties!
-
-But of course there is no such thing as a free lunch. The codesize almost
-tripled (from 562 bytes to 1434 bytes). Then again, it is not that much.
-
-
-Correcting errors
-=================
-
-For correcting errors I again used the ST application note as a starter,
-but I also peeked at the existing code.
-The algorithm itself is pretty straightforward. Just xor the given and
-the calculated ecc. If all bytes are 0 there is no problem. If 11 bits
-are 1 we have one correctable bit error. If there is 1 bit 1, we have an
-error in the given ecc code.
-It proved to be fastest to do some table lookups. Performance gain
-introduced by this is about a factor 2 on my system when a repair had to
-be done, and 1% or so if no repair had to be done.
-Code size increased from 330 bytes to 686 bytes for this function.
-(gcc 4.2, -O3)
-
-
-Conclusion
-==========
-
-The gain when calculating the ecc is tremendous. Om my development hardware
-a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
-embedded system with a MIPS core a factor 7 was obtained.
-On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
-5 (big endian mode, gcc 4.1.2, -O3)
-For correction not much gain could be obtained (as bitflips are rare). Then
-again there are also much less cycles spent there.
-
-It seems there is not much more gain possible in this, at least when
-programmed in C. Of course it might be possible to squeeze something more
-out of it with an assembler program, but due to pipeline behaviour etc
-this is very tricky (at least for intel hw).
-
-Author: Frans Meulenbroeks
-Copyright (C) 2008 Koninklijke Philips Electronics NV.
diff --git a/Documentation/mtd/spi-nor.rst b/Documentation/mtd/spi-nor.rst
new file mode 100644
index 000000000000..f5333e3bf486
--- /dev/null
+++ b/Documentation/mtd/spi-nor.rst
@@ -0,0 +1,66 @@
+=================
+SPI NOR framework
+=================
+
+Part I - Why do we need this framework?
+---------------------------------------
+
+SPI bus controllers (drivers/spi/) only deal with streams of bytes; the bus
+controller operates agnostic of the specific device attached. However, some
+controllers (such as Freescale's QuadSPI controller) cannot easily handle
+arbitrary streams of bytes, but rather are designed specifically for SPI NOR.
+
+In particular, Freescale's QuadSPI controller must know the NOR commands to
+find the right LUT sequence. Unfortunately, the SPI subsystem has no notion of
+opcodes, addresses, or data payloads; a SPI controller simply knows to send or
+receive bytes (Tx and Rx). Therefore, we must define a new layering scheme under
+which the controller driver is aware of the opcodes, addressing, and other
+details of the SPI NOR protocol.
+
+Part II - How does the framework work?
+--------------------------------------
+
+This framework just adds a new layer between the MTD and the SPI bus driver.
+With this new layer, the SPI NOR controller driver does not depend on the
+m25p80 code anymore.
+
+Before this framework, the layer is like::
+
+                   MTD
+         ------------------------
+                  m25p80
+         ------------------------
+	       SPI bus driver
+         ------------------------
+	        SPI NOR chip
+
+   After this framework, the layer is like:
+                   MTD
+         ------------------------
+              SPI NOR framework
+         ------------------------
+                  m25p80
+         ------------------------
+	       SPI bus driver
+         ------------------------
+	       SPI NOR chip
+
+  With the SPI NOR controller driver (Freescale QuadSPI), it looks like:
+                   MTD
+         ------------------------
+              SPI NOR framework
+         ------------------------
+                fsl-quadSPI
+         ------------------------
+	       SPI NOR chip
+
+Part III - How can drivers use the framework?
+---------------------------------------------
+
+The main API is spi_nor_scan(). Before you call the hook, a driver should
+initialize the necessary fields for spi_nor{}. Please see
+drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
+when you want to write a new driver for a SPI NOR controller.
+Another API is spi_nor_restore(), this is used to restore the status of SPI
+flash chip such as addressing mode. Call it whenever detach the driver from
+device or reboot the system.
diff --git a/Documentation/mtd/spi-nor.txt b/Documentation/mtd/spi-nor.txt
deleted file mode 100644
index da1fbff5a24c..000000000000
--- a/Documentation/mtd/spi-nor.txt
+++ /dev/null
@@ -1,65 +0,0 @@
-                          SPI NOR framework
-               ============================================
-
-Part I - Why do we need this framework?
----------------------------------------
-
-SPI bus controllers (drivers/spi/) only deal with streams of bytes; the bus
-controller operates agnostic of the specific device attached. However, some
-controllers (such as Freescale's QuadSPI controller) cannot easily handle
-arbitrary streams of bytes, but rather are designed specifically for SPI NOR.
-
-In particular, Freescale's QuadSPI controller must know the NOR commands to
-find the right LUT sequence. Unfortunately, the SPI subsystem has no notion of
-opcodes, addresses, or data payloads; a SPI controller simply knows to send or
-receive bytes (Tx and Rx). Therefore, we must define a new layering scheme under
-which the controller driver is aware of the opcodes, addressing, and other
-details of the SPI NOR protocol.
-
-Part II - How does the framework work?
---------------------------------------
-
-This framework just adds a new layer between the MTD and the SPI bus driver.
-With this new layer, the SPI NOR controller driver does not depend on the
-m25p80 code anymore.
-
-   Before this framework, the layer is like:
-
-                   MTD
-         ------------------------
-                  m25p80
-         ------------------------
-	       SPI bus driver
-         ------------------------
-	        SPI NOR chip
-
-   After this framework, the layer is like:
-                   MTD
-         ------------------------
-              SPI NOR framework
-         ------------------------
-                  m25p80
-         ------------------------
-	       SPI bus driver
-         ------------------------
-	       SPI NOR chip
-
-  With the SPI NOR controller driver (Freescale QuadSPI), it looks like:
-                   MTD
-         ------------------------
-              SPI NOR framework
-         ------------------------
-                fsl-quadSPI
-         ------------------------
-	       SPI NOR chip
-
-Part III - How can drivers use the framework?
----------------------------------------------
-
-The main API is spi_nor_scan(). Before you call the hook, a driver should
-initialize the necessary fields for spi_nor{}. Please see
-drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
-when you want to write a new driver for a SPI NOR controller.
-Another API is spi_nor_restore(), this is used to restore the status of SPI
-flash chip such as addressing mode. Call it whenever detach the driver from
-device or reboot the system.
diff --git a/drivers/mtd/nand/raw/nand_ecc.c b/drivers/mtd/nand/raw/nand_ecc.c
index 223fbd8052b3..f6a7808db818 100644
--- a/drivers/mtd/nand/raw/nand_ecc.c
+++ b/drivers/mtd/nand/raw/nand_ecc.c
@@ -11,7 +11,7 @@
  *   Thomas Gleixner (tglx@linutronix.de)
  *
  * Information on how this algorithm works and how it was developed
- * can be found in Documentation/mtd/nand_ecc.txt
+ * can be found in Documentation/mtd/nand_ecc.rst
  */
 
 #include <linux/types.h>
-- 
cgit v1.2.3-55-g7522


From b0a4aa950c68b5010831ecfc450510c64e4d80ba Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 14:21:14 -0300
Subject: docs: nvdimm: convert to ReST

Rename the nvdimm documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/nvdimm/btt.rst      | 285 ++++++++++++
 Documentation/nvdimm/btt.txt      | 273 ------------
 Documentation/nvdimm/index.rst    |  12 +
 Documentation/nvdimm/nvdimm.rst   | 887 ++++++++++++++++++++++++++++++++++++++
 Documentation/nvdimm/nvdimm.txt   | 815 ----------------------------------
 Documentation/nvdimm/security.rst | 143 ++++++
 Documentation/nvdimm/security.txt | 141 ------
 drivers/nvdimm/Kconfig            |   2 +-
 8 files changed, 1328 insertions(+), 1230 deletions(-)
 create mode 100644 Documentation/nvdimm/btt.rst
 delete mode 100644 Documentation/nvdimm/btt.txt
 create mode 100644 Documentation/nvdimm/index.rst
 create mode 100644 Documentation/nvdimm/nvdimm.rst
 delete mode 100644 Documentation/nvdimm/nvdimm.txt
 create mode 100644 Documentation/nvdimm/security.rst
 delete mode 100644 Documentation/nvdimm/security.txt

diff --git a/Documentation/nvdimm/btt.rst b/Documentation/nvdimm/btt.rst
new file mode 100644
index 000000000000..2d8269f834bd
--- /dev/null
+++ b/Documentation/nvdimm/btt.rst
@@ -0,0 +1,285 @@
+=============================
+BTT - Block Translation Table
+=============================
+
+
+1. Introduction
+===============
+
+Persistent memory based storage is able to perform IO at byte (or more
+accurately, cache line) granularity. However, we often want to expose such
+storage as traditional block devices. The block drivers for persistent memory
+will do exactly this. However, they do not provide any atomicity guarantees.
+Traditional SSDs typically provide protection against torn sectors in hardware,
+using stored energy in capacitors to complete in-flight block writes, or perhaps
+in firmware. We don't have this luxury with persistent memory - if a write is in
+progress, and we experience a power failure, the block will contain a mix of old
+and new data. Applications may not be prepared to handle such a scenario.
+
+The Block Translation Table (BTT) provides atomic sector update semantics for
+persistent memory devices, so that applications that rely on sector writes not
+being torn can continue to do so. The BTT manifests itself as a stacked block
+device, and reserves a portion of the underlying storage for its metadata. At
+the heart of it, is an indirection table that re-maps all the blocks on the
+volume. It can be thought of as an extremely simple file system that only
+provides atomic sector updates.
+
+
+2. Static Layout
+================
+
+The underlying storage on which a BTT can be laid out is not limited in any way.
+The BTT, however, splits the available space into chunks of up to 512 GiB,
+called "Arenas".
+
+Each arena follows the same layout for its metadata, and all references in an
+arena are internal to it (with the exception of one field that points to the
+next arena). The following depicts the "On-disk" metadata layout::
+
+
+    Backing Store     +------->  Arena
+  +---------------+   |   +------------------+
+  |               |   |   | Arena info block |
+  |    Arena 0    +---+   |       4K         |
+  |     512G      |       +------------------+
+  |               |       |                  |
+  +---------------+       |                  |
+  |               |       |                  |
+  |    Arena 1    |       |   Data Blocks    |
+  |     512G      |       |                  |
+  |               |       |                  |
+  +---------------+       |                  |
+  |       .       |       |                  |
+  |       .       |       |                  |
+  |       .       |       |                  |
+  |               |       |                  |
+  |               |       |                  |
+  +---------------+       +------------------+
+                          |                  |
+                          |     BTT Map      |
+                          |                  |
+                          |                  |
+                          +------------------+
+                          |                  |
+                          |     BTT Flog     |
+                          |                  |
+                          +------------------+
+                          | Info block copy  |
+                          |       4K         |
+                          +------------------+
+
+
+3. Theory of Operation
+======================
+
+
+a. The BTT Map
+--------------
+
+The map is a simple lookup/indirection table that maps an LBA to an internal
+block. Each map entry is 32 bits. The two most significant bits are special
+flags, and the remaining form the internal block number.
+
+======== =============================================================
+Bit      Description
+======== =============================================================
+31 - 30	 Error and Zero flags - Used in the following way:
+
+	   == ==  ====================================================
+	   31 30  Description
+	   == ==  ====================================================
+	   0  0	  Initial state. Reads return zeroes; Premap = Postmap
+	   0  1	  Zero state: Reads return zeroes
+	   1  0	  Error state: Reads fail; Writes clear 'E' bit
+	   1  1	  Normal Block – has valid postmap
+	   == ==  ====================================================
+
+29 - 0	 Mappings to internal 'postmap' blocks
+======== =============================================================
+
+
+Some of the terminology that will be subsequently used:
+
+============	================================================================
+External LBA	LBA as made visible to upper layers.
+ABA		Arena Block Address - Block offset/number within an arena
+Premap ABA	The block offset into an arena, which was decided upon by range
+		checking the External LBA
+Postmap ABA	The block number in the "Data Blocks" area obtained after
+		indirection from the map
+nfree		The number of free blocks that are maintained at any given time.
+		This is the number of concurrent writes that can happen to the
+		arena.
+============	================================================================
+
+
+For example, after adding a BTT, we surface a disk of 1024G. We get a read for
+the external LBA at 768G. This falls into the second arena, and of the 512G
+worth of blocks that this arena contributes, this block is at 256G. Thus, the
+premap ABA is 256G. We now refer to the map, and find out the mapping for block
+'X' (256G) points to block 'Y', say '64'. Thus the postmap ABA is 64.
+
+
+b. The BTT Flog
+---------------
+
+The BTT provides sector atomicity by making every write an "allocating write",
+i.e. Every write goes to a "free" block. A running list of free blocks is
+maintained in the form of the BTT flog. 'Flog' is a combination of the words
+"free list" and "log". The flog contains 'nfree' entries, and an entry contains:
+
+========  =====================================================================
+lba       The premap ABA that is being written to
+old_map   The old postmap ABA - after 'this' write completes, this will be a
+	  free block.
+new_map   The new postmap ABA. The map will up updated to reflect this
+	  lba->postmap_aba mapping, but we log it here in case we have to
+	  recover.
+seq	  Sequence number to mark which of the 2 sections of this flog entry is
+	  valid/newest. It cycles between 01->10->11->01 (binary) under normal
+	  operation, with 00 indicating an uninitialized state.
+lba'	  alternate lba entry
+old_map'  alternate old postmap entry
+new_map'  alternate new postmap entry
+seq'	  alternate sequence number.
+========  =====================================================================
+
+Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also
+padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are
+done such that for any entry being written, it:
+a. overwrites the 'old' section in the entry based on sequence numbers
+b. writes the 'new' section such that the sequence number is written last.
+
+
+c. The concept of lanes
+-----------------------
+
+While 'nfree' describes the number of concurrent IOs an arena can process
+concurrently, 'nlanes' is the number of IOs the BTT device as a whole can
+process::
+
+	nlanes = min(nfree, num_cpus)
+
+A lane number is obtained at the start of any IO, and is used for indexing into
+all the on-disk and in-memory data structures for the duration of the IO. If
+there are more CPUs than the max number of available lanes, than lanes are
+protected by spinlocks.
+
+
+d. In-memory data structure: Read Tracking Table (RTT)
+------------------------------------------------------
+
+Consider a case where we have two threads, one doing reads and the other,
+writes. We can hit a condition where the writer thread grabs a free block to do
+a new IO, but the (slow) reader thread is still reading from it. In other words,
+the reader consulted a map entry, and started reading the corresponding block. A
+writer started writing to the same external LBA, and finished the write updating
+the map for that external LBA to point to its new postmap ABA. At this point the
+internal, postmap block that the reader is (still) reading has been inserted
+into the list of free blocks. If another write comes in for the same LBA, it can
+grab this free block, and start writing to it, causing the reader to read
+incorrect data. To prevent this, we introduce the RTT.
+
+The RTT is a simple, per arena table with 'nfree' entries. Every reader inserts
+into rtt[lane_number], the postmap ABA it is reading, and clears it after the
+read is complete. Every writer thread, after grabbing a free block, checks the
+RTT for its presence. If the postmap free block is in the RTT, it waits till the
+reader clears the RTT entry, and only then starts writing to it.
+
+
+e. In-memory data structure: map locks
+--------------------------------------
+
+Consider a case where two writer threads are writing to the same LBA. There can
+be a race in the following sequence of steps::
+
+	free[lane] = map[premap_aba]
+	map[premap_aba] = postmap_aba
+
+Both threads can update their respective free[lane] with the same old, freed
+postmap_aba. This has made the layout inconsistent by losing a free entry, and
+at the same time, duplicating another free entry for two lanes.
+
+To solve this, we could have a single map lock (per arena) that has to be taken
+before performing the above sequence, but we feel that could be too contentious.
+Instead we use an array of (nfree) map_locks that is indexed by
+(premap_aba modulo nfree).
+
+
+f. Reconstruction from the Flog
+-------------------------------
+
+On startup, we analyze the BTT flog to create our list of free blocks. We walk
+through all the entries, and for each lane, of the set of two possible
+'sections', we always look at the most recent one only (based on the sequence
+number). The reconstruction rules/steps are simple:
+
+- Read map[log_entry.lba].
+- If log_entry.new matches the map entry, then log_entry.old is free.
+- If log_entry.new does not match the map entry, then log_entry.new is free.
+  (This case can only be caused by power-fails/unsafe shutdowns)
+
+
+g. Summarizing - Read and Write flows
+-------------------------------------
+
+Read:
+
+1.  Convert external LBA to arena number + pre-map ABA
+2.  Get a lane (and take lane_lock)
+3.  Read map to get the entry for this pre-map ABA
+4.  Enter post-map ABA into RTT[lane]
+5.  If TRIM flag set in map, return zeroes, and end IO (go to step 8)
+6.  If ERROR flag set in map, end IO with EIO (go to step 8)
+7.  Read data from this block
+8.  Remove post-map ABA entry from RTT[lane]
+9.  Release lane (and lane_lock)
+
+Write:
+
+1.  Convert external LBA to Arena number + pre-map ABA
+2.  Get a lane (and take lane_lock)
+3.  Use lane to index into in-memory free list and obtain a new block, next flog
+    index, next sequence number
+4.  Scan the RTT to check if free block is present, and spin/wait if it is.
+5.  Write data to this free block
+6.  Read map to get the existing post-map ABA entry for this pre-map ABA
+7.  Write flog entry: [premap_aba / old postmap_aba / new postmap_aba / seq_num]
+8.  Write new post-map ABA into map.
+9.  Write old post-map entry into the free list
+10. Calculate next sequence number and write into the free list entry
+11. Release lane (and lane_lock)
+
+
+4. Error Handling
+=================
+
+An arena would be in an error state if any of the metadata is corrupted
+irrecoverably, either due to a bug or a media error. The following conditions
+indicate an error:
+
+- Info block checksum does not match (and recovering from the copy also fails)
+- All internal available blocks are not uniquely and entirely addressed by the
+  sum of mapped blocks and free blocks (from the BTT flog).
+- Rebuilding free list from the flog reveals missing/duplicate/impossible
+  entries
+- A map entry is out of bounds
+
+If any of these error conditions are encountered, the arena is put into a read
+only state using a flag in the info block.
+
+
+5. Usage
+========
+
+The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
+(pmem, or blk mode). The easiest way to set up such a namespace is using the
+'ndctl' utility [1]:
+
+For example, the ndctl command line to setup a btt with a 4k sector size is::
+
+    ndctl create-namespace -f -e namespace0.0 -m sector -l 4k
+
+See ndctl create-namespace --help for more options.
+
+[1]: https://github.com/pmem/ndctl
diff --git a/Documentation/nvdimm/btt.txt b/Documentation/nvdimm/btt.txt
deleted file mode 100644
index e293fb664924..000000000000
--- a/Documentation/nvdimm/btt.txt
+++ /dev/null
@@ -1,273 +0,0 @@
-BTT - Block Translation Table
-=============================
-
-
-1. Introduction
----------------
-
-Persistent memory based storage is able to perform IO at byte (or more
-accurately, cache line) granularity. However, we often want to expose such
-storage as traditional block devices. The block drivers for persistent memory
-will do exactly this. However, they do not provide any atomicity guarantees.
-Traditional SSDs typically provide protection against torn sectors in hardware,
-using stored energy in capacitors to complete in-flight block writes, or perhaps
-in firmware. We don't have this luxury with persistent memory - if a write is in
-progress, and we experience a power failure, the block will contain a mix of old
-and new data. Applications may not be prepared to handle such a scenario.
-
-The Block Translation Table (BTT) provides atomic sector update semantics for
-persistent memory devices, so that applications that rely on sector writes not
-being torn can continue to do so. The BTT manifests itself as a stacked block
-device, and reserves a portion of the underlying storage for its metadata. At
-the heart of it, is an indirection table that re-maps all the blocks on the
-volume. It can be thought of as an extremely simple file system that only
-provides atomic sector updates.
-
-
-2. Static Layout
-----------------
-
-The underlying storage on which a BTT can be laid out is not limited in any way.
-The BTT, however, splits the available space into chunks of up to 512 GiB,
-called "Arenas".
-
-Each arena follows the same layout for its metadata, and all references in an
-arena are internal to it (with the exception of one field that points to the
-next arena). The following depicts the "On-disk" metadata layout:
-
-
-  Backing Store     +------->  Arena
-+---------------+   |   +------------------+
-|               |   |   | Arena info block |
-|    Arena 0    +---+   |       4K         |
-|     512G      |       +------------------+
-|               |       |                  |
-+---------------+       |                  |
-|               |       |                  |
-|    Arena 1    |       |   Data Blocks    |
-|     512G      |       |                  |
-|               |       |                  |
-+---------------+       |                  |
-|       .       |       |                  |
-|       .       |       |                  |
-|       .       |       |                  |
-|               |       |                  |
-|               |       |                  |
-+---------------+       +------------------+
-                        |                  |
-                        |     BTT Map      |
-                        |                  |
-                        |                  |
-                        +------------------+
-                        |                  |
-                        |     BTT Flog     |
-                        |                  |
-                        +------------------+
-                        | Info block copy  |
-                        |       4K         |
-                        +------------------+
-
-
-3. Theory of Operation
-----------------------
-
-
-a. The BTT Map
---------------
-
-The map is a simple lookup/indirection table that maps an LBA to an internal
-block. Each map entry is 32 bits. The two most significant bits are special
-flags, and the remaining form the internal block number.
-
-Bit      Description
-31 - 30	: Error and Zero flags - Used in the following way:
-	 Bit		      Description
-	31 30
-	-----------------------------------------------------------------------
-	 00	Initial state. Reads return zeroes; Premap = Postmap
-	 01	Zero state: Reads return zeroes
-	 10	Error state: Reads fail; Writes clear 'E' bit
-	 11	Normal Block – has valid postmap
-
-
-29 - 0	: Mappings to internal 'postmap' blocks
-
-
-Some of the terminology that will be subsequently used:
-
-External LBA  : LBA as made visible to upper layers.
-ABA           : Arena Block Address - Block offset/number within an arena
-Premap ABA    : The block offset into an arena, which was decided upon by range
-		checking the External LBA
-Postmap ABA   : The block number in the "Data Blocks" area obtained after
-		indirection from the map
-nfree	      : The number of free blocks that are maintained at any given time.
-		This is the number of concurrent writes that can happen to the
-		arena.
-
-
-For example, after adding a BTT, we surface a disk of 1024G. We get a read for
-the external LBA at 768G. This falls into the second arena, and of the 512G
-worth of blocks that this arena contributes, this block is at 256G. Thus, the
-premap ABA is 256G. We now refer to the map, and find out the mapping for block
-'X' (256G) points to block 'Y', say '64'. Thus the postmap ABA is 64.
-
-
-b. The BTT Flog
----------------
-
-The BTT provides sector atomicity by making every write an "allocating write",
-i.e. Every write goes to a "free" block. A running list of free blocks is
-maintained in the form of the BTT flog. 'Flog' is a combination of the words
-"free list" and "log". The flog contains 'nfree' entries, and an entry contains:
-
-lba     : The premap ABA that is being written to
-old_map : The old postmap ABA - after 'this' write completes, this will be a
-	  free block.
-new_map : The new postmap ABA. The map will up updated to reflect this
-	  lba->postmap_aba mapping, but we log it here in case we have to
-	  recover.
-seq	: Sequence number to mark which of the 2 sections of this flog entry is
-	  valid/newest. It cycles between 01->10->11->01 (binary) under normal
-	  operation, with 00 indicating an uninitialized state.
-lba'	: alternate lba entry
-old_map': alternate old postmap entry
-new_map': alternate new postmap entry
-seq'	: alternate sequence number.
-
-Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also
-padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are
-done such that for any entry being written, it:
-a. overwrites the 'old' section in the entry based on sequence numbers
-b. writes the 'new' section such that the sequence number is written last.
-
-
-c. The concept of lanes
------------------------
-
-While 'nfree' describes the number of concurrent IOs an arena can process
-concurrently, 'nlanes' is the number of IOs the BTT device as a whole can
-process.
- nlanes = min(nfree, num_cpus)
-A lane number is obtained at the start of any IO, and is used for indexing into
-all the on-disk and in-memory data structures for the duration of the IO. If
-there are more CPUs than the max number of available lanes, than lanes are
-protected by spinlocks.
-
-
-d. In-memory data structure: Read Tracking Table (RTT)
-------------------------------------------------------
-
-Consider a case where we have two threads, one doing reads and the other,
-writes. We can hit a condition where the writer thread grabs a free block to do
-a new IO, but the (slow) reader thread is still reading from it. In other words,
-the reader consulted a map entry, and started reading the corresponding block. A
-writer started writing to the same external LBA, and finished the write updating
-the map for that external LBA to point to its new postmap ABA. At this point the
-internal, postmap block that the reader is (still) reading has been inserted
-into the list of free blocks. If another write comes in for the same LBA, it can
-grab this free block, and start writing to it, causing the reader to read
-incorrect data. To prevent this, we introduce the RTT.
-
-The RTT is a simple, per arena table with 'nfree' entries. Every reader inserts
-into rtt[lane_number], the postmap ABA it is reading, and clears it after the
-read is complete. Every writer thread, after grabbing a free block, checks the
-RTT for its presence. If the postmap free block is in the RTT, it waits till the
-reader clears the RTT entry, and only then starts writing to it.
-
-
-e. In-memory data structure: map locks
---------------------------------------
-
-Consider a case where two writer threads are writing to the same LBA. There can
-be a race in the following sequence of steps:
-
-free[lane] = map[premap_aba]
-map[premap_aba] = postmap_aba
-
-Both threads can update their respective free[lane] with the same old, freed
-postmap_aba. This has made the layout inconsistent by losing a free entry, and
-at the same time, duplicating another free entry for two lanes.
-
-To solve this, we could have a single map lock (per arena) that has to be taken
-before performing the above sequence, but we feel that could be too contentious.
-Instead we use an array of (nfree) map_locks that is indexed by
-(premap_aba modulo nfree).
-
-
-f. Reconstruction from the Flog
--------------------------------
-
-On startup, we analyze the BTT flog to create our list of free blocks. We walk
-through all the entries, and for each lane, of the set of two possible
-'sections', we always look at the most recent one only (based on the sequence
-number). The reconstruction rules/steps are simple:
-- Read map[log_entry.lba].
-- If log_entry.new matches the map entry, then log_entry.old is free.
-- If log_entry.new does not match the map entry, then log_entry.new is free.
-  (This case can only be caused by power-fails/unsafe shutdowns)
-
-
-g. Summarizing - Read and Write flows
--------------------------------------
-
-Read:
-
-1.  Convert external LBA to arena number + pre-map ABA
-2.  Get a lane (and take lane_lock)
-3.  Read map to get the entry for this pre-map ABA
-4.  Enter post-map ABA into RTT[lane]
-5.  If TRIM flag set in map, return zeroes, and end IO (go to step 8)
-6.  If ERROR flag set in map, end IO with EIO (go to step 8)
-7.  Read data from this block
-8.  Remove post-map ABA entry from RTT[lane]
-9.  Release lane (and lane_lock)
-
-Write:
-
-1.  Convert external LBA to Arena number + pre-map ABA
-2.  Get a lane (and take lane_lock)
-3.  Use lane to index into in-memory free list and obtain a new block, next flog
-        index, next sequence number
-4.  Scan the RTT to check if free block is present, and spin/wait if it is.
-5.  Write data to this free block
-6.  Read map to get the existing post-map ABA entry for this pre-map ABA
-7.  Write flog entry: [premap_aba / old postmap_aba / new postmap_aba / seq_num]
-8.  Write new post-map ABA into map.
-9.  Write old post-map entry into the free list
-10. Calculate next sequence number and write into the free list entry
-11. Release lane (and lane_lock)
-
-
-4. Error Handling
-=================
-
-An arena would be in an error state if any of the metadata is corrupted
-irrecoverably, either due to a bug or a media error. The following conditions
-indicate an error:
-- Info block checksum does not match (and recovering from the copy also fails)
-- All internal available blocks are not uniquely and entirely addressed by the
-  sum of mapped blocks and free blocks (from the BTT flog).
-- Rebuilding free list from the flog reveals missing/duplicate/impossible
-  entries
-- A map entry is out of bounds
-
-If any of these error conditions are encountered, the arena is put into a read
-only state using a flag in the info block.
-
-
-5. Usage
-========
-
-The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
-(pmem, or blk mode). The easiest way to set up such a namespace is using the
-'ndctl' utility [1]:
-
-For example, the ndctl command line to setup a btt with a 4k sector size is:
-
-    ndctl create-namespace -f -e namespace0.0 -m sector -l 4k
-
-See ndctl create-namespace --help for more options.
-
-[1]: https://github.com/pmem/ndctl
-
diff --git a/Documentation/nvdimm/index.rst b/Documentation/nvdimm/index.rst
new file mode 100644
index 000000000000..1a3402d3775e
--- /dev/null
+++ b/Documentation/nvdimm/index.rst
@@ -0,0 +1,12 @@
+:orphan:
+
+===================================
+Non-Volatile Memory Device (NVDIMM)
+===================================
+
+.. toctree::
+   :maxdepth: 1
+
+   nvdimm
+   btt
+   security
diff --git a/Documentation/nvdimm/nvdimm.rst b/Documentation/nvdimm/nvdimm.rst
new file mode 100644
index 000000000000..08f855cbb4e6
--- /dev/null
+++ b/Documentation/nvdimm/nvdimm.rst
@@ -0,0 +1,887 @@
+===============================
+LIBNVDIMM: Non-Volatile Devices
+===============================
+
+libnvdimm - kernel / libndctl - userspace helper library
+
+linux-nvdimm@lists.01.org
+
+Version 13
+
+.. contents:
+
+	Glossary
+	Overview
+	    Supporting Documents
+	    Git Trees
+	LIBNVDIMM PMEM and BLK
+	Why BLK?
+	    PMEM vs BLK
+	        BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
+	Example NVDIMM Platform
+	LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
+	    LIBNDCTL: Context
+	        libndctl: instantiate a new library context example
+	    LIBNVDIMM/LIBNDCTL: Bus
+	        libnvdimm: control class device in /sys/class
+	        libnvdimm: bus
+	        libndctl: bus enumeration example
+	    LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
+	        libnvdimm: DIMM (NMEM)
+	        libndctl: DIMM enumeration example
+	    LIBNVDIMM/LIBNDCTL: Region
+	        libnvdimm: region
+	        libndctl: region enumeration example
+	        Why Not Encode the Region Type into the Region Name?
+	        How Do I Determine the Major Type of a Region?
+	    LIBNVDIMM/LIBNDCTL: Namespace
+	        libnvdimm: namespace
+	        libndctl: namespace enumeration example
+	        libndctl: namespace creation example
+	        Why the Term "namespace"?
+	    LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
+	        libnvdimm: btt layout
+	        libndctl: btt creation example
+	Summary LIBNDCTL Diagram
+
+
+Glossary
+========
+
+PMEM:
+  A system-physical-address range where writes are persistent.  A
+  block device composed of PMEM is capable of DAX.  A PMEM address range
+  may span an interleave of several DIMMs.
+
+BLK:
+  A set of one or more programmable memory mapped apertures provided
+  by a DIMM to access its media.  This indirection precludes the
+  performance benefit of interleaving, but enables DIMM-bounded failure
+  modes.
+
+DPA:
+  DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
+  the system there would be a 1:1 system-physical-address:DPA association.
+  Once more DIMMs are added a memory controller interleave must be
+  decoded to determine the DPA associated with a given
+  system-physical-address.  BLK capacity always has a 1:1 relationship
+  with a single-DIMM's DPA range.
+
+DAX:
+  File system extensions to bypass the page cache and block layer to
+  mmap persistent memory, from a PMEM block device, directly into a
+  process address space.
+
+DSM:
+  Device Specific Method: ACPI method to to control specific
+  device - in this case the firmware.
+
+DCR:
+  NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
+  It defines a vendor-id, device-id, and interface format for a given DIMM.
+
+BTT:
+  Block Translation Table: Persistent memory is byte addressable.
+  Existing software may have an expectation that the power-fail-atomicity
+  of writes is at least one sector, 512 bytes.  The BTT is an indirection
+  table with atomic update semantics to front a PMEM/BLK block device
+  driver and present arbitrary atomic sector sizes.
+
+LABEL:
+  Metadata stored on a DIMM device that partitions and identifies
+  (persistently names) storage between PMEM and BLK.  It also partitions
+  BLK storage to host BTTs with different parameters per BLK-partition.
+  Note that traditional partition tables, GPT/MBR, are layered on top of a
+  BLK or PMEM device.
+
+
+Overview
+========
+
+The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
+PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
+and BLK mode access.  These three modes of operation are described by
+the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6.  While the LIBNVDIMM
+implementation is generic and supports pre-NFIT platforms, it was guided
+by the superset of capabilities need to support this ACPI 6 definition
+for NVDIMM resources.  The bulk of the kernel implementation is in place
+to handle the case where DPA accessible via PMEM is aliased with DPA
+accessible via BLK.  When that occurs a LABEL is needed to reserve DPA
+for exclusive access via one mode a time.
+
+Supporting Documents
+--------------------
+
+ACPI 6:
+	http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
+NVDIMM Namespace:
+	http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+DSM Interface Example:
+	http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+Driver Writer's Guide:
+	http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
+
+Git Trees
+---------
+
+LIBNVDIMM:
+	https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
+LIBNDCTL:
+	https://github.com/pmem/ndctl.git
+PMEM:
+	https://github.com/01org/prd
+
+
+LIBNVDIMM PMEM and BLK
+======================
+
+Prior to the arrival of the NFIT, non-volatile memory was described to a
+system in various ad-hoc ways.  Usually only the bare minimum was
+provided, namely, a single system-physical-address range where writes
+are expected to be durable after a system power loss.  Now, the NFIT
+specification standardizes not only the description of PMEM, but also
+BLK and platform message-passing entry points for control and
+configuration.
+
+For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
+device driver:
+
+    1. PMEM (nd_pmem.ko): Drives a system-physical-address range.  This
+       range is contiguous in system memory and may be interleaved (hardware
+       memory controller striped) across multiple DIMMs.  When interleaved the
+       platform may optionally provide details of which DIMMs are participating
+       in the interleave.
+
+       Note that while LIBNVDIMM describes system-physical-address ranges that may
+       alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
+       alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
+       distinction.  The different device-types are an implementation detail
+       that userspace can exploit to implement policies like "only interface
+       with address ranges from certain DIMMs".  It is worth noting that when
+       aliasing is present and a DIMM lacks a label, then no block device can
+       be created by default as userspace needs to do at least one allocation
+       of DPA to the PMEM range.  In contrast ND_NAMESPACE_IO ranges, once
+       registered, can be immediately attached to nd_pmem.
+
+    2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
+       defined apertures.  A set of apertures will access just one DIMM.
+       Multiple windows (apertures) allow multiple concurrent accesses, much like
+       tagged-command-queuing, and would likely be used by different threads or
+       different CPUs.
+
+       The NFIT specification defines a standard format for a BLK-aperture, but
+       the spec also allows for vendor specific layouts, and non-NFIT BLK
+       implementations may have other designs for BLK I/O.  For this reason
+       "nd_blk" calls back into platform-specific code to perform the I/O.
+
+       One such implementation is defined in the "Driver Writer's Guide" and "DSM
+       Interface Example".
+
+
+Why BLK?
+========
+
+While PMEM provides direct byte-addressable CPU-load/store access to
+NVDIMM storage, it does not provide the best system RAS (recovery,
+availability, and serviceability) model.  An access to a corrupted
+system-physical-address address causes a CPU exception while an access
+to a corrupted address through an BLK-aperture causes that block window
+to raise an error status in a register.  The latter is more aligned with
+the standard error model that host-bus-adapter attached disks present.
+
+Also, if an administrator ever wants to replace a memory it is easier to
+service a system at DIMM module boundaries.  Compare this to PMEM where
+data could be interleaved in an opaque hardware specific manner across
+several DIMMs.
+
+PMEM vs BLK
+-----------
+
+BLK-apertures solve these RAS problems, but their presence is also the
+major contributing factor to the complexity of the ND subsystem.  They
+complicate the implementation because PMEM and BLK alias in DPA space.
+Any given DIMM's DPA-range may contribute to one or more
+system-physical-address sets of interleaved DIMMs, *and* may also be
+accessed in its entirety through its BLK-aperture.  Accessing a DPA
+through a system-physical-address while simultaneously accessing the
+same DPA through a BLK-aperture has undefined results.  For this reason,
+DIMMs with this dual interface configuration include a DSM function to
+store/retrieve a LABEL.  The LABEL effectively partitions the DPA-space
+into exclusive system-physical-address and BLK-aperture accessible
+regions.  For simplicity a DIMM is allowed a PMEM "region" per each
+interleave set in which it is a member.  The remaining DPA space can be
+carved into an arbitrary number of BLK devices with discontiguous
+extents.
+
+BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+One of the few
+reasons to allow multiple BLK namespaces per REGION is so that each
+BLK-namespace can be configured with a BTT with unique atomic sector
+sizes.  While a PMEM device can host a BTT the LABEL specification does
+not provide for a sector size to be specified for a PMEM namespace.
+
+This is due to the expectation that the primary usage model for PMEM is
+via DAX, and the BTT is incompatible with DAX.  However, for the cases
+where an application or filesystem still needs atomic sector update
+guarantees it can register a BTT on a PMEM device or partition.  See
+LIBNVDIMM/NDCTL: Block Translation Table "btt"
+
+
+Example NVDIMM Platform
+=======================
+
+For the remainder of this document the following diagram will be
+referenced for any example sysfs layouts::
+
+
+                               (a)               (b)           DIMM   BLK-REGION
+            +-------------------+--------+--------+--------+
+  +------+  |       pm0.0       | blk2.0 | pm1.0  | blk2.1 |    0      region2
+  | imc0 +--+- - - region0- - - +--------+        +--------+
+  +--+---+  |       pm0.0       | blk3.0 | pm1.0  | blk3.1 |    1      region3
+     |      +-------------------+--------v        v--------+
+  +--+---+                               |                 |
+  | cpu0 |                                     region1
+  +--+---+                               |                 |
+     |      +----------------------------^        ^--------+
+  +--+---+  |           blk4.0           | pm1.0  | blk4.0 |    2      region4
+  | imc1 +--+----------------------------|        +--------+
+  +------+  |           blk5.0           | pm1.0  | blk5.0 |    3      region5
+            +----------------------------+--------+--------+
+
+In this platform we have four DIMMs and two memory controllers in one
+socket.  Each unique interface (BLK or PMEM) to DPA space is identified
+by a region device with a dynamically assigned id (REGION0 - REGION5).
+
+    1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
+       single PMEM namespace is created in the REGION0-SPA-range that spans most
+       of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
+       interleaved system-physical-address range is reclaimed as BLK-aperture
+       accessed space starting at DPA-offset (a) into each DIMM.  In that
+       reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
+       REGION3 where "blk2.0" and "blk3.0" are just human readable names that
+       could be set to any user-desired name in the LABEL.
+
+    2. In the last portion of DIMM0 and DIMM1 we have an interleaved
+       system-physical-address range, REGION1, that spans those two DIMMs as
+       well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace
+       named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
+       each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
+       "blk5.0".
+
+    3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
+       interleaved system-physical-address range (i.e. the DPA address past
+       offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
+       Note, that this example shows that BLK-aperture namespaces don't need to
+       be contiguous in DPA-space.
+
+    This bus is provided by the kernel under the device
+    /sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
+    the nfit_test.ko module is loaded.  This not only test LIBNVDIMM but the
+    acpi_nfit.ko driver as well.
+
+
+LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
+========================================================
+
+What follows is a description of the LIBNVDIMM sysfs layout and a
+corresponding object hierarchy diagram as viewed through the LIBNDCTL
+API.  The example sysfs paths and diagrams are relative to the Example
+NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
+test.
+
+LIBNDCTL: Context
+-----------------
+
+Every API call in the LIBNDCTL library requires a context that holds the
+logging parameters and other library instance state.  The library is
+based on the libabc template:
+
+	https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
+
+LIBNDCTL: instantiate a new library context example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+	struct ndctl_ctx *ctx;
+
+	if (ndctl_new(&ctx) == 0)
+		return ctx;
+	else
+		return NULL;
+
+LIBNVDIMM/LIBNDCTL: Bus
+-----------------------
+
+A bus has a 1:1 relationship with an NFIT.  The current expectation for
+ACPI based systems is that there is only ever one platform-global NFIT.
+That said, it is trivial to register multiple NFITs, the specification
+does not preclude it.  The infrastructure supports multiple busses and
+we use this capability to test multiple NFIT configurations in the unit
+test.
+
+LIBNVDIMM: control class device in /sys/class
+---------------------------------------------
+
+This character device accepts DSM messages to be passed to DIMM
+identified by its NFIT handle::
+
+	/sys/class/nd/ndctl0
+	|-- dev
+	|-- device -> ../../../ndbus0
+	|-- subsystem -> ../../../../../../../class/nd
+
+
+
+LIBNVDIMM: bus
+--------------
+
+::
+
+	struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
+	       struct nvdimm_bus_descriptor *nfit_desc);
+
+::
+
+	/sys/devices/platform/nfit_test.0/ndbus0
+	|-- commands
+	|-- nd
+	|-- nfit
+	|-- nmem0
+	|-- nmem1
+	|-- nmem2
+	|-- nmem3
+	|-- power
+	|-- provider
+	|-- region0
+	|-- region1
+	|-- region2
+	|-- region3
+	|-- region4
+	|-- region5
+	|-- uevent
+	`-- wait_probe
+
+LIBNDCTL: bus enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Find the bus handle that describes the bus from Example NVDIMM Platform::
+
+	static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
+			const char *provider)
+	{
+		struct ndctl_bus *bus;
+
+		ndctl_bus_foreach(ctx, bus)
+			if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
+				return bus;
+
+		return NULL;
+	}
+
+	bus = get_bus_by_provider(ctx, "nfit_test.0");
+
+
+LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
+-------------------------------
+
+The DIMM device provides a character device for sending commands to
+hardware, and it is a container for LABELs.  If the DIMM is defined by
+NFIT then an optional 'nfit' attribute sub-directory is available to add
+NFIT-specifics.
+
+Note that the kernel device name for "DIMMs" is "nmemX".  The NFIT
+describes these devices via "Memory Device to System Physical Address
+Range Mapping Structure", and there is no requirement that they actually
+be physical DIMMs, so we use a more generic name.
+
+LIBNVDIMM: DIMM (NMEM)
+^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+	struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
+			const struct attribute_group **groups, unsigned long flags,
+			unsigned long *dsm_mask);
+
+::
+
+	/sys/devices/platform/nfit_test.0/ndbus0
+	|-- nmem0
+	|   |-- available_slots
+	|   |-- commands
+	|   |-- dev
+	|   |-- devtype
+	|   |-- driver -> ../../../../../bus/nd/drivers/nvdimm
+	|   |-- modalias
+	|   |-- nfit
+	|   |   |-- device
+	|   |   |-- format
+	|   |   |-- handle
+	|   |   |-- phys_id
+	|   |   |-- rev_id
+	|   |   |-- serial
+	|   |   `-- vendor
+	|   |-- state
+	|   |-- subsystem -> ../../../../../bus/nd
+	|   `-- uevent
+	|-- nmem1
+	[..]
+
+
+LIBNDCTL: DIMM enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note, in this example we are assuming NFIT-defined DIMMs which are
+identified by an "nfit_handle" a 32-bit value where:
+
+   - Bit 3:0 DIMM number within the memory channel
+   - Bit 7:4 memory channel number
+   - Bit 11:8 memory controller ID
+   - Bit 15:12 socket ID (within scope of a Node controller if node
+     controller is present)
+   - Bit 27:16 Node Controller ID
+   - Bit 31:28 Reserved
+
+::
+
+	static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
+	       unsigned int handle)
+	{
+		struct ndctl_dimm *dimm;
+
+		ndctl_dimm_foreach(bus, dimm)
+			if (ndctl_dimm_get_handle(dimm) == handle)
+				return dimm;
+
+		return NULL;
+	}
+
+	#define DIMM_HANDLE(n, s, i, c, d) \
+		(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
+		 | ((c & 0xf) << 4) | (d & 0xf))
+
+	dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
+
+LIBNVDIMM/LIBNDCTL: Region
+--------------------------
+
+A generic REGION device is registered for each PMEM range or BLK-aperture
+set.  Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
+sets on the "nfit_test.0" bus.  The primary role of regions are to be a
+container of "mappings".  A mapping is a tuple of <DIMM,
+DPA-start-offset, length>.
+
+LIBNVDIMM provides a built-in driver for these REGION devices.  This driver
+is responsible for reconciling the aliased DPA mappings across all
+regions, parsing the LABEL, if present, and then emitting NAMESPACE
+devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
+nd_blk device driver to consume.
+
+In addition to the generic attributes of "mapping"s, "interleave_ways"
+and "size" the REGION device also exports some convenience attributes.
+"nstype" indicates the integer type of namespace-device this region
+emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
+'add' event, "modalias" duplicates the MODALIAS variable stored by udev
+at the 'add' event, and finally, the optional "spa_index" is provided in
+the case where the region is defined by a SPA.
+
+LIBNVDIMM: region::
+
+	struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
+			struct nd_region_desc *ndr_desc);
+	struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
+			struct nd_region_desc *ndr_desc);
+
+::
+
+	/sys/devices/platform/nfit_test.0/ndbus0
+	|-- region0
+	|   |-- available_size
+	|   |-- btt0
+	|   |-- btt_seed
+	|   |-- devtype
+	|   |-- driver -> ../../../../../bus/nd/drivers/nd_region
+	|   |-- init_namespaces
+	|   |-- mapping0
+	|   |-- mapping1
+	|   |-- mappings
+	|   |-- modalias
+	|   |-- namespace0.0
+	|   |-- namespace_seed
+	|   |-- numa_node
+	|   |-- nfit
+	|   |   `-- spa_index
+	|   |-- nstype
+	|   |-- set_cookie
+	|   |-- size
+	|   |-- subsystem -> ../../../../../bus/nd
+	|   `-- uevent
+	|-- region1
+	[..]
+
+LIBNDCTL: region enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sample region retrieval routines based on NFIT-unique data like
+"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
+BLK::
+
+	static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
+			unsigned int spa_index)
+	{
+		struct ndctl_region *region;
+
+		ndctl_region_foreach(bus, region) {
+			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
+				continue;
+			if (ndctl_region_get_spa_index(region) == spa_index)
+				return region;
+		}
+		return NULL;
+	}
+
+	static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
+			unsigned int handle)
+	{
+		struct ndctl_region *region;
+
+		ndctl_region_foreach(bus, region) {
+			struct ndctl_mapping *map;
+
+			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
+				continue;
+			ndctl_mapping_foreach(region, map) {
+				struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
+
+				if (ndctl_dimm_get_handle(dimm) == handle)
+					return region;
+			}
+		}
+		return NULL;
+	}
+
+
+Why Not Encode the Region Type into the Region Name?
+----------------------------------------------------
+
+At first glance it seems since NFIT defines just PMEM and BLK interface
+types that we should simply name REGION devices with something derived
+from those type names.  However, the ND subsystem explicitly keeps the
+REGION name generic and expects userspace to always consider the
+region-attributes for four reasons:
+
+    1. There are already more than two REGION and "namespace" types.  For
+       PMEM there are two subtypes.  As mentioned previously we have PMEM where
+       the constituent DIMM devices are known and anonymous PMEM.  For BLK
+       regions the NFIT specification already anticipates vendor specific
+       implementations.  The exact distinction of what a region contains is in
+       the region-attributes not the region-name or the region-devtype.
+
+    2. A region with zero child-namespaces is a possible configuration.  For
+       example, the NFIT allows for a DCR to be published without a
+       corresponding BLK-aperture.  This equates to a DIMM that can only accept
+       control/configuration messages, but no i/o through a descendant block
+       device.  Again, this "type" is advertised in the attributes ('mappings'
+       == 0) and the name does not tell you much.
+
+    3. What if a third major interface type arises in the future?  Outside
+       of vendor specific implementations, it's not difficult to envision a
+       third class of interface type beyond BLK and PMEM.  With a generic name
+       for the REGION level of the device-hierarchy old userspace
+       implementations can still make sense of new kernel advertised
+       region-types.  Userspace can always rely on the generic region
+       attributes like "mappings", "size", etc and the expected child devices
+       named "namespace".  This generic format of the device-model hierarchy
+       allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
+       future-proof.
+
+    4. There are more robust mechanisms for determining the major type of a
+       region than a device name.  See the next section, How Do I Determine the
+       Major Type of a Region?
+
+How Do I Determine the Major Type of a Region?
+----------------------------------------------
+
+Outside of the blanket recommendation of "use libndctl", or simply
+looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
+"nstype" integer attribute, here are some other options.
+
+1. module alias lookup
+^^^^^^^^^^^^^^^^^^^^^^
+
+    The whole point of region/namespace device type differentiation is to
+    decide which block-device driver will attach to a given LIBNVDIMM namespace.
+    One can simply use the modalias to lookup the resulting module.  It's
+    important to note that this method is robust in the presence of a
+    vendor-specific driver down the road.  If a vendor-specific
+    implementation wants to supplant the standard nd_blk driver it can with
+    minimal impact to the rest of LIBNVDIMM.
+
+    In fact, a vendor may also want to have a vendor-specific region-driver
+    (outside of nd_region).  For example, if a vendor defined its own LABEL
+    format it would need its own region driver to parse that LABEL and emit
+    the resulting namespaces.  The output from module resolution is more
+    accurate than a region-name or region-devtype.
+
+2. udev
+^^^^^^^
+
+    The kernel "devtype" is registered in the udev database::
+
+	# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
+	P: /devices/platform/nfit_test.0/ndbus0/region0
+	E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
+	E: DEVTYPE=nd_pmem
+	E: MODALIAS=nd:t2
+	E: SUBSYSTEM=nd
+
+	# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
+	P: /devices/platform/nfit_test.0/ndbus0/region4
+	E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
+	E: DEVTYPE=nd_blk
+	E: MODALIAS=nd:t3
+	E: SUBSYSTEM=nd
+
+    ...and is available as a region attribute, but keep in mind that the
+    "devtype" does not indicate sub-type variations and scripts should
+    really be understanding the other attributes.
+
+3. type specific attributes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    As it currently stands a BLK-aperture region will never have a
+    "nfit/spa_index" attribute, but neither will a non-NFIT PMEM region.  A
+    BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
+    that does not allow I/O.  A PMEM region with a "mappings" value of zero
+    is a simple system-physical-address range.
+
+
+LIBNVDIMM/LIBNDCTL: Namespace
+-----------------------------
+
+A REGION, after resolving DPA aliasing and LABEL specified boundaries,
+surfaces one or more "namespace" devices.  The arrival of a "namespace"
+device currently triggers either the nd_blk or nd_pmem driver to load
+and register a disk/block device.
+
+LIBNVDIMM: namespace
+^^^^^^^^^^^^^^^^^^^^
+
+Here is a sample layout from the three major types of NAMESPACE where
+namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
+attribute), namespace2.0 represents a BLK namespace (note it has a
+'sector_size' attribute) that, and namespace6.0 represents an anonymous
+PMEM namespace (note that has no 'uuid' attribute due to not support a
+LABEL)::
+
+	/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
+	|-- alt_name
+	|-- devtype
+	|-- dpa_extents
+	|-- force_raw
+	|-- modalias
+	|-- numa_node
+	|-- resource
+	|-- size
+	|-- subsystem -> ../../../../../../bus/nd
+	|-- type
+	|-- uevent
+	`-- uuid
+	/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
+	|-- alt_name
+	|-- devtype
+	|-- dpa_extents
+	|-- force_raw
+	|-- modalias
+	|-- numa_node
+	|-- sector_size
+	|-- size
+	|-- subsystem -> ../../../../../../bus/nd
+	|-- type
+	|-- uevent
+	`-- uuid
+	/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
+	|-- block
+	|   `-- pmem0
+	|-- devtype
+	|-- driver -> ../../../../../../bus/nd/drivers/pmem
+	|-- force_raw
+	|-- modalias
+	|-- numa_node
+	|-- resource
+	|-- size
+	|-- subsystem -> ../../../../../../bus/nd
+	|-- type
+	`-- uevent
+
+LIBNDCTL: namespace enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Namespaces are indexed relative to their parent region, example below.
+These indexes are mostly static from boot to boot, but subsystem makes
+no guarantees in this regard.  For a static namespace identifier use its
+'uuid' attribute.
+
+::
+
+  static struct ndctl_namespace
+  *get_namespace_by_id(struct ndctl_region *region, unsigned int id)
+  {
+          struct ndctl_namespace *ndns;
+
+          ndctl_namespace_foreach(region, ndns)
+                  if (ndctl_namespace_get_id(ndns) == id)
+                          return ndns;
+
+          return NULL;
+  }
+
+LIBNDCTL: namespace creation example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Idle namespaces are automatically created by the kernel if a given
+region has enough available capacity to create a new namespace.
+Namespace instantiation involves finding an idle namespace and
+configuring it.  For the most part the setting of namespace attributes
+can occur in any order, the only constraint is that 'uuid' must be set
+before 'size'.  This enables the kernel to track DPA allocations
+internally with a static identifier::
+
+  static int configure_namespace(struct ndctl_region *region,
+                  struct ndctl_namespace *ndns,
+                  struct namespace_parameters *parameters)
+  {
+          char devname[50];
+
+          snprintf(devname, sizeof(devname), "namespace%d.%d",
+                          ndctl_region_get_id(region), paramaters->id);
+
+          ndctl_namespace_set_alt_name(ndns, devname);
+          /* 'uuid' must be set prior to setting size! */
+          ndctl_namespace_set_uuid(ndns, paramaters->uuid);
+          ndctl_namespace_set_size(ndns, paramaters->size);
+          /* unlike pmem namespaces, blk namespaces have a sector size */
+          if (parameters->lbasize)
+                  ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
+          ndctl_namespace_enable(ndns);
+  }
+
+
+Why the Term "namespace"?
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    1. Why not "volume" for instance?  "volume" ran the risk of confusing
+       ND (libnvdimm subsystem) to a volume manager like device-mapper.
+
+    2. The term originated to describe the sub-devices that can be created
+       within a NVME controller (see the nvme specification:
+       http://www.nvmexpress.org/specifications/), and NFIT namespaces are
+       meant to parallel the capabilities and configurability of
+       NVME-namespaces.
+
+
+LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
+-------------------------------------------------
+
+A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
+block device driver that fronts either the whole block device or a
+partition of a block device emitted by either a PMEM or BLK NAMESPACE.
+
+LIBNVDIMM: btt layout
+^^^^^^^^^^^^^^^^^^^^^
+
+Every region will start out with at least one BTT device which is the
+seed device.  To activate it set the "namespace", "uuid", and
+"sector_size" attributes and then bind the device to the nd_pmem or
+nd_blk driver depending on the region type::
+
+	/sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
+	|-- namespace
+	|-- delete
+	|-- devtype
+	|-- modalias
+	|-- numa_node
+	|-- sector_size
+	|-- subsystem -> ../../../../../bus/nd
+	|-- uevent
+	`-- uuid
+
+LIBNDCTL: btt creation example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Similar to namespaces an idle BTT device is automatically created per
+region.  Each time this "seed" btt device is configured and enabled a new
+seed is created.  Creating a BTT configuration involves two steps of
+finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
+
+	static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
+	{
+		struct ndctl_btt *btt;
+
+		ndctl_btt_foreach(region, btt)
+			if (!ndctl_btt_is_enabled(btt)
+					&& !ndctl_btt_is_configured(btt))
+				return btt;
+
+		return NULL;
+	}
+
+	static int configure_btt(struct ndctl_region *region,
+			struct btt_parameters *parameters)
+	{
+		btt = get_idle_btt(region);
+
+		ndctl_btt_set_uuid(btt, parameters->uuid);
+		ndctl_btt_set_sector_size(btt, parameters->sector_size);
+		ndctl_btt_set_namespace(btt, parameters->ndns);
+		/* turn off raw mode device */
+		ndctl_namespace_disable(parameters->ndns);
+		/* turn on btt access */
+		ndctl_btt_enable(btt);
+	}
+
+Once instantiated a new inactive btt seed device will appear underneath
+the region.
+
+Once a "namespace" is removed from a BTT that instance of the BTT device
+will be deleted or otherwise reset to default values.  This deletion is
+only at the device model level.  In order to destroy a BTT the "info
+block" needs to be destroyed.  Note, that to destroy a BTT the media
+needs to be written in raw mode.  By default, the kernel will autodetect
+the presence of a BTT and disable raw mode.  This autodetect behavior
+can be suppressed by enabling raw mode for the namespace via the
+ndctl_namespace_set_raw_mode() API.
+
+
+Summary LIBNDCTL Diagram
+------------------------
+
+For the given example above, here is the view of the objects as seen by the
+LIBNDCTL API::
+
+              +---+
+              |CTX|    +---------+   +--------------+  +---------------+
+              +-+-+  +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
+                |    | +---------+   +--------------+  +---------------+
+  +-------+     |    | +---------+   +--------------+  +---------------+
+  | DIMM0 <-+   |    +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
+  +-------+ |   |    | +---------+   +--------------+  +---------------+
+  | DIMM1 <-+ +-v--+ | +---------+   +--------------+  +---------------+
+  +-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6  "blk2.0" |
+  | DIMM2 <-+ +----+ | +---------+ | +--------------+  +----------------------+
+  +-------+ |        |             +-> NAMESPACE2.1 +--> ND5  "blk2.1" | BTT2 |
+  | DIMM3 <-+        |               +--------------+  +----------------------+
+  +-------+          | +---------+   +--------------+  +---------------+
+                     +-> REGION3 +-+-> NAMESPACE3.0 +--> ND4  "blk3.0" |
+                     | +---------+ | +--------------+  +----------------------+
+                     |             +-> NAMESPACE3.1 +--> ND3  "blk3.1" | BTT1 |
+                     |               +--------------+  +----------------------+
+                     | +---------+   +--------------+  +---------------+
+                     +-> REGION4 +---> NAMESPACE4.0 +--> ND2  "blk4.0" |
+                     | +---------+   +--------------+  +---------------+
+                     | +---------+   +--------------+  +----------------------+
+                     +-> REGION5 +---> NAMESPACE5.0 +--> ND1  "blk5.0" | BTT0 |
+                       +---------+   +--------------+  +---------------+------+
diff --git a/Documentation/nvdimm/nvdimm.txt b/Documentation/nvdimm/nvdimm.txt
deleted file mode 100644
index 1669f626b037..000000000000
--- a/Documentation/nvdimm/nvdimm.txt
+++ /dev/null
@@ -1,815 +0,0 @@
-			  LIBNVDIMM: Non-Volatile Devices
-	      libnvdimm - kernel / libndctl - userspace helper library
-			   linux-nvdimm@lists.01.org
-				      v13
-
-
-	Glossary
-	Overview
-	    Supporting Documents
-	    Git Trees
-	LIBNVDIMM PMEM and BLK
-	Why BLK?
-	    PMEM vs BLK
-	        BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
-	Example NVDIMM Platform
-	LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
-	    LIBNDCTL: Context
-	        libndctl: instantiate a new library context example
-	    LIBNVDIMM/LIBNDCTL: Bus
-	        libnvdimm: control class device in /sys/class
-	        libnvdimm: bus
-	        libndctl: bus enumeration example
-	    LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
-	        libnvdimm: DIMM (NMEM)
-	        libndctl: DIMM enumeration example
-	    LIBNVDIMM/LIBNDCTL: Region
-	        libnvdimm: region
-	        libndctl: region enumeration example
-	        Why Not Encode the Region Type into the Region Name?
-	        How Do I Determine the Major Type of a Region?
-	    LIBNVDIMM/LIBNDCTL: Namespace
-	        libnvdimm: namespace
-	        libndctl: namespace enumeration example
-	        libndctl: namespace creation example
-	        Why the Term "namespace"?
-	    LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
-	        libnvdimm: btt layout
-	        libndctl: btt creation example
-	Summary LIBNDCTL Diagram
-
-
-Glossary
---------
-
-PMEM: A system-physical-address range where writes are persistent.  A
-block device composed of PMEM is capable of DAX.  A PMEM address range
-may span an interleave of several DIMMs.
-
-BLK: A set of one or more programmable memory mapped apertures provided
-by a DIMM to access its media.  This indirection precludes the
-performance benefit of interleaving, but enables DIMM-bounded failure
-modes.
-
-DPA: DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
-the system there would be a 1:1 system-physical-address:DPA association.
-Once more DIMMs are added a memory controller interleave must be
-decoded to determine the DPA associated with a given
-system-physical-address.  BLK capacity always has a 1:1 relationship
-with a single-DIMM's DPA range.
-
-DAX: File system extensions to bypass the page cache and block layer to
-mmap persistent memory, from a PMEM block device, directly into a
-process address space.
-
-DSM: Device Specific Method: ACPI method to to control specific
-device - in this case the firmware.
-
-DCR: NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
-It defines a vendor-id, device-id, and interface format for a given DIMM.
-
-BTT: Block Translation Table: Persistent memory is byte addressable.
-Existing software may have an expectation that the power-fail-atomicity
-of writes is at least one sector, 512 bytes.  The BTT is an indirection
-table with atomic update semantics to front a PMEM/BLK block device
-driver and present arbitrary atomic sector sizes.
-
-LABEL: Metadata stored on a DIMM device that partitions and identifies
-(persistently names) storage between PMEM and BLK.  It also partitions
-BLK storage to host BTTs with different parameters per BLK-partition.
-Note that traditional partition tables, GPT/MBR, are layered on top of a
-BLK or PMEM device.
-
-
-Overview
---------
-
-The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
-PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
-and BLK mode access.  These three modes of operation are described by
-the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6.  While the LIBNVDIMM
-implementation is generic and supports pre-NFIT platforms, it was guided
-by the superset of capabilities need to support this ACPI 6 definition
-for NVDIMM resources.  The bulk of the kernel implementation is in place
-to handle the case where DPA accessible via PMEM is aliased with DPA
-accessible via BLK.  When that occurs a LABEL is needed to reserve DPA
-for exclusive access via one mode a time.
-
-Supporting Documents
-ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
-NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
-DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
-Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
-
-Git Trees
-LIBNVDIMM: https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
-LIBNDCTL: https://github.com/pmem/ndctl.git
-PMEM: https://github.com/01org/prd
-
-
-LIBNVDIMM PMEM and BLK
-------------------
-
-Prior to the arrival of the NFIT, non-volatile memory was described to a
-system in various ad-hoc ways.  Usually only the bare minimum was
-provided, namely, a single system-physical-address range where writes
-are expected to be durable after a system power loss.  Now, the NFIT
-specification standardizes not only the description of PMEM, but also
-BLK and platform message-passing entry points for control and
-configuration.
-
-For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
-device driver:
-
-    1. PMEM (nd_pmem.ko): Drives a system-physical-address range.  This
-    range is contiguous in system memory and may be interleaved (hardware
-    memory controller striped) across multiple DIMMs.  When interleaved the
-    platform may optionally provide details of which DIMMs are participating
-    in the interleave.
-
-    Note that while LIBNVDIMM describes system-physical-address ranges that may
-    alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
-    alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
-    distinction.  The different device-types are an implementation detail
-    that userspace can exploit to implement policies like "only interface
-    with address ranges from certain DIMMs".  It is worth noting that when
-    aliasing is present and a DIMM lacks a label, then no block device can
-    be created by default as userspace needs to do at least one allocation
-    of DPA to the PMEM range.  In contrast ND_NAMESPACE_IO ranges, once
-    registered, can be immediately attached to nd_pmem.
-
-    2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
-    defined apertures.  A set of apertures will access just one DIMM.
-    Multiple windows (apertures) allow multiple concurrent accesses, much like
-    tagged-command-queuing, and would likely be used by different threads or
-    different CPUs.
-
-    The NFIT specification defines a standard format for a BLK-aperture, but
-    the spec also allows for vendor specific layouts, and non-NFIT BLK
-    implementations may have other designs for BLK I/O.  For this reason
-    "nd_blk" calls back into platform-specific code to perform the I/O.
-    One such implementation is defined in the "Driver Writer's Guide" and "DSM
-    Interface Example".
-
-
-Why BLK?
---------
-
-While PMEM provides direct byte-addressable CPU-load/store access to
-NVDIMM storage, it does not provide the best system RAS (recovery,
-availability, and serviceability) model.  An access to a corrupted
-system-physical-address address causes a CPU exception while an access
-to a corrupted address through an BLK-aperture causes that block window
-to raise an error status in a register.  The latter is more aligned with
-the standard error model that host-bus-adapter attached disks present.
-Also, if an administrator ever wants to replace a memory it is easier to
-service a system at DIMM module boundaries.  Compare this to PMEM where
-data could be interleaved in an opaque hardware specific manner across
-several DIMMs.
-
-PMEM vs BLK
-BLK-apertures solve these RAS problems, but their presence is also the
-major contributing factor to the complexity of the ND subsystem.  They
-complicate the implementation because PMEM and BLK alias in DPA space.
-Any given DIMM's DPA-range may contribute to one or more
-system-physical-address sets of interleaved DIMMs, *and* may also be
-accessed in its entirety through its BLK-aperture.  Accessing a DPA
-through a system-physical-address while simultaneously accessing the
-same DPA through a BLK-aperture has undefined results.  For this reason,
-DIMMs with this dual interface configuration include a DSM function to
-store/retrieve a LABEL.  The LABEL effectively partitions the DPA-space
-into exclusive system-physical-address and BLK-aperture accessible
-regions.  For simplicity a DIMM is allowed a PMEM "region" per each
-interleave set in which it is a member.  The remaining DPA space can be
-carved into an arbitrary number of BLK devices with discontiguous
-extents.
-
-BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
---------------------------------------------------
-
-One of the few
-reasons to allow multiple BLK namespaces per REGION is so that each
-BLK-namespace can be configured with a BTT with unique atomic sector
-sizes.  While a PMEM device can host a BTT the LABEL specification does
-not provide for a sector size to be specified for a PMEM namespace.
-This is due to the expectation that the primary usage model for PMEM is
-via DAX, and the BTT is incompatible with DAX.  However, for the cases
-where an application or filesystem still needs atomic sector update
-guarantees it can register a BTT on a PMEM device or partition.  See
-LIBNVDIMM/NDCTL: Block Translation Table "btt"
-
-
-Example NVDIMM Platform
------------------------
-
-For the remainder of this document the following diagram will be
-referenced for any example sysfs layouts.
-
-
-                             (a)               (b)           DIMM   BLK-REGION
-          +-------------------+--------+--------+--------+
-+------+  |       pm0.0       | blk2.0 | pm1.0  | blk2.1 |    0      region2
-| imc0 +--+- - - region0- - - +--------+        +--------+
-+--+---+  |       pm0.0       | blk3.0 | pm1.0  | blk3.1 |    1      region3
-   |      +-------------------+--------v        v--------+
-+--+---+                               |                 |
-| cpu0 |                                     region1
-+--+---+                               |                 |
-   |      +----------------------------^        ^--------+
-+--+---+  |           blk4.0           | pm1.0  | blk4.0 |    2      region4
-| imc1 +--+----------------------------|        +--------+
-+------+  |           blk5.0           | pm1.0  | blk5.0 |    3      region5
-          +----------------------------+--------+--------+
-
-In this platform we have four DIMMs and two memory controllers in one
-socket.  Each unique interface (BLK or PMEM) to DPA space is identified
-by a region device with a dynamically assigned id (REGION0 - REGION5).
-
-    1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
-    single PMEM namespace is created in the REGION0-SPA-range that spans most
-    of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
-    interleaved system-physical-address range is reclaimed as BLK-aperture
-    accessed space starting at DPA-offset (a) into each DIMM.  In that
-    reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
-    REGION3 where "blk2.0" and "blk3.0" are just human readable names that
-    could be set to any user-desired name in the LABEL.
-
-    2. In the last portion of DIMM0 and DIMM1 we have an interleaved
-    system-physical-address range, REGION1, that spans those two DIMMs as
-    well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace
-    named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
-    each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
-    "blk5.0".
-
-    3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
-    interleaved system-physical-address range (i.e. the DPA address past
-    offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
-    Note, that this example shows that BLK-aperture namespaces don't need to
-    be contiguous in DPA-space.
-
-    This bus is provided by the kernel under the device
-    /sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
-    the nfit_test.ko module is loaded.  This not only test LIBNVDIMM but the
-    acpi_nfit.ko driver as well.
-
-
-LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
-----------------------------------------------------
-
-What follows is a description of the LIBNVDIMM sysfs layout and a
-corresponding object hierarchy diagram as viewed through the LIBNDCTL
-API.  The example sysfs paths and diagrams are relative to the Example
-NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
-test.
-
-LIBNDCTL: Context
-Every API call in the LIBNDCTL library requires a context that holds the
-logging parameters and other library instance state.  The library is
-based on the libabc template:
-https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
-
-LIBNDCTL: instantiate a new library context example
-
-	struct ndctl_ctx *ctx;
-
-	if (ndctl_new(&ctx) == 0)
-		return ctx;
-	else
-		return NULL;
-
-LIBNVDIMM/LIBNDCTL: Bus
--------------------
-
-A bus has a 1:1 relationship with an NFIT.  The current expectation for
-ACPI based systems is that there is only ever one platform-global NFIT.
-That said, it is trivial to register multiple NFITs, the specification
-does not preclude it.  The infrastructure supports multiple busses and
-we use this capability to test multiple NFIT configurations in the unit
-test.
-
-LIBNVDIMM: control class device in /sys/class
-
-This character device accepts DSM messages to be passed to DIMM
-identified by its NFIT handle.
-
-	/sys/class/nd/ndctl0
-	|-- dev
-	|-- device -> ../../../ndbus0
-	|-- subsystem -> ../../../../../../../class/nd
-
-
-
-LIBNVDIMM: bus
-
-	struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
-	       struct nvdimm_bus_descriptor *nfit_desc);
-
-	/sys/devices/platform/nfit_test.0/ndbus0
-	|-- commands
-	|-- nd
-	|-- nfit
-	|-- nmem0
-	|-- nmem1
-	|-- nmem2
-	|-- nmem3
-	|-- power
-	|-- provider
-	|-- region0
-	|-- region1
-	|-- region2
-	|-- region3
-	|-- region4
-	|-- region5
-	|-- uevent
-	`-- wait_probe
-
-LIBNDCTL: bus enumeration example
-Find the bus handle that describes the bus from Example NVDIMM Platform
-
-	static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
-			const char *provider)
-	{
-		struct ndctl_bus *bus;
-
-		ndctl_bus_foreach(ctx, bus)
-			if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
-				return bus;
-
-		return NULL;
-	}
-
-	bus = get_bus_by_provider(ctx, "nfit_test.0");
-
-
-LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
----------------------------
-
-The DIMM device provides a character device for sending commands to
-hardware, and it is a container for LABELs.  If the DIMM is defined by
-NFIT then an optional 'nfit' attribute sub-directory is available to add
-NFIT-specifics.
-
-Note that the kernel device name for "DIMMs" is "nmemX".  The NFIT
-describes these devices via "Memory Device to System Physical Address
-Range Mapping Structure", and there is no requirement that they actually
-be physical DIMMs, so we use a more generic name.
-
-LIBNVDIMM: DIMM (NMEM)
-
-	struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
-			const struct attribute_group **groups, unsigned long flags,
-			unsigned long *dsm_mask);
-
-	/sys/devices/platform/nfit_test.0/ndbus0
-	|-- nmem0
-	|   |-- available_slots
-	|   |-- commands
-	|   |-- dev
-	|   |-- devtype
-	|   |-- driver -> ../../../../../bus/nd/drivers/nvdimm
-	|   |-- modalias
-	|   |-- nfit
-	|   |   |-- device
-	|   |   |-- format
-	|   |   |-- handle
-	|   |   |-- phys_id
-	|   |   |-- rev_id
-	|   |   |-- serial
-	|   |   `-- vendor
-	|   |-- state
-	|   |-- subsystem -> ../../../../../bus/nd
-	|   `-- uevent
-	|-- nmem1
-	[..]
-
-
-LIBNDCTL: DIMM enumeration example
-
-Note, in this example we are assuming NFIT-defined DIMMs which are
-identified by an "nfit_handle" a 32-bit value where:
-Bit 3:0 DIMM number within the memory channel
-Bit 7:4 memory channel number
-Bit 11:8 memory controller ID
-Bit 15:12 socket ID (within scope of a Node controller if node controller is present)
-Bit 27:16 Node Controller ID
-Bit 31:28 Reserved
-
-	static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
-	       unsigned int handle)
-	{
-		struct ndctl_dimm *dimm;
-
-		ndctl_dimm_foreach(bus, dimm)
-			if (ndctl_dimm_get_handle(dimm) == handle)
-				return dimm;
-
-		return NULL;
-	}
-
-	#define DIMM_HANDLE(n, s, i, c, d) \
-		(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
-		 | ((c & 0xf) << 4) | (d & 0xf))
-
-	dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
-
-LIBNVDIMM/LIBNDCTL: Region
-----------------------
-
-A generic REGION device is registered for each PMEM range or BLK-aperture
-set.  Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
-sets on the "nfit_test.0" bus.  The primary role of regions are to be a
-container of "mappings".  A mapping is a tuple of <DIMM,
-DPA-start-offset, length>.
-
-LIBNVDIMM provides a built-in driver for these REGION devices.  This driver
-is responsible for reconciling the aliased DPA mappings across all
-regions, parsing the LABEL, if present, and then emitting NAMESPACE
-devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
-nd_blk device driver to consume.
-
-In addition to the generic attributes of "mapping"s, "interleave_ways"
-and "size" the REGION device also exports some convenience attributes.
-"nstype" indicates the integer type of namespace-device this region
-emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
-'add' event, "modalias" duplicates the MODALIAS variable stored by udev
-at the 'add' event, and finally, the optional "spa_index" is provided in
-the case where the region is defined by a SPA.
-
-LIBNVDIMM: region
-
-	struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
-			struct nd_region_desc *ndr_desc);
-	struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
-			struct nd_region_desc *ndr_desc);
-
-	/sys/devices/platform/nfit_test.0/ndbus0
-	|-- region0
-	|   |-- available_size
-	|   |-- btt0
-	|   |-- btt_seed
-	|   |-- devtype
-	|   |-- driver -> ../../../../../bus/nd/drivers/nd_region
-	|   |-- init_namespaces
-	|   |-- mapping0
-	|   |-- mapping1
-	|   |-- mappings
-	|   |-- modalias
-	|   |-- namespace0.0
-	|   |-- namespace_seed
-	|   |-- numa_node
-	|   |-- nfit
-	|   |   `-- spa_index
-	|   |-- nstype
-	|   |-- set_cookie
-	|   |-- size
-	|   |-- subsystem -> ../../../../../bus/nd
-	|   `-- uevent
-	|-- region1
-	[..]
-
-LIBNDCTL: region enumeration example
-
-Sample region retrieval routines based on NFIT-unique data like
-"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
-BLK.
-
-	static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
-			unsigned int spa_index)
-	{
-		struct ndctl_region *region;
-
-		ndctl_region_foreach(bus, region) {
-			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
-				continue;
-			if (ndctl_region_get_spa_index(region) == spa_index)
-				return region;
-		}
-		return NULL;
-	}
-
-	static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
-			unsigned int handle)
-	{
-		struct ndctl_region *region;
-
-		ndctl_region_foreach(bus, region) {
-			struct ndctl_mapping *map;
-
-			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
-				continue;
-			ndctl_mapping_foreach(region, map) {
-				struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
-
-				if (ndctl_dimm_get_handle(dimm) == handle)
-					return region;
-			}
-		}
-		return NULL;
-	}
-
-
-Why Not Encode the Region Type into the Region Name?
-----------------------------------------------------
-
-At first glance it seems since NFIT defines just PMEM and BLK interface
-types that we should simply name REGION devices with something derived
-from those type names.  However, the ND subsystem explicitly keeps the
-REGION name generic and expects userspace to always consider the
-region-attributes for four reasons:
-
-    1. There are already more than two REGION and "namespace" types.  For
-    PMEM there are two subtypes.  As mentioned previously we have PMEM where
-    the constituent DIMM devices are known and anonymous PMEM.  For BLK
-    regions the NFIT specification already anticipates vendor specific
-    implementations.  The exact distinction of what a region contains is in
-    the region-attributes not the region-name or the region-devtype.
-
-    2. A region with zero child-namespaces is a possible configuration.  For
-    example, the NFIT allows for a DCR to be published without a
-    corresponding BLK-aperture.  This equates to a DIMM that can only accept
-    control/configuration messages, but no i/o through a descendant block
-    device.  Again, this "type" is advertised in the attributes ('mappings'
-    == 0) and the name does not tell you much.
-
-    3. What if a third major interface type arises in the future?  Outside
-    of vendor specific implementations, it's not difficult to envision a
-    third class of interface type beyond BLK and PMEM.  With a generic name
-    for the REGION level of the device-hierarchy old userspace
-    implementations can still make sense of new kernel advertised
-    region-types.  Userspace can always rely on the generic region
-    attributes like "mappings", "size", etc and the expected child devices
-    named "namespace".  This generic format of the device-model hierarchy
-    allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
-    future-proof.
-
-    4. There are more robust mechanisms for determining the major type of a
-    region than a device name.  See the next section, How Do I Determine the
-    Major Type of a Region?
-
-How Do I Determine the Major Type of a Region?
-----------------------------------------------
-
-Outside of the blanket recommendation of "use libndctl", or simply
-looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
-"nstype" integer attribute, here are some other options.
-
-    1. module alias lookup:
-
-    The whole point of region/namespace device type differentiation is to
-    decide which block-device driver will attach to a given LIBNVDIMM namespace.
-    One can simply use the modalias to lookup the resulting module.  It's
-    important to note that this method is robust in the presence of a
-    vendor-specific driver down the road.  If a vendor-specific
-    implementation wants to supplant the standard nd_blk driver it can with
-    minimal impact to the rest of LIBNVDIMM.
-
-    In fact, a vendor may also want to have a vendor-specific region-driver
-    (outside of nd_region).  For example, if a vendor defined its own LABEL
-    format it would need its own region driver to parse that LABEL and emit
-    the resulting namespaces.  The output from module resolution is more
-    accurate than a region-name or region-devtype.
-
-    2. udev:
-
-    The kernel "devtype" is registered in the udev database
-    # udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
-    P: /devices/platform/nfit_test.0/ndbus0/region0
-    E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
-    E: DEVTYPE=nd_pmem
-    E: MODALIAS=nd:t2
-    E: SUBSYSTEM=nd
-
-    # udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
-    P: /devices/platform/nfit_test.0/ndbus0/region4
-    E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
-    E: DEVTYPE=nd_blk
-    E: MODALIAS=nd:t3
-    E: SUBSYSTEM=nd
-
-    ...and is available as a region attribute, but keep in mind that the
-    "devtype" does not indicate sub-type variations and scripts should
-    really be understanding the other attributes.
-
-    3. type specific attributes:
-
-    As it currently stands a BLK-aperture region will never have a
-    "nfit/spa_index" attribute, but neither will a non-NFIT PMEM region.  A
-    BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
-    that does not allow I/O.  A PMEM region with a "mappings" value of zero
-    is a simple system-physical-address range.
-
-
-LIBNVDIMM/LIBNDCTL: Namespace
--------------------------
-
-A REGION, after resolving DPA aliasing and LABEL specified boundaries,
-surfaces one or more "namespace" devices.  The arrival of a "namespace"
-device currently triggers either the nd_blk or nd_pmem driver to load
-and register a disk/block device.
-
-LIBNVDIMM: namespace
-Here is a sample layout from the three major types of NAMESPACE where
-namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
-attribute), namespace2.0 represents a BLK namespace (note it has a
-'sector_size' attribute) that, and namespace6.0 represents an anonymous
-PMEM namespace (note that has no 'uuid' attribute due to not support a
-LABEL).
-
-	/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
-	|-- alt_name
-	|-- devtype
-	|-- dpa_extents
-	|-- force_raw
-	|-- modalias
-	|-- numa_node
-	|-- resource
-	|-- size
-	|-- subsystem -> ../../../../../../bus/nd
-	|-- type
-	|-- uevent
-	`-- uuid
-	/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
-	|-- alt_name
-	|-- devtype
-	|-- dpa_extents
-	|-- force_raw
-	|-- modalias
-	|-- numa_node
-	|-- sector_size
-	|-- size
-	|-- subsystem -> ../../../../../../bus/nd
-	|-- type
-	|-- uevent
-	`-- uuid
-	/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
-	|-- block
-	|   `-- pmem0
-	|-- devtype
-	|-- driver -> ../../../../../../bus/nd/drivers/pmem
-	|-- force_raw
-	|-- modalias
-	|-- numa_node
-	|-- resource
-	|-- size
-	|-- subsystem -> ../../../../../../bus/nd
-	|-- type
-	`-- uevent
-
-LIBNDCTL: namespace enumeration example
-Namespaces are indexed relative to their parent region, example below.
-These indexes are mostly static from boot to boot, but subsystem makes
-no guarantees in this regard.  For a static namespace identifier use its
-'uuid' attribute.
-
-static struct ndctl_namespace *get_namespace_by_id(struct ndctl_region *region,
-                unsigned int id)
-{
-        struct ndctl_namespace *ndns;
-
-        ndctl_namespace_foreach(region, ndns)
-                if (ndctl_namespace_get_id(ndns) == id)
-                        return ndns;
-
-        return NULL;
-}
-
-LIBNDCTL: namespace creation example
-Idle namespaces are automatically created by the kernel if a given
-region has enough available capacity to create a new namespace.
-Namespace instantiation involves finding an idle namespace and
-configuring it.  For the most part the setting of namespace attributes
-can occur in any order, the only constraint is that 'uuid' must be set
-before 'size'.  This enables the kernel to track DPA allocations
-internally with a static identifier.
-
-static int configure_namespace(struct ndctl_region *region,
-                struct ndctl_namespace *ndns,
-                struct namespace_parameters *parameters)
-{
-        char devname[50];
-
-        snprintf(devname, sizeof(devname), "namespace%d.%d",
-                        ndctl_region_get_id(region), paramaters->id);
-
-        ndctl_namespace_set_alt_name(ndns, devname);
-        /* 'uuid' must be set prior to setting size! */
-        ndctl_namespace_set_uuid(ndns, paramaters->uuid);
-        ndctl_namespace_set_size(ndns, paramaters->size);
-        /* unlike pmem namespaces, blk namespaces have a sector size */
-        if (parameters->lbasize)
-                ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
-        ndctl_namespace_enable(ndns);
-}
-
-
-Why the Term "namespace"?
-
-    1. Why not "volume" for instance?  "volume" ran the risk of confusing
-    ND (libnvdimm subsystem) to a volume manager like device-mapper.
-
-    2. The term originated to describe the sub-devices that can be created
-    within a NVME controller (see the nvme specification:
-    http://www.nvmexpress.org/specifications/), and NFIT namespaces are
-    meant to parallel the capabilities and configurability of
-    NVME-namespaces.
-
-
-LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
----------------------------------------------
-
-A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
-block device driver that fronts either the whole block device or a
-partition of a block device emitted by either a PMEM or BLK NAMESPACE.
-
-LIBNVDIMM: btt layout
-Every region will start out with at least one BTT device which is the
-seed device.  To activate it set the "namespace", "uuid", and
-"sector_size" attributes and then bind the device to the nd_pmem or
-nd_blk driver depending on the region type.
-
-	/sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
-	|-- namespace
-	|-- delete
-	|-- devtype
-	|-- modalias
-	|-- numa_node
-	|-- sector_size
-	|-- subsystem -> ../../../../../bus/nd
-	|-- uevent
-	`-- uuid
-
-LIBNDCTL: btt creation example
-Similar to namespaces an idle BTT device is automatically created per
-region.  Each time this "seed" btt device is configured and enabled a new
-seed is created.  Creating a BTT configuration involves two steps of
-finding and idle BTT and assigning it to consume a PMEM or BLK namespace.
-
-	static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
-	{
-		struct ndctl_btt *btt;
-
-		ndctl_btt_foreach(region, btt)
-			if (!ndctl_btt_is_enabled(btt)
-					&& !ndctl_btt_is_configured(btt))
-				return btt;
-
-		return NULL;
-	}
-
-	static int configure_btt(struct ndctl_region *region,
-			struct btt_parameters *parameters)
-	{
-		btt = get_idle_btt(region);
-
-		ndctl_btt_set_uuid(btt, parameters->uuid);
-		ndctl_btt_set_sector_size(btt, parameters->sector_size);
-		ndctl_btt_set_namespace(btt, parameters->ndns);
-		/* turn off raw mode device */
-		ndctl_namespace_disable(parameters->ndns);
-		/* turn on btt access */
-		ndctl_btt_enable(btt);
-	}
-
-Once instantiated a new inactive btt seed device will appear underneath
-the region.
-
-Once a "namespace" is removed from a BTT that instance of the BTT device
-will be deleted or otherwise reset to default values.  This deletion is
-only at the device model level.  In order to destroy a BTT the "info
-block" needs to be destroyed.  Note, that to destroy a BTT the media
-needs to be written in raw mode.  By default, the kernel will autodetect
-the presence of a BTT and disable raw mode.  This autodetect behavior
-can be suppressed by enabling raw mode for the namespace via the
-ndctl_namespace_set_raw_mode() API.
-
-
-Summary LIBNDCTL Diagram
-------------------------
-
-For the given example above, here is the view of the objects as seen by the
-LIBNDCTL API:
-            +---+
-            |CTX|    +---------+   +--------------+  +---------------+
-            +-+-+  +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
-              |    | +---------+   +--------------+  +---------------+
-+-------+     |    | +---------+   +--------------+  +---------------+
-| DIMM0 <-+   |    +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
-+-------+ |   |    | +---------+   +--------------+  +---------------+
-| DIMM1 <-+ +-v--+ | +---------+   +--------------+  +---------------+
-+-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6  "blk2.0" |
-| DIMM2 <-+ +----+ | +---------+ | +--------------+  +----------------------+
-+-------+ |        |             +-> NAMESPACE2.1 +--> ND5  "blk2.1" | BTT2 |
-| DIMM3 <-+        |               +--------------+  +----------------------+
-+-------+          | +---------+   +--------------+  +---------------+
-                   +-> REGION3 +-+-> NAMESPACE3.0 +--> ND4  "blk3.0" |
-                   | +---------+ | +--------------+  +----------------------+
-                   |             +-> NAMESPACE3.1 +--> ND3  "blk3.1" | BTT1 |
-                   |               +--------------+  +----------------------+
-                   | +---------+   +--------------+  +---------------+
-                   +-> REGION4 +---> NAMESPACE4.0 +--> ND2  "blk4.0" |
-                   | +---------+   +--------------+  +---------------+
-                   | +---------+   +--------------+  +----------------------+
-                   +-> REGION5 +---> NAMESPACE5.0 +--> ND1  "blk5.0" | BTT0 |
-                     +---------+   +--------------+  +---------------+------+
-
-
diff --git a/Documentation/nvdimm/security.rst b/Documentation/nvdimm/security.rst
new file mode 100644
index 000000000000..ad9dea099b34
--- /dev/null
+++ b/Documentation/nvdimm/security.rst
@@ -0,0 +1,143 @@
+===============
+NVDIMM Security
+===============
+
+1. Introduction
+---------------
+
+With the introduction of Intel Device Specific Methods (DSM) v1.8
+specification [1], security DSMs are introduced. The spec added the following
+security DSMs: "get security state", "set passphrase", "disable passphrase",
+"unlock unit", "freeze lock", "secure erase", and "overwrite". A security_ops
+data structure has been added to struct dimm in order to support the security
+operations and generic APIs are exposed to allow vendor neutral operations.
+
+2. Sysfs Interface
+------------------
+The "security" sysfs attribute is provided in the nvdimm sysfs directory. For
+example:
+/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/security
+
+The "show" attribute of that attribute will display the security state for
+that DIMM. The following states are available: disabled, unlocked, locked,
+frozen, and overwrite. If security is not supported, the sysfs attribute
+will not be visible.
+
+The "store" attribute takes several commands when it is being written to
+in order to support some of the security functionalities:
+update <old_keyid> <new_keyid> - enable or update passphrase.
+disable <keyid> - disable enabled security and remove key.
+freeze - freeze changing of security states.
+erase <keyid> - delete existing user encryption key.
+overwrite <keyid> - wipe the entire nvdimm.
+master_update <keyid> <new_keyid> - enable or update master passphrase.
+master_erase <keyid> - delete existing user encryption key.
+
+3. Key Management
+-----------------
+
+The key is associated to the payload by the DIMM id. For example:
+# cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/nfit/id
+8089-a2-1740-00000133
+The DIMM id would be provided along with the key payload (passphrase) to
+the kernel.
+
+The security keys are managed on the basis of a single key per DIMM. The
+key "passphrase" is expected to be 32bytes long. This is similar to the ATA
+security specification [2]. A key is initially acquired via the request_key()
+kernel API call during nvdimm unlock. It is up to the user to make sure that
+all the keys are in the kernel user keyring for unlock.
+
+A nvdimm encrypted-key of format enc32 has the description format of:
+nvdimm:<bus-provider-specific-unique-id>
+
+See file ``Documentation/security/keys/trusted-encrypted.rst`` for creating
+encrypted-keys of enc32 format. TPM usage with a master trusted key is
+preferred for sealing the encrypted-keys.
+
+4. Unlocking
+------------
+When the DIMMs are being enumerated by the kernel, the kernel will attempt to
+retrieve the key from the kernel user keyring. This is the only time
+a locked DIMM can be unlocked. Once unlocked, the DIMM will remain unlocked
+until reboot. Typically an entity (i.e. shell script) will inject all the
+relevant encrypted-keys into the kernel user keyring during the initramfs phase.
+This provides the unlock function access to all the related keys that contain
+the passphrase for the respective nvdimms.  It is also recommended that the
+keys are injected before libnvdimm is loaded by modprobe.
+
+5. Update
+---------
+When doing an update, it is expected that the existing key is removed from
+the kernel user keyring and reinjected as different (old) key. It's irrelevant
+what the key description is for the old key since we are only interested in the
+keyid when doing the update operation. It is also expected that the new key
+is injected with the description format described from earlier in this
+document.  The update command written to the sysfs attribute will be with
+the format:
+update <old keyid> <new keyid>
+
+If there is no old keyid due to a security enabling, then a 0 should be
+passed in.
+
+6. Freeze
+---------
+The freeze operation does not require any keys. The security config can be
+frozen by a user with root privelege.
+
+7. Disable
+----------
+The security disable command format is:
+disable <keyid>
+
+An key with the current passphrase payload that is tied to the nvdimm should be
+in the kernel user keyring.
+
+8. Secure Erase
+---------------
+The command format for doing a secure erase is:
+erase <keyid>
+
+An key with the current passphrase payload that is tied to the nvdimm should be
+in the kernel user keyring.
+
+9. Overwrite
+------------
+The command format for doing an overwrite is:
+overwrite <keyid>
+
+Overwrite can be done without a key if security is not enabled. A key serial
+of 0 can be passed in to indicate no key.
+
+The sysfs attribute "security" can be polled to wait on overwrite completion.
+Overwrite can last tens of minutes or more depending on nvdimm size.
+
+An encrypted-key with the current user passphrase that is tied to the nvdimm
+should be injected and its keyid should be passed in via sysfs.
+
+10. Master Update
+-----------------
+The command format for doing a master update is:
+update <old keyid> <new keyid>
+
+The operating mechanism for master update is identical to update except the
+master passphrase key is passed to the kernel. The master passphrase key
+is just another encrypted-key.
+
+This command is only available when security is disabled.
+
+11. Master Erase
+----------------
+The command format for doing a master erase is:
+master_erase <current keyid>
+
+This command has the same operating mechanism as erase except the master
+passphrase key is passed to the kernel. The master passphrase key is just
+another encrypted-key.
+
+This command is only available when the master security is enabled, indicated
+by the extended security status.
+
+[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
+
+[2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf
diff --git a/Documentation/nvdimm/security.txt b/Documentation/nvdimm/security.txt
deleted file mode 100644
index 4c36c05ca98e..000000000000
--- a/Documentation/nvdimm/security.txt
+++ /dev/null
@@ -1,141 +0,0 @@
-NVDIMM SECURITY
-===============
-
-1. Introduction
----------------
-
-With the introduction of Intel Device Specific Methods (DSM) v1.8
-specification [1], security DSMs are introduced. The spec added the following
-security DSMs: "get security state", "set passphrase", "disable passphrase",
-"unlock unit", "freeze lock", "secure erase", and "overwrite". A security_ops
-data structure has been added to struct dimm in order to support the security
-operations and generic APIs are exposed to allow vendor neutral operations.
-
-2. Sysfs Interface
-------------------
-The "security" sysfs attribute is provided in the nvdimm sysfs directory. For
-example:
-/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/security
-
-The "show" attribute of that attribute will display the security state for
-that DIMM. The following states are available: disabled, unlocked, locked,
-frozen, and overwrite. If security is not supported, the sysfs attribute
-will not be visible.
-
-The "store" attribute takes several commands when it is being written to
-in order to support some of the security functionalities:
-update <old_keyid> <new_keyid> - enable or update passphrase.
-disable <keyid> - disable enabled security and remove key.
-freeze - freeze changing of security states.
-erase <keyid> - delete existing user encryption key.
-overwrite <keyid> - wipe the entire nvdimm.
-master_update <keyid> <new_keyid> - enable or update master passphrase.
-master_erase <keyid> - delete existing user encryption key.
-
-3. Key Management
------------------
-
-The key is associated to the payload by the DIMM id. For example:
-# cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/nfit/id
-8089-a2-1740-00000133
-The DIMM id would be provided along with the key payload (passphrase) to
-the kernel.
-
-The security keys are managed on the basis of a single key per DIMM. The
-key "passphrase" is expected to be 32bytes long. This is similar to the ATA
-security specification [2]. A key is initially acquired via the request_key()
-kernel API call during nvdimm unlock. It is up to the user to make sure that
-all the keys are in the kernel user keyring for unlock.
-
-A nvdimm encrypted-key of format enc32 has the description format of:
-nvdimm:<bus-provider-specific-unique-id>
-
-See file ``Documentation/security/keys/trusted-encrypted.rst`` for creating
-encrypted-keys of enc32 format. TPM usage with a master trusted key is
-preferred for sealing the encrypted-keys.
-
-4. Unlocking
-------------
-When the DIMMs are being enumerated by the kernel, the kernel will attempt to
-retrieve the key from the kernel user keyring. This is the only time
-a locked DIMM can be unlocked. Once unlocked, the DIMM will remain unlocked
-until reboot. Typically an entity (i.e. shell script) will inject all the
-relevant encrypted-keys into the kernel user keyring during the initramfs phase.
-This provides the unlock function access to all the related keys that contain
-the passphrase for the respective nvdimms.  It is also recommended that the
-keys are injected before libnvdimm is loaded by modprobe.
-
-5. Update
----------
-When doing an update, it is expected that the existing key is removed from
-the kernel user keyring and reinjected as different (old) key. It's irrelevant
-what the key description is for the old key since we are only interested in the
-keyid when doing the update operation. It is also expected that the new key
-is injected with the description format described from earlier in this
-document.  The update command written to the sysfs attribute will be with
-the format:
-update <old keyid> <new keyid>
-
-If there is no old keyid due to a security enabling, then a 0 should be
-passed in.
-
-6. Freeze
----------
-The freeze operation does not require any keys. The security config can be
-frozen by a user with root privelege.
-
-7. Disable
-----------
-The security disable command format is:
-disable <keyid>
-
-An key with the current passphrase payload that is tied to the nvdimm should be
-in the kernel user keyring.
-
-8. Secure Erase
----------------
-The command format for doing a secure erase is:
-erase <keyid>
-
-An key with the current passphrase payload that is tied to the nvdimm should be
-in the kernel user keyring.
-
-9. Overwrite
-------------
-The command format for doing an overwrite is:
-overwrite <keyid>
-
-Overwrite can be done without a key if security is not enabled. A key serial
-of 0 can be passed in to indicate no key.
-
-The sysfs attribute "security" can be polled to wait on overwrite completion.
-Overwrite can last tens of minutes or more depending on nvdimm size.
-
-An encrypted-key with the current user passphrase that is tied to the nvdimm
-should be injected and its keyid should be passed in via sysfs.
-
-10. Master Update
------------------
-The command format for doing a master update is:
-update <old keyid> <new keyid>
-
-The operating mechanism for master update is identical to update except the
-master passphrase key is passed to the kernel. The master passphrase key
-is just another encrypted-key.
-
-This command is only available when security is disabled.
-
-11. Master Erase
-----------------
-The command format for doing a master erase is:
-master_erase <current keyid>
-
-This command has the same operating mechanism as erase except the master
-passphrase key is passed to the kernel. The master passphrase key is just
-another encrypted-key.
-
-This command is only available when the master security is enabled, indicated
-by the extended security status.
-
-[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
-[2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 54500798f23a..e89c1c332407 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -33,7 +33,7 @@ config BLK_DEV_PMEM
 	  Documentation/admin-guide/kernel-parameters.rst).  This driver converts
 	  these persistent memory ranges into block devices that are
 	  capable of DAX (direct-access) file system mappings.  See
-	  Documentation/nvdimm/nvdimm.txt for more details.
+	  Documentation/nvdimm/nvdimm.rst for more details.
 
 	  Say Y if you want to use an NVDIMM
 
-- 
cgit v1.2.3-55-g7522


From 8ea0afa3b801e9fe3ff676c3e60e74afa1a0848a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 14:34:34 -0300
Subject: docs: xtensa: convert to ReST

Rename the xtensa documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/xtensa/atomctl.rst         |  51 ++++++++
 Documentation/xtensa/atomctl.txt         |  44 -------
 Documentation/xtensa/booting.rst         |  22 ++++
 Documentation/xtensa/booting.txt         |  19 ---
 Documentation/xtensa/index.rst           |  12 ++
 Documentation/xtensa/mmu.rst             | 195 +++++++++++++++++++++++++++++++
 Documentation/xtensa/mmu.txt             | 189 ------------------------------
 arch/xtensa/include/asm/initialize_mmu.h |   2 +-
 8 files changed, 281 insertions(+), 253 deletions(-)
 create mode 100644 Documentation/xtensa/atomctl.rst
 delete mode 100644 Documentation/xtensa/atomctl.txt
 create mode 100644 Documentation/xtensa/booting.rst
 delete mode 100644 Documentation/xtensa/booting.txt
 create mode 100644 Documentation/xtensa/index.rst
 create mode 100644 Documentation/xtensa/mmu.rst
 delete mode 100644 Documentation/xtensa/mmu.txt

diff --git a/Documentation/xtensa/atomctl.rst b/Documentation/xtensa/atomctl.rst
new file mode 100644
index 000000000000..1ecbd0ba9a2e
--- /dev/null
+++ b/Documentation/xtensa/atomctl.rst
@@ -0,0 +1,51 @@
+===========================================
+Atomic Operation Control (ATOMCTL) Register
+===========================================
+
+We Have Atomic Operation Control (ATOMCTL) Register.
+This register determines the effect of using a S32C1I instruction
+with various combinations of:
+
+     1. With and without an Coherent Cache Controller which
+        can do Atomic Transactions to the memory internally.
+
+     2. With and without An Intelligent Memory Controller which
+        can do Atomic Transactions itself.
+
+The Core comes up with a default value of for the three types of cache ops::
+
+      0x28: (WB: Internal, WT: Internal, BY:Exception)
+
+On the FPGA Cards we typically simulate an Intelligent Memory controller
+which can implement  RCW transactions. For FPGA cards with an External
+Memory controller we let it to the atomic operations internally while
+doing a Cached (WB) transaction and use the Memory RCW for un-cached
+operations.
+
+For systems without an coherent cache controller, non-MX, we always
+use the memory controllers RCW, thought non-MX controlers likely
+support the Internal Operation.
+
+CUSTOMER-WARNING:
+   Virtually all customers buy their memory controllers from vendors that
+   don't support atomic RCW memory transactions and will likely want to
+   configure this register to not use RCW.
+
+Developers might find using RCW in Bypass mode convenient when testing
+with the cache being bypassed; for example studying cache alias problems.
+
+See Section 4.3.12.4 of ISA; Bits::
+
+                             WB     WT      BY
+                           5   4 | 3   2 | 1   0
+
+=========    ==================      ==================      ===============
+  2 Bit
+  Field
+  Values     WB - Write Back         WT - Write Thru         BY - Bypass
+=========    ==================      ==================      ===============
+    0        Exception               Exception               Exception
+    1        RCW Transaction         RCW Transaction         RCW Transaction
+    2        Internal Operation      Internal Operation      Reserved
+    3        Reserved                Reserved                Reserved
+=========    ==================      ==================      ===============
diff --git a/Documentation/xtensa/atomctl.txt b/Documentation/xtensa/atomctl.txt
deleted file mode 100644
index 1da783ac200c..000000000000
--- a/Documentation/xtensa/atomctl.txt
+++ /dev/null
@@ -1,44 +0,0 @@
-We Have Atomic Operation Control (ATOMCTL) Register.
-This register determines the effect of using a S32C1I instruction
-with various combinations of:
-
-     1. With and without an Coherent Cache Controller which
-        can do Atomic Transactions to the memory internally.
-
-     2. With and without An Intelligent Memory Controller which
-        can do Atomic Transactions itself.
-
-The Core comes up with a default value of for the three types of cache ops:
-
-      0x28: (WB: Internal, WT: Internal, BY:Exception)
-
-On the FPGA Cards we typically simulate an Intelligent Memory controller
-which can implement  RCW transactions. For FPGA cards with an External
-Memory controller we let it to the atomic operations internally while
-doing a Cached (WB) transaction and use the Memory RCW for un-cached
-operations.
-
-For systems without an coherent cache controller, non-MX, we always
-use the memory controllers RCW, thought non-MX controlers likely
-support the Internal Operation.
-
-CUSTOMER-WARNING:
-   Virtually all customers buy their memory controllers from vendors that
-   don't support atomic RCW memory transactions and will likely want to
-   configure this register to not use RCW.
-
-Developers might find using RCW in Bypass mode convenient when testing
-with the cache being bypassed; for example studying cache alias problems.
-
-See Section 4.3.12.4 of ISA; Bits:
-
-                             WB     WT      BY
-                           5   4 | 3   2 | 1   0
-  2 Bit
-  Field
-  Values     WB - Write Back         WT - Write Thru         BY - Bypass
----------    ---------------         -----------------     ----------------
-    0        Exception               Exception               Exception
-    1        RCW Transaction         RCW Transaction         RCW Transaction
-    2        Internal Operation      Internal Operation      Reserved
-    3        Reserved                Reserved                Reserved
diff --git a/Documentation/xtensa/booting.rst b/Documentation/xtensa/booting.rst
new file mode 100644
index 000000000000..e1b83707e5b6
--- /dev/null
+++ b/Documentation/xtensa/booting.rst
@@ -0,0 +1,22 @@
+=====================================
+Passing boot parameters to the kernel
+=====================================
+
+Boot parameters are represented as a TLV list in the memory. Please see
+arch/xtensa/include/asm/bootparam.h for definition of the bp_tag structure and
+tag value constants. First entry in the list must have type BP_TAG_FIRST, last
+entry must have type BP_TAG_LAST. The address of the first list entry is
+passed to the kernel in the register a2. The address type depends on MMU type:
+
+- For configurations without MMU, with region protection or with MPU the
+  address must be the physical address.
+- For configurations with region translarion MMU or with MMUv3 and CONFIG_MMU=n
+  the address must be a valid address in the current mapping. The kernel will
+  not change the mapping on its own.
+- For configurations with MMUv2 the address must be a virtual address in the
+  default virtual mapping (0xd0000000..0xffffffff).
+- For configurations with MMUv3 and CONFIG_MMU=y the address may be either a
+  virtual or physical address. In either case it must be within the default
+  virtual mapping. It is considered physical if it is within the range of
+  physical addresses covered by the default KSEG mapping (XCHAL_KSEG_PADDR..
+  XCHAL_KSEG_PADDR + XCHAL_KSEG_SIZE), otherwise it is considered virtual.
diff --git a/Documentation/xtensa/booting.txt b/Documentation/xtensa/booting.txt
deleted file mode 100644
index 402b33a2619f..000000000000
--- a/Documentation/xtensa/booting.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-Passing boot parameters to the kernel.
-
-Boot parameters are represented as a TLV list in the memory. Please see
-arch/xtensa/include/asm/bootparam.h for definition of the bp_tag structure and
-tag value constants. First entry in the list must have type BP_TAG_FIRST, last
-entry must have type BP_TAG_LAST. The address of the first list entry is
-passed to the kernel in the register a2. The address type depends on MMU type:
-- For configurations without MMU, with region protection or with MPU the
-  address must be the physical address.
-- For configurations with region translarion MMU or with MMUv3 and CONFIG_MMU=n
-  the address must be a valid address in the current mapping. The kernel will
-  not change the mapping on its own.
-- For configurations with MMUv2 the address must be a virtual address in the
-  default virtual mapping (0xd0000000..0xffffffff).
-- For configurations with MMUv3 and CONFIG_MMU=y the address may be either a
-  virtual or physical address. In either case it must be within the default
-  virtual mapping. It is considered physical if it is within the range of
-  physical addresses covered by the default KSEG mapping (XCHAL_KSEG_PADDR..
-  XCHAL_KSEG_PADDR + XCHAL_KSEG_SIZE), otherwise it is considered virtual.
diff --git a/Documentation/xtensa/index.rst b/Documentation/xtensa/index.rst
new file mode 100644
index 000000000000..5a24e365e35f
--- /dev/null
+++ b/Documentation/xtensa/index.rst
@@ -0,0 +1,12 @@
+:orphan:
+
+===================
+Xtensa Architecture
+===================
+
+.. toctree::
+   :maxdepth: 1
+
+   atomctl
+   booting
+   mmu
diff --git a/Documentation/xtensa/mmu.rst b/Documentation/xtensa/mmu.rst
new file mode 100644
index 000000000000..e52a12960fdc
--- /dev/null
+++ b/Documentation/xtensa/mmu.rst
@@ -0,0 +1,195 @@
+=============================
+MMUv3 initialization sequence
+=============================
+
+The code in the initialize_mmu macro sets up MMUv3 memory mapping
+identically to MMUv2 fixed memory mapping. Depending on
+CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX symbol this code is
+located in addresses it was linked for (symbol undefined), or not
+(symbol defined), so it needs to be position-independent.
+
+The code has the following assumptions:
+
+  - This code fragment is run only on an MMU v3.
+  - TLBs are in their reset state.
+  - ITLBCFG and DTLBCFG are zero (reset state).
+  - RASID is 0x04030201 (reset state).
+  - PS.RING is zero (reset state).
+  - LITBASE is zero (reset state, PC-relative literals); required to be PIC.
+
+TLB setup proceeds along the following steps.
+
+  Legend:
+
+    - VA = virtual address (two upper nibbles of it);
+    - PA = physical address (two upper nibbles of it);
+    - pc = physical range that contains this code;
+
+After step 2, we jump to virtual address in the range 0x40000000..0x5fffffff
+or 0x00000000..0x1fffffff, depending on whether the kernel was loaded below
+0x40000000 or above. That address corresponds to next instruction to execute
+in this code. After step 4, we jump to intended (linked) address of this code.
+The scheme below assumes that the kernel is loaded below 0x40000000.
+
+ ====== =====  =====  =====  =====   ====== =====  =====
+ -      Step0  Step1  Step2  Step3          Step4  Step5
+
+   VA      PA     PA     PA     PA     VA      PA     PA
+ ====== =====  =====  =====  =====   ====== =====  =====
+ E0..FF -> E0  -> E0  -> E0          F0..FF -> F0  -> F0
+ C0..DF -> C0  -> C0  -> C0          E0..EF -> F0  -> F0
+ A0..BF -> A0  -> A0  -> A0          D8..DF -> 00  -> 00
+ 80..9F -> 80  -> 80  -> 80          D0..D7 -> 00  -> 00
+ 60..7F -> 60  -> 60  -> 60
+ 40..5F -> 40         -> pc  -> pc   40..5F -> pc
+ 20..3F -> 20  -> 20  -> 20
+ 00..1F -> 00  -> 00  -> 00
+ ====== =====  =====  =====  =====   ====== =====  =====
+
+The default location of IO peripherals is above 0xf0000000. This may be changed
+using a "ranges" property in a device tree simple-bus node. See the Devicetree
+Specification, section 4.5 for details on the syntax and semantics of
+simple-bus nodes. The following limitations apply:
+
+1. Only top level simple-bus nodes are considered
+
+2. Only one (first) simple-bus node is considered
+
+3. Empty "ranges" properties are not supported
+
+4. Only the first triplet in the "ranges" property is considered
+
+5. The parent-bus-address value is rounded down to the nearest 256MB boundary
+
+6. The IO area covers the entire 256MB segment of parent-bus-address; the
+   "ranges" triplet length field is ignored
+
+
+MMUv3 address space layouts.
+============================
+
+Default MMUv2-compatible layout::
+
+                        Symbol                   VADDR       Size
+  +------------------+
+  | Userspace        |                           0x00000000  TASK_SIZE
+  +------------------+                           0x40000000
+  +------------------+
+  | Page table       |  XCHAL_PAGE_TABLE_VADDR   0x80000000  XCHAL_PAGE_TABLE_SIZE
+  +------------------+
+  | KASAN shadow map |  KASAN_SHADOW_START       0x80400000  KASAN_SHADOW_SIZE
+  +------------------+                           0x8e400000
+  +------------------+
+  | VMALLOC area     |  VMALLOC_START            0xc0000000  128MB - 64KB
+  +------------------+  VMALLOC_END
+  | Cache aliasing   |  TLBTEMP_BASE_1           0xc7ff0000  DCACHE_WAY_SIZE
+  | remap area 1     |
+  +------------------+
+  | Cache aliasing   |  TLBTEMP_BASE_2                       DCACHE_WAY_SIZE
+  | remap area 2     |
+  +------------------+
+  +------------------+
+  | KMAP area        |  PKMAP_BASE                           PTRS_PER_PTE *
+  |                  |                                       DCACHE_N_COLORS *
+  |                  |                                       PAGE_SIZE
+  |                  |                                       (4MB * DCACHE_N_COLORS)
+  +------------------+
+  | Atomic KMAP area |  FIXADDR_START                        KM_TYPE_NR *
+  |                  |                                       NR_CPUS *
+  |                  |                                       DCACHE_N_COLORS *
+  |                  |                                       PAGE_SIZE
+  +------------------+  FIXADDR_TOP              0xcffff000
+  +------------------+
+  | Cached KSEG      |  XCHAL_KSEG_CACHED_VADDR  0xd0000000  128MB
+  +------------------+
+  | Uncached KSEG    |  XCHAL_KSEG_BYPASS_VADDR  0xd8000000  128MB
+  +------------------+
+  | Cached KIO       |  XCHAL_KIO_CACHED_VADDR   0xe0000000  256MB
+  +------------------+
+  | Uncached KIO     |  XCHAL_KIO_BYPASS_VADDR   0xf0000000  256MB
+  +------------------+
+
+
+256MB cached + 256MB uncached layout::
+
+                        Symbol                   VADDR       Size
+  +------------------+
+  | Userspace        |                           0x00000000  TASK_SIZE
+  +------------------+                           0x40000000
+  +------------------+
+  | Page table       |  XCHAL_PAGE_TABLE_VADDR   0x80000000  XCHAL_PAGE_TABLE_SIZE
+  +------------------+
+  | KASAN shadow map |  KASAN_SHADOW_START       0x80400000  KASAN_SHADOW_SIZE
+  +------------------+                           0x8e400000
+  +------------------+
+  | VMALLOC area     |  VMALLOC_START            0xa0000000  128MB - 64KB
+  +------------------+  VMALLOC_END
+  | Cache aliasing   |  TLBTEMP_BASE_1           0xa7ff0000  DCACHE_WAY_SIZE
+  | remap area 1     |
+  +------------------+
+  | Cache aliasing   |  TLBTEMP_BASE_2                       DCACHE_WAY_SIZE
+  | remap area 2     |
+  +------------------+
+  +------------------+
+  | KMAP area        |  PKMAP_BASE                           PTRS_PER_PTE *
+  |                  |                                       DCACHE_N_COLORS *
+  |                  |                                       PAGE_SIZE
+  |                  |                                       (4MB * DCACHE_N_COLORS)
+  +------------------+
+  | Atomic KMAP area |  FIXADDR_START                        KM_TYPE_NR *
+  |                  |                                       NR_CPUS *
+  |                  |                                       DCACHE_N_COLORS *
+  |                  |                                       PAGE_SIZE
+  +------------------+  FIXADDR_TOP              0xaffff000
+  +------------------+
+  | Cached KSEG      |  XCHAL_KSEG_CACHED_VADDR  0xb0000000  256MB
+  +------------------+
+  | Uncached KSEG    |  XCHAL_KSEG_BYPASS_VADDR  0xc0000000  256MB
+  +------------------+
+  +------------------+
+  | Cached KIO       |  XCHAL_KIO_CACHED_VADDR   0xe0000000  256MB
+  +------------------+
+  | Uncached KIO     |  XCHAL_KIO_BYPASS_VADDR   0xf0000000  256MB
+  +------------------+
+
+
+512MB cached + 512MB uncached layout::
+
+                        Symbol                   VADDR       Size
+  +------------------+
+  | Userspace        |                           0x00000000  TASK_SIZE
+  +------------------+                           0x40000000
+  +------------------+
+  | Page table       |  XCHAL_PAGE_TABLE_VADDR   0x80000000  XCHAL_PAGE_TABLE_SIZE
+  +------------------+
+  | KASAN shadow map |  KASAN_SHADOW_START       0x80400000  KASAN_SHADOW_SIZE
+  +------------------+                           0x8e400000
+  +------------------+
+  | VMALLOC area     |  VMALLOC_START            0x90000000  128MB - 64KB
+  +------------------+  VMALLOC_END
+  | Cache aliasing   |  TLBTEMP_BASE_1           0x97ff0000  DCACHE_WAY_SIZE
+  | remap area 1     |
+  +------------------+
+  | Cache aliasing   |  TLBTEMP_BASE_2                       DCACHE_WAY_SIZE
+  | remap area 2     |
+  +------------------+
+  +------------------+
+  | KMAP area        |  PKMAP_BASE                           PTRS_PER_PTE *
+  |                  |                                       DCACHE_N_COLORS *
+  |                  |                                       PAGE_SIZE
+  |                  |                                       (4MB * DCACHE_N_COLORS)
+  +------------------+
+  | Atomic KMAP area |  FIXADDR_START                        KM_TYPE_NR *
+  |                  |                                       NR_CPUS *
+  |                  |                                       DCACHE_N_COLORS *
+  |                  |                                       PAGE_SIZE
+  +------------------+  FIXADDR_TOP              0x9ffff000
+  +------------------+
+  | Cached KSEG      |  XCHAL_KSEG_CACHED_VADDR  0xa0000000  512MB
+  +------------------+
+  | Uncached KSEG    |  XCHAL_KSEG_BYPASS_VADDR  0xc0000000  512MB
+  +------------------+
+  | Cached KIO       |  XCHAL_KIO_CACHED_VADDR   0xe0000000  256MB
+  +------------------+
+  | Uncached KIO     |  XCHAL_KIO_BYPASS_VADDR   0xf0000000  256MB
+  +------------------+
diff --git a/Documentation/xtensa/mmu.txt b/Documentation/xtensa/mmu.txt
deleted file mode 100644
index 318114de63f3..000000000000
--- a/Documentation/xtensa/mmu.txt
+++ /dev/null
@@ -1,189 +0,0 @@
-MMUv3 initialization sequence.
-
-The code in the initialize_mmu macro sets up MMUv3 memory mapping
-identically to MMUv2 fixed memory mapping. Depending on
-CONFIG_INITIALIZE_XTENSA_MMU_INSIDE_VMLINUX symbol this code is
-located in addresses it was linked for (symbol undefined), or not
-(symbol defined), so it needs to be position-independent.
-
-The code has the following assumptions:
-  This code fragment is run only on an MMU v3.
-  TLBs are in their reset state.
-  ITLBCFG and DTLBCFG are zero (reset state).
-  RASID is 0x04030201 (reset state).
-  PS.RING is zero (reset state).
-  LITBASE is zero (reset state, PC-relative literals); required to be PIC.
-
-TLB setup proceeds along the following steps.
-
-  Legend:
-    VA = virtual address (two upper nibbles of it);
-    PA = physical address (two upper nibbles of it);
-    pc = physical range that contains this code;
-
-After step 2, we jump to virtual address in the range 0x40000000..0x5fffffff
-or 0x00000000..0x1fffffff, depending on whether the kernel was loaded below
-0x40000000 or above. That address corresponds to next instruction to execute
-in this code. After step 4, we jump to intended (linked) address of this code.
-The scheme below assumes that the kernel is loaded below 0x40000000.
-
-        Step0  Step1  Step2  Step3          Step4  Step5
-        =====  =====  =====  =====          =====  =====
-   VA      PA     PA     PA     PA     VA      PA     PA
- ------    --     --     --     --   ------    --     --
- E0..FF -> E0  -> E0  -> E0          F0..FF -> F0  -> F0
- C0..DF -> C0  -> C0  -> C0          E0..EF -> F0  -> F0
- A0..BF -> A0  -> A0  -> A0          D8..DF -> 00  -> 00
- 80..9F -> 80  -> 80  -> 80          D0..D7 -> 00  -> 00
- 60..7F -> 60  -> 60  -> 60
- 40..5F -> 40         -> pc  -> pc   40..5F -> pc
- 20..3F -> 20  -> 20  -> 20
- 00..1F -> 00  -> 00  -> 00
-
-The default location of IO peripherals is above 0xf0000000. This may be changed
-using a "ranges" property in a device tree simple-bus node. See the Devicetree
-Specification, section 4.5 for details on the syntax and semantics of
-simple-bus nodes. The following limitations apply:
-
-1. Only top level simple-bus nodes are considered
-
-2. Only one (first) simple-bus node is considered
-
-3. Empty "ranges" properties are not supported
-
-4. Only the first triplet in the "ranges" property is considered
-
-5. The parent-bus-address value is rounded down to the nearest 256MB boundary
-
-6. The IO area covers the entire 256MB segment of parent-bus-address; the
-   "ranges" triplet length field is ignored
-
-
-MMUv3 address space layouts.
-============================
-
-Default MMUv2-compatible layout.
-
-                      Symbol                   VADDR       Size
-+------------------+
-| Userspace        |                           0x00000000  TASK_SIZE
-+------------------+                           0x40000000
-+------------------+
-| Page table       |  XCHAL_PAGE_TABLE_VADDR   0x80000000  XCHAL_PAGE_TABLE_SIZE
-+------------------+
-| KASAN shadow map |  KASAN_SHADOW_START       0x80400000  KASAN_SHADOW_SIZE
-+------------------+                           0x8e400000
-+------------------+
-| VMALLOC area     |  VMALLOC_START            0xc0000000  128MB - 64KB
-+------------------+  VMALLOC_END
-| Cache aliasing   |  TLBTEMP_BASE_1           0xc7ff0000  DCACHE_WAY_SIZE
-| remap area 1     |
-+------------------+
-| Cache aliasing   |  TLBTEMP_BASE_2                       DCACHE_WAY_SIZE
-| remap area 2     |
-+------------------+
-+------------------+
-| KMAP area        |  PKMAP_BASE                           PTRS_PER_PTE *
-|                  |                                       DCACHE_N_COLORS *
-|                  |                                       PAGE_SIZE
-|                  |                                       (4MB * DCACHE_N_COLORS)
-+------------------+
-| Atomic KMAP area |  FIXADDR_START                        KM_TYPE_NR *
-|                  |                                       NR_CPUS *
-|                  |                                       DCACHE_N_COLORS *
-|                  |                                       PAGE_SIZE
-+------------------+  FIXADDR_TOP              0xcffff000
-+------------------+
-| Cached KSEG      |  XCHAL_KSEG_CACHED_VADDR  0xd0000000  128MB
-+------------------+
-| Uncached KSEG    |  XCHAL_KSEG_BYPASS_VADDR  0xd8000000  128MB
-+------------------+
-| Cached KIO       |  XCHAL_KIO_CACHED_VADDR   0xe0000000  256MB
-+------------------+
-| Uncached KIO     |  XCHAL_KIO_BYPASS_VADDR   0xf0000000  256MB
-+------------------+
-
-
-256MB cached + 256MB uncached layout.
-
-                      Symbol                   VADDR       Size
-+------------------+
-| Userspace        |                           0x00000000  TASK_SIZE
-+------------------+                           0x40000000
-+------------------+
-| Page table       |  XCHAL_PAGE_TABLE_VADDR   0x80000000  XCHAL_PAGE_TABLE_SIZE
-+------------------+
-| KASAN shadow map |  KASAN_SHADOW_START       0x80400000  KASAN_SHADOW_SIZE
-+------------------+                           0x8e400000
-+------------------+
-| VMALLOC area     |  VMALLOC_START            0xa0000000  128MB - 64KB
-+------------------+  VMALLOC_END
-| Cache aliasing   |  TLBTEMP_BASE_1           0xa7ff0000  DCACHE_WAY_SIZE
-| remap area 1     |
-+------------------+
-| Cache aliasing   |  TLBTEMP_BASE_2                       DCACHE_WAY_SIZE
-| remap area 2     |
-+------------------+
-+------------------+
-| KMAP area        |  PKMAP_BASE                           PTRS_PER_PTE *
-|                  |                                       DCACHE_N_COLORS *
-|                  |                                       PAGE_SIZE
-|                  |                                       (4MB * DCACHE_N_COLORS)
-+------------------+
-| Atomic KMAP area |  FIXADDR_START                        KM_TYPE_NR *
-|                  |                                       NR_CPUS *
-|                  |                                       DCACHE_N_COLORS *
-|                  |                                       PAGE_SIZE
-+------------------+  FIXADDR_TOP              0xaffff000
-+------------------+
-| Cached KSEG      |  XCHAL_KSEG_CACHED_VADDR  0xb0000000  256MB
-+------------------+
-| Uncached KSEG    |  XCHAL_KSEG_BYPASS_VADDR  0xc0000000  256MB
-+------------------+
-+------------------+
-| Cached KIO       |  XCHAL_KIO_CACHED_VADDR   0xe0000000  256MB
-+------------------+
-| Uncached KIO     |  XCHAL_KIO_BYPASS_VADDR   0xf0000000  256MB
-+------------------+
-
-
-512MB cached + 512MB uncached layout.
-
-                      Symbol                   VADDR       Size
-+------------------+
-| Userspace        |                           0x00000000  TASK_SIZE
-+------------------+                           0x40000000
-+------------------+
-| Page table       |  XCHAL_PAGE_TABLE_VADDR   0x80000000  XCHAL_PAGE_TABLE_SIZE
-+------------------+
-| KASAN shadow map |  KASAN_SHADOW_START       0x80400000  KASAN_SHADOW_SIZE
-+------------------+                           0x8e400000
-+------------------+
-| VMALLOC area     |  VMALLOC_START            0x90000000  128MB - 64KB
-+------------------+  VMALLOC_END
-| Cache aliasing   |  TLBTEMP_BASE_1           0x97ff0000  DCACHE_WAY_SIZE
-| remap area 1     |
-+------------------+
-| Cache aliasing   |  TLBTEMP_BASE_2                       DCACHE_WAY_SIZE
-| remap area 2     |
-+------------------+
-+------------------+
-| KMAP area        |  PKMAP_BASE                           PTRS_PER_PTE *
-|                  |                                       DCACHE_N_COLORS *
-|                  |                                       PAGE_SIZE
-|                  |                                       (4MB * DCACHE_N_COLORS)
-+------------------+
-| Atomic KMAP area |  FIXADDR_START                        KM_TYPE_NR *
-|                  |                                       NR_CPUS *
-|                  |                                       DCACHE_N_COLORS *
-|                  |                                       PAGE_SIZE
-+------------------+  FIXADDR_TOP              0x9ffff000
-+------------------+
-| Cached KSEG      |  XCHAL_KSEG_CACHED_VADDR  0xa0000000  512MB
-+------------------+
-| Uncached KSEG    |  XCHAL_KSEG_BYPASS_VADDR  0xc0000000  512MB
-+------------------+
-| Cached KIO       |  XCHAL_KIO_CACHED_VADDR   0xe0000000  256MB
-+------------------+
-| Uncached KIO     |  XCHAL_KIO_BYPASS_VADDR   0xf0000000  256MB
-+------------------+
diff --git a/arch/xtensa/include/asm/initialize_mmu.h b/arch/xtensa/include/asm/initialize_mmu.h
index 323d05789159..3b054d2bede0 100644
--- a/arch/xtensa/include/asm/initialize_mmu.h
+++ b/arch/xtensa/include/asm/initialize_mmu.h
@@ -42,7 +42,7 @@
 #if XCHAL_HAVE_S32C1I && (XCHAL_HW_MIN_VERSION >= XTENSA_HWVERSION_RC_2009_0)
 /*
  * We Have Atomic Operation Control (ATOMCTL) Register; Initialize it.
- * For details see Documentation/xtensa/atomctl.txt
+ * For details see Documentation/xtensa/atomctl.rst
  */
 #if XCHAL_DCACHE_IS_COHERENT
 	movi	a3, 0x25	/* For SMP/MX -- internal for writeback,
-- 
cgit v1.2.3-55-g7522


From f408510c4ff38965289bb53e8462861ad05dfada Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 14:44:06 -0300
Subject: docs: mmc: convert to ReST

Rename the mmc documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/mmc/index.rst         | 13 +++++
 Documentation/mmc/mmc-async-req.rst | 98 +++++++++++++++++++++++++++++++++++++
 Documentation/mmc/mmc-async-req.txt | 87 --------------------------------
 Documentation/mmc/mmc-dev-attrs.rst | 91 ++++++++++++++++++++++++++++++++++
 Documentation/mmc/mmc-dev-attrs.txt | 77 -----------------------------
 Documentation/mmc/mmc-dev-parts.rst | 41 ++++++++++++++++
 Documentation/mmc/mmc-dev-parts.txt | 40 ---------------
 Documentation/mmc/mmc-tools.rst     | 37 ++++++++++++++
 Documentation/mmc/mmc-tools.txt     | 34 -------------
 9 files changed, 280 insertions(+), 238 deletions(-)
 create mode 100644 Documentation/mmc/index.rst
 create mode 100644 Documentation/mmc/mmc-async-req.rst
 delete mode 100644 Documentation/mmc/mmc-async-req.txt
 create mode 100644 Documentation/mmc/mmc-dev-attrs.rst
 delete mode 100644 Documentation/mmc/mmc-dev-attrs.txt
 create mode 100644 Documentation/mmc/mmc-dev-parts.rst
 delete mode 100644 Documentation/mmc/mmc-dev-parts.txt
 create mode 100644 Documentation/mmc/mmc-tools.rst
 delete mode 100644 Documentation/mmc/mmc-tools.txt

diff --git a/Documentation/mmc/index.rst b/Documentation/mmc/index.rst
new file mode 100644
index 000000000000..3305478ddadb
--- /dev/null
+++ b/Documentation/mmc/index.rst
@@ -0,0 +1,13 @@
+:orphan:
+
+========================
+MMC/SD/SDIO card support
+========================
+
+.. toctree::
+   :maxdepth: 1
+
+   mmc-dev-attrs
+   mmc-dev-parts
+   mmc-async-req
+   mmc-tools
diff --git a/Documentation/mmc/mmc-async-req.rst b/Documentation/mmc/mmc-async-req.rst
new file mode 100644
index 000000000000..0f7197c9c3b5
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.rst
@@ -0,0 +1,98 @@
+========================
+MMC Asynchronous Request
+========================
+
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is the
+more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is an ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional members in the mmc_host_ops -- pre_req() and
+post_req() -- that the host driver may implement in order to move work
+to before and after the actual mmc_host_ops.request() function is called.
+
+In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
+descriptor, and post_req() runs the dma_unmap_sg().
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel
+with the previous transfer, since there is no previous request.
+
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead::
+
+  if (is_first_req && req->size > threshold)
+     /* start MMC transfer for the complete transfer size */
+     mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+     /*
+      * Begin to prepare DMA while cmd is being processed by MMC.
+      * The first chunk of the request should take the same time
+      * to prepare as the "MMC process command time".
+      * If prepare time exceeds MMC cmd time
+      * the transfer is delayed, guesstimate max 4k as first chunk size.
+      */
+      prepare_1st_chunk_for_dma(req);
+      /* flush pending desc to the DMAC (dmaengine.h) */
+      dma_issue_pending(req->dma_desc);
+
+      prepare_2nd_chunk_for_dma(req);
+      /*
+       * The second issue_pending should be called before MMC runs out
+       * of the first chunk. If the MMC runs out of the first data chunk
+       * before this call, the transfer is delayed.
+       */
+      dma_issue_pending(req->dma_desc);
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
deleted file mode 100644
index ae1907b10e4a..000000000000
--- a/Documentation/mmc/mmc-async-req.txt
+++ /dev/null
@@ -1,87 +0,0 @@
-Rationale
-=========
-
-How significant is the cache maintenance overhead?
-It depends. Fast eMMC and multiple cache levels with speculative cache
-pre-fetch makes the cache overhead relatively significant. If the DMA
-preparations for the next request are done in parallel with the current
-transfer, the DMA preparation overhead would not affect the MMC performance.
-The intention of non-blocking (asynchronous) MMC requests is to minimize the
-time between when an MMC request ends and another MMC request begins.
-Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
-dma_unmap_sg are processing. Using non-blocking MMC requests makes it
-possible to prepare the caches for next job in parallel with an active
-MMC request.
-
-MMC block driver
-================
-
-The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
-The increase in throughput is proportional to the time it takes to
-prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
-a request and how fast the memory is. The faster the MMC/SD is the
-more significant the prepare request time becomes. Roughly the expected
-performance gain is 5% for large writes and 10% on large reads on a L2 cache
-platform. In power save mode, when clocks run on a lower frequency, the DMA
-preparation may cost even more. As long as these slower preparations are run
-in parallel with the transfer performance won't be affected.
-
-Details on measurements from IOZone and mmc_test
-================================================
-
-https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
-
-MMC core API extension
-======================
-
-There is one new public function mmc_start_req().
-It starts a new MMC command request for a host. The function isn't
-truly non-blocking. If there is an ongoing async request it waits
-for completion of that request and starts the new one and returns. It
-doesn't wait for the new request to complete. If there is no ongoing
-request it starts the new request and returns immediately.
-
-MMC host extensions
-===================
-
-There are two optional members in the mmc_host_ops -- pre_req() and
-post_req() -- that the host driver may implement in order to move work
-to before and after the actual mmc_host_ops.request() function is called.
-In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
-descriptor, and post_req() runs the dma_unmap_sg().
-
-Optimize for the first request
-==============================
-
-The first request in a series of requests can't be prepared in parallel
-with the previous transfer, since there is no previous request.
-The argument is_first_req in pre_req() indicates that there is no previous
-request. The host driver may optimize for this scenario to minimize
-the performance loss. A way to optimize for this is to split the current
-request in two chunks, prepare the first chunk and start the request,
-and finally prepare the second chunk and start the transfer.
-
-Pseudocode to handle is_first_req scenario with minimal prepare overhead:
-
-if (is_first_req && req->size > threshold)
-   /* start MMC transfer for the complete transfer size */
-   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
-
-   /*
-    * Begin to prepare DMA while cmd is being processed by MMC.
-    * The first chunk of the request should take the same time
-    * to prepare as the "MMC process command time".
-    * If prepare time exceeds MMC cmd time
-    * the transfer is delayed, guesstimate max 4k as first chunk size.
-    */
-    prepare_1st_chunk_for_dma(req);
-    /* flush pending desc to the DMAC (dmaengine.h) */
-    dma_issue_pending(req->dma_desc);
-
-    prepare_2nd_chunk_for_dma(req);
-    /*
-     * The second issue_pending should be called before MMC runs out
-     * of the first chunk. If the MMC runs out of the first data chunk
-     * before this call, the transfer is delayed.
-     */
-    dma_issue_pending(req->dma_desc);
diff --git a/Documentation/mmc/mmc-dev-attrs.rst b/Documentation/mmc/mmc-dev-attrs.rst
new file mode 100644
index 000000000000..4f44b1b730d6
--- /dev/null
+++ b/Documentation/mmc/mmc-dev-attrs.rst
@@ -0,0 +1,91 @@
+==================================
+SD and MMC Block Device Attributes
+==================================
+
+These attributes are defined for the block devices associated with the
+SD or MMC device.
+
+The following attributes are read/write.
+
+	========		===============================================
+	force_ro		Enforce read-only access even if write protect 					switch is off.
+	========		===============================================
+
+SD and MMC Device Attributes
+============================
+
+All attributes are read-only.
+
+	======================	===============================================
+	cid			Card Identification Register
+	csd			Card Specific Data Register
+	scr			SD Card Configuration Register (SD only)
+	date			Manufacturing Date (from CID Register)
+	fwrev			Firmware/Product Revision (from CID Register)
+				(SD and MMCv1 only)
+	hwrev			Hardware/Product Revision (from CID Register)
+				(SD and MMCv1 only)
+	manfid			Manufacturer ID (from CID Register)
+	name			Product Name (from CID Register)
+	oemid			OEM/Application ID (from CID Register)
+	prv			Product Revision (from CID Register)
+				(SD and MMCv4 only)
+	serial			Product Serial Number (from CID Register)
+	erase_size		Erase group size
+	preferred_erase_size	Preferred erase size
+	raw_rpmb_size_mult	RPMB partition size
+	rel_sectors		Reliable write sector count
+	ocr 			Operation Conditions Register
+	dsr			Driver Stage Register
+	cmdq_en			Command Queue enabled:
+
+					1 => enabled, 0 => not enabled
+	======================	===============================================
+
+Note on Erase Size and Preferred Erase Size:
+
+	"erase_size" is the  minimum size, in bytes, of an erase
+	operation.  For MMC, "erase_size" is the erase group size
+	reported by the card.  Note that "erase_size" does not apply
+	to trim or secure trim operations where the minimum size is
+	always one 512 byte sector.  For SD, "erase_size" is 512
+	if the card is block-addressed, 0 otherwise.
+
+	SD/MMC cards can erase an arbitrarily large area up to and
+	including the whole card.  When erasing a large area it may
+	be desirable to do it in smaller chunks for three reasons:
+
+	     1. A single erase command will make all other I/O on
+		the card wait.  This is not a problem if the whole card
+		is being erased, but erasing one partition will make
+		I/O for another partition on the same card wait for the
+		duration of the erase - which could be a several
+		minutes.
+	     2. To be able to inform the user of erase progress.
+	     3. The erase timeout becomes too large to be very
+		useful.  Because the erase timeout contains a margin
+		which is multiplied by the size of the erase area,
+		the value can end up being several minutes for large
+		areas.
+
+	"erase_size" is not the most efficient unit to erase
+	(especially for SD where it is just one sector),
+	hence "preferred_erase_size" provides a good chunk
+	size for erasing large areas.
+
+	For MMC, "preferred_erase_size" is the high-capacity
+	erase size if a card specifies one, otherwise it is
+	based on the capacity of the card.
+
+	For SD, "preferred_erase_size" is the allocation unit
+	size specified by the card.
+
+	"preferred_erase_size" is in bytes.
+
+Note on raw_rpmb_size_mult:
+
+	"raw_rpmb_size_mult" is a multiple of 128kB block.
+
+	RPMB size in byte is calculated by using the following equation:
+
+		RPMB partition size = 128kB x raw_rpmb_size_mult
diff --git a/Documentation/mmc/mmc-dev-attrs.txt b/Documentation/mmc/mmc-dev-attrs.txt
deleted file mode 100644
index 4ad0bb17f343..000000000000
--- a/Documentation/mmc/mmc-dev-attrs.txt
+++ /dev/null
@@ -1,77 +0,0 @@
-SD and MMC Block Device Attributes
-==================================
-
-These attributes are defined for the block devices associated with the
-SD or MMC device.
-
-The following attributes are read/write.
-
-	force_ro		Enforce read-only access even if write protect switch is off.
-
-SD and MMC Device Attributes
-============================
-
-All attributes are read-only.
-
-	cid			Card Identification Register
-	csd			Card Specific Data Register
-	scr			SD Card Configuration Register (SD only)
-	date			Manufacturing Date (from CID Register)
-	fwrev			Firmware/Product Revision (from CID Register) (SD and MMCv1 only)
-	hwrev			Hardware/Product Revision (from CID Register) (SD and MMCv1 only)
-	manfid			Manufacturer ID (from CID Register)
-	name			Product Name (from CID Register)
-	oemid			OEM/Application ID (from CID Register)
-	prv			Product Revision (from CID Register) (SD and MMCv4 only)
-	serial			Product Serial Number (from CID Register)
-	erase_size		Erase group size
-	preferred_erase_size	Preferred erase size
-	raw_rpmb_size_mult	RPMB partition size
-	rel_sectors		Reliable write sector count
-	ocr 			Operation Conditions Register
-	dsr			Driver Stage Register
-	cmdq_en			Command Queue enabled: 1 => enabled, 0 => not enabled
-
-Note on Erase Size and Preferred Erase Size:
-
-	"erase_size" is the  minimum size, in bytes, of an erase
-	operation.  For MMC, "erase_size" is the erase group size
-	reported by the card.  Note that "erase_size" does not apply
-	to trim or secure trim operations where the minimum size is
-	always one 512 byte sector.  For SD, "erase_size" is 512
-	if the card is block-addressed, 0 otherwise.
-
-	SD/MMC cards can erase an arbitrarily large area up to and
-	including the whole card.  When erasing a large area it may
-	be desirable to do it in smaller chunks for three reasons:
-		1. A single erase command will make all other I/O on
-		the card wait.  This is not a problem if the whole card
-		is being erased, but erasing one partition will make
-		I/O for another partition on the same card wait for the
-		duration of the erase - which could be a several
-		minutes.
-		2. To be able to inform the user of erase progress.
-		3. The erase timeout becomes too large to be very
-		useful.  Because the erase timeout contains a margin
-		which is multiplied by the size of the erase area,
-		the value can end up being several minutes for large
-		areas.
-
-	"erase_size" is not the most efficient unit to erase
-	(especially for SD where it is just one sector),
-	hence "preferred_erase_size" provides a good chunk
-	size for erasing large areas.
-
-	For MMC, "preferred_erase_size" is the high-capacity
-	erase size if a card specifies one, otherwise it is
-	based on the capacity of the card.
-
-	For SD, "preferred_erase_size" is the allocation unit
-	size specified by the card.
-
-	"preferred_erase_size" is in bytes.
-
-Note on raw_rpmb_size_mult:
-	"raw_rpmb_size_mult" is a multiple of 128kB block.
-	RPMB size in byte is calculated by using the following equation:
-	RPMB partition size = 128kB x raw_rpmb_size_mult
diff --git a/Documentation/mmc/mmc-dev-parts.rst b/Documentation/mmc/mmc-dev-parts.rst
new file mode 100644
index 000000000000..995922f1f744
--- /dev/null
+++ b/Documentation/mmc/mmc-dev-parts.rst
@@ -0,0 +1,41 @@
+============================
+SD and MMC Device Partitions
+============================
+
+Device partitions are additional logical block devices present on the
+SD/MMC device.
+
+As of this writing, MMC boot partitions as supported and exposed as
+/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the
+parent /dev/mmcblkX.
+
+MMC Boot Partitions
+===================
+
+Read and write access is provided to the two MMC boot partitions. Due to
+the sensitive nature of the boot partition contents, which often store
+a bootloader or bootloader configuration tables crucial to booting the
+platform, write access is disabled by default to reduce the chance of
+accidental bricking.
+
+To enable write access to /dev/mmcblkXbootY, disable the forced read-only
+access with::
+
+	echo 0 > /sys/block/mmcblkXbootY/force_ro
+
+To re-enable read-only access::
+
+	echo 1 > /sys/block/mmcblkXbootY/force_ro
+
+The boot partitions can also be locked read only until the next power on,
+with::
+
+	echo 1 > /sys/block/mmcblkXbootY/ro_lock_until_next_power_on
+
+This is a feature of the card and not of the kernel. If the card does
+not support boot partition locking, the file will not exist. If the
+feature has been disabled on the card, the file will be read-only.
+
+The boot partitions can also be locked permanently, but this feature is
+not accessible through sysfs in order to avoid accidental or malicious
+bricking.
diff --git a/Documentation/mmc/mmc-dev-parts.txt b/Documentation/mmc/mmc-dev-parts.txt
deleted file mode 100644
index f08d078d43cf..000000000000
--- a/Documentation/mmc/mmc-dev-parts.txt
+++ /dev/null
@@ -1,40 +0,0 @@
-SD and MMC Device Partitions
-============================
-
-Device partitions are additional logical block devices present on the
-SD/MMC device.
-
-As of this writing, MMC boot partitions as supported and exposed as
-/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the
-parent /dev/mmcblkX.
-
-MMC Boot Partitions
-===================
-
-Read and write access is provided to the two MMC boot partitions. Due to
-the sensitive nature of the boot partition contents, which often store
-a bootloader or bootloader configuration tables crucial to booting the
-platform, write access is disabled by default to reduce the chance of
-accidental bricking.
-
-To enable write access to /dev/mmcblkXbootY, disable the forced read-only
-access with:
-
-echo 0 > /sys/block/mmcblkXbootY/force_ro
-
-To re-enable read-only access:
-
-echo 1 > /sys/block/mmcblkXbootY/force_ro
-
-The boot partitions can also be locked read only until the next power on,
-with:
-
-echo 1 > /sys/block/mmcblkXbootY/ro_lock_until_next_power_on
-
-This is a feature of the card and not of the kernel. If the card does
-not support boot partition locking, the file will not exist. If the
-feature has been disabled on the card, the file will be read-only.
-
-The boot partitions can also be locked permanently, but this feature is
-not accessible through sysfs in order to avoid accidental or malicious
-bricking.
diff --git a/Documentation/mmc/mmc-tools.rst b/Documentation/mmc/mmc-tools.rst
new file mode 100644
index 000000000000..54406093768b
--- /dev/null
+++ b/Documentation/mmc/mmc-tools.rst
@@ -0,0 +1,37 @@
+======================
+MMC tools introduction
+======================
+
+There is one MMC test tools called mmc-utils, which is maintained by Chris Ball,
+you can find it at the below public git repository:
+
+	http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc-utils.git/
+
+Functions
+=========
+
+The mmc-utils tools can do the following:
+
+ - Print and parse extcsd data.
+ - Determine the eMMC writeprotect status.
+ - Set the eMMC writeprotect status.
+ - Set the eMMC data sector size to 4KB by disabling emulation.
+ - Create general purpose partition.
+ - Enable the enhanced user area.
+ - Enable write reliability per partition.
+ - Print the response to STATUS_SEND (CMD13).
+ - Enable the boot partition.
+ - Set Boot Bus Conditions.
+ - Enable the eMMC BKOPS feature.
+ - Permanently enable the eMMC H/W Reset feature.
+ - Permanently disable the eMMC H/W Reset feature.
+ - Send Sanitize command.
+ - Program authentication key for the device.
+ - Counter value for the rpmb device will be read to stdout.
+ - Read from rpmb device to output.
+ - Write to rpmb device from data file.
+ - Enable the eMMC cache feature.
+ - Disable the eMMC cache feature.
+ - Print and parse CID data.
+ - Print and parse CSD data.
+ - Print and parse SCR data.
diff --git a/Documentation/mmc/mmc-tools.txt b/Documentation/mmc/mmc-tools.txt
deleted file mode 100644
index 735509c165d5..000000000000
--- a/Documentation/mmc/mmc-tools.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-MMC tools introduction
-======================
-
-There is one MMC test tools called mmc-utils, which is maintained by Chris Ball,
-you can find it at the below public git repository:
-http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc-utils.git/
-
-Functions
-=========
-
-The mmc-utils tools can do the following:
- - Print and parse extcsd data.
- - Determine the eMMC writeprotect status.
- - Set the eMMC writeprotect status.
- - Set the eMMC data sector size to 4KB by disabling emulation.
- - Create general purpose partition.
- - Enable the enhanced user area.
- - Enable write reliability per partition.
- - Print the response to STATUS_SEND (CMD13).
- - Enable the boot partition.
- - Set Boot Bus Conditions.
- - Enable the eMMC BKOPS feature.
- - Permanently enable the eMMC H/W Reset feature.
- - Permanently disable the eMMC H/W Reset feature.
- - Send Sanitize command.
- - Program authentication key for the device.
- - Counter value for the rpmb device will be read to stdout.
- - Read from rpmb device to output.
- - Write to rpmb device from data file.
- - Enable the eMMC cache feature.
- - Disable the eMMC cache feature.
- - Print and parse CID data.
- - Print and parse CSD data.
- - Print and parse SCR data.
-- 
cgit v1.2.3-55-g7522


From 08536105d93fe371743709b85350db141bafc51f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 11:21:26 -0300
Subject: docs: ioctl-number.txt: convert it to ReST format

The conversion itself is simple: add a markup for the
title of this file and add markups for both tables.

Yet, the big table here with IOCTL numbers is badly formatted:
on several lines, the "Include File" column has some values that
are bigger than the reserved space there.

Also, on several places, a comment was misplaced at the "Include
File" space.

So, most of the work here is to actually ensure that each field
will be properly fixed.

Also worth to mention that some URLs have the asterisk character
on it. Well, Sphinx has an issue with asterisks in the middle
of an string. As this is URL, use the alternate format: %2A.

As a side effect of this patch, it is now a lot easier to see that
some reserved ioctl numbers are missing the include files
where it is supposed to be used.

PS.: While this is part of a subdir, I opted to convert this
single file alone, as this file has a potential of conflicts,
as most subsystem maintainers touch it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/ioctl/ioctl-number.rst               | 363 +++++++++++++++++++++
 Documentation/ioctl/ioctl-number.txt               | 351 --------------------
 Documentation/process/submit-checklist.rst         |   2 +-
 .../it_IT/process/submit-checklist.rst             |   2 +-
 .../zh_CN/process/submit-checklist.rst             |   2 +-
 include/uapi/rdma/rdma_user_ioctl_cmds.h           |   2 +-
 6 files changed, 367 insertions(+), 355 deletions(-)
 create mode 100644 Documentation/ioctl/ioctl-number.rst
 delete mode 100644 Documentation/ioctl/ioctl-number.txt

diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst
new file mode 100644
index 000000000000..fcf9623a599f
--- /dev/null
+++ b/Documentation/ioctl/ioctl-number.rst
@@ -0,0 +1,363 @@
+:orphan:
+
+=============
+Ioctl Numbers
+=============
+
+19 October 1999
+
+Michael Elizabeth Chastain
+<mec@shout.net>
+
+If you are adding new ioctl's to the kernel, you should use the _IO
+macros defined in <linux/ioctl.h>:
+
+    ====== == ============================================
+    _IO    an ioctl with no parameters
+    _IOW   an ioctl with write parameters (copy_from_user)
+    _IOR   an ioctl with read parameters  (copy_to_user)
+    _IOWR  an ioctl with both write and read parameters.
+    ====== == ============================================
+
+'Write' and 'read' are from the user's point of view, just like the
+system calls 'write' and 'read'.  For example, a SET_FOO ioctl would
+be _IOW, although the kernel would actually read data from user space;
+a GET_FOO ioctl would be _IOR, although the kernel would actually write
+data to user space.
+
+The first argument to _IO, _IOW, _IOR, or _IOWR is an identifying letter
+or number from the table below.  Because of the large number of drivers,
+many drivers share a partial letter with other drivers.
+
+If you are writing a driver for a new device and need a letter, pick an
+unused block with enough room for expansion: 32 to 256 ioctl commands.
+You can register the block by patching this file and submitting the
+patch to Linus Torvalds.  Or you can e-mail me at <mec@shout.net> and
+I'll register one for you.
+
+The second argument to _IO, _IOW, _IOR, or _IOWR is a sequence number
+to distinguish ioctls from each other.  The third argument to _IOW,
+_IOR, or _IOWR is the type of the data going into the kernel or coming
+out of the kernel (e.g.  'int' or 'struct foo').  NOTE!  Do NOT use
+sizeof(arg) as the third argument as this results in your ioctl thinking
+it passes an argument of type size_t.
+
+Some devices use their major number as the identifier; this is OK, as
+long as it is unique.  Some devices are irregular and don't follow any
+convention at all.
+
+Following this convention is good because:
+
+(1) Keeping the ioctl's globally unique helps error checking:
+    if a program calls an ioctl on the wrong device, it will get an
+    error rather than some unexpected behaviour.
+
+(2) The 'strace' build procedure automatically finds ioctl numbers
+    defined with _IO, _IOW, _IOR, or _IOWR.
+
+(3) 'strace' can decode numbers back into useful names when the
+    numbers are unique.
+
+(4) People looking for ioctls can grep for them more easily when
+    this convention is used to define the ioctl numbers.
+
+(5) When following the convention, the driver code can use generic
+    code to copy the parameters between user and kernel space.
+
+This table lists ioctls visible from user land for Linux/x86.  It contains
+most drivers up to 2.6.31, but I know I am missing some.  There has been
+no attempt to list non-X86 architectures or ioctls from drivers/staging/.
+
+====  =====  ======================================================= ================================================================
+Code  Seq#    Include File                                           Comments
+      (hex)
+====  =====  ======================================================= ================================================================
+0x00  00-1F  linux/fs.h                                              conflict!
+0x00  00-1F  scsi/scsi_ioctl.h                                       conflict!
+0x00  00-1F  linux/fb.h                                              conflict!
+0x00  00-1F  linux/wavefront.h                                       conflict!
+0x02  all    linux/fd.h
+0x03  all    linux/hdreg.h
+0x04  D2-DC  linux/umsdos_fs.h                                       Dead since 2.6.11, but don't reuse these.
+0x06  all    linux/lp.h
+0x09  all    linux/raid/md_u.h
+0x10  00-0F  drivers/char/s390/vmcp.h
+0x10  10-1F  arch/s390/include/uapi/sclp_ctl.h
+0x10  20-2F  arch/s390/include/uapi/asm/hypfs.h
+0x12  all    linux/fs.h
+             linux/blkpg.h
+0x1b  all                                                            InfiniBand Subsystem
+                                                                     <http://infiniband.sourceforge.net/>
+0x20  all    drivers/cdrom/cm206.h
+0x22  all    scsi/sg.h
+'!'   00-1F  uapi/linux/seccomp.h
+'#'   00-3F                                                          IEEE 1394 Subsystem
+                                                                     Block for the entire subsystem
+'$'   00-0F  linux/perf_counter.h, linux/perf_event.h
+'%'   00-0F  include/uapi/linux/stm.h                                System Trace Module subsystem
+                                                                     <mailto:alexander.shishkin@linux.intel.com>
+'&'   00-07  drivers/firewire/nosy-user.h
+'1'   00-1F  linux/timepps.h                                         PPS kit from Ulrich Windl
+                                                                     <ftp://ftp.de.kernel.org/pub/linux/daemons/ntp/PPS/>
+'2'   01-04  linux/i2o.h
+'3'   00-0F  drivers/s390/char/raw3270.h                             conflict!
+'3'   00-1F  linux/suspend_ioctls.h,                                 conflict!
+             kernel/power/user.c
+'8'   all                                                            SNP8023 advanced NIC card
+                                                                     <mailto:mcr@solidum.com>
+';'   64-7F  linux/vfio.h
+'@'   00-0F  linux/radeonfb.h                                        conflict!
+'@'   00-0F  drivers/video/aty/aty128fb.c                            conflict!
+'A'   00-1F  linux/apm_bios.h                                        conflict!
+'A'   00-0F  linux/agpgart.h,                                        conflict!
+             drivers/char/agp/compat_ioctl.h
+'A'   00-7F  sound/asound.h                                          conflict!
+'B'   00-1F  linux/cciss_ioctl.h                                     conflict!
+'B'   00-0F  include/linux/pmu.h                                     conflict!
+'B'   C0-FF  advanced bbus                                           <mailto:maassen@uni-freiburg.de>
+'C'   all    linux/soundcard.h                                       conflict!
+'C'   01-2F  linux/capi.h                                            conflict!
+'C'   F0-FF  drivers/net/wan/cosa.h                                  conflict!
+'D'   all    arch/s390/include/asm/dasd.h
+'D'   40-5F  drivers/scsi/dpt/dtpi_ioctl.h
+'D'   05     drivers/scsi/pmcraid.h
+'E'   all    linux/input.h                                           conflict!
+'E'   00-0F  xen/evtchn.h                                            conflict!
+'F'   all    linux/fb.h                                              conflict!
+'F'   01-02  drivers/scsi/pmcraid.h                                  conflict!
+'F'   20     drivers/video/fsl-diu-fb.h                              conflict!
+'F'   20     drivers/video/intelfb/intelfb.h                         conflict!
+'F'   20     linux/ivtvfb.h                                          conflict!
+'F'   20     linux/matroxfb.h                                        conflict!
+'F'   20     drivers/video/aty/atyfb_base.c                          conflict!
+'F'   00-0F  video/da8xx-fb.h                                        conflict!
+'F'   80-8F  linux/arcfb.h                                           conflict!
+'F'   DD     video/sstfb.h                                           conflict!
+'G'   00-3F  drivers/misc/sgi-gru/grulib.h                           conflict!
+'G'   00-0F  linux/gigaset_dev.h                                     conflict!
+'H'   00-7F  linux/hiddev.h                                          conflict!
+'H'   00-0F  linux/hidraw.h                                          conflict!
+'H'   01     linux/mei.h                                             conflict!
+'H'   02     linux/mei.h                                             conflict!
+'H'   03     linux/mei.h                                             conflict!
+'H'   00-0F  sound/asound.h                                          conflict!
+'H'   20-40  sound/asound_fm.h                                       conflict!
+'H'   80-8F  sound/sfnt_info.h                                       conflict!
+'H'   10-8F  sound/emu10k1.h                                         conflict!
+'H'   10-1F  sound/sb16_csp.h                                        conflict!
+'H'   10-1F  sound/hda_hwdep.h                                       conflict!
+'H'   40-4F  sound/hdspm.h                                           conflict!
+'H'   40-4F  sound/hdsp.h                                            conflict!
+'H'   90     sound/usb/usx2y/usb_stream.h
+'H'   A0     uapi/linux/usb/cdc-wdm.h
+'H'   C0-F0  net/bluetooth/hci.h                                     conflict!
+'H'   C0-DF  net/bluetooth/hidp/hidp.h                               conflict!
+'H'   C0-DF  net/bluetooth/cmtp/cmtp.h                               conflict!
+'H'   C0-DF  net/bluetooth/bnep/bnep.h                               conflict!
+'H'   F1     linux/hid-roccat.h                                      <mailto:erazor_de@users.sourceforge.net>
+'H'   F8-FA  sound/firewire.h
+'I'   all    linux/isdn.h                                            conflict!
+'I'   00-0F  drivers/isdn/divert/isdn_divert.h                       conflict!
+'I'   40-4F  linux/mISDNif.h                                         conflict!
+'J'   00-1F  drivers/scsi/gdth_ioctl.h
+'K'   all    linux/kd.h
+'L'   00-1F  linux/loop.h                                            conflict!
+'L'   10-1F  drivers/scsi/mpt3sas/mpt3sas_ctl.h                      conflict!
+'L'   20-2F  linux/lightnvm.h
+'L'   E0-FF  linux/ppdd.h                                            encrypted disk device driver
+                                                                     <http://linux01.gwdg.de/~alatham/ppdd.html>
+'M'   all    linux/soundcard.h                                       conflict!
+'M'   01-16  mtd/mtd-abi.h                                           conflict!
+      and    drivers/mtd/mtdchar.c
+'M'   01-03  drivers/scsi/megaraid/megaraid_sas.h
+'M'   00-0F  drivers/video/fsl-diu-fb.h                              conflict!
+'N'   00-1F  drivers/usb/scanner.h
+'N'   40-7F  drivers/block/nvme.c
+'O'   00-06  mtd/ubi-user.h                                          UBI
+'P'   all    linux/soundcard.h                                       conflict!
+'P'   60-6F  sound/sscape_ioctl.h                                    conflict!
+'P'   00-0F  drivers/usb/class/usblp.c                               conflict!
+'P'   01-09  drivers/misc/pci_endpoint_test.c                        conflict!
+'Q'   all    linux/soundcard.h
+'R'   00-1F  linux/random.h                                          conflict!
+'R'   01     linux/rfkill.h                                          conflict!
+'R'   C0-DF  net/bluetooth/rfcomm.h
+'S'   all    linux/cdrom.h                                           conflict!
+'S'   80-81  scsi/scsi_ioctl.h                                       conflict!
+'S'   82-FF  scsi/scsi.h                                             conflict!
+'S'   00-7F  sound/asequencer.h                                      conflict!
+'T'   all    linux/soundcard.h                                       conflict!
+'T'   00-AF  sound/asound.h                                          conflict!
+'T'   all    arch/x86/include/asm/ioctls.h                           conflict!
+'T'   C0-DF  linux/if_tun.h                                          conflict!
+'U'   all    sound/asound.h                                          conflict!
+'U'   00-CF  linux/uinput.h                                          conflict!
+'U'   00-EF  linux/usbdevice_fs.h
+'U'   C0-CF  drivers/bluetooth/hci_uart.h
+'V'   all    linux/vt.h                                              conflict!
+'V'   all    linux/videodev2.h                                       conflict!
+'V'   C0     linux/ivtvfb.h                                          conflict!
+'V'   C0     linux/ivtv.h                                            conflict!
+'V'   C0     media/davinci/vpfe_capture.h                            conflict!
+'V'   C0     media/si4713.h                                          conflict!
+'W'   00-1F  linux/watchdog.h                                        conflict!
+'W'   00-1F  linux/wanrouter.h                                       conflict! (pre 3.9)
+'W'   00-3F  sound/asound.h                                          conflict!
+'W'   40-5F  drivers/pci/switch/switchtec.c
+'X'   all    fs/xfs/xfs_fs.h,                                        conflict!
+             fs/xfs/linux-2.6/xfs_ioctl32.h,
+             include/linux/falloc.h,
+             linux/fs.h,
+'X'   all    fs/ocfs2/ocfs_fs.h                                      conflict!
+'X'   01     linux/pktcdvd.h                                         conflict!
+'Y'   all    linux/cyclades.h
+'Z'   14-15  drivers/message/fusion/mptctl.h
+'['   00-3F  linux/usb/tmc.h                                         USB Test and Measurement Devices
+                                                                     <mailto:gregkh@linuxfoundation.org>
+'a'   all    linux/atm*.h, linux/sonet.h                             ATM on linux
+                                                                     <http://lrcwww.epfl.ch/>
+'a'   00-0F  drivers/crypto/qat/qat_common/adf_cfg_common.h          conflict! qat driver
+'b'   00-FF                                                          conflict! bit3 vme host bridge
+                                                                     <mailto:natalia@nikhefk.nikhef.nl>
+'c'   all    linux/cm4000_cs.h                                       conflict!
+'c'   00-7F  linux/comstats.h                                        conflict!
+'c'   00-7F  linux/coda.h                                            conflict!
+'c'   00-1F  linux/chio.h                                            conflict!
+'c'   80-9F  arch/s390/include/asm/chsc.h                            conflict!
+'c'   A0-AF  arch/x86/include/asm/msr.h conflict!
+'d'   00-FF  linux/char/drm/drm.h                                    conflict!
+'d'   02-40  pcmcia/ds.h                                             conflict!
+'d'   F0-FF  linux/digi1.h
+'e'   all    linux/digi1.h                                           conflict!
+'f'   00-1F  linux/ext2_fs.h                                         conflict!
+'f'   00-1F  linux/ext3_fs.h                                         conflict!
+'f'   00-0F  fs/jfs/jfs_dinode.h                                     conflict!
+'f'   00-0F  fs/ext4/ext4.h                                          conflict!
+'f'   00-0F  linux/fs.h                                              conflict!
+'f'   00-0F  fs/ocfs2/ocfs2_fs.h                                     conflict!
+'g'   00-0F  linux/usb/gadgetfs.h
+'g'   20-2F  linux/usb/g_printer.h
+'h'   00-7F                                                          conflict! Charon filesystem
+                                                                     <mailto:zapman@interlan.net>
+'h'   00-1F  linux/hpet.h                                            conflict!
+'h'   80-8F  fs/hfsplus/ioctl.c
+'i'   00-3F  linux/i2o-dev.h                                         conflict!
+'i'   0B-1F  linux/ipmi.h                                            conflict!
+'i'   80-8F  linux/i8k.h
+'j'   00-3F  linux/joystick.h
+'k'   00-0F  linux/spi/spidev.h                                      conflict!
+'k'   00-05  video/kyro.h                                            conflict!
+'k'   10-17  linux/hsi/hsi_char.h                                    HSI character device
+'l'   00-3F  linux/tcfs_fs.h                                         transparent cryptographic file system
+                                                                     <http://web.archive.org/web/%2A/http://mikonos.dia.unisa.it/tcfs>
+'l'   40-7F  linux/udf_fs_i.h                                        in development:
+                                                                     <http://sourceforge.net/projects/linux-udf/>
+'m'   00-09  linux/mmtimer.h                                         conflict!
+'m'   all    linux/mtio.h                                            conflict!
+'m'   all    linux/soundcard.h                                       conflict!
+'m'   all    linux/synclink.h                                        conflict!
+'m'   00-19  drivers/message/fusion/mptctl.h                         conflict!
+'m'   00     drivers/scsi/megaraid/megaraid_ioctl.h                  conflict!
+'n'   00-7F  linux/ncp_fs.h and fs/ncpfs/ioctl.c
+'n'   80-8F  uapi/linux/nilfs2_api.h                                 NILFS2
+'n'   E0-FF  linux/matroxfb.h                                        matroxfb
+'o'   00-1F  fs/ocfs2/ocfs2_fs.h                                     OCFS2
+'o'   00-03  mtd/ubi-user.h                                          conflict! (OCFS2 and UBI overlaps)
+'o'   40-41  mtd/ubi-user.h                                          UBI
+'o'   01-A1  `linux/dvb/*.h`                                         DVB
+'p'   00-0F  linux/phantom.h                                         conflict! (OpenHaptics needs this)
+'p'   00-1F  linux/rtc.h                                             conflict!
+'p'   00-3F  linux/mc146818rtc.h                                     conflict!
+'p'   40-7F  linux/nvram.h
+'p'   80-9F  linux/ppdev.h                                           user-space parport
+                                                                     <mailto:tim@cyberelk.net>
+'p'   A1-A5  linux/pps.h                                             LinuxPPS
+                                                                     <mailto:giometti@linux.it>
+'q'   00-1F  linux/serio.h
+'q'   80-FF  linux/telephony.h                                       Internet PhoneJACK, Internet LineJACK
+             linux/ixjuser.h                                         <http://web.archive.org/web/%2A/http://www.quicknet.net>
+'r'   00-1F  linux/msdos_fs.h and fs/fat/dir.c
+'s'   all    linux/cdk.h
+'t'   00-7F  linux/ppp-ioctl.h
+'t'   80-8F  linux/isdn_ppp.h
+'t'   90-91  linux/toshiba.h                                         toshiba and toshiba_acpi SMM
+'u'   00-1F  linux/smb_fs.h                                          gone
+'u'   20-3F  linux/uvcvideo.h                                        USB video class host driver
+'u'   40-4f  linux/udmabuf.h                                         userspace dma-buf misc device
+'v'   00-1F  linux/ext2_fs.h                                         conflict!
+'v'   00-1F  linux/fs.h                                              conflict!
+'v'   00-0F  linux/sonypi.h                                          conflict!
+'v'   00-0F  media/v4l2-subdev.h                                     conflict!
+'v'   C0-FF  linux/meye.h                                            conflict!
+'w'   all                                                            CERN SCI driver
+'y'   00-1F                                                          packet based user level communications
+                                                                     <mailto:zapman@interlan.net>
+'z'   00-3F                                                          CAN bus card conflict!
+                                                                     <mailto:hdstich@connectu.ulm.circular.de>
+'z'   40-7F                                                          CAN bus card conflict!
+                                                                     <mailto:oe@port.de>
+'z'   10-4F  drivers/s390/crypto/zcrypt_api.h                        conflict!
+'|'   00-7F  linux/media.h
+0x80  00-1F  linux/fb.h
+0x89  00-06  arch/x86/include/asm/sockios.h
+0x89  0B-DF  linux/sockios.h
+0x89  E0-EF  linux/sockios.h                                         SIOCPROTOPRIVATE range
+0x89  E0-EF  linux/dn.h                                              PROTOPRIVATE range
+0x89  F0-FF  linux/sockios.h                                         SIOCDEVPRIVATE range
+0x8B  all    linux/wireless.h
+0x8C  00-3F                                                          WiNRADiO driver
+                                                                     <http://www.winradio.com.au/>
+0x90  00     drivers/cdrom/sbpcd.h
+0x92  00-0F  drivers/usb/mon/mon_bin.c
+0x93  60-7F  linux/auto_fs.h
+0x94  all    fs/btrfs/ioctl.h                                        Btrfs filesystem
+             and linux/fs.h                                          some lifted to vfs/generic
+0x97  00-7F  fs/ceph/ioctl.h                                         Ceph file system
+0x99  00-0F                                                          537-Addinboard driver
+                                                                     <mailto:buk@buks.ipn.de>
+0xA0  all    linux/sdp/sdp.h                                         Industrial Device Project
+                                                                     <mailto:kenji@bitgate.com>
+0xA1  0      linux/vtpm_proxy.h                                      TPM Emulator Proxy Driver
+0xA3  80-8F                                                          Port ACL  in development:
+                                                                     <mailto:tlewis@mindspring.com>
+0xA3  90-9F  linux/dtlk.h
+0xA4  00-1F  uapi/linux/tee.h                                        Generic TEE subsystem
+0xAA  00-3F  linux/uapi/linux/userfaultfd.h
+0xAB  00-1F  linux/nbd.h
+0xAC  00-1F  linux/raw.h
+0xAD  00                                                             Netfilter device in development:
+                                                                     <mailto:rusty@rustcorp.com.au>
+0xAE  all    linux/kvm.h                                             Kernel-based Virtual Machine
+                                                                     <mailto:kvm@vger.kernel.org>
+0xAF  00-1F  linux/fsl_hypervisor.h                                  Freescale hypervisor
+0xB0  all                                                            RATIO devices in development:
+                                                                     <mailto:vgo@ratio.de>
+0xB1  00-1F                                                          PPPoX
+                                                                     <mailto:mostrows@styx.uwaterloo.ca>
+0xB3  00     linux/mmc/ioctl.h
+0xB4  00-0F  linux/gpio.h                                            <mailto:linux-gpio@vger.kernel.org>
+0xB5  00-0F  uapi/linux/rpmsg.h                                      <mailto:linux-remoteproc@vger.kernel.org>
+0xB6  all    linux/fpga-dfl.h
+0xC0  00-0F  linux/usb/iowarrior.h
+0xCA  00-0F  uapi/misc/cxl.h
+0xCA  10-2F  uapi/misc/ocxl.h
+0xCA  80-BF  uapi/scsi/cxlflash_ioctl.h
+0xCB  00-1F                                                          CBM serial IEC bus in development:
+                                                                     <mailto:michael.klein@puffin.lb.shuttle.de>
+0xCC  00-0F  drivers/misc/ibmvmc.h                                   pseries VMC driver
+0xCD  01     linux/reiserfs_fs.h
+0xCF  02     fs/cifs/ioctl.c
+0xDB  00-0F  drivers/char/mwave/mwavepub.h
+0xDD  00-3F                                                          ZFCP device driver see drivers/s390/scsi/
+                                                                     <mailto:aherrman@de.ibm.com>
+0xE5  00-3F  linux/fuse.h
+0xEC  00-01  drivers/platform/chrome/cros_ec_dev.h                   ChromeOS EC driver
+0xF3  00-3F  drivers/usb/misc/sisusbvga/sisusb.h                     sisfb (in development)
+                                                                     <mailto:thomas@winischhofer.net>
+0xF4  00-1F  video/mbxfb.h                                           mbxfb
+                                                                     <mailto:raph@8d.com>
+0xF6  all                                                            LTTng Linux Trace Toolkit Next Generation
+                                                                     <mailto:mathieu.desnoyers@efficios.com>
+0xFD  all    linux/dm-ioctl.h
+0xFE  all    linux/isst_if.h
+====  =====  ======================================================= ================================================================
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
deleted file mode 100644
index ab0b3f686454..000000000000
--- a/Documentation/ioctl/ioctl-number.txt
+++ /dev/null
@@ -1,351 +0,0 @@
-Ioctl Numbers
-19 October 1999
-Michael Elizabeth Chastain
-<mec@shout.net>
-
-If you are adding new ioctl's to the kernel, you should use the _IO
-macros defined in <linux/ioctl.h>:
-
-    _IO    an ioctl with no parameters
-    _IOW   an ioctl with write parameters (copy_from_user)
-    _IOR   an ioctl with read parameters  (copy_to_user)
-    _IOWR  an ioctl with both write and read parameters.
-
-'Write' and 'read' are from the user's point of view, just like the
-system calls 'write' and 'read'.  For example, a SET_FOO ioctl would
-be _IOW, although the kernel would actually read data from user space;
-a GET_FOO ioctl would be _IOR, although the kernel would actually write
-data to user space.
-
-The first argument to _IO, _IOW, _IOR, or _IOWR is an identifying letter
-or number from the table below.  Because of the large number of drivers,
-many drivers share a partial letter with other drivers.
-
-If you are writing a driver for a new device and need a letter, pick an
-unused block with enough room for expansion: 32 to 256 ioctl commands.
-You can register the block by patching this file and submitting the
-patch to Linus Torvalds.  Or you can e-mail me at <mec@shout.net> and
-I'll register one for you.
-
-The second argument to _IO, _IOW, _IOR, or _IOWR is a sequence number
-to distinguish ioctls from each other.  The third argument to _IOW,
-_IOR, or _IOWR is the type of the data going into the kernel or coming
-out of the kernel (e.g.  'int' or 'struct foo').  NOTE!  Do NOT use
-sizeof(arg) as the third argument as this results in your ioctl thinking
-it passes an argument of type size_t.
-
-Some devices use their major number as the identifier; this is OK, as
-long as it is unique.  Some devices are irregular and don't follow any
-convention at all.
-
-Following this convention is good because:
-
-(1) Keeping the ioctl's globally unique helps error checking:
-    if a program calls an ioctl on the wrong device, it will get an
-    error rather than some unexpected behaviour.
-
-(2) The 'strace' build procedure automatically finds ioctl numbers
-    defined with _IO, _IOW, _IOR, or _IOWR.
-
-(3) 'strace' can decode numbers back into useful names when the
-    numbers are unique.
-
-(4) People looking for ioctls can grep for them more easily when
-    this convention is used to define the ioctl numbers.
-
-(5) When following the convention, the driver code can use generic
-    code to copy the parameters between user and kernel space.
-
-This table lists ioctls visible from user land for Linux/x86.  It contains
-most drivers up to 2.6.31, but I know I am missing some.  There has been
-no attempt to list non-X86 architectures or ioctls from drivers/staging/.
-
-Code  Seq#(hex)	Include File		Comments
-========================================================
-0x00	00-1F	linux/fs.h		conflict!
-0x00	00-1F	scsi/scsi_ioctl.h	conflict!
-0x00	00-1F	linux/fb.h		conflict!
-0x00	00-1F	linux/wavefront.h	conflict!
-0x02	all	linux/fd.h
-0x03	all	linux/hdreg.h
-0x04	D2-DC	linux/umsdos_fs.h	Dead since 2.6.11, but don't reuse these.
-0x06	all	linux/lp.h
-0x09	all	linux/raid/md_u.h
-0x10	00-0F	drivers/char/s390/vmcp.h
-0x10	10-1F	arch/s390/include/uapi/sclp_ctl.h
-0x10	20-2F	arch/s390/include/uapi/asm/hypfs.h
-0x12	all	linux/fs.h
-		linux/blkpg.h
-0x1b	all	InfiniBand Subsystem	<http://infiniband.sourceforge.net/>
-0x20	all	drivers/cdrom/cm206.h
-0x22	all	scsi/sg.h
-'!'	00-1F	uapi/linux/seccomp.h
-'#'	00-3F	IEEE 1394 Subsystem	Block for the entire subsystem
-'$'	00-0F	linux/perf_counter.h, linux/perf_event.h
-'%'	00-0F	include/uapi/linux/stm.h
-					System Trace Module subsystem
-					<mailto:alexander.shishkin@linux.intel.com>
-'&'	00-07	drivers/firewire/nosy-user.h
-'1'	00-1F	<linux/timepps.h>	PPS kit from Ulrich Windl
-					<ftp://ftp.de.kernel.org/pub/linux/daemons/ntp/PPS/>
-'2'	01-04	linux/i2o.h
-'3'	00-0F	drivers/s390/char/raw3270.h	conflict!
-'3'	00-1F	linux/suspend_ioctls.h	conflict!
-		and kernel/power/user.c
-'8'	all				SNP8023 advanced NIC card
-					<mailto:mcr@solidum.com>
-';'	64-7F	linux/vfio.h
-'@'	00-0F	linux/radeonfb.h	conflict!
-'@'	00-0F	drivers/video/aty/aty128fb.c	conflict!
-'A'	00-1F	linux/apm_bios.h	conflict!
-'A'	00-0F	linux/agpgart.h		conflict!
-		and drivers/char/agp/compat_ioctl.h
-'A'	00-7F	sound/asound.h		conflict!
-'B'	00-1F	linux/cciss_ioctl.h	conflict!
-'B'	00-0F	include/linux/pmu.h	conflict!
-'B'	C0-FF				advanced bbus
-					<mailto:maassen@uni-freiburg.de>
-'C'	all	linux/soundcard.h	conflict!
-'C'	01-2F	linux/capi.h		conflict!
-'C'	F0-FF	drivers/net/wan/cosa.h	conflict!
-'D'	all	arch/s390/include/asm/dasd.h
-'D'	40-5F	drivers/scsi/dpt/dtpi_ioctl.h
-'D'	05	drivers/scsi/pmcraid.h
-'E'	all	linux/input.h		conflict!
-'E'	00-0F	xen/evtchn.h		conflict!
-'F'	all	linux/fb.h		conflict!
-'F'	01-02	drivers/scsi/pmcraid.h	conflict!
-'F'	20	drivers/video/fsl-diu-fb.h	conflict!
-'F'	20	drivers/video/intelfb/intelfb.h	conflict!
-'F'	20	linux/ivtvfb.h		conflict!
-'F'	20	linux/matroxfb.h	conflict!
-'F'	20	drivers/video/aty/atyfb_base.c	conflict!
-'F'	00-0F	video/da8xx-fb.h	conflict!
-'F'	80-8F	linux/arcfb.h		conflict!
-'F'	DD	video/sstfb.h		conflict!
-'G'	00-3F	drivers/misc/sgi-gru/grulib.h	conflict!
-'G'	00-0F	linux/gigaset_dev.h	conflict!
-'H'	00-7F	linux/hiddev.h		conflict!
-'H'	00-0F	linux/hidraw.h		conflict!
-'H'	01	linux/mei.h		conflict!
-'H'	02	linux/mei.h		conflict!
-'H'	03	linux/mei.h		conflict!
-'H'	00-0F	sound/asound.h		conflict!
-'H'	20-40	sound/asound_fm.h	conflict!
-'H'	80-8F	sound/sfnt_info.h	conflict!
-'H'	10-8F	sound/emu10k1.h		conflict!
-'H'	10-1F	sound/sb16_csp.h	conflict!
-'H'	10-1F	sound/hda_hwdep.h	conflict!
-'H'	40-4F	sound/hdspm.h		conflict!
-'H'	40-4F	sound/hdsp.h		conflict!
-'H'	90	sound/usb/usx2y/usb_stream.h
-'H'	A0	uapi/linux/usb/cdc-wdm.h
-'H'	C0-F0	net/bluetooth/hci.h	conflict!
-'H'	C0-DF	net/bluetooth/hidp/hidp.h	conflict!
-'H'	C0-DF	net/bluetooth/cmtp/cmtp.h	conflict!
-'H'	C0-DF	net/bluetooth/bnep/bnep.h	conflict!
-'H'	F1	linux/hid-roccat.h	<mailto:erazor_de@users.sourceforge.net>
-'H'	F8-FA	sound/firewire.h
-'I'	all	linux/isdn.h		conflict!
-'I'	00-0F	drivers/isdn/divert/isdn_divert.h	conflict!
-'I'	40-4F	linux/mISDNif.h		conflict!
-'J'	00-1F	drivers/scsi/gdth_ioctl.h
-'K'	all	linux/kd.h
-'L'	00-1F	linux/loop.h		conflict!
-'L'	10-1F	drivers/scsi/mpt3sas/mpt3sas_ctl.h	conflict!
-'L'	20-2F	linux/lightnvm.h
-'L'	E0-FF	linux/ppdd.h		encrypted disk device driver
-					<http://linux01.gwdg.de/~alatham/ppdd.html>
-'M'	all	linux/soundcard.h	conflict!
-'M'	01-16	mtd/mtd-abi.h		conflict!
-		and drivers/mtd/mtdchar.c
-'M'	01-03	drivers/scsi/megaraid/megaraid_sas.h
-'M'	00-0F	drivers/video/fsl-diu-fb.h	conflict!
-'N'	00-1F	drivers/usb/scanner.h
-'N'	40-7F	drivers/block/nvme.c
-'O'     00-06   mtd/ubi-user.h		UBI
-'P'	all	linux/soundcard.h	conflict!
-'P'	60-6F	sound/sscape_ioctl.h	conflict!
-'P'	00-0F	drivers/usb/class/usblp.c	conflict!
-'P'	01-09	drivers/misc/pci_endpoint_test.c	conflict!
-'Q'	all	linux/soundcard.h
-'R'	00-1F	linux/random.h		conflict!
-'R'	01	linux/rfkill.h		conflict!
-'R'	C0-DF	net/bluetooth/rfcomm.h
-'S'	all	linux/cdrom.h		conflict!
-'S'	80-81	scsi/scsi_ioctl.h	conflict!
-'S'	82-FF	scsi/scsi.h		conflict!
-'S'	00-7F	sound/asequencer.h	conflict!
-'T'	all	linux/soundcard.h	conflict!
-'T'	00-AF	sound/asound.h		conflict!
-'T'	all	arch/x86/include/asm/ioctls.h	conflict!
-'T'	C0-DF	linux/if_tun.h		conflict!
-'U'	all	sound/asound.h		conflict!
-'U'	00-CF	linux/uinput.h		conflict!
-'U'	00-EF	linux/usbdevice_fs.h
-'U'	C0-CF	drivers/bluetooth/hci_uart.h
-'V'	all	linux/vt.h		conflict!
-'V'	all	linux/videodev2.h	conflict!
-'V'	C0	linux/ivtvfb.h		conflict!
-'V'	C0	linux/ivtv.h		conflict!
-'V'	C0	media/davinci/vpfe_capture.h	conflict!
-'V'	C0	media/si4713.h		conflict!
-'W'	00-1F	linux/watchdog.h	conflict!
-'W'	00-1F	linux/wanrouter.h	conflict!		(pre 3.9)
-'W'	00-3F	sound/asound.h		conflict!
-'W'	40-5F   drivers/pci/switch/switchtec.c
-'X'	all	fs/xfs/xfs_fs.h		conflict!
-		and fs/xfs/linux-2.6/xfs_ioctl32.h
-		and include/linux/falloc.h
-		and linux/fs.h
-'X'	all	fs/ocfs2/ocfs_fs.h	conflict!
-'X'	01	linux/pktcdvd.h		conflict!
-'Y'	all	linux/cyclades.h
-'Z'	14-15	drivers/message/fusion/mptctl.h
-'['	00-3F	linux/usb/tmc.h		USB Test and Measurement Devices
-					<mailto:gregkh@linuxfoundation.org>
-'a'	all	linux/atm*.h, linux/sonet.h	ATM on linux
-					<http://lrcwww.epfl.ch/>
-'a'	00-0F	drivers/crypto/qat/qat_common/adf_cfg_common.h	conflict! qat driver
-'b'	00-FF				conflict! bit3 vme host bridge
-					<mailto:natalia@nikhefk.nikhef.nl>
-'c'	all	linux/cm4000_cs.h	conflict!
-'c'	00-7F	linux/comstats.h	conflict!
-'c'	00-7F	linux/coda.h		conflict!
-'c'	00-1F	linux/chio.h		conflict!
-'c'	80-9F	arch/s390/include/asm/chsc.h	conflict!
-'c'	A0-AF   arch/x86/include/asm/msr.h	conflict!
-'d'	00-FF	linux/char/drm/drm.h	conflict!
-'d'	02-40	pcmcia/ds.h		conflict!
-'d'	F0-FF	linux/digi1.h
-'e'	all	linux/digi1.h		conflict!
-'f'	00-1F	linux/ext2_fs.h		conflict!
-'f'	00-1F	linux/ext3_fs.h		conflict!
-'f'	00-0F	fs/jfs/jfs_dinode.h	conflict!
-'f'	00-0F	fs/ext4/ext4.h		conflict!
-'f'	00-0F	linux/fs.h		conflict!
-'f'	00-0F	fs/ocfs2/ocfs2_fs.h	conflict!
-'g'	00-0F	linux/usb/gadgetfs.h
-'g'	20-2F	linux/usb/g_printer.h
-'h'	00-7F				conflict! Charon filesystem
-					<mailto:zapman@interlan.net>
-'h'	00-1F	linux/hpet.h		conflict!
-'h'	80-8F	fs/hfsplus/ioctl.c
-'i'	00-3F	linux/i2o-dev.h		conflict!
-'i'	0B-1F	linux/ipmi.h		conflict!
-'i'	80-8F	linux/i8k.h
-'j'	00-3F	linux/joystick.h
-'k'	00-0F	linux/spi/spidev.h	conflict!
-'k'	00-05	video/kyro.h		conflict!
-'k'	10-17	linux/hsi/hsi_char.h	HSI character device
-'l'	00-3F	linux/tcfs_fs.h		transparent cryptographic file system
-					<http://web.archive.org/web/*/http://mikonos.dia.unisa.it/tcfs>
-'l'	40-7F	linux/udf_fs_i.h	in development:
-					<http://sourceforge.net/projects/linux-udf/>
-'m'	00-09	linux/mmtimer.h		conflict!
-'m'	all	linux/mtio.h		conflict!
-'m'	all	linux/soundcard.h	conflict!
-'m'	all	linux/synclink.h	conflict!
-'m'	00-19	drivers/message/fusion/mptctl.h	conflict!
-'m'	00	drivers/scsi/megaraid/megaraid_ioctl.h	conflict!
-'n'	00-7F	linux/ncp_fs.h and fs/ncpfs/ioctl.c
-'n'	80-8F	uapi/linux/nilfs2_api.h	NILFS2
-'n'	E0-FF	linux/matroxfb.h	matroxfb
-'o'	00-1F	fs/ocfs2/ocfs2_fs.h	OCFS2
-'o'     00-03   mtd/ubi-user.h		conflict! (OCFS2 and UBI overlaps)
-'o'     40-41   mtd/ubi-user.h		UBI
-'o'     01-A1   linux/dvb/*.h		DVB
-'p'	00-0F	linux/phantom.h		conflict! (OpenHaptics needs this)
-'p'	00-1F	linux/rtc.h		conflict!
-'p'	00-3F	linux/mc146818rtc.h	conflict!
-'p'	40-7F	linux/nvram.h
-'p'	80-9F	linux/ppdev.h		user-space parport
-					<mailto:tim@cyberelk.net>
-'p'	A1-A5	linux/pps.h		LinuxPPS
-					<mailto:giometti@linux.it>
-'q'	00-1F	linux/serio.h
-'q'	80-FF	linux/telephony.h	Internet PhoneJACK, Internet LineJACK
-		linux/ixjuser.h		<http://web.archive.org/web/*/http://www.quicknet.net>
-'r'	00-1F	linux/msdos_fs.h and fs/fat/dir.c
-'s'	all	linux/cdk.h
-'t'	00-7F	linux/ppp-ioctl.h
-'t'	80-8F	linux/isdn_ppp.h
-'t'	90-91	linux/toshiba.h		toshiba and toshiba_acpi SMM
-'u'	00-1F	linux/smb_fs.h		gone
-'u'	20-3F	linux/uvcvideo.h	USB video class host driver
-'u'	40-4f	linux/udmabuf.h		userspace dma-buf misc device
-'v'	00-1F	linux/ext2_fs.h		conflict!
-'v'	00-1F	linux/fs.h		conflict!
-'v'	00-0F	linux/sonypi.h		conflict!
-'v'	00-0F	media/v4l2-subdev.h	conflict!
-'v'	C0-FF	linux/meye.h		conflict!
-'w'	all				CERN SCI driver
-'y'	00-1F				packet based user level communications
-					<mailto:zapman@interlan.net>
-'z'	00-3F				CAN bus card	conflict!
-					<mailto:hdstich@connectu.ulm.circular.de>
-'z'	40-7F				CAN bus card	conflict!
-					<mailto:oe@port.de>
-'z'	10-4F	drivers/s390/crypto/zcrypt_api.h	conflict!
-'|'	00-7F	linux/media.h
-0x80	00-1F	linux/fb.h
-0x89	00-06	arch/x86/include/asm/sockios.h
-0x89	0B-DF	linux/sockios.h
-0x89	E0-EF	linux/sockios.h		SIOCPROTOPRIVATE range
-0x89	E0-EF	linux/dn.h		PROTOPRIVATE range
-0x89	F0-FF	linux/sockios.h		SIOCDEVPRIVATE range
-0x8B	all	linux/wireless.h
-0x8C	00-3F				WiNRADiO driver
-					<http://www.winradio.com.au/>
-0x90	00	drivers/cdrom/sbpcd.h
-0x92	00-0F	drivers/usb/mon/mon_bin.c
-0x93	60-7F	linux/auto_fs.h
-0x94	all	fs/btrfs/ioctl.h	Btrfs filesystem
-		and linux/fs.h		some lifted to vfs/generic
-0x97	00-7F	fs/ceph/ioctl.h		Ceph file system
-0x99	00-0F				537-Addinboard driver
-					<mailto:buk@buks.ipn.de>
-0xA0	all	linux/sdp/sdp.h		Industrial Device Project
-					<mailto:kenji@bitgate.com>
-0xA1	0	linux/vtpm_proxy.h	TPM Emulator Proxy Driver
-0xA3	80-8F	Port ACL		in development:
-					<mailto:tlewis@mindspring.com>
-0xA3	90-9F	linux/dtlk.h
-0xA4	00-1F	uapi/linux/tee.h	Generic TEE subsystem
-0xAA	00-3F	linux/uapi/linux/userfaultfd.h
-0xAB	00-1F	linux/nbd.h
-0xAC	00-1F	linux/raw.h
-0xAD	00	Netfilter device	in development:
-					<mailto:rusty@rustcorp.com.au>
-0xAE	all	linux/kvm.h		Kernel-based Virtual Machine
-					<mailto:kvm@vger.kernel.org>
-0xAF	00-1F	linux/fsl_hypervisor.h	Freescale hypervisor
-0xB0	all	RATIO devices		in development:
-					<mailto:vgo@ratio.de>
-0xB1	00-1F	PPPoX			<mailto:mostrows@styx.uwaterloo.ca>
-0xB3	00	linux/mmc/ioctl.h
-0xB4	00-0F	linux/gpio.h		<mailto:linux-gpio@vger.kernel.org>
-0xB5	00-0F	uapi/linux/rpmsg.h	<mailto:linux-remoteproc@vger.kernel.org>
-0xB6	all	linux/fpga-dfl.h
-0xC0	00-0F	linux/usb/iowarrior.h
-0xCA	00-0F	uapi/misc/cxl.h
-0xCA	10-2F	uapi/misc/ocxl.h
-0xCA	80-BF	uapi/scsi/cxlflash_ioctl.h
-0xCB	00-1F	CBM serial IEC bus	in development:
-					<mailto:michael.klein@puffin.lb.shuttle.de>
-0xCC	00-0F	drivers/misc/ibmvmc.h    pseries VMC driver
-0xCD	01	linux/reiserfs_fs.h
-0xCF	02	fs/cifs/ioctl.c
-0xDB	00-0F	drivers/char/mwave/mwavepub.h
-0xDD	00-3F	ZFCP device driver	see drivers/s390/scsi/
-					<mailto:aherrman@de.ibm.com>
-0xE5	00-3F	linux/fuse.h
-0xEC	00-01	drivers/platform/chrome/cros_ec_dev.h	ChromeOS EC driver
-0xF3	00-3F	drivers/usb/misc/sisusbvga/sisusb.h	sisfb (in development)
-					<mailto:thomas@winischhofer.net>
-0xF4	00-1F	video/mbxfb.h		mbxfb
-					<mailto:raph@8d.com>
-0xF6	all	LTTng			Linux Trace Toolkit Next Generation
-					<mailto:mathieu.desnoyers@efficios.com>
-0xFD	all	linux/dm-ioctl.h
-0xFE	all	linux/isst_if.h
diff --git a/Documentation/process/submit-checklist.rst b/Documentation/process/submit-checklist.rst
index 365efc9e4aa8..8e56337d422d 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
     and why.
 
 26) If any ioctl's are added by the patch, then also update
-    ``Documentation/ioctl/ioctl-number.txt``.
+    ``Documentation/ioctl/ioctl-number.rst``.
 
 27) If your modified source code depends on or uses any of the kernel
     APIs or features that are related to the following ``Kconfig`` symbols,
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst b/Documentation/translations/it_IT/process/submit-checklist.rst
index ea74cae958d7..995ee69fab11 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
     sorgenti che ne spieghi la logica: cosa fanno e perché.
 
 25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
-    ``Documentation/ioctl/ioctl-number.txt``.
+    ``Documentation/ioctl/ioctl-number.rst``.
 
 26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o
     funzionalità del kernel che è associata a uno dei seguenti simboli
diff --git a/Documentation/translations/zh_CN/process/submit-checklist.rst b/Documentation/translations/zh_CN/process/submit-checklist.rst
index f4785d2b0491..8738c55e42a2 100644
--- a/Documentation/translations/zh_CN/process/submit-checklist.rst
+++ b/Documentation/translations/zh_CN/process/submit-checklist.rst
@@ -97,7 +97,7 @@ Linux内核补丁提交清单
 24) 所有内存屏障例如 ``barrier()``, ``rmb()``, ``wmb()`` 都需要源代码中的注
     释来解释它们正在执行的操作及其原因的逻辑。
 
-25) 如果补丁添加了任何ioctl，那么也要更新 ``Documentation/ioctl/ioctl-number.txt``
+25) 如果补丁添加了任何ioctl，那么也要更新 ``Documentation/ioctl/ioctl-number.rst``
 
 26) 如果修改后的源代码依赖或使用与以下 ``Kconfig`` 符号相关的任何内核API或
     功能，则在禁用相关 ``Kconfig`` 符号和/或 ``=m`` （如果该选项可用）的情况
diff --git a/include/uapi/rdma/rdma_user_ioctl_cmds.h b/include/uapi/rdma/rdma_user_ioctl_cmds.h
index 26213f49f5c8..54e16a589472 100644
--- a/include/uapi/rdma/rdma_user_ioctl_cmds.h
+++ b/include/uapi/rdma/rdma_user_ioctl_cmds.h
@@ -36,7 +36,7 @@
 #include <linux/types.h>
 #include <linux/ioctl.h>
 
-/* Documentation/ioctl/ioctl-number.txt */
+/* Documentation/ioctl/ioctl-number.rst */
 #define RDMA_IOCTL_MAGIC	0x1b
 #define RDMA_VERBS_IOCTL \
 	_IOWR(RDMA_IOCTL_MAGIC, 1, struct ib_uverbs_ioctl_hdr)
-- 
cgit v1.2.3-55-g7522


From 5c04dceaa152d9dd9fe94dec6594965069e19e9e Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 11:38:26 -0300
Subject: docs: ioctl: convert to ReST

Rename the iio documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

The cdrom.txt and hdio.txt have their own particular syntax.
In order to speedup the conversion, I used a small ancillary
perl script:

	my $d;
	$d .= $_ while(<>);
	$d =~ s/(\nCDROM\S+)\s+(\w[^\n]*)/$1\n\t$2\n/g;
	$d =~ s/(\nHDIO\S+)\s+(\w[^\n]*)/$1\n\t$2\n/g;
	$d =~ s/(\n\s*usage:)[\s\n]*(\w[^\n]*)/$1:\n\n\t  $2\n/g;
	$d =~ s/(\n\s*)(E\w+[\s\n]*\w[^\n]*)/$1- $2/g;
	$d =~ s/(\n\s*)(inputs|outputs|notes):\s*(\w[^\n]*)/$1$2:\n\t\t$3\n/g;
	print $d;

It basically add blank lines on a few interesting places. The
script is not perfect: still several things require manual work,
but it saved quite some time doing some obvious stuff.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/ioctl/botching-up-ioctls.rst |  225 +++++
 Documentation/ioctl/botching-up-ioctls.txt |  224 -----
 Documentation/ioctl/cdrom.rst              | 1233 +++++++++++++++++++++++++
 Documentation/ioctl/cdrom.txt              |  967 --------------------
 Documentation/ioctl/hdio.rst               | 1342 ++++++++++++++++++++++++++++
 Documentation/ioctl/hdio.txt               | 1071 ----------------------
 Documentation/ioctl/index.rst              |   16 +
 Documentation/ioctl/ioctl-decoding.rst     |   31 +
 Documentation/ioctl/ioctl-decoding.txt     |   24 -
 drivers/gpu/drm/drm_ioctl.c                |    2 +-
 10 files changed, 2848 insertions(+), 2287 deletions(-)
 create mode 100644 Documentation/ioctl/botching-up-ioctls.rst
 delete mode 100644 Documentation/ioctl/botching-up-ioctls.txt
 create mode 100644 Documentation/ioctl/cdrom.rst
 delete mode 100644 Documentation/ioctl/cdrom.txt
 create mode 100644 Documentation/ioctl/hdio.rst
 delete mode 100644 Documentation/ioctl/hdio.txt
 create mode 100644 Documentation/ioctl/index.rst
 create mode 100644 Documentation/ioctl/ioctl-decoding.rst
 delete mode 100644 Documentation/ioctl/ioctl-decoding.txt

diff --git a/Documentation/ioctl/botching-up-ioctls.rst b/Documentation/ioctl/botching-up-ioctls.rst
new file mode 100644
index 000000000000..ac697fef3545
--- /dev/null
+++ b/Documentation/ioctl/botching-up-ioctls.rst
@@ -0,0 +1,225 @@
+=================================
+(How to avoid) Botching up ioctls
+=================================
+
+From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
+
+By: Daniel Vetter, Copyright © 2013 Intel Corporation
+
+One clear insight kernel graphics hackers gained in the past few years is that
+trying to come up with a unified interface to manage the execution units and
+memory on completely different GPUs is a futile effort. So nowadays every
+driver has its own set of ioctls to allocate memory and submit work to the GPU.
+Which is nice, since there's no more insanity in the form of fake-generic, but
+actually only used once interfaces. But the clear downside is that there's much
+more potential to screw things up.
+
+To avoid repeating all the same mistakes again I've written up some of the
+lessons learned while botching the job for the drm/i915 driver. Most of these
+only cover technicalities and not the big-picture issues like what the command
+submission ioctl exactly should look like. Learning these lessons is probably
+something every GPU driver has to do on its own.
+
+
+Prerequisites
+-------------
+
+First the prerequisites. Without these you have already failed, because you
+will need to add a 32-bit compat layer:
+
+ * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
+   the kernel has special types like __u32, __s64. Use them.
+
+ * Align everything to the natural size and use explicit padding. 32-bit
+   platforms don't necessarily align 64-bit values to 64-bit boundaries, but
+   64-bit platforms do. So we always need padding to the natural size to get
+   this right.
+
+ * Pad the entire struct to a multiple of 64-bits if the structure contains
+   64-bit types - the structure size will otherwise differ on 32-bit versus
+   64-bit. Having a different structure size hurts when passing arrays of
+   structures to the kernel, or if the kernel checks the structure size, which
+   e.g. the drm core does.
+
+ * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
+   from/to a void __user * in the kernel. Try really hard not to delay this
+   conversion or worse, fiddle the raw __u64 through your code since that
+   diminishes the checking tools like sparse can provide. The macro
+   u64_to_user_ptr can be used in the kernel to avoid warnings about integers
+   and pointres of different sizes.
+
+
+Basics
+------
+
+With the joys of writing a compat layer avoided we can take a look at the basic
+fumbles. Neglecting these will make backward and forward compatibility a real
+pain. And since getting things wrong on the first attempt is guaranteed you
+will have a second iteration or at least an extension for any given interface.
+
+ * Have a clear way for userspace to figure out whether your new ioctl or ioctl
+   extension is supported on a given kernel. If you can't rely on old kernels
+   rejecting the new flags/modes or ioctls (since doing that was botched in the
+   past) then you need a driver feature flag or revision number somewhere.
+
+ * Have a plan for extending ioctls with new flags or new fields at the end of
+   the structure. The drm core checks the passed-in size for each ioctl call
+   and zero-extends any mismatches between kernel and userspace. That helps,
+   but isn't a complete solution since newer userspace on older kernels won't
+   notice that the newly added fields at the end get ignored. So this still
+   needs a new driver feature flags.
+
+ * Check all unused fields and flags and all the padding for whether it's 0,
+   and reject the ioctl if that's not the case. Otherwise your nice plan for
+   future extensions is going right down the gutters since someone will submit
+   an ioctl struct with random stack garbage in the yet unused parts. Which
+   then bakes in the ABI that those fields can never be used for anything else
+   but garbage. This is also the reason why you must explicitly pad all
+   structures, even if you never use them in an array - the padding the compiler
+   might insert could contain garbage.
+
+ * Have simple testcases for all of the above.
+
+
+Fun with Error Paths
+--------------------
+
+Nowadays we don't have any excuse left any more for drm drivers being neat
+little root exploits. This means we both need full input validation and solid
+error handling paths - GPUs will die eventually in the oddmost corner cases
+anyway:
+
+ * The ioctl must check for array overflows. Also it needs to check for
+   over/underflows and clamping issues of integer values in general. The usual
+   example is sprite positioning values fed directly into the hardware with the
+   hardware just having 12 bits or so. Works nicely until some odd display
+   server doesn't bother with clamping itself and the cursor wraps around the
+   screen.
+
+ * Have simple testcases for every input validation failure case in your ioctl.
+   Check that the error code matches your expectations. And finally make sure
+   that you only test for one single error path in each subtest by submitting
+   otherwise perfectly valid data. Without this an earlier check might reject
+   the ioctl already and shadow the codepath you actually want to test, hiding
+   bugs and regressions.
+
+ * Make all your ioctls restartable. First X really loves signals and second
+   this will allow you to test 90% of all error handling paths by just
+   interrupting your main test suite constantly with signals. Thanks to X's
+   love for signal you'll get an excellent base coverage of all your error
+   paths pretty much for free for graphics drivers. Also, be consistent with
+   how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
+   userspace library. The i915 driver botched this with the set_tiling ioctl,
+   now we're stuck forever with some arcane semantics in both the kernel and
+   userspace.
+
+ * If you can't make a given codepath restartable make a stuck task at least
+   killable. GPUs just die and your users won't like you more if you hang their
+   entire box (by means of an unkillable X process). If the state recovery is
+   still too tricky have a timeout or hangcheck safety net as a last-ditch
+   effort in case the hardware has gone bananas.
+
+ * Have testcases for the really tricky corner cases in your error recovery code
+   - it's way too easy to create a deadlock between your hangcheck code and
+   waiters.
+
+
+Time, Waiting and Missing it
+----------------------------
+
+GPUs do most everything asynchronously, so we have a need to time operations and
+wait for outstanding ones. This is really tricky business; at the moment none of
+the ioctls supported by the drm/i915 get this fully right, which means there's
+still tons more lessons to learn here.
+
+ * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
+   v4l use by default nowadays. But let userspace know which timestamps are
+   derived from different clock domains like your main system clock (provided
+   by the kernel) or some independent hardware counter somewhere else. Clocks
+   will mismatch if you look close enough, but if performance measuring tools
+   have this information they can at least compensate. If your userspace can
+   get at the raw values of some clocks (e.g. through in-command-stream
+   performance counter sampling instructions) consider exposing those also.
+
+ * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
+   convenient time specification, but it's mostly the standard.
+
+ * Check that input time values are normalized and reject them if not. Note
+   that the kernel native struct ktime has a signed integer for both seconds
+   and nanoseconds, so beware here.
+
+ * For timeouts, use absolute times. If you're a good fellow and made your
+   ioctl restartable relative timeouts tend to be too coarse and can
+   indefinitely extend your wait time due to rounding on each restart.
+   Especially if your reference clock is something really slow like the display
+   frame counter. With a spec lawyer hat on this isn't a bug since timeouts can
+   always be extended - but users will surely hate you if their neat animations
+   starts to stutter due to this.
+
+ * Consider ditching any synchronous wait ioctls with timeouts and just deliver
+   an asynchronous event on a pollable file descriptor. It fits much better
+   into event driven applications' main loop.
+
+ * Have testcases for corner-cases, especially whether the return values for
+   already-completed events, successful waits and timed-out waits are all sane
+   and suiting to your needs.
+
+
+Leaking Resources, Not
+----------------------
+
+A full-blown drm driver essentially implements a little OS, but specialized to
+the given GPU platforms. This means a driver needs to expose tons of handles
+for different objects and other resources to userspace. Doing that right
+entails its own little set of pitfalls:
+
+ * Always attach the lifetime of your dynamically created resources to the
+   lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
+   needs to be shared across processes -  fd-passing over unix domain sockets
+   also simplifies lifetime management for userspace.
+
+ * Always have O_CLOEXEC support.
+
+ * Ensure that you have sufficient insulation between different clients. By
+   default pick a private per-fd namespace which forces any sharing to be done
+   explicitly. Only go with a more global per-device namespace if the objects
+   are truly device-unique. One counterexample in the drm modeset interfaces is
+   that the per-device modeset objects like connectors share a namespace with
+   framebuffer objects, which mostly are not shared at all. A separate
+   namespace, private by default, for framebuffers would have been more
+   suitable.
+
+ * Think about uniqueness requirements for userspace handles. E.g. for most drm
+   drivers it's a userspace bug to submit the same object twice in the same
+   command submission ioctl. But then if objects are shareable userspace needs
+   to know whether it has seen an imported object from a different process
+   already or not. I haven't tried this myself yet due to lack of a new class
+   of objects, but consider using inode numbers on your shared file descriptors
+   as unique identifiers - it's how real files are told apart, too.
+   Unfortunately this requires a full-blown virtual filesystem in the kernel.
+
+
+Last, but not Least
+-------------------
+
+Not every problem needs a new ioctl:
+
+ * Think hard whether you really want a driver-private interface. Of course
+   it's much quicker to push a driver-private interface than engaging in
+   lengthy discussions for a more generic solution. And occasionally doing a
+   private interface to spearhead a new concept is what's required. But in the
+   end, once the generic interface comes around you'll end up maintainer two
+   interfaces. Indefinitely.
+
+ * Consider other interfaces than ioctls. A sysfs attribute is much better for
+   per-device settings, or for child objects with fairly static lifetimes (like
+   output connectors in drm with all the detection override attributes). Or
+   maybe only your testsuite needs this interface, and then debugfs with its
+   disclaimer of not having a stable ABI would be better.
+
+Finally, the name of the game is to get it right on the first attempt, since if
+your driver proves popular and your hardware platforms long-lived then you'll
+be stuck with a given ioctl essentially forever. You can try to deprecate
+horrible ioctls on newer iterations of your hardware, but generally it takes
+years to accomplish this. And then again years until the last user able to
+complain about regressions disappears, too.
diff --git a/Documentation/ioctl/botching-up-ioctls.txt b/Documentation/ioctl/botching-up-ioctls.txt
deleted file mode 100644
index 883fb034bd04..000000000000
--- a/Documentation/ioctl/botching-up-ioctls.txt
+++ /dev/null
@@ -1,224 +0,0 @@
-(How to avoid) Botching up ioctls
-=================================
-
-From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
-
-By: Daniel Vetter, Copyright © 2013 Intel Corporation
-
-One clear insight kernel graphics hackers gained in the past few years is that
-trying to come up with a unified interface to manage the execution units and
-memory on completely different GPUs is a futile effort. So nowadays every
-driver has its own set of ioctls to allocate memory and submit work to the GPU.
-Which is nice, since there's no more insanity in the form of fake-generic, but
-actually only used once interfaces. But the clear downside is that there's much
-more potential to screw things up.
-
-To avoid repeating all the same mistakes again I've written up some of the
-lessons learned while botching the job for the drm/i915 driver. Most of these
-only cover technicalities and not the big-picture issues like what the command
-submission ioctl exactly should look like. Learning these lessons is probably
-something every GPU driver has to do on its own.
-
-
-Prerequisites
--------------
-
-First the prerequisites. Without these you have already failed, because you
-will need to add a 32-bit compat layer:
-
- * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
-   the kernel has special types like __u32, __s64. Use them.
-
- * Align everything to the natural size and use explicit padding. 32-bit
-   platforms don't necessarily align 64-bit values to 64-bit boundaries, but
-   64-bit platforms do. So we always need padding to the natural size to get
-   this right.
-
- * Pad the entire struct to a multiple of 64-bits if the structure contains
-   64-bit types - the structure size will otherwise differ on 32-bit versus
-   64-bit. Having a different structure size hurts when passing arrays of
-   structures to the kernel, or if the kernel checks the structure size, which
-   e.g. the drm core does.
-
- * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
-   from/to a void __user * in the kernel. Try really hard not to delay this
-   conversion or worse, fiddle the raw __u64 through your code since that
-   diminishes the checking tools like sparse can provide. The macro
-   u64_to_user_ptr can be used in the kernel to avoid warnings about integers
-   and pointres of different sizes.
-
-
-Basics
-------
-
-With the joys of writing a compat layer avoided we can take a look at the basic
-fumbles. Neglecting these will make backward and forward compatibility a real
-pain. And since getting things wrong on the first attempt is guaranteed you
-will have a second iteration or at least an extension for any given interface.
-
- * Have a clear way for userspace to figure out whether your new ioctl or ioctl
-   extension is supported on a given kernel. If you can't rely on old kernels
-   rejecting the new flags/modes or ioctls (since doing that was botched in the
-   past) then you need a driver feature flag or revision number somewhere.
-
- * Have a plan for extending ioctls with new flags or new fields at the end of
-   the structure. The drm core checks the passed-in size for each ioctl call
-   and zero-extends any mismatches between kernel and userspace. That helps,
-   but isn't a complete solution since newer userspace on older kernels won't
-   notice that the newly added fields at the end get ignored. So this still
-   needs a new driver feature flags.
-
- * Check all unused fields and flags and all the padding for whether it's 0,
-   and reject the ioctl if that's not the case. Otherwise your nice plan for
-   future extensions is going right down the gutters since someone will submit
-   an ioctl struct with random stack garbage in the yet unused parts. Which
-   then bakes in the ABI that those fields can never be used for anything else
-   but garbage. This is also the reason why you must explicitly pad all
-   structures, even if you never use them in an array - the padding the compiler
-   might insert could contain garbage.
-
- * Have simple testcases for all of the above.
-
-
-Fun with Error Paths
---------------------
-
-Nowadays we don't have any excuse left any more for drm drivers being neat
-little root exploits. This means we both need full input validation and solid
-error handling paths - GPUs will die eventually in the oddmost corner cases
-anyway:
-
- * The ioctl must check for array overflows. Also it needs to check for
-   over/underflows and clamping issues of integer values in general. The usual
-   example is sprite positioning values fed directly into the hardware with the
-   hardware just having 12 bits or so. Works nicely until some odd display
-   server doesn't bother with clamping itself and the cursor wraps around the
-   screen.
-
- * Have simple testcases for every input validation failure case in your ioctl.
-   Check that the error code matches your expectations. And finally make sure
-   that you only test for one single error path in each subtest by submitting
-   otherwise perfectly valid data. Without this an earlier check might reject
-   the ioctl already and shadow the codepath you actually want to test, hiding
-   bugs and regressions.
-
- * Make all your ioctls restartable. First X really loves signals and second
-   this will allow you to test 90% of all error handling paths by just
-   interrupting your main test suite constantly with signals. Thanks to X's
-   love for signal you'll get an excellent base coverage of all your error
-   paths pretty much for free for graphics drivers. Also, be consistent with
-   how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
-   userspace library. The i915 driver botched this with the set_tiling ioctl,
-   now we're stuck forever with some arcane semantics in both the kernel and
-   userspace.
-
- * If you can't make a given codepath restartable make a stuck task at least
-   killable. GPUs just die and your users won't like you more if you hang their
-   entire box (by means of an unkillable X process). If the state recovery is
-   still too tricky have a timeout or hangcheck safety net as a last-ditch
-   effort in case the hardware has gone bananas.
-
- * Have testcases for the really tricky corner cases in your error recovery code
-   - it's way too easy to create a deadlock between your hangcheck code and
-   waiters.
-
-
-Time, Waiting and Missing it
-----------------------------
-
-GPUs do most everything asynchronously, so we have a need to time operations and
-wait for outstanding ones. This is really tricky business; at the moment none of
-the ioctls supported by the drm/i915 get this fully right, which means there's
-still tons more lessons to learn here.
-
- * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
-   v4l use by default nowadays. But let userspace know which timestamps are
-   derived from different clock domains like your main system clock (provided
-   by the kernel) or some independent hardware counter somewhere else. Clocks
-   will mismatch if you look close enough, but if performance measuring tools
-   have this information they can at least compensate. If your userspace can
-   get at the raw values of some clocks (e.g. through in-command-stream
-   performance counter sampling instructions) consider exposing those also.
-
- * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
-   convenient time specification, but it's mostly the standard.
-
- * Check that input time values are normalized and reject them if not. Note
-   that the kernel native struct ktime has a signed integer for both seconds
-   and nanoseconds, so beware here.
-
- * For timeouts, use absolute times. If you're a good fellow and made your
-   ioctl restartable relative timeouts tend to be too coarse and can
-   indefinitely extend your wait time due to rounding on each restart.
-   Especially if your reference clock is something really slow like the display
-   frame counter. With a spec lawyer hat on this isn't a bug since timeouts can
-   always be extended - but users will surely hate you if their neat animations
-   starts to stutter due to this.
-
- * Consider ditching any synchronous wait ioctls with timeouts and just deliver
-   an asynchronous event on a pollable file descriptor. It fits much better
-   into event driven applications' main loop.
-
- * Have testcases for corner-cases, especially whether the return values for
-   already-completed events, successful waits and timed-out waits are all sane
-   and suiting to your needs.
-
-
-Leaking Resources, Not
-----------------------
-
-A full-blown drm driver essentially implements a little OS, but specialized to
-the given GPU platforms. This means a driver needs to expose tons of handles
-for different objects and other resources to userspace. Doing that right
-entails its own little set of pitfalls:
-
- * Always attach the lifetime of your dynamically created resources to the
-   lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
-   needs to be shared across processes -  fd-passing over unix domain sockets
-   also simplifies lifetime management for userspace.
-
- * Always have O_CLOEXEC support.
-
- * Ensure that you have sufficient insulation between different clients. By
-   default pick a private per-fd namespace which forces any sharing to be done
-   explicitly. Only go with a more global per-device namespace if the objects
-   are truly device-unique. One counterexample in the drm modeset interfaces is
-   that the per-device modeset objects like connectors share a namespace with
-   framebuffer objects, which mostly are not shared at all. A separate
-   namespace, private by default, for framebuffers would have been more
-   suitable.
-
- * Think about uniqueness requirements for userspace handles. E.g. for most drm
-   drivers it's a userspace bug to submit the same object twice in the same
-   command submission ioctl. But then if objects are shareable userspace needs
-   to know whether it has seen an imported object from a different process
-   already or not. I haven't tried this myself yet due to lack of a new class
-   of objects, but consider using inode numbers on your shared file descriptors
-   as unique identifiers - it's how real files are told apart, too.
-   Unfortunately this requires a full-blown virtual filesystem in the kernel.
-
-
-Last, but not Least
--------------------
-
-Not every problem needs a new ioctl:
-
- * Think hard whether you really want a driver-private interface. Of course
-   it's much quicker to push a driver-private interface than engaging in
-   lengthy discussions for a more generic solution. And occasionally doing a
-   private interface to spearhead a new concept is what's required. But in the
-   end, once the generic interface comes around you'll end up maintainer two
-   interfaces. Indefinitely.
-
- * Consider other interfaces than ioctls. A sysfs attribute is much better for
-   per-device settings, or for child objects with fairly static lifetimes (like
-   output connectors in drm with all the detection override attributes). Or
-   maybe only your testsuite needs this interface, and then debugfs with its
-   disclaimer of not having a stable ABI would be better.
-
-Finally, the name of the game is to get it right on the first attempt, since if
-your driver proves popular and your hardware platforms long-lived then you'll
-be stuck with a given ioctl essentially forever. You can try to deprecate
-horrible ioctls on newer iterations of your hardware, but generally it takes
-years to accomplish this. And then again years until the last user able to
-complain about regressions disappears, too.
diff --git a/Documentation/ioctl/cdrom.rst b/Documentation/ioctl/cdrom.rst
new file mode 100644
index 000000000000..3b4c0506de46
--- /dev/null
+++ b/Documentation/ioctl/cdrom.rst
@@ -0,0 +1,1233 @@
+============================
+Summary of CDROM ioctl calls
+============================
+
+- Edward A. Falk <efalk@google.com>
+
+November, 2004
+
+This document attempts to describe the ioctl(2) calls supported by
+the CDROM layer.  These are by-and-large implemented (as of Linux 2.6)
+in drivers/cdrom/cdrom.c and drivers/block/scsi_ioctl.c
+
+ioctl values are listed in <linux/cdrom.h>.  As of this writing, they
+are as follows:
+
+	======================	===============================================
+	CDROMPAUSE		Pause Audio Operation
+	CDROMRESUME		Resume paused Audio Operation
+	CDROMPLAYMSF		Play Audio MSF (struct cdrom_msf)
+	CDROMPLAYTRKIND		Play Audio Track/index (struct cdrom_ti)
+	CDROMREADTOCHDR		Read TOC header (struct cdrom_tochdr)
+	CDROMREADTOCENTRY	Read TOC entry (struct cdrom_tocentry)
+	CDROMSTOP		Stop the cdrom drive
+	CDROMSTART		Start the cdrom drive
+	CDROMEJECT		Ejects the cdrom media
+	CDROMVOLCTRL		Control output volume (struct cdrom_volctrl)
+	CDROMSUBCHNL		Read subchannel data (struct cdrom_subchnl)
+	CDROMREADMODE2		Read CDROM mode 2 data (2336 Bytes)
+				(struct cdrom_read)
+	CDROMREADMODE1		Read CDROM mode 1 data (2048 Bytes)
+				(struct cdrom_read)
+	CDROMREADAUDIO		(struct cdrom_read_audio)
+	CDROMEJECT_SW		enable(1)/disable(0) auto-ejecting
+	CDROMMULTISESSION	Obtain the start-of-last-session
+				address of multi session disks
+				(struct cdrom_multisession)
+	CDROM_GET_MCN		Obtain the "Universal Product Code"
+				if available (struct cdrom_mcn)
+	CDROM_GET_UPC		Deprecated, use CDROM_GET_MCN instead.
+	CDROMRESET		hard-reset the drive
+	CDROMVOLREAD		Get the drive's volume setting
+				(struct cdrom_volctrl)
+	CDROMREADRAW		read data in raw mode (2352 Bytes)
+				(struct cdrom_read)
+	CDROMREADCOOKED		read data in cooked mode
+	CDROMSEEK		seek msf address
+	CDROMPLAYBLK		scsi-cd only, (struct cdrom_blk)
+	CDROMREADALL		read all 2646 bytes
+	CDROMGETSPINDOWN	return 4-bit spindown value
+	CDROMSETSPINDOWN	set 4-bit spindown value
+	CDROMCLOSETRAY		pendant of CDROMEJECT
+	CDROM_SET_OPTIONS	Set behavior options
+	CDROM_CLEAR_OPTIONS	Clear behavior options
+	CDROM_SELECT_SPEED	Set the CD-ROM speed
+	CDROM_SELECT_DISC	Select disc (for juke-boxes)
+	CDROM_MEDIA_CHANGED	Check is media changed
+	CDROM_DRIVE_STATUS	Get tray position, etc.
+	CDROM_DISC_STATUS	Get disc type, etc.
+	CDROM_CHANGER_NSLOTS	Get number of slots
+	CDROM_LOCKDOOR		lock or unlock door
+	CDROM_DEBUG		Turn debug messages on/off
+	CDROM_GET_CAPABILITY	get capabilities
+	CDROMAUDIOBUFSIZ	set the audio buffer size
+	DVD_READ_STRUCT		Read structure
+	DVD_WRITE_STRUCT	Write structure
+	DVD_AUTH		Authentication
+	CDROM_SEND_PACKET	send a packet to the drive
+	CDROM_NEXT_WRITABLE	get next writable block
+	CDROM_LAST_WRITTEN	get last block written on disc
+	======================	===============================================
+
+
+The information that follows was determined from reading kernel source
+code.  It is likely that some corrections will be made over time.
+
+------------------------------------------------------------------------------
+
+General:
+
+	Unless otherwise specified, all ioctl calls return 0 on success
+	and -1 with errno set to an appropriate value on error.  (Some
+	ioctls return non-negative data values.)
+
+	Unless otherwise specified, all ioctl calls return -1 and set
+	errno to EFAULT on a failed attempt to copy data to or from user
+	address space.
+
+	Individual drivers may return error codes not listed here.
+
+	Unless otherwise specified, all data structures and constants
+	are defined in <linux/cdrom.h>
+
+------------------------------------------------------------------------------
+
+
+CDROMPAUSE
+	Pause Audio Operation
+
+
+	usage::
+
+	  ioctl(fd, CDROMPAUSE, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+
+CDROMRESUME
+	Resume paused Audio Operation
+
+
+	usage::
+
+	  ioctl(fd, CDROMRESUME, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+
+CDROMPLAYMSF
+	Play Audio MSF
+
+	(struct cdrom_msf)
+
+
+	usage::
+
+	  struct cdrom_msf msf;
+
+	  ioctl(fd, CDROMPLAYMSF, &msf);
+
+	inputs:
+		cdrom_msf structure, describing a segment of music to play
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+	notes:
+		- MSF stands for minutes-seconds-frames
+		- LBA stands for logical block address
+		- Segment is described as start and end times, where each time
+		  is described as minutes:seconds:frames.
+		  A frame is 1/75 of a second.
+
+
+CDROMPLAYTRKIND
+	Play Audio Track/index
+
+	(struct cdrom_ti)
+
+
+	usage::
+
+	  struct cdrom_ti ti;
+
+	  ioctl(fd, CDROMPLAYTRKIND, &ti);
+
+	inputs:
+		cdrom_ti structure, describing a segment of music to play
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+	notes:
+		- Segment is described as start and end times, where each time
+		  is described as a track and an index.
+
+
+
+CDROMREADTOCHDR
+	Read TOC header
+
+	(struct cdrom_tochdr)
+
+
+	usage::
+
+	  cdrom_tochdr header;
+
+	  ioctl(fd, CDROMREADTOCHDR, &header);
+
+	inputs:
+		cdrom_tochdr structure
+
+
+	outputs:
+		cdrom_tochdr structure
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+
+
+CDROMREADTOCENTRY
+	Read TOC entry
+
+	(struct cdrom_tocentry)
+
+
+	usage::
+
+	  struct cdrom_tocentry entry;
+
+	  ioctl(fd, CDROMREADTOCENTRY, &entry);
+
+	inputs:
+		cdrom_tocentry structure
+
+
+	outputs:
+		cdrom_tocentry structure
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+	  - EINVAL	entry.cdte_format not CDROM_MSF or CDROM_LBA
+	  - EINVAL	requested track out of bounds
+	  - EIO		I/O error reading TOC
+
+	notes:
+		- TOC stands for Table Of Contents
+		- MSF stands for minutes-seconds-frames
+		- LBA stands for logical block address
+
+
+
+CDROMSTOP
+	Stop the cdrom drive
+
+
+	usage::
+
+	  ioctl(fd, CDROMSTOP, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+	notes:
+	  - Exact interpretation of this ioctl depends on the device,
+	    but most seem to spin the drive down.
+
+
+CDROMSTART
+	Start the cdrom drive
+
+
+	usage::
+
+	  ioctl(fd, CDROMSTART, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+	notes:
+	  - Exact interpretation of this ioctl depends on the device,
+	    but most seem to spin the drive up and/or close the tray.
+	    Other devices ignore the ioctl completely.
+
+
+CDROMEJECT
+	- Ejects the cdrom media
+
+
+	usage::
+
+	  ioctl(fd, CDROMEJECT, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error returns:
+	  - ENOSYS	cd drive not capable of ejecting
+	  - EBUSY	other processes are accessing drive, or door is locked
+
+	notes:
+		- See CDROM_LOCKDOOR, below.
+
+
+
+
+CDROMCLOSETRAY
+	pendant of CDROMEJECT
+
+
+	usage::
+
+	  ioctl(fd, CDROMCLOSETRAY, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error returns:
+	  - ENOSYS	cd drive not capable of closing the tray
+	  - EBUSY	other processes are accessing drive, or door is locked
+
+	notes:
+		- See CDROM_LOCKDOOR, below.
+
+
+
+
+CDROMVOLCTRL
+	Control output volume (struct cdrom_volctrl)
+
+
+	usage::
+
+	  struct cdrom_volctrl volume;
+
+	  ioctl(fd, CDROMVOLCTRL, &volume);
+
+	inputs:
+		cdrom_volctrl structure containing volumes for up to 4
+		channels.
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+
+
+CDROMVOLREAD
+	Get the drive's volume setting
+
+	(struct cdrom_volctrl)
+
+
+	usage::
+
+	  struct cdrom_volctrl volume;
+
+	  ioctl(fd, CDROMVOLREAD, &volume);
+
+	inputs:
+		none
+
+
+	outputs:
+		The current volume settings.
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+
+
+
+CDROMSUBCHNL
+	Read subchannel data
+
+	(struct cdrom_subchnl)
+
+
+	usage::
+
+	  struct cdrom_subchnl q;
+
+	  ioctl(fd, CDROMSUBCHNL, &q);
+
+	inputs:
+		cdrom_subchnl structure
+
+
+	outputs:
+		cdrom_subchnl structure
+
+
+	error return:
+	  - ENOSYS	cd drive not audio-capable.
+	  - EINVAL	format not CDROM_MSF or CDROM_LBA
+
+	notes:
+		- Format is converted to CDROM_MSF or CDROM_LBA
+		  as per user request on return
+
+
+
+CDROMREADRAW
+	read data in raw mode (2352 Bytes)
+
+	(struct cdrom_read)
+
+	usage::
+
+	  union {
+
+	    struct cdrom_msf msf;		/* input */
+	    char buffer[CD_FRAMESIZE_RAW];	/* return */
+	  } arg;
+	  ioctl(fd, CDROMREADRAW, &arg);
+
+	inputs:
+		cdrom_msf structure indicating an address to read.
+
+		Only the start values are significant.
+
+	outputs:
+		Data written to address provided by user.
+
+
+	error return:
+	  - EINVAL	address less than 0, or msf less than 0:2:0
+	  - ENOMEM	out of memory
+
+	notes:
+		- As of 2.6.8.1, comments in <linux/cdrom.h> indicate that this
+		  ioctl accepts a cdrom_read structure, but actual source code
+		  reads a cdrom_msf structure and writes a buffer of data to
+		  the same address.
+
+		- MSF values are converted to LBA values via this formula::
+
+		    lba = (((m * CD_SECS) + s) * CD_FRAMES + f) - CD_MSF_OFFSET;
+
+
+
+
+CDROMREADMODE1
+	Read CDROM mode 1 data (2048 Bytes)
+
+	(struct cdrom_read)
+
+	notes:
+		Identical to CDROMREADRAW except that block size is
+		CD_FRAMESIZE (2048) bytes
+
+
+
+CDROMREADMODE2
+	Read CDROM mode 2 data (2336 Bytes)
+
+	(struct cdrom_read)
+
+	notes:
+		Identical to CDROMREADRAW except that block size is
+		CD_FRAMESIZE_RAW0 (2336) bytes
+
+
+
+CDROMREADAUDIO
+	(struct cdrom_read_audio)
+
+	usage::
+
+	  struct cdrom_read_audio ra;
+
+	  ioctl(fd, CDROMREADAUDIO, &ra);
+
+	inputs:
+		cdrom_read_audio structure containing read start
+		point and length
+
+	outputs:
+		audio data, returned to buffer indicated by ra
+
+
+	error return:
+	  - EINVAL	format not CDROM_MSF or CDROM_LBA
+	  - EINVAL	nframes not in range [1 75]
+	  - ENXIO	drive has no queue (probably means invalid fd)
+	  - ENOMEM	out of memory
+
+
+CDROMEJECT_SW
+	enable(1)/disable(0) auto-ejecting
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, CDROMEJECT_SW, val);
+
+	inputs:
+		Flag specifying auto-eject flag.
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	Drive is not capable of ejecting.
+	  - EBUSY	Door is locked
+
+
+
+
+CDROMMULTISESSION
+	Obtain the start-of-last-session address of multi session disks
+
+	(struct cdrom_multisession)
+
+	usage::
+
+	  struct cdrom_multisession ms_info;
+
+	  ioctl(fd, CDROMMULTISESSION, &ms_info);
+
+	inputs:
+		cdrom_multisession structure containing desired
+
+	  format.
+
+	outputs:
+		cdrom_multisession structure is filled with last_session
+		information.
+
+	error return:
+	  - EINVAL	format not CDROM_MSF or CDROM_LBA
+
+
+CDROM_GET_MCN
+	Obtain the "Universal Product Code"
+	if available
+
+	(struct cdrom_mcn)
+
+
+	usage::
+
+	  struct cdrom_mcn mcn;
+
+	  ioctl(fd, CDROM_GET_MCN, &mcn);
+
+	inputs:
+		none
+
+
+	outputs:
+		Universal Product Code
+
+
+	error return:
+	  - ENOSYS	Drive is not capable of reading MCN data.
+
+	notes:
+		- Source code comments state::
+
+		    The following function is implemented, although very few
+		    audio discs give Universal Product Code information, which
+		    should just be the Medium Catalog Number on the box.  Note,
+		    that the way the code is written on the CD is /not/ uniform
+		    across all discs!
+
+
+
+
+CDROM_GET_UPC
+	CDROM_GET_MCN  (deprecated)
+
+
+	Not implemented, as of 2.6.8.1
+
+
+
+CDROMRESET
+	hard-reset the drive
+
+
+	usage::
+
+	  ioctl(fd, CDROMRESET, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - ENOSYS	Drive is not capable of resetting.
+
+
+
+
+CDROMREADCOOKED
+	read data in cooked mode
+
+
+	usage::
+
+	  u8 buffer[CD_FRAMESIZE]
+
+	  ioctl(fd, CDROMREADCOOKED, buffer);
+
+	inputs:
+		none
+
+
+	outputs:
+		2048 bytes of data, "cooked" mode.
+
+
+	notes:
+		Not implemented on all drives.
+
+
+
+
+
+CDROMREADALL
+	read all 2646 bytes
+
+
+	Same as CDROMREADCOOKED, but reads 2646 bytes.
+
+
+
+CDROMSEEK
+	seek msf address
+
+
+	usage::
+
+	  struct cdrom_msf msf;
+
+	  ioctl(fd, CDROMSEEK, &msf);
+
+	inputs:
+		MSF address to seek to.
+
+
+	outputs:
+		none
+
+
+
+
+CDROMPLAYBLK
+	scsi-cd only
+
+	(struct cdrom_blk)
+
+
+	usage::
+
+	  struct cdrom_blk blk;
+
+	  ioctl(fd, CDROMPLAYBLK, &blk);
+
+	inputs:
+		Region to play
+
+
+	outputs:
+		none
+
+
+
+
+CDROMGETSPINDOWN
+	usage::
+
+	  char spindown;
+
+	  ioctl(fd, CDROMGETSPINDOWN, &spindown);
+
+	inputs:
+		none
+
+
+	outputs:
+		The value of the current 4-bit spindown value.
+
+
+
+
+
+CDROMSETSPINDOWN
+	usage::
+
+	  char spindown
+
+	  ioctl(fd, CDROMSETSPINDOWN, &spindown);
+
+	inputs:
+		4-bit value used to control spindown (TODO: more detail here)
+
+
+	outputs:
+		none
+
+
+
+
+
+
+CDROM_SET_OPTIONS
+	Set behavior options
+
+
+	usage::
+
+	  int options;
+
+	  ioctl(fd, CDROM_SET_OPTIONS, options);
+
+	inputs:
+		New values for drive options.  The logical 'or' of:
+
+	    ==============      ==================================
+	    CDO_AUTO_CLOSE	close tray on first open(2)
+	    CDO_AUTO_EJECT	open tray on last release
+	    CDO_USE_FFLAGS	use O_NONBLOCK information on open
+	    CDO_LOCK		lock tray on open files
+	    CDO_CHECK_TYPE	check type on open for data
+	    ==============      ==================================
+
+	outputs:
+		Returns the resulting options settings in the
+		ioctl return value.  Returns -1 on error.
+
+	error return:
+	  - ENOSYS	selected option(s) not supported by drive.
+
+
+
+
+CDROM_CLEAR_OPTIONS
+	Clear behavior options
+
+
+	Same as CDROM_SET_OPTIONS, except that selected options are
+	turned off.
+
+
+
+CDROM_SELECT_SPEED
+	Set the CD-ROM speed
+
+
+	usage::
+
+	  int speed;
+
+	  ioctl(fd, CDROM_SELECT_SPEED, speed);
+
+	inputs:
+		New drive speed.
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - ENOSYS	speed selection not supported by drive.
+
+
+
+CDROM_SELECT_DISC
+	Select disc (for juke-boxes)
+
+
+	usage::
+
+	  int disk;
+
+	  ioctl(fd, CDROM_SELECT_DISC, disk);
+
+	inputs:
+		Disk to load into drive.
+
+
+	outputs:
+		none
+
+
+	error return:
+	  - EINVAL	Disk number beyond capacity of drive
+
+
+
+CDROM_MEDIA_CHANGED
+	Check is media changed
+
+
+	usage::
+
+	  int slot;
+
+	  ioctl(fd, CDROM_MEDIA_CHANGED, slot);
+
+	inputs:
+		Slot number to be tested, always zero except for jukeboxes.
+
+		May also be special values CDSL_NONE or CDSL_CURRENT
+
+	outputs:
+		Ioctl return value is 0 or 1 depending on whether the media
+
+	  has been changed, or -1 on error.
+
+	error returns:
+	  - ENOSYS	Drive can't detect media change
+	  - EINVAL	Slot number beyond capacity of drive
+	  - ENOMEM	Out of memory
+
+
+
+CDROM_DRIVE_STATUS
+	Get tray position, etc.
+
+
+	usage::
+
+	  int slot;
+
+	  ioctl(fd, CDROM_DRIVE_STATUS, slot);
+
+	inputs:
+		Slot number to be tested, always zero except for jukeboxes.
+
+		May also be special values CDSL_NONE or CDSL_CURRENT
+
+	outputs:
+		Ioctl return value will be one of the following values
+
+	  from <linux/cdrom.h>:
+
+	    =================== ==========================
+	    CDS_NO_INFO		Information not available.
+	    CDS_NO_DISC
+	    CDS_TRAY_OPEN
+	    CDS_DRIVE_NOT_READY
+	    CDS_DISC_OK
+	    -1			error
+	    =================== ==========================
+
+	error returns:
+	  - ENOSYS	Drive can't detect drive status
+	  - EINVAL	Slot number beyond capacity of drive
+	  - ENOMEM	Out of memory
+
+
+
+
+CDROM_DISC_STATUS
+	Get disc type, etc.
+
+
+	usage::
+
+	  ioctl(fd, CDROM_DISC_STATUS, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		Ioctl return value will be one of the following values
+
+	  from <linux/cdrom.h>:
+
+	    - CDS_NO_INFO
+	    - CDS_AUDIO
+	    - CDS_MIXED
+	    - CDS_XA_2_2
+	    - CDS_XA_2_1
+	    - CDS_DATA_1
+
+	error returns:
+		none at present
+
+	notes:
+	    - Source code comments state::
+
+
+		Ok, this is where problems start.  The current interface for
+		the CDROM_DISC_STATUS ioctl is flawed.  It makes the false
+		assumption that CDs are all CDS_DATA_1 or all CDS_AUDIO, etc.
+		Unfortunately, while this is often the case, it is also
+		very common for CDs to have some tracks with data, and some
+		tracks with audio.	Just because I feel like it, I declare
+		the following to be the best way to cope.  If the CD has
+		ANY data tracks on it, it will be returned as a data CD.
+		If it has any XA tracks, I will return it as that.	Now I
+		could simplify this interface by combining these returns with
+		the above, but this more clearly demonstrates the problem
+		with the current interface.  Too bad this wasn't designed
+		to use bitmasks...	       -Erik
+
+		Well, now we have the option CDS_MIXED: a mixed-type CD.
+		User level programmers might feel the ioctl is not very
+		useful.
+				---david
+
+
+
+
+CDROM_CHANGER_NSLOTS
+	Get number of slots
+
+
+	usage::
+
+	  ioctl(fd, CDROM_CHANGER_NSLOTS, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		The ioctl return value will be the number of slots in a
+		CD changer.  Typically 1 for non-multi-disk devices.
+
+	error returns:
+		none
+
+
+
+CDROM_LOCKDOOR
+	lock or unlock door
+
+
+	usage::
+
+	  int lock;
+
+	  ioctl(fd, CDROM_LOCKDOOR, lock);
+
+	inputs:
+		Door lock flag, 1=lock, 0=unlock
+
+
+	outputs:
+		none
+
+
+	error returns:
+	  - EDRIVE_CANT_DO_THIS
+
+				Door lock function not supported.
+	  - EBUSY
+
+				Attempt to unlock when multiple users
+				have the drive open and not CAP_SYS_ADMIN
+
+	notes:
+		As of 2.6.8.1, the lock flag is a global lock, meaning that
+		all CD drives will be locked or unlocked together.  This is
+		probably a bug.
+
+		The EDRIVE_CANT_DO_THIS value is defined in <linux/cdrom.h>
+		and is currently (2.6.8.1) the same as EOPNOTSUPP
+
+
+
+CDROM_DEBUG
+	Turn debug messages on/off
+
+
+	usage::
+
+	  int debug;
+
+	  ioctl(fd, CDROM_DEBUG, debug);
+
+	inputs:
+		Cdrom debug flag, 0=disable, 1=enable
+
+
+	outputs:
+		The ioctl return value will be the new debug flag.
+
+
+	error return:
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+
+
+
+CDROM_GET_CAPABILITY
+	get capabilities
+
+
+	usage::
+
+	  ioctl(fd, CDROM_GET_CAPABILITY, 0);
+
+
+	inputs:
+		none
+
+
+	outputs:
+		The ioctl return value is the current device capability
+		flags.  See CDC_CLOSE_TRAY, CDC_OPEN_TRAY, etc.
+
+
+
+CDROMAUDIOBUFSIZ
+	set the audio buffer size
+
+
+	usage::
+
+	  int arg;
+
+	  ioctl(fd, CDROMAUDIOBUFSIZ, val);
+
+	inputs:
+		New audio buffer size
+
+
+	outputs:
+		The ioctl return value is the new audio buffer size, or -1
+		on error.
+
+	error return:
+	  - ENOSYS	Not supported by this driver.
+
+	notes:
+		Not supported by all drivers.
+
+
+
+
+DVD_READ_STRUCT			Read structure
+
+	usage::
+
+	  dvd_struct s;
+
+	  ioctl(fd, DVD_READ_STRUCT, &s);
+
+	inputs:
+		dvd_struct structure, containing:
+
+	    =================== ==========================================
+	    type		specifies the information desired, one of
+				DVD_STRUCT_PHYSICAL, DVD_STRUCT_COPYRIGHT,
+				DVD_STRUCT_DISCKEY, DVD_STRUCT_BCA,
+				DVD_STRUCT_MANUFACT
+	    physical.layer_num	desired layer, indexed from 0
+	    copyright.layer_num	desired layer, indexed from 0
+	    disckey.agid
+	    =================== ==========================================
+
+	outputs:
+		dvd_struct structure, containing:
+
+	    =================== ================================
+	    physical		for type == DVD_STRUCT_PHYSICAL
+	    copyright		for type == DVD_STRUCT_COPYRIGHT
+	    disckey.value	for type == DVD_STRUCT_DISCKEY
+	    bca.{len,value}	for type == DVD_STRUCT_BCA
+	    manufact.{len,valu}	for type == DVD_STRUCT_MANUFACT
+	    =================== ================================
+
+	error returns:
+	  - EINVAL	physical.layer_num exceeds number of layers
+	  - EIO		Received invalid response from drive
+
+
+
+DVD_WRITE_STRUCT		Write structure
+
+	Not implemented, as of 2.6.8.1
+
+
+
+DVD_AUTH			Authentication
+
+	usage::
+
+	  dvd_authinfo ai;
+
+	  ioctl(fd, DVD_AUTH, &ai);
+
+	inputs:
+		dvd_authinfo structure.  See <linux/cdrom.h>
+
+
+	outputs:
+		dvd_authinfo structure.
+
+
+	error return:
+	  - ENOTTY	ai.type not recognized.
+
+
+
+CDROM_SEND_PACKET
+	send a packet to the drive
+
+
+	usage::
+
+	  struct cdrom_generic_command cgc;
+
+	  ioctl(fd, CDROM_SEND_PACKET, &cgc);
+
+	inputs:
+		cdrom_generic_command structure containing the packet to send.
+
+
+	outputs:
+		none
+
+	  cdrom_generic_command structure containing results.
+
+	error return:
+	  - EIO
+
+			command failed.
+	  - EPERM
+
+			Operation not permitted, either because a
+			write command was attempted on a drive which
+			is opened read-only, or because the command
+			requires CAP_SYS_RAWIO
+	  - EINVAL
+
+			cgc.data_direction not set
+
+
+
+CDROM_NEXT_WRITABLE
+	get next writable block
+
+
+	usage::
+
+	  long next;
+
+	  ioctl(fd, CDROM_NEXT_WRITABLE, &next);
+
+	inputs:
+		none
+
+
+	outputs:
+		The next writable block.
+
+
+	notes:
+		If the device does not support this ioctl directly, the
+
+	  ioctl will return CDROM_LAST_WRITTEN + 7.
+
+
+
+CDROM_LAST_WRITTEN
+	get last block written on disc
+
+
+	usage::
+
+	  long last;
+
+	  ioctl(fd, CDROM_LAST_WRITTEN, &last);
+
+	inputs:
+		none
+
+
+	outputs:
+		The last block written on disc
+
+
+	notes:
+		If the device does not support this ioctl directly, the
+		result is derived from the disc's table of contents.  If the
+		table of contents can't be read, this ioctl returns an
+		error.
diff --git a/Documentation/ioctl/cdrom.txt b/Documentation/ioctl/cdrom.txt
deleted file mode 100644
index a4d62a9d6771..000000000000
--- a/Documentation/ioctl/cdrom.txt
+++ /dev/null
@@ -1,967 +0,0 @@
-		Summary of CDROM ioctl calls.
-		============================
-
-		Edward A. Falk <efalk@google.com>
-
-		November, 2004
-
-This document attempts to describe the ioctl(2) calls supported by
-the CDROM layer.  These are by-and-large implemented (as of Linux 2.6)
-in drivers/cdrom/cdrom.c and drivers/block/scsi_ioctl.c
-
-ioctl values are listed in <linux/cdrom.h>.  As of this writing, they
-are as follows:
-
-	CDROMPAUSE		Pause Audio Operation
-	CDROMRESUME		Resume paused Audio Operation
-	CDROMPLAYMSF		Play Audio MSF (struct cdrom_msf)
-	CDROMPLAYTRKIND		Play Audio Track/index (struct cdrom_ti)
-	CDROMREADTOCHDR		Read TOC header (struct cdrom_tochdr)
-	CDROMREADTOCENTRY	Read TOC entry (struct cdrom_tocentry)
-	CDROMSTOP		Stop the cdrom drive
-	CDROMSTART		Start the cdrom drive
-	CDROMEJECT		Ejects the cdrom media
-	CDROMVOLCTRL		Control output volume (struct cdrom_volctrl)
-	CDROMSUBCHNL		Read subchannel data (struct cdrom_subchnl)
-	CDROMREADMODE2		Read CDROM mode 2 data (2336 Bytes)
-					   (struct cdrom_read)
-	CDROMREADMODE1		Read CDROM mode 1 data (2048 Bytes)
-					   (struct cdrom_read)
-	CDROMREADAUDIO		(struct cdrom_read_audio)
-	CDROMEJECT_SW		enable(1)/disable(0) auto-ejecting
-	CDROMMULTISESSION	Obtain the start-of-last-session
-				  address of multi session disks
-				  (struct cdrom_multisession)
-	CDROM_GET_MCN		Obtain the "Universal Product Code"
-				   if available (struct cdrom_mcn)
-	CDROM_GET_UPC		Deprecated, use CDROM_GET_MCN instead.
-	CDROMRESET		hard-reset the drive
-	CDROMVOLREAD		Get the drive's volume setting
-					  (struct cdrom_volctrl)
-	CDROMREADRAW		read data in raw mode (2352 Bytes)
-					   (struct cdrom_read)
-	CDROMREADCOOKED		read data in cooked mode
-	CDROMSEEK		seek msf address
-	CDROMPLAYBLK		scsi-cd only, (struct cdrom_blk)
-	CDROMREADALL		read all 2646 bytes
-	CDROMGETSPINDOWN	return 4-bit spindown value
-	CDROMSETSPINDOWN	set 4-bit spindown value
-	CDROMCLOSETRAY		pendant of CDROMEJECT
-	CDROM_SET_OPTIONS	Set behavior options
-	CDROM_CLEAR_OPTIONS	Clear behavior options
-	CDROM_SELECT_SPEED	Set the CD-ROM speed
-	CDROM_SELECT_DISC	Select disc (for juke-boxes)
-	CDROM_MEDIA_CHANGED	Check is media changed
-	CDROM_DRIVE_STATUS	Get tray position, etc.
-	CDROM_DISC_STATUS	Get disc type, etc.
-	CDROM_CHANGER_NSLOTS	Get number of slots
-	CDROM_LOCKDOOR		lock or unlock door
-	CDROM_DEBUG		Turn debug messages on/off
-	CDROM_GET_CAPABILITY	get capabilities
-	CDROMAUDIOBUFSIZ	set the audio buffer size
-	DVD_READ_STRUCT		Read structure
-	DVD_WRITE_STRUCT	Write structure
-	DVD_AUTH		Authentication
-	CDROM_SEND_PACKET	send a packet to the drive
-	CDROM_NEXT_WRITABLE	get next writable block
-	CDROM_LAST_WRITTEN	get last block written on disc
-
-
-The information that follows was determined from reading kernel source
-code.  It is likely that some corrections will be made over time.
-
-
-
-
-
-
-
-General:
-
-	Unless otherwise specified, all ioctl calls return 0 on success
-	and -1 with errno set to an appropriate value on error.  (Some
-	ioctls return non-negative data values.)
-
-	Unless otherwise specified, all ioctl calls return -1 and set
-	errno to EFAULT on a failed attempt to copy data to or from user
-	address space.
-
-	Individual drivers may return error codes not listed here.
-
-	Unless otherwise specified, all data structures and constants
-	are defined in <linux/cdrom.h>
-
-
-
-
-CDROMPAUSE			Pause Audio Operation
-
-	usage:
-
-	  ioctl(fd, CDROMPAUSE, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-
-CDROMRESUME			Resume paused Audio Operation
-
-	usage:
-
-	  ioctl(fd, CDROMRESUME, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-
-CDROMPLAYMSF			Play Audio MSF (struct cdrom_msf)
-
-	usage:
-
-	  struct cdrom_msf msf;
-	  ioctl(fd, CDROMPLAYMSF, &msf);
-
-	inputs:
-	  cdrom_msf structure, describing a segment of music to play
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-	notes:
-	  MSF stands for minutes-seconds-frames
-	  LBA stands for logical block address
-
-	  Segment is described as start and end times, where each time
-	  is described as minutes:seconds:frames.  A frame is 1/75 of
-	  a second.
-
-
-CDROMPLAYTRKIND			Play Audio Track/index (struct cdrom_ti)
-
-	usage:
-
-	  struct cdrom_ti ti;
-	  ioctl(fd, CDROMPLAYTRKIND, &ti);
-
-	inputs:
-	  cdrom_ti structure, describing a segment of music to play
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-	notes:
-	  Segment is described as start and end times, where each time
-	  is described as a track and an index.
-
-
-
-CDROMREADTOCHDR			Read TOC header (struct cdrom_tochdr)
-
-	usage:
-
-	  cdrom_tochdr header;
-	  ioctl(fd, CDROMREADTOCHDR, &header);
-
-	inputs:
-	  cdrom_tochdr structure
-
-	outputs:
-	  cdrom_tochdr structure
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-
-
-CDROMREADTOCENTRY		Read TOC entry (struct cdrom_tocentry)
-
-	usage:
-
-	  struct cdrom_tocentry entry;
-	  ioctl(fd, CDROMREADTOCENTRY, &entry);
-
-	inputs:
-	  cdrom_tocentry structure
-
-	outputs:
-	  cdrom_tocentry structure
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-	  EINVAL	entry.cdte_format not CDROM_MSF or CDROM_LBA
-	  EINVAL	requested track out of bounds
-	  EIO		I/O error reading TOC
-
-	notes:
-	  TOC stands for Table Of Contents
-	  MSF stands for minutes-seconds-frames
-	  LBA stands for logical block address
-
-
-
-CDROMSTOP			Stop the cdrom drive
-
-	usage:
-
-	  ioctl(fd, CDROMSTOP, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-	notes:
-	  Exact interpretation of this ioctl depends on the device,
-	  but most seem to spin the drive down.
-
-
-CDROMSTART			Start the cdrom drive
-
-	usage:
-
-	  ioctl(fd, CDROMSTART, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-	notes:
-	  Exact interpretation of this ioctl depends on the device,
-	  but most seem to spin the drive up and/or close the tray.
-	  Other devices ignore the ioctl completely.
-
-
-CDROMEJECT			Ejects the cdrom media
-
-	usage:
-
-	  ioctl(fd, CDROMEJECT, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error returns:
-	  ENOSYS	cd drive not capable of ejecting
-	  EBUSY		other processes are accessing drive, or door is locked
-
-	notes:
-	  See CDROM_LOCKDOOR, below.
-
-
-
-CDROMCLOSETRAY			pendant of CDROMEJECT
-
-	usage:
-
-	  ioctl(fd, CDROMCLOSETRAY, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error returns:
-	  ENOSYS	cd drive not capable of closing the tray
-	  EBUSY		other processes are accessing drive, or door is locked
-
-	notes:
-	  See CDROM_LOCKDOOR, below.
-
-
-
-CDROMVOLCTRL			Control output volume (struct cdrom_volctrl)
-
-	usage:
-
-	  struct cdrom_volctrl volume;
-	  ioctl(fd, CDROMVOLCTRL, &volume);
-
-	inputs:
-	  cdrom_volctrl structure containing volumes for up to 4
-	  channels.
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-
-
-CDROMVOLREAD			Get the drive's volume setting
-					  (struct cdrom_volctrl)
-
-	usage:
-
-	  struct cdrom_volctrl volume;
-	  ioctl(fd, CDROMVOLREAD, &volume);
-
-	inputs:		none
-
-	outputs:
-	  The current volume settings.
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-
-
-
-CDROMSUBCHNL			Read subchannel data (struct cdrom_subchnl)
-
-	usage:
-
-	  struct cdrom_subchnl q;
-	  ioctl(fd, CDROMSUBCHNL, &q);
-
-	inputs:
-	  cdrom_subchnl structure
-
-	outputs:
-	  cdrom_subchnl structure
-
-	error return:
-	  ENOSYS	cd drive not audio-capable.
-	  EINVAL	format not CDROM_MSF or CDROM_LBA
-
-	notes:
-	  Format is converted to CDROM_MSF or CDROM_LBA
-	  as per user request on return
-
-
-
-CDROMREADRAW			read data in raw mode (2352 Bytes)
-					   (struct cdrom_read)
-
-	usage:
-
-	  union {
-	    struct cdrom_msf msf;		/* input */
-	    char buffer[CD_FRAMESIZE_RAW];	/* return */
-	  } arg;
-	  ioctl(fd, CDROMREADRAW, &arg);
-
-	inputs:
-	  cdrom_msf structure indicating an address to read.
-	  Only the start values are significant.
-
-	outputs:
-	  Data written to address provided by user.
-
-	error return:
-	  EINVAL	address less than 0, or msf less than 0:2:0
-	  ENOMEM	out of memory
-
-	notes:
-	  As of 2.6.8.1, comments in <linux/cdrom.h> indicate that this
-	  ioctl accepts a cdrom_read structure, but actual source code
-	  reads a cdrom_msf structure and writes a buffer of data to
-	  the same address.
-
-	  MSF values are converted to LBA values via this formula:
-
-	    lba = (((m * CD_SECS) + s) * CD_FRAMES + f) - CD_MSF_OFFSET;
-
-
-
-
-CDROMREADMODE1			Read CDROM mode 1 data (2048 Bytes)
-					   (struct cdrom_read)
-
-	notes:
-	  Identical to CDROMREADRAW except that block size is
-	  CD_FRAMESIZE (2048) bytes
-
-
-
-CDROMREADMODE2			Read CDROM mode 2 data (2336 Bytes)
-					   (struct cdrom_read)
-
-	notes:
-	  Identical to CDROMREADRAW except that block size is
-	  CD_FRAMESIZE_RAW0 (2336) bytes
-
-
-
-CDROMREADAUDIO			(struct cdrom_read_audio)
-
-	usage:
-
-	  struct cdrom_read_audio ra;
-	  ioctl(fd, CDROMREADAUDIO, &ra);
-
-	inputs:
-	  cdrom_read_audio structure containing read start
-	  point and length
-
-	outputs:
-	  audio data, returned to buffer indicated by ra
-
-	error return:
-	  EINVAL	format not CDROM_MSF or CDROM_LBA
-	  EINVAL	nframes not in range [1 75]
-	  ENXIO		drive has no queue (probably means invalid fd)
-	  ENOMEM	out of memory
-
-
-CDROMEJECT_SW			enable(1)/disable(0) auto-ejecting
-
-	usage:
-
-	  int val;
-	  ioctl(fd, CDROMEJECT_SW, val);
-
-	inputs:
-	  Flag specifying auto-eject flag.
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	Drive is not capable of ejecting.
-	  EBUSY		Door is locked
-
-
-
-
-CDROMMULTISESSION		Obtain the start-of-last-session
-				  address of multi session disks
-				  (struct cdrom_multisession)
-	usage:
-
-	  struct cdrom_multisession ms_info;
-	  ioctl(fd, CDROMMULTISESSION, &ms_info);
-
-	inputs:
-	  cdrom_multisession structure containing desired
-	  format.
-
-	outputs:
-	  cdrom_multisession structure is filled with last_session
-	  information.
-
-	error return:
-	  EINVAL	format not CDROM_MSF or CDROM_LBA
-
-
-CDROM_GET_MCN			Obtain the "Universal Product Code"
-				   if available (struct cdrom_mcn)
-
-	usage:
-
-	  struct cdrom_mcn mcn;
-	  ioctl(fd, CDROM_GET_MCN, &mcn);
-
-	inputs:		none
-
-	outputs:
-	  Universal Product Code
-
-	error return:
-	  ENOSYS	Drive is not capable of reading MCN data.
-
-	notes:
-	  Source code comments state:
-
-	    The following function is implemented, although very few
-	    audio discs give Universal Product Code information, which
-	    should just be the Medium Catalog Number on the box.  Note,
-	    that the way the code is written on the CD is /not/ uniform
-	    across all discs!
-
-
-
-
-CDROM_GET_UPC			CDROM_GET_MCN  (deprecated)
-
-	Not implemented, as of 2.6.8.1
-
-
-
-CDROMRESET			hard-reset the drive
-
-	usage:
-
-	  ioctl(fd, CDROMRESET, 0);
-
-	inputs:		none
-
-	outputs:	none
-
-	error return:
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  ENOSYS	Drive is not capable of resetting.
-
-
-
-
-CDROMREADCOOKED			read data in cooked mode
-
-	usage:
-
-	  u8 buffer[CD_FRAMESIZE]
-	  ioctl(fd, CDROMREADCOOKED, buffer);
-
-	inputs:		none
-
-	outputs:
-	  2048 bytes of data, "cooked" mode.
-
-	notes:
-	  Not implemented on all drives.
-
-
-
-
-CDROMREADALL			read all 2646 bytes
-
-	Same as CDROMREADCOOKED, but reads 2646 bytes.
-
-
-
-CDROMSEEK			seek msf address
-
-	usage:
-
-	  struct cdrom_msf msf;
-	  ioctl(fd, CDROMSEEK, &msf);
-
-	inputs:
-	  MSF address to seek to.
-
-	outputs:	none
-
-
-
-CDROMPLAYBLK			scsi-cd only, (struct cdrom_blk)
-
-	usage:
-
-	  struct cdrom_blk blk;
-	  ioctl(fd, CDROMPLAYBLK, &blk);
-
-	inputs:
-	  Region to play
-
-	outputs:	none
-
-
-
-CDROMGETSPINDOWN
-
-	usage:
-
-	  char spindown;
-	  ioctl(fd, CDROMGETSPINDOWN, &spindown);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current 4-bit spindown value.
-
-
-
-
-CDROMSETSPINDOWN
-
-	usage:
-
-	  char spindown
-	  ioctl(fd, CDROMSETSPINDOWN, &spindown);
-
-	inputs:
-	  4-bit value used to control spindown (TODO: more detail here)
-
-	outputs:	none
-
-
-
-
-
-CDROM_SET_OPTIONS		Set behavior options
-
-	usage:
-
-	  int options;
-	  ioctl(fd, CDROM_SET_OPTIONS, options);
-
-	inputs:
-	  New values for drive options.  The logical 'or' of:
-	    CDO_AUTO_CLOSE	close tray on first open(2)
-	    CDO_AUTO_EJECT	open tray on last release
-	    CDO_USE_FFLAGS	use O_NONBLOCK information on open
-	    CDO_LOCK		lock tray on open files
-	    CDO_CHECK_TYPE	check type on open for data
-
-	outputs:
-	  Returns the resulting options settings in the
-	  ioctl return value.  Returns -1 on error.
-
-	error return:
-	  ENOSYS	selected option(s) not supported by drive.
-
-
-
-
-CDROM_CLEAR_OPTIONS		Clear behavior options
-
-	Same as CDROM_SET_OPTIONS, except that selected options are
-	turned off.
-
-
-
-CDROM_SELECT_SPEED		Set the CD-ROM speed
-
-	usage:
-
-	  int speed;
-	  ioctl(fd, CDROM_SELECT_SPEED, speed);
-
-	inputs:
-	  New drive speed.
-
-	outputs:	none
-
-	error return:
-	  ENOSYS	speed selection not supported by drive.
-
-
-
-CDROM_SELECT_DISC		Select disc (for juke-boxes)
-
-	usage:
-
-	  int disk;
-	  ioctl(fd, CDROM_SELECT_DISC, disk);
-
-	inputs:
-	  Disk to load into drive.
-
-	outputs:	none
-
-	error return:
-	  EINVAL	Disk number beyond capacity of drive
-
-
-
-CDROM_MEDIA_CHANGED		Check is media changed
-
-	usage:
-
-	  int slot;
-	  ioctl(fd, CDROM_MEDIA_CHANGED, slot);
-
-	inputs:
-	  Slot number to be tested, always zero except for jukeboxes.
-	  May also be special values CDSL_NONE or CDSL_CURRENT
-
-	outputs:
-	  Ioctl return value is 0 or 1 depending on whether the media
-	  has been changed, or -1 on error.
-
-	error returns:
-	  ENOSYS	Drive can't detect media change
-	  EINVAL	Slot number beyond capacity of drive
-	  ENOMEM	Out of memory
-
-
-
-CDROM_DRIVE_STATUS		Get tray position, etc.
-
-	usage:
-
-	  int slot;
-	  ioctl(fd, CDROM_DRIVE_STATUS, slot);
-
-	inputs:
-	  Slot number to be tested, always zero except for jukeboxes.
-	  May also be special values CDSL_NONE or CDSL_CURRENT
-
-	outputs:
-	  Ioctl return value will be one of the following values
-	  from <linux/cdrom.h>:
-
-	    CDS_NO_INFO		Information not available.
-	    CDS_NO_DISC
-	    CDS_TRAY_OPEN
-	    CDS_DRIVE_NOT_READY
-	    CDS_DISC_OK
-	    -1			error
-
-	error returns:
-	  ENOSYS	Drive can't detect drive status
-	  EINVAL	Slot number beyond capacity of drive
-	  ENOMEM	Out of memory
-
-
-
-
-CDROM_DISC_STATUS		Get disc type, etc.
-
-	usage:
-
-	  ioctl(fd, CDROM_DISC_STATUS, 0);
-
-	inputs:		none
-
-	outputs:
-	  Ioctl return value will be one of the following values
-	  from <linux/cdrom.h>:
-	    CDS_NO_INFO
-	    CDS_AUDIO
-	    CDS_MIXED
-	    CDS_XA_2_2
-	    CDS_XA_2_1
-	    CDS_DATA_1
-
-	error returns:	none at present
-
-	notes:
-	  Source code comments state:
-
-	    Ok, this is where problems start.  The current interface for
-	    the CDROM_DISC_STATUS ioctl is flawed.  It makes the false
-	    assumption that CDs are all CDS_DATA_1 or all CDS_AUDIO, etc.
-	    Unfortunately, while this is often the case, it is also
-	    very common for CDs to have some tracks with data, and some
-	    tracks with audio.	Just because I feel like it, I declare
-	    the following to be the best way to cope.  If the CD has
-	    ANY data tracks on it, it will be returned as a data CD.
-	    If it has any XA tracks, I will return it as that.	Now I
-	    could simplify this interface by combining these returns with
-	    the above, but this more clearly demonstrates the problem
-	    with the current interface.  Too bad this wasn't designed
-	    to use bitmasks...	       -Erik
-
-	    Well, now we have the option CDS_MIXED: a mixed-type CD.
-	    User level programmers might feel the ioctl is not very
-	    useful.
-			---david
-
-
-
-
-CDROM_CHANGER_NSLOTS		Get number of slots
-
-	usage:
-
-	  ioctl(fd, CDROM_CHANGER_NSLOTS, 0);
-
-	inputs:		none
-
-	outputs:
-	  The ioctl return value will be the number of slots in a
-	  CD changer.  Typically 1 for non-multi-disk devices.
-
-	error returns:	none
-
-
-
-CDROM_LOCKDOOR			lock or unlock door
-
-	usage:
-
-	  int lock;
-	  ioctl(fd, CDROM_LOCKDOOR, lock);
-
-	inputs:
-	  Door lock flag, 1=lock, 0=unlock
-
-	outputs:	none
-
-	error returns:
-	  EDRIVE_CANT_DO_THIS	Door lock function not supported.
-	  EBUSY			Attempt to unlock when multiple users
-	  			have the drive open and not CAP_SYS_ADMIN
-
-	notes:
-	  As of 2.6.8.1, the lock flag is a global lock, meaning that
-	  all CD drives will be locked or unlocked together.  This is
-	  probably a bug.
-
-	  The EDRIVE_CANT_DO_THIS value is defined in <linux/cdrom.h>
-	  and is currently (2.6.8.1) the same as EOPNOTSUPP
-
-
-
-CDROM_DEBUG			Turn debug messages on/off
-
-	usage:
-
-	  int debug;
-	  ioctl(fd, CDROM_DEBUG, debug);
-
-	inputs:
-	  Cdrom debug flag, 0=disable, 1=enable
-
-	outputs:
-	  The ioctl return value will be the new debug flag.
-
-	error return:
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-
-
-
-CDROM_GET_CAPABILITY		get capabilities
-
-	usage:
-
-	  ioctl(fd, CDROM_GET_CAPABILITY, 0);
-
-	inputs:		none
-
-	outputs:
-	  The ioctl return value is the current device capability
-	  flags.  See CDC_CLOSE_TRAY, CDC_OPEN_TRAY, etc.
-
-
-
-CDROMAUDIOBUFSIZ		set the audio buffer size
-
-	usage:
-
-	  int arg;
-	  ioctl(fd, CDROMAUDIOBUFSIZ, val);
-
-	inputs:
-	  New audio buffer size
-
-	outputs:
-	  The ioctl return value is the new audio buffer size, or -1
-	  on error.
-
-	error return:
-	  ENOSYS	Not supported by this driver.
-
-	notes:
-	  Not supported by all drivers.
-
-
-
-DVD_READ_STRUCT			Read structure
-
-	usage:
-
-	  dvd_struct s;
-	  ioctl(fd, DVD_READ_STRUCT, &s);
-
-	inputs:
-	  dvd_struct structure, containing:
-	    type		specifies the information desired, one of
-	    			DVD_STRUCT_PHYSICAL, DVD_STRUCT_COPYRIGHT,
-				DVD_STRUCT_DISCKEY, DVD_STRUCT_BCA,
-				DVD_STRUCT_MANUFACT
-	    physical.layer_num	desired layer, indexed from 0
-	    copyright.layer_num	desired layer, indexed from 0
-	    disckey.agid
-
-	outputs:
-	  dvd_struct structure, containing:
-	    physical		for type == DVD_STRUCT_PHYSICAL
-	    copyright		for type == DVD_STRUCT_COPYRIGHT
-	    disckey.value	for type == DVD_STRUCT_DISCKEY
-	    bca.{len,value}	for type == DVD_STRUCT_BCA
-	    manufact.{len,valu}	for type == DVD_STRUCT_MANUFACT
-
-	error returns:
-	  EINVAL	physical.layer_num exceeds number of layers
-	  EIO		Received invalid response from drive
-
-
-
-DVD_WRITE_STRUCT		Write structure
-
-	Not implemented, as of 2.6.8.1
-
-
-
-DVD_AUTH			Authentication
-
-	usage:
-
-	  dvd_authinfo ai;
-	  ioctl(fd, DVD_AUTH, &ai);
-
-	inputs:
-	  dvd_authinfo structure.  See <linux/cdrom.h>
-
-	outputs:
-	  dvd_authinfo structure.
-
-	error return:
-	  ENOTTY	ai.type not recognized.
-
-
-
-CDROM_SEND_PACKET		send a packet to the drive
-
-	usage:
-
-	  struct cdrom_generic_command cgc;
-	  ioctl(fd, CDROM_SEND_PACKET, &cgc);
-
-	inputs:
-	  cdrom_generic_command structure containing the packet to send.
-
-	outputs:	none
-	  cdrom_generic_command structure containing results.
-
-	error return:
-	  EIO		command failed.
-	  EPERM		Operation not permitted, either because a
-			write command was attempted on a drive which
-			is opened read-only, or because the command
-			requires CAP_SYS_RAWIO
-	  EINVAL	cgc.data_direction not set
-
-
-
-CDROM_NEXT_WRITABLE		get next writable block
-
-	usage:
-
-	  long next;
-	  ioctl(fd, CDROM_NEXT_WRITABLE, &next);
-
-	inputs:		none
-
-	outputs:
-	  The next writable block.
-
-	notes:
-	  If the device does not support this ioctl directly, the
-	  ioctl will return CDROM_LAST_WRITTEN + 7.
-
-
-
-CDROM_LAST_WRITTEN		get last block written on disc
-
-	usage:
-
-	  long last;
-	  ioctl(fd, CDROM_LAST_WRITTEN, &last);
-
-	inputs:		none
-
-	outputs:
-	  The last block written on disc
-
-	notes:
-	  If the device does not support this ioctl directly, the
-	  result is derived from the disc's table of contents.  If the
-	  table of contents can't be read, this ioctl returns an
-	  error.
diff --git a/Documentation/ioctl/hdio.rst b/Documentation/ioctl/hdio.rst
new file mode 100644
index 000000000000..e822e3dff176
--- /dev/null
+++ b/Documentation/ioctl/hdio.rst
@@ -0,0 +1,1342 @@
+==============================
+Summary of `HDIO_` ioctl calls
+==============================
+
+- Edward A. Falk <efalk@google.com>
+
+November, 2004
+
+This document attempts to describe the ioctl(2) calls supported by
+the HD/IDE layer.  These are by-and-large implemented (as of Linux 2.6)
+in drivers/ide/ide.c and drivers/block/scsi_ioctl.c
+
+ioctl values are listed in <linux/hdreg.h>.  As of this writing, they
+are as follows:
+
+    ioctls that pass argument pointers to user space:
+
+	=======================	=======================================
+	HDIO_GETGEO		get device geometry
+	HDIO_GET_UNMASKINTR	get current unmask setting
+	HDIO_GET_MULTCOUNT	get current IDE blockmode setting
+	HDIO_GET_QDMA		get use-qdma flag
+	HDIO_SET_XFER		set transfer rate via proc
+	HDIO_OBSOLETE_IDENTITY	OBSOLETE, DO NOT USE
+	HDIO_GET_KEEPSETTINGS	get keep-settings-on-reset flag
+	HDIO_GET_32BIT		get current io_32bit setting
+	HDIO_GET_NOWERR		get ignore-write-error flag
+	HDIO_GET_DMA		get use-dma flag
+	HDIO_GET_NICE		get nice flags
+	HDIO_GET_IDENTITY	get IDE identification info
+	HDIO_GET_WCACHE		get write cache mode on|off
+	HDIO_GET_ACOUSTIC	get acoustic value
+	HDIO_GET_ADDRESS	get sector addressing mode
+	HDIO_GET_BUSSTATE	get the bus state of the hwif
+	HDIO_TRISTATE_HWIF	execute a channel tristate
+	HDIO_DRIVE_RESET	execute a device reset
+	HDIO_DRIVE_TASKFILE	execute raw taskfile
+	HDIO_DRIVE_TASK		execute task and special drive command
+	HDIO_DRIVE_CMD		execute a special drive command
+	HDIO_DRIVE_CMD_AEB	HDIO_DRIVE_TASK
+	=======================	=======================================
+
+    ioctls that pass non-pointer values:
+
+	=======================	=======================================
+	HDIO_SET_MULTCOUNT	change IDE blockmode
+	HDIO_SET_UNMASKINTR	permit other irqs during I/O
+	HDIO_SET_KEEPSETTINGS	keep ioctl settings on reset
+	HDIO_SET_32BIT		change io_32bit flags
+	HDIO_SET_NOWERR		change ignore-write-error flag
+	HDIO_SET_DMA		change use-dma flag
+	HDIO_SET_PIO_MODE	reconfig interface to new speed
+	HDIO_SCAN_HWIF		register and (re)scan interface
+	HDIO_SET_NICE		set nice flags
+	HDIO_UNREGISTER_HWIF	unregister interface
+	HDIO_SET_WCACHE		change write cache enable-disable
+	HDIO_SET_ACOUSTIC	change acoustic behavior
+	HDIO_SET_BUSSTATE	set the bus state of the hwif
+	HDIO_SET_QDMA		change use-qdma flag
+	HDIO_SET_ADDRESS	change lba addressing modes
+
+	HDIO_SET_IDE_SCSI	Set scsi emulation mode on/off
+	HDIO_SET_SCSI_IDE	not implemented yet
+	=======================	=======================================
+
+
+The information that follows was determined from reading kernel source
+code.  It is likely that some corrections will be made over time.
+
+------------------------------------------------------------------------------
+
+General:
+
+	Unless otherwise specified, all ioctl calls return 0 on success
+	and -1 with errno set to an appropriate value on error.
+
+	Unless otherwise specified, all ioctl calls return -1 and set
+	errno to EFAULT on a failed attempt to copy data to or from user
+	address space.
+
+	Unless otherwise specified, all data structures and constants
+	are defined in <linux/hdreg.h>
+
+------------------------------------------------------------------------------
+
+HDIO_GETGEO
+	get device geometry
+
+
+	usage::
+
+	  struct hd_geometry geom;
+
+	  ioctl(fd, HDIO_GETGEO, &geom);
+
+
+	inputs:
+		none
+
+
+
+	outputs:
+		hd_geometry structure containing:
+
+
+	    =========	==================================
+	    heads	number of heads
+	    sectors	number of sectors/track
+	    cylinders	number of cylinders, mod 65536
+	    start	starting sector of this partition.
+	    =========	==================================
+
+
+	error returns:
+	  - EINVAL
+
+			if the device is not a disk drive or floppy drive,
+			or if the user passes a null pointer
+
+
+	notes:
+		Not particularly useful with modern disk drives, whose geometry
+		is a polite fiction anyway.  Modern drives are addressed
+		purely by sector number nowadays (lba addressing), and the
+		drive geometry is an abstraction which is actually subject
+		to change.  Currently (as of Nov 2004), the geometry values
+		are the "bios" values -- presumably the values the drive had
+		when Linux first booted.
+
+		In addition, the cylinders field of the hd_geometry is an
+		unsigned short, meaning that on most architectures, this
+		ioctl will not return a meaningful value on drives with more
+		than 65535 tracks.
+
+		The start field is unsigned long, meaning that it will not
+		contain a meaningful value for disks over 219 Gb in size.
+
+
+
+
+HDIO_GET_UNMASKINTR
+	get current unmask setting
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_UNMASKINTR, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the drive's current unmask setting
+
+
+
+
+
+HDIO_SET_UNMASKINTR
+	permit other irqs during I/O
+
+
+	usage::
+
+	  unsigned long val;
+
+	  ioctl(fd, HDIO_SET_UNMASKINTR, val);
+
+	inputs:
+		New value for unmask flag
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 1]
+	  - EBUSY	Controller busy
+
+
+
+
+HDIO_GET_MULTCOUNT
+	get current IDE blockmode setting
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_MULTCOUNT, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current IDE block mode setting.  This
+		controls how many sectors the drive will transfer per
+		interrupt.
+
+
+
+HDIO_SET_MULTCOUNT
+	change IDE blockmode
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, HDIO_SET_MULTCOUNT, val);
+
+	inputs:
+		New value for IDE block mode setting.  This controls how many
+		sectors the drive will transfer per interrupt.
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range supported by disk.
+	  - EBUSY	Controller busy or blockmode already set.
+	  - EIO		Drive did not accept new block mode.
+
+	notes:
+	  Source code comments read::
+
+	    This is tightly woven into the driver->do_special cannot
+	    touch.  DON'T do it again until a total personality rewrite
+	    is committed.
+
+	  If blockmode has already been set, this ioctl will fail with
+	  -EBUSY
+
+
+
+HDIO_GET_QDMA
+	get use-qdma flag
+
+
+	Not implemented, as of 2.6.8.1
+
+
+
+HDIO_SET_XFER
+	set transfer rate via proc
+
+
+	Not implemented, as of 2.6.8.1
+
+
+
+HDIO_OBSOLETE_IDENTITY
+	OBSOLETE, DO NOT USE
+
+
+	Same as HDIO_GET_IDENTITY (see below), except that it only
+	returns the first 142 bytes of drive identity information.
+
+
+
+HDIO_GET_IDENTITY
+	get IDE identification info
+
+
+	usage::
+
+	  unsigned char identity[512];
+
+	  ioctl(fd, HDIO_GET_IDENTITY, identity);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		ATA drive identity information.  For full description, see
+		the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands in
+		the ATA specification.
+
+	error returns:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - ENOMSG	IDENTIFY DEVICE information not available
+
+	notes:
+		Returns information that was obtained when the drive was
+		probed.  Some of this information is subject to change, and
+		this ioctl does not re-probe the drive to update the
+		information.
+
+		This information is also available from /proc/ide/hdX/identify
+
+
+
+HDIO_GET_KEEPSETTINGS
+	get keep-settings-on-reset flag
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_KEEPSETTINGS, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current "keep settings" flag
+
+
+
+	notes:
+		When set, indicates that kernel should restore settings
+		after a drive reset.
+
+
+
+HDIO_SET_KEEPSETTINGS
+	keep ioctl settings on reset
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_SET_KEEPSETTINGS, val);
+
+	inputs:
+		New value for keep_settings flag
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 1]
+	  - EBUSY		Controller busy
+
+
+
+HDIO_GET_32BIT
+	get current io_32bit setting
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_32BIT, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current io_32bit setting
+
+
+
+	notes:
+		0=16-bit, 1=32-bit, 2,3 = 32bit+sync
+
+
+
+
+
+HDIO_GET_NOWERR
+	get ignore-write-error flag
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_NOWERR, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current ignore-write-error flag
+
+
+
+
+
+HDIO_GET_DMA
+	get use-dma flag
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_DMA, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current use-dma flag
+
+
+
+
+
+HDIO_GET_NICE
+	get nice flags
+
+
+	usage::
+
+	  long nice;
+
+	  ioctl(fd, HDIO_GET_NICE, &nice);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The drive's "nice" values.
+
+
+
+	notes:
+		Per-drive flags which determine when the system will give more
+		bandwidth to other devices sharing the same IDE bus.
+
+		See <linux/hdreg.h>, near symbol IDE_NICE_DSC_OVERLAP.
+
+
+
+
+HDIO_SET_NICE
+	set nice flags
+
+
+	usage::
+
+	  unsigned long nice;
+
+	  ...
+	  ioctl(fd, HDIO_SET_NICE, nice);
+
+	inputs:
+		bitmask of nice flags.
+
+
+
+	outputs:
+		none
+
+
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EPERM	Flags other than DSC_OVERLAP and NICE_1 set.
+	  - EPERM	DSC_OVERLAP specified but not supported by drive
+
+	notes:
+		This ioctl sets the DSC_OVERLAP and NICE_1 flags from values
+		provided by the user.
+
+		Nice flags are listed in <linux/hdreg.h>, starting with
+		IDE_NICE_DSC_OVERLAP.  These values represent shifts.
+
+
+
+
+
+HDIO_GET_WCACHE
+	get write cache mode on|off
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_WCACHE, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current write cache mode
+
+
+
+
+
+HDIO_GET_ACOUSTIC
+	get acoustic value
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_ACOUSTIC, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current acoustic settings
+
+
+
+	notes:
+		See HDIO_SET_ACOUSTIC
+
+
+
+
+
+HDIO_GET_ADDRESS
+	usage::
+
+
+	  long val;
+
+	  ioctl(fd, HDIO_GET_ADDRESS, &val);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		The value of the current addressing mode:
+
+	    =  ===================
+	    0  28-bit
+	    1  48-bit
+	    2  48-bit doing 28-bit
+	    3  64-bit
+	    =  ===================
+
+
+
+HDIO_GET_BUSSTATE
+	get the bus state of the hwif
+
+
+	usage::
+
+	  long state;
+
+	  ioctl(fd, HDIO_SCAN_HWIF, &state);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		Current power state of the IDE bus.  One of BUSSTATE_OFF,
+		BUSSTATE_ON, or BUSSTATE_TRISTATE
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+
+
+
+
+HDIO_SET_BUSSTATE
+	set the bus state of the hwif
+
+
+	usage::
+
+	  int state;
+
+	  ...
+	  ioctl(fd, HDIO_SCAN_HWIF, state);
+
+	inputs:
+		Desired IDE power state.  One of BUSSTATE_OFF, BUSSTATE_ON,
+		or BUSSTATE_TRISTATE
+
+	outputs:
+		none
+
+
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_RAWIO
+	  - EOPNOTSUPP	Hardware interface does not support bus power control
+
+
+
+
+HDIO_TRISTATE_HWIF
+	execute a channel tristate
+
+
+	Not implemented, as of 2.6.8.1.  See HDIO_SET_BUSSTATE
+
+
+
+HDIO_DRIVE_RESET
+	execute a device reset
+
+
+	usage::
+
+	  int args[3]
+
+	  ...
+	  ioctl(fd, HDIO_DRIVE_RESET, args);
+
+	inputs:
+		none
+
+
+
+	outputs:
+		none
+
+
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - ENXIO	No such device:	phy dead or ctl_addr == 0
+	  - EIO		I/O error:	reset timed out or hardware error
+
+	notes:
+
+	  - Execute a reset on the device as soon as the current IO
+	    operation has completed.
+
+	  - Executes an ATAPI soft reset if applicable, otherwise
+	    executes an ATA soft reset on the controller.
+
+
+
+HDIO_DRIVE_TASKFILE
+	execute raw taskfile
+
+
+	Note:
+		If you don't have a copy of the ANSI ATA specification
+		handy, you should probably ignore this ioctl.
+
+	- Execute an ATA disk command directly by writing the "taskfile"
+	  registers of the drive.  Requires ADMIN and RAWIO access
+	  privileges.
+
+	usage::
+
+	  struct {
+
+	    ide_task_request_t req_task;
+	    u8 outbuf[OUTPUT_SIZE];
+	    u8 inbuf[INPUT_SIZE];
+	  } task;
+	  memset(&task.req_task, 0, sizeof(task.req_task));
+	  task.req_task.out_size = sizeof(task.outbuf);
+	  task.req_task.in_size = sizeof(task.inbuf);
+	  ...
+	  ioctl(fd, HDIO_DRIVE_TASKFILE, &task);
+	  ...
+
+	inputs:
+
+	  (See below for details on memory area passed to ioctl.)
+
+	  ============	===================================================
+	  io_ports[8]	values to be written to taskfile registers
+	  hob_ports[8]	high-order bytes, for extended commands.
+	  out_flags	flags indicating which registers are valid
+	  in_flags	flags indicating which registers should be returned
+	  data_phase	see below
+	  req_cmd	command type to be executed
+	  out_size	size of output buffer
+	  outbuf	buffer of data to be transmitted to disk
+	  inbuf		buffer of data to be received from disk (see [1])
+	  ============	===================================================
+
+	outputs:
+
+	  ===========	====================================================
+	  io_ports[]	values returned in the taskfile registers
+	  hob_ports[]	high-order bytes, for extended commands.
+	  out_flags	flags indicating which registers are valid (see [2])
+	  in_flags	flags indicating which registers should be returned
+	  outbuf	buffer of data to be transmitted to disk (see [1])
+	  inbuf		buffer of data to be received from disk
+	  ===========	====================================================
+
+	error returns:
+	  - EACCES	CAP_SYS_ADMIN or CAP_SYS_RAWIO privilege not set.
+	  - ENOMSG	Device is not a disk drive.
+	  - ENOMEM	Unable to allocate memory for task
+	  - EFAULT	req_cmd == TASKFILE_IN_OUT (not implemented as of 2.6.8)
+	  - EPERM
+
+			req_cmd == TASKFILE_MULTI_OUT and drive
+			multi-count not yet set.
+	  - EIO		Drive failed the command.
+
+	notes:
+
+	  [1] READ THE FOLLOWING NOTES *CAREFULLY*.  THIS IOCTL IS
+	  FULL OF GOTCHAS.  Extreme caution should be used with using
+	  this ioctl.  A mistake can easily corrupt data or hang the
+	  system.
+
+	  [2] Both the input and output buffers are copied from the
+	  user and written back to the user, even when not used.
+
+	  [3] If one or more bits are set in out_flags and in_flags is
+	  zero, the following values are used for in_flags.all and
+	  written back into in_flags on completion.
+
+	   * IDE_TASKFILE_STD_IN_FLAGS | (IDE_HOB_STD_IN_FLAGS << 8)
+	     if LBA48 addressing is enabled for the drive
+	   * IDE_TASKFILE_STD_IN_FLAGS
+	     if CHS/LBA28
+
+	  The association between in_flags.all and each enable
+	  bitfield flips depending on endianness; fortunately, TASKFILE
+	  only uses inflags.b.data bit and ignores all other bits.
+	  The end result is that, on any endian machines, it has no
+	  effect other than modifying in_flags on completion.
+
+	  [4] The default value of SELECT is (0xa0|DEV_bit|LBA_bit)
+	  except for four drives per port chipsets.  For four drives
+	  per port chipsets, it's (0xa0|DEV_bit|LBA_bit) for the first
+	  pair and (0x80|DEV_bit|LBA_bit) for the second pair.
+
+	  [5] The argument to the ioctl is a pointer to a region of
+	  memory containing a ide_task_request_t structure, followed
+	  by an optional buffer of data to be transmitted to the
+	  drive, followed by an optional buffer to receive data from
+	  the drive.
+
+	  Command is passed to the disk drive via the ide_task_request_t
+	  structure, which contains these fields:
+
+	    ============	===============================================
+	    io_ports[8]		values for the taskfile registers
+	    hob_ports[8]	high-order bytes, for extended commands
+	    out_flags		flags indicating which entries in the
+				io_ports[] and hob_ports[] arrays
+				contain valid values.  Type ide_reg_valid_t.
+	    in_flags		flags indicating which entries in the
+				io_ports[] and hob_ports[] arrays
+				are expected to contain valid values
+				on return.
+	    data_phase		See below
+	    req_cmd		Command type, see below
+	    out_size		output (user->drive) buffer size, bytes
+	    in_size		input (drive->user) buffer size, bytes
+	    ============	===============================================
+
+	  When out_flags is zero, the following registers are loaded.
+
+	    ============	===============================================
+	    HOB_FEATURE		If the drive supports LBA48
+	    HOB_NSECTOR		If the drive supports LBA48
+	    HOB_SECTOR		If the drive supports LBA48
+	    HOB_LCYL		If the drive supports LBA48
+	    HOB_HCYL		If the drive supports LBA48
+	    FEATURE
+	    NSECTOR
+	    SECTOR
+	    LCYL
+	    HCYL
+	    SELECT		First, masked with 0xE0 if LBA48, 0xEF
+				otherwise; then, or'ed with the default
+				value of SELECT.
+	    ============	===============================================
+
+	  If any bit in out_flags is set, the following registers are loaded.
+
+	    ============	===============================================
+	    HOB_DATA		If out_flags.b.data is set.  HOB_DATA will
+				travel on DD8-DD15 on little endian machines
+				and on DD0-DD7 on big endian machines.
+	    DATA		If out_flags.b.data is set.  DATA will
+				travel on DD0-DD7 on little endian machines
+				and on DD8-DD15 on big endian machines.
+	    HOB_NSECTOR		If out_flags.b.nsector_hob is set
+	    HOB_SECTOR		If out_flags.b.sector_hob is set
+	    HOB_LCYL		If out_flags.b.lcyl_hob is set
+	    HOB_HCYL		If out_flags.b.hcyl_hob is set
+	    FEATURE		If out_flags.b.feature is set
+	    NSECTOR		If out_flags.b.nsector is set
+	    SECTOR		If out_flags.b.sector is set
+	    LCYL		If out_flags.b.lcyl is set
+	    HCYL		If out_flags.b.hcyl is set
+	    SELECT		Or'ed with the default value of SELECT and
+				loaded regardless of out_flags.b.select.
+	    ============	===============================================
+
+	  Taskfile registers are read back from the drive into
+	  {io|hob}_ports[] after the command completes iff one of the
+	  following conditions is met; otherwise, the original values
+	  will be written back, unchanged.
+
+	    1. The drive fails the command (EIO).
+	    2. One or more than one bits are set in out_flags.
+	    3. The requested data_phase is TASKFILE_NO_DATA.
+
+	    ============	===============================================
+	    HOB_DATA		If in_flags.b.data is set.  It will contain
+				DD8-DD15 on little endian machines and
+				DD0-DD7 on big endian machines.
+	    DATA		If in_flags.b.data is set.  It will contain
+				DD0-DD7 on little endian machines and
+				DD8-DD15 on big endian machines.
+	    HOB_FEATURE		If the drive supports LBA48
+	    HOB_NSECTOR		If the drive supports LBA48
+	    HOB_SECTOR		If the drive supports LBA48
+	    HOB_LCYL		If the drive supports LBA48
+	    HOB_HCYL		If the drive supports LBA48
+	    NSECTOR
+	    SECTOR
+	    LCYL
+	    HCYL
+	    ============	===============================================
+
+	  The data_phase field describes the data transfer to be
+	  performed.  Value is one of:
+
+	    ===================        ========================================
+	    TASKFILE_IN
+	    TASKFILE_MULTI_IN
+	    TASKFILE_OUT
+	    TASKFILE_MULTI_OUT
+	    TASKFILE_IN_OUT
+	    TASKFILE_IN_DMA
+	    TASKFILE_IN_DMAQ		== IN_DMA (queueing not supported)
+	    TASKFILE_OUT_DMA
+	    TASKFILE_OUT_DMAQ		== OUT_DMA (queueing not supported)
+	    TASKFILE_P_IN		unimplemented
+	    TASKFILE_P_IN_DMA		unimplemented
+	    TASKFILE_P_IN_DMAQ		unimplemented
+	    TASKFILE_P_OUT		unimplemented
+	    TASKFILE_P_OUT_DMA		unimplemented
+	    TASKFILE_P_OUT_DMAQ		unimplemented
+	    ===================        ========================================
+
+	  The req_cmd field classifies the command type.  It may be
+	  one of:
+
+	    ========================    =======================================
+	    IDE_DRIVE_TASK_NO_DATA
+	    IDE_DRIVE_TASK_SET_XFER	unimplemented
+	    IDE_DRIVE_TASK_IN
+	    IDE_DRIVE_TASK_OUT		unimplemented
+	    IDE_DRIVE_TASK_RAW_WRITE
+	    ========================    =======================================
+
+	  [6] Do not access {in|out}_flags->all except for resetting
+	  all the bits.  Always access individual bit fields.  ->all
+	  value will flip depending on endianness.  For the same
+	  reason, do not use IDE_{TASKFILE|HOB}_STD_{OUT|IN}_FLAGS
+	  constants defined in hdreg.h.
+
+
+
+HDIO_DRIVE_CMD
+	execute a special drive command
+
+
+	Note:  If you don't have a copy of the ANSI ATA specification
+	handy, you should probably ignore this ioctl.
+
+	usage::
+
+	  u8 args[4+XFER_SIZE];
+
+	  ...
+	  ioctl(fd, HDIO_DRIVE_CMD, args);
+
+	inputs:
+	    Commands other than WIN_SMART:
+
+	    =======     =======
+	    args[0]	COMMAND
+	    args[1]	NSECTOR
+	    args[2]	FEATURE
+	    args[3]	NSECTOR
+	    =======     =======
+
+	    WIN_SMART:
+
+	    =======     =======
+	    args[0]	COMMAND
+	    args[1]	SECTOR
+	    args[2]	FEATURE
+	    args[3]	NSECTOR
+	    =======     =======
+
+	outputs:
+		args[] buffer is filled with register values followed by any
+
+
+	  data returned by the disk.
+
+	    ========	====================================================
+	    args[0]	status
+	    args[1]	error
+	    args[2]	NSECTOR
+	    args[3]	undefined
+	    args[4+]	NSECTOR * 512 bytes of data returned by the command.
+	    ========	====================================================
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_RAWIO
+	  - ENOMEM	Unable to allocate memory for task
+	  - EIO		Drive reports error
+
+	notes:
+
+	  [1] For commands other than WIN_SMART, args[1] should equal
+	  args[3].  SECTOR, LCYL and HCYL are undefined.  For
+	  WIN_SMART, 0x4f and 0xc2 are loaded into LCYL and HCYL
+	  respectively.  In both cases SELECT will contain the default
+	  value for the drive.  Please refer to HDIO_DRIVE_TASKFILE
+	  notes for the default value of SELECT.
+
+	  [2] If NSECTOR value is greater than zero and the drive sets
+	  DRQ when interrupting for the command, NSECTOR * 512 bytes
+	  are read from the device into the area following NSECTOR.
+	  In the above example, the area would be
+	  args[4..4+XFER_SIZE].  16bit PIO is used regardless of
+	  HDIO_SET_32BIT setting.
+
+	  [3] If COMMAND == WIN_SETFEATURES && FEATURE == SETFEATURES_XFER
+	  && NSECTOR >= XFER_SW_DMA_0 && the drive supports any DMA
+	  mode, IDE driver will try to tune the transfer mode of the
+	  drive accordingly.
+
+
+
+HDIO_DRIVE_TASK
+	execute task and special drive command
+
+
+	Note:  If you don't have a copy of the ANSI ATA specification
+	handy, you should probably ignore this ioctl.
+
+	usage::
+
+	  u8 args[7];
+
+	  ...
+	  ioctl(fd, HDIO_DRIVE_TASK, args);
+
+	inputs:
+	    Taskfile register values:
+
+	    =======	=======
+	    args[0]	COMMAND
+	    args[1]	FEATURE
+	    args[2]	NSECTOR
+	    args[3]	SECTOR
+	    args[4]	LCYL
+	    args[5]	HCYL
+	    args[6]	SELECT
+	    =======	=======
+
+	outputs:
+	    Taskfile register values:
+
+
+	    =======	=======
+	    args[0]	status
+	    args[1]	error
+	    args[2]	NSECTOR
+	    args[3]	SECTOR
+	    args[4]	LCYL
+	    args[5]	HCYL
+	    args[6]	SELECT
+	    =======	=======
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_RAWIO
+	  - ENOMEM	Unable to allocate memory for task
+	  - ENOMSG	Device is not a disk drive.
+	  - EIO		Drive failed the command.
+
+	notes:
+
+	  [1] DEV bit (0x10) of SELECT register is ignored and the
+	  appropriate value for the drive is used.  All other bits
+	  are used unaltered.
+
+
+
+HDIO_DRIVE_CMD_AEB
+	HDIO_DRIVE_TASK
+
+
+	Not implemented, as of 2.6.8.1
+
+
+
+HDIO_SET_32BIT
+	change io_32bit flags
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, HDIO_SET_32BIT, val);
+
+	inputs:
+		New value for io_32bit flag
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 3]
+	  - EBUSY	Controller busy
+
+
+
+
+HDIO_SET_NOWERR
+	change ignore-write-error flag
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, HDIO_SET_NOWERR, val);
+
+	inputs:
+		New value for ignore-write-error flag.  Used for ignoring
+
+
+	  WRERR_STAT
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 1]
+	  - EBUSY		Controller busy
+
+
+
+HDIO_SET_DMA
+	change use-dma flag
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_SET_DMA, val);
+
+	inputs:
+		New value for use-dma flag
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 1]
+	  - EBUSY	Controller busy
+
+
+
+HDIO_SET_PIO_MODE
+	reconfig interface to new speed
+
+
+	usage::
+
+	  long val;
+
+	  ioctl(fd, HDIO_SET_PIO_MODE, val);
+
+	inputs:
+		New interface speed.
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 255]
+	  - EBUSY	Controller busy
+
+
+
+HDIO_SCAN_HWIF
+	register and (re)scan interface
+
+
+	usage::
+
+	  int args[3]
+
+	  ...
+	  ioctl(fd, HDIO_SCAN_HWIF, args);
+
+	inputs:
+
+	  =======	=========================
+	  args[0]	io address to probe
+
+
+	  args[1]	control address to probe
+	  args[2]	irq number
+	  =======	=========================
+
+	outputs:
+		none
+
+
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_RAWIO
+	  - EIO		Probe failed.
+
+	notes:
+		This ioctl initializes the addresses and irq for a disk
+		controller, probes for drives, and creates /proc/ide
+		interfaces as appropriate.
+
+
+
+HDIO_UNREGISTER_HWIF
+	unregister interface
+
+
+	usage::
+
+	  int index;
+
+	  ioctl(fd, HDIO_UNREGISTER_HWIF, index);
+
+	inputs:
+		index		index of hardware interface to unregister
+
+
+
+	outputs:
+		none
+
+
+
+	error returns:
+	  - EACCES	Access denied:  requires CAP_SYS_RAWIO
+
+	notes:
+		This ioctl removes a hardware interface from the kernel.
+
+		Currently (2.6.8) this ioctl silently fails if any drive on
+		the interface is busy.
+
+
+
+HDIO_SET_WCACHE
+	change write cache enable-disable
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, HDIO_SET_WCACHE, val);
+
+	inputs:
+		New value for write cache enable
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 1]
+	  - EBUSY	Controller busy
+
+
+
+HDIO_SET_ACOUSTIC
+	change acoustic behavior
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, HDIO_SET_ACOUSTIC, val);
+
+	inputs:
+		New value for drive acoustic settings
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 254]
+	  - EBUSY	Controller busy
+
+
+
+HDIO_SET_QDMA
+	change use-qdma flag
+
+
+	Not implemented, as of 2.6.8.1
+
+
+
+HDIO_SET_ADDRESS
+	change lba addressing modes
+
+
+	usage::
+
+	  int val;
+
+	  ioctl(fd, HDIO_SET_ADDRESS, val);
+
+	inputs:
+		New value for addressing mode
+
+	    =   ===================
+	    0   28-bit
+	    1   48-bit
+	    2   48-bit doing 28-bit
+	    =   ===================
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 2]
+	  - EBUSY		Controller busy
+	  - EIO		Drive does not support lba48 mode.
+
+
+HDIO_SET_IDE_SCSI
+	usage::
+
+
+	  long val;
+
+	  ioctl(fd, HDIO_SET_IDE_SCSI, val);
+
+	inputs:
+		New value for scsi emulation mode (?)
+
+
+
+	outputs:
+		none
+
+
+
+	error return:
+	  - EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
+	  - EACCES	Access denied:  requires CAP_SYS_ADMIN
+	  - EINVAL	value out of range [0 1]
+	  - EBUSY	Controller busy
+
+
+
+HDIO_SET_SCSI_IDE
+	Not implemented, as of 2.6.8.1
diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt
deleted file mode 100644
index 18eb98c44ffe..000000000000
--- a/Documentation/ioctl/hdio.txt
+++ /dev/null
@@ -1,1071 +0,0 @@
-		Summary of HDIO_ ioctl calls.
-		============================
-
-		Edward A. Falk <efalk@google.com>
-
-		November, 2004
-
-This document attempts to describe the ioctl(2) calls supported by
-the HD/IDE layer.  These are by-and-large implemented (as of Linux 2.6)
-in drivers/ide/ide.c and drivers/block/scsi_ioctl.c
-
-ioctl values are listed in <linux/hdreg.h>.  As of this writing, they
-are as follows:
-
-    ioctls that pass argument pointers to user space:
-
-	HDIO_GETGEO		get device geometry
-	HDIO_GET_UNMASKINTR	get current unmask setting
-	HDIO_GET_MULTCOUNT	get current IDE blockmode setting
-	HDIO_GET_QDMA		get use-qdma flag
-	HDIO_SET_XFER		set transfer rate via proc
-	HDIO_OBSOLETE_IDENTITY	OBSOLETE, DO NOT USE
-	HDIO_GET_KEEPSETTINGS	get keep-settings-on-reset flag
-	HDIO_GET_32BIT		get current io_32bit setting
-	HDIO_GET_NOWERR		get ignore-write-error flag
-	HDIO_GET_DMA		get use-dma flag
-	HDIO_GET_NICE		get nice flags
-	HDIO_GET_IDENTITY	get IDE identification info
-	HDIO_GET_WCACHE		get write cache mode on|off
-	HDIO_GET_ACOUSTIC	get acoustic value
-	HDIO_GET_ADDRESS	get sector addressing mode
-	HDIO_GET_BUSSTATE	get the bus state of the hwif
-	HDIO_TRISTATE_HWIF	execute a channel tristate
-	HDIO_DRIVE_RESET	execute a device reset
-	HDIO_DRIVE_TASKFILE	execute raw taskfile
-	HDIO_DRIVE_TASK		execute task and special drive command
-	HDIO_DRIVE_CMD		execute a special drive command
-	HDIO_DRIVE_CMD_AEB	HDIO_DRIVE_TASK
-
-    ioctls that pass non-pointer values:
-
-	HDIO_SET_MULTCOUNT	change IDE blockmode
-	HDIO_SET_UNMASKINTR	permit other irqs during I/O
-	HDIO_SET_KEEPSETTINGS	keep ioctl settings on reset
-	HDIO_SET_32BIT		change io_32bit flags
-	HDIO_SET_NOWERR		change ignore-write-error flag
-	HDIO_SET_DMA		change use-dma flag
-	HDIO_SET_PIO_MODE	reconfig interface to new speed
-	HDIO_SCAN_HWIF		register and (re)scan interface
-	HDIO_SET_NICE		set nice flags
-	HDIO_UNREGISTER_HWIF	unregister interface
-	HDIO_SET_WCACHE		change write cache enable-disable
-	HDIO_SET_ACOUSTIC	change acoustic behavior
-	HDIO_SET_BUSSTATE	set the bus state of the hwif
-	HDIO_SET_QDMA		change use-qdma flag
-	HDIO_SET_ADDRESS	change lba addressing modes
-
-	HDIO_SET_IDE_SCSI	Set scsi emulation mode on/off
-	HDIO_SET_SCSI_IDE	not implemented yet
-
-
-The information that follows was determined from reading kernel source
-code.  It is likely that some corrections will be made over time.
-
-
-
-
-
-
-
-General:
-
-	Unless otherwise specified, all ioctl calls return 0 on success
-	and -1 with errno set to an appropriate value on error.
-
-	Unless otherwise specified, all ioctl calls return -1 and set
-	errno to EFAULT on a failed attempt to copy data to or from user
-	address space.
-
-	Unless otherwise specified, all data structures and constants
-	are defined in <linux/hdreg.h>
-
-
-
-HDIO_GETGEO			get device geometry
-
-	usage:
-
-	  struct hd_geometry geom;
-	  ioctl(fd, HDIO_GETGEO, &geom);
-
-
-	inputs:		none
-
-	outputs:
-
-	  hd_geometry structure containing:
-
-	    heads	number of heads
-	    sectors	number of sectors/track
-	    cylinders	number of cylinders, mod 65536
-	    start	starting sector of this partition.
-
-
-	error returns:
-	  EINVAL	if the device is not a disk drive or floppy drive,
-	  		or if the user passes a null pointer
-
-
-	notes:
-
-	  Not particularly useful with modern disk drives, whose geometry
-	  is a polite fiction anyway.  Modern drives are addressed
-	  purely by sector number nowadays (lba addressing), and the
-	  drive geometry is an abstraction which is actually subject
-	  to change.  Currently (as of Nov 2004), the geometry values
-	  are the "bios" values -- presumably the values the drive had
-	  when Linux first booted.
-
-	  In addition, the cylinders field of the hd_geometry is an
-	  unsigned short, meaning that on most architectures, this
-	  ioctl will not return a meaningful value on drives with more
-	  than 65535 tracks.
-
-	  The start field is unsigned long, meaning that it will not
-	  contain a meaningful value for disks over 219 Gb in size.
-
-
-
-
-HDIO_GET_UNMASKINTR		get current unmask setting
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_UNMASKINTR, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the drive's current unmask setting
-
-
-
-HDIO_SET_UNMASKINTR		permit other irqs during I/O
-
-	usage:
-
-	  unsigned long val;
-	  ioctl(fd, HDIO_SET_UNMASKINTR, val);
-
-	inputs:
-	  New value for unmask flag
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 1]
-	  EBUSY		Controller busy
-
-
-
-
-HDIO_GET_MULTCOUNT		get current IDE blockmode setting
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_MULTCOUNT, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current IDE block mode setting.  This
-	  controls how many sectors the drive will transfer per
-	  interrupt.
-
-
-
-HDIO_SET_MULTCOUNT		change IDE blockmode
-
-	usage:
-
-	  int val;
-	  ioctl(fd, HDIO_SET_MULTCOUNT, val);
-
-	inputs:
-	  New value for IDE block mode setting.  This controls how many
-	  sectors the drive will transfer per interrupt.
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range supported by disk.
-	  EBUSY		Controller busy or blockmode already set.
-	  EIO		Drive did not accept new block mode.
-
-	notes:
-
-	  Source code comments read:
-
-	    This is tightly woven into the driver->do_special cannot
-	    touch.  DON'T do it again until a total personality rewrite
-	    is committed.
-
-	  If blockmode has already been set, this ioctl will fail with
-	  EBUSY
-
-
-
-HDIO_GET_QDMA			get use-qdma flag
-
-	Not implemented, as of 2.6.8.1
-
-
-
-HDIO_SET_XFER			set transfer rate via proc
-
-	Not implemented, as of 2.6.8.1
-
-
-
-HDIO_OBSOLETE_IDENTITY		OBSOLETE, DO NOT USE
-
-	Same as HDIO_GET_IDENTITY (see below), except that it only
-	returns the first 142 bytes of drive identity information.
-
-
-
-HDIO_GET_IDENTITY		get IDE identification info
-
-	usage:
-
-	  unsigned char identity[512];
-	  ioctl(fd, HDIO_GET_IDENTITY, identity);
-
-	inputs:		none
-
-	outputs:
-
-	  ATA drive identity information.  For full description, see
-	  the IDENTIFY DEVICE and IDENTIFY PACKET DEVICE commands in
-	  the ATA specification.
-
-	error returns:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  ENOMSG	IDENTIFY DEVICE information not available
-
-	notes:
-
-	  Returns information that was obtained when the drive was
-	  probed.  Some of this information is subject to change, and
-	  this ioctl does not re-probe the drive to update the
-	  information.
-
-	  This information is also available from /proc/ide/hdX/identify
-
-
-
-HDIO_GET_KEEPSETTINGS		get keep-settings-on-reset flag
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_KEEPSETTINGS, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current "keep settings" flag
-
-	notes:
-
-	  When set, indicates that kernel should restore settings
-	  after a drive reset.
-
-
-
-HDIO_SET_KEEPSETTINGS		keep ioctl settings on reset
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_SET_KEEPSETTINGS, val);
-
-	inputs:
-	  New value for keep_settings flag
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 1]
-	  EBUSY		Controller busy
-
-
-
-HDIO_GET_32BIT			get current io_32bit setting
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_32BIT, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current io_32bit setting
-
-	notes:
-
-	  0=16-bit, 1=32-bit, 2,3 = 32bit+sync
-
-
-
-HDIO_GET_NOWERR			get ignore-write-error flag
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_NOWERR, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current ignore-write-error flag
-
-
-
-HDIO_GET_DMA			get use-dma flag
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_DMA, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current use-dma flag
-
-
-
-HDIO_GET_NICE			get nice flags
-
-	usage:
-
-	  long nice;
-	  ioctl(fd, HDIO_GET_NICE, &nice);
-
-	inputs:		none
-
-	outputs:
-
-	  The drive's "nice" values.
-
-	notes:
-
-	  Per-drive flags which determine when the system will give more
-	  bandwidth to other devices sharing the same IDE bus.
-	  See <linux/hdreg.h>, near symbol IDE_NICE_DSC_OVERLAP.
-
-
-
-
-HDIO_SET_NICE			set nice flags
-
-	usage:
-
-	  unsigned long nice;
-	  ...
-	  ioctl(fd, HDIO_SET_NICE, nice);
-
-	inputs:
-	  bitmask of nice flags.
-
-	outputs:	none
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EPERM		Flags other than DSC_OVERLAP and NICE_1 set.
-	  EPERM		DSC_OVERLAP specified but not supported by drive
-
-	notes:
-
-	  This ioctl sets the DSC_OVERLAP and NICE_1 flags from values
-	  provided by the user.
-
-	  Nice flags are listed in <linux/hdreg.h>, starting with
-	  IDE_NICE_DSC_OVERLAP.  These values represent shifts.
-
-
-
-
-
-HDIO_GET_WCACHE			get write cache mode on|off
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_WCACHE, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current write cache mode
-
-
-
-HDIO_GET_ACOUSTIC		get acoustic value
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_ACOUSTIC, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current acoustic settings
-
-	notes:
-
-	  See HDIO_SET_ACOUSTIC
-
-
-
-HDIO_GET_ADDRESS
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_GET_ADDRESS, &val);
-
-	inputs:		none
-
-	outputs:
-	  The value of the current addressing mode:
-	    0 = 28-bit
-	    1 = 48-bit
-	    2 = 48-bit doing 28-bit
-	    3 = 64-bit
-
-
-
-HDIO_GET_BUSSTATE		get the bus state of the hwif
-
-	usage:
-
-	  long state;
-	  ioctl(fd, HDIO_SCAN_HWIF, &state);
-
-	inputs:		none
-
-	outputs:
-	  Current power state of the IDE bus.  One of BUSSTATE_OFF,
-	  BUSSTATE_ON, or BUSSTATE_TRISTATE
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-
-
-
-
-HDIO_SET_BUSSTATE		set the bus state of the hwif
-
-	usage:
-
-	  int state;
-	  ...
-	  ioctl(fd, HDIO_SCAN_HWIF, state);
-
-	inputs:
-	  Desired IDE power state.  One of BUSSTATE_OFF, BUSSTATE_ON,
-	  or BUSSTATE_TRISTATE
-
-	outputs:	none
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_RAWIO
-	  EOPNOTSUPP	Hardware interface does not support bus power control
-
-
-
-
-HDIO_TRISTATE_HWIF		execute a channel tristate
-
-	Not implemented, as of 2.6.8.1.  See HDIO_SET_BUSSTATE
-
-
-
-HDIO_DRIVE_RESET		execute a device reset
-
-	usage:
-
-	  int args[3]
-	  ...
-	  ioctl(fd, HDIO_DRIVE_RESET, args);
-
-	inputs:		none
-
-	outputs:	none
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  ENXIO		No such device:	phy dead or ctl_addr == 0
-	  EIO		I/O error:	reset timed out or hardware error
-
-	notes:
-
-	  Execute a reset on the device as soon as the current IO
-	  operation has completed.
-
-	  Executes an ATAPI soft reset if applicable, otherwise
-	  executes an ATA soft reset on the controller.
-
-
-
-HDIO_DRIVE_TASKFILE		execute raw taskfile
-
-	Note:  If you don't have a copy of the ANSI ATA specification
-	handy, you should probably ignore this ioctl.
-
-	Execute an ATA disk command directly by writing the "taskfile"
-	registers of the drive.  Requires ADMIN and RAWIO access
-	privileges.
-
-	usage:
-
-	  struct {
-	    ide_task_request_t req_task;
-	    u8 outbuf[OUTPUT_SIZE];
-	    u8 inbuf[INPUT_SIZE];
-	  } task;
-	  memset(&task.req_task, 0, sizeof(task.req_task));
-	  task.req_task.out_size = sizeof(task.outbuf);
-	  task.req_task.in_size = sizeof(task.inbuf);
-	  ...
-	  ioctl(fd, HDIO_DRIVE_TASKFILE, &task);
-	  ...
-
-	inputs:
-
-	  (See below for details on memory area passed to ioctl.)
-
-	  io_ports[8]	values to be written to taskfile registers
-	  hob_ports[8]	high-order bytes, for extended commands.
-	  out_flags	flags indicating which registers are valid
-	  in_flags	flags indicating which registers should be returned
-	  data_phase	see below
-	  req_cmd	command type to be executed
-	  out_size	size of output buffer
-	  outbuf	buffer of data to be transmitted to disk
-	  inbuf		buffer of data to be received from disk (see [1])
-
-	outputs:
-
-	  io_ports[]	values returned in the taskfile registers
-	  hob_ports[]	high-order bytes, for extended commands.
-	  out_flags	flags indicating which registers are valid (see [2])
-	  in_flags	flags indicating which registers should be returned
-	  outbuf	buffer of data to be transmitted to disk (see [1])
-	  inbuf		buffer of data to be received from disk
-
-	error returns:
-	  EACCES	CAP_SYS_ADMIN or CAP_SYS_RAWIO privilege not set.
-	  ENOMSG	Device is not a disk drive.
-	  ENOMEM	Unable to allocate memory for task
-	  EFAULT	req_cmd == TASKFILE_IN_OUT (not implemented as of 2.6.8)
-	  EPERM		req_cmd == TASKFILE_MULTI_OUT and drive
-	  		multi-count not yet set.
-	  EIO		Drive failed the command.
-
-	notes:
-
-	  [1] READ THE FOLLOWING NOTES *CAREFULLY*.  THIS IOCTL IS
-	  FULL OF GOTCHAS.  Extreme caution should be used with using
-	  this ioctl.  A mistake can easily corrupt data or hang the
-	  system.
-
-	  [2] Both the input and output buffers are copied from the
-	  user and written back to the user, even when not used.
-
-	  [3] If one or more bits are set in out_flags and in_flags is
-	  zero, the following values are used for in_flags.all and
-	  written back into in_flags on completion.
-
-	   * IDE_TASKFILE_STD_IN_FLAGS | (IDE_HOB_STD_IN_FLAGS << 8)
-	     if LBA48 addressing is enabled for the drive
-	   * IDE_TASKFILE_STD_IN_FLAGS
-	     if CHS/LBA28
-
-	  The association between in_flags.all and each enable
-	  bitfield flips depending on endianness; fortunately, TASKFILE
-	  only uses inflags.b.data bit and ignores all other bits.
-	  The end result is that, on any endian machines, it has no
-	  effect other than modifying in_flags on completion.
-
-	  [4] The default value of SELECT is (0xa0|DEV_bit|LBA_bit)
-	  except for four drives per port chipsets.  For four drives
-	  per port chipsets, it's (0xa0|DEV_bit|LBA_bit) for the first
-	  pair and (0x80|DEV_bit|LBA_bit) for the second pair.
-
-	  [5] The argument to the ioctl is a pointer to a region of
-	  memory containing a ide_task_request_t structure, followed
-	  by an optional buffer of data to be transmitted to the
-	  drive, followed by an optional buffer to receive data from
-	  the drive.
-
-	  Command is passed to the disk drive via the ide_task_request_t
-	  structure, which contains these fields:
-
-	    io_ports[8]		values for the taskfile registers
-	    hob_ports[8]	high-order bytes, for extended commands
-	    out_flags		flags indicating which entries in the
-	    			io_ports[] and hob_ports[] arrays
-				contain valid values.  Type ide_reg_valid_t.
-	    in_flags		flags indicating which entries in the
-	    			io_ports[] and hob_ports[] arrays
-				are expected to contain valid values
-				on return.
-	    data_phase		See below
-	    req_cmd		Command type, see below
-	    out_size		output (user->drive) buffer size, bytes
-	    in_size		input (drive->user) buffer size, bytes
-
-	  When out_flags is zero, the following registers are loaded.
-
-	    HOB_FEATURE		If the drive supports LBA48
-	    HOB_NSECTOR		If the drive supports LBA48
-	    HOB_SECTOR		If the drive supports LBA48
-	    HOB_LCYL		If the drive supports LBA48
-	    HOB_HCYL		If the drive supports LBA48
-	    FEATURE
-	    NSECTOR
-	    SECTOR
-	    LCYL
-	    HCYL
-	    SELECT		First, masked with 0xE0 if LBA48, 0xEF
-				otherwise; then, or'ed with the default
-				value of SELECT.
-
-	  If any bit in out_flags is set, the following registers are loaded.
-
-	    HOB_DATA		If out_flags.b.data is set.  HOB_DATA will
-				travel on DD8-DD15 on little endian machines
-				and on DD0-DD7 on big endian machines.
-	    DATA		If out_flags.b.data is set.  DATA will
-				travel on DD0-DD7 on little endian machines
-				and on DD8-DD15 on big endian machines.
-	    HOB_NSECTOR		If out_flags.b.nsector_hob is set
-	    HOB_SECTOR		If out_flags.b.sector_hob is set
-	    HOB_LCYL		If out_flags.b.lcyl_hob is set
-	    HOB_HCYL		If out_flags.b.hcyl_hob is set
-	    FEATURE		If out_flags.b.feature is set
-	    NSECTOR		If out_flags.b.nsector is set
-	    SECTOR		If out_flags.b.sector is set
-	    LCYL		If out_flags.b.lcyl is set
-	    HCYL		If out_flags.b.hcyl is set
-	    SELECT		Or'ed with the default value of SELECT and
-				loaded regardless of out_flags.b.select.
-
-	  Taskfile registers are read back from the drive into
-	  {io|hob}_ports[] after the command completes iff one of the
-	  following conditions is met; otherwise, the original values
-	  will be written back, unchanged.
-
-	    1. The drive fails the command (EIO).
-	    2. One or more than one bits are set in out_flags.
-	    3. The requested data_phase is TASKFILE_NO_DATA.
-
-	    HOB_DATA		If in_flags.b.data is set.  It will contain
-				DD8-DD15 on little endian machines and
-				DD0-DD7 on big endian machines.
-	    DATA		If in_flags.b.data is set.  It will contain
-				DD0-DD7 on little endian machines and
-				DD8-DD15 on big endian machines.
-	    HOB_FEATURE		If the drive supports LBA48
-	    HOB_NSECTOR		If the drive supports LBA48
-	    HOB_SECTOR		If the drive supports LBA48
-	    HOB_LCYL		If the drive supports LBA48
-	    HOB_HCYL		If the drive supports LBA48
-	    NSECTOR
-	    SECTOR
-	    LCYL
-	    HCYL
-
-	  The data_phase field describes the data transfer to be
-	  performed.  Value is one of:
-
-	    TASKFILE_IN
-	    TASKFILE_MULTI_IN
-	    TASKFILE_OUT
-	    TASKFILE_MULTI_OUT
-	    TASKFILE_IN_OUT
-	    TASKFILE_IN_DMA
-	    TASKFILE_IN_DMAQ		== IN_DMA (queueing not supported)
-	    TASKFILE_OUT_DMA
-	    TASKFILE_OUT_DMAQ		== OUT_DMA (queueing not supported)
-	    TASKFILE_P_IN		unimplemented
-	    TASKFILE_P_IN_DMA		unimplemented
-	    TASKFILE_P_IN_DMAQ		unimplemented
-	    TASKFILE_P_OUT		unimplemented
-	    TASKFILE_P_OUT_DMA		unimplemented
-	    TASKFILE_P_OUT_DMAQ		unimplemented
-
-	  The req_cmd field classifies the command type.  It may be
-	  one of:
-
-	    IDE_DRIVE_TASK_NO_DATA
-	    IDE_DRIVE_TASK_SET_XFER	unimplemented
-	    IDE_DRIVE_TASK_IN
-	    IDE_DRIVE_TASK_OUT		unimplemented
-	    IDE_DRIVE_TASK_RAW_WRITE
-
-	  [6] Do not access {in|out}_flags->all except for resetting
-	  all the bits.  Always access individual bit fields.  ->all
-	  value will flip depending on endianness.  For the same
-	  reason, do not use IDE_{TASKFILE|HOB}_STD_{OUT|IN}_FLAGS
-	  constants defined in hdreg.h.
-
-
-
-HDIO_DRIVE_CMD			execute a special drive command
-
-	Note:  If you don't have a copy of the ANSI ATA specification
-	handy, you should probably ignore this ioctl.
-
-	usage:
-
-	  u8 args[4+XFER_SIZE];
-	  ...
-	  ioctl(fd, HDIO_DRIVE_CMD, args);
-
-	inputs:
-
-	  Commands other than WIN_SMART
-	    args[0]	COMMAND
-	    args[1]	NSECTOR
-	    args[2]	FEATURE
-	    args[3]	NSECTOR
-
-	  WIN_SMART
-	    args[0]	COMMAND
-	    args[1]	SECTOR
-	    args[2]	FEATURE
-	    args[3]	NSECTOR
-
-	outputs:
-
-	  args[] buffer is filled with register values followed by any
-	  data returned by the disk.
-	    args[0]	status
-	    args[1]	error
-	    args[2]	NSECTOR
-	    args[3]	undefined
-	    args[4+]	NSECTOR * 512 bytes of data returned by the command.
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_RAWIO
-	  ENOMEM	Unable to allocate memory for task
-	  EIO		Drive reports error
-
-	notes:
-
-	  [1] For commands other than WIN_SMART, args[1] should equal
-	  args[3].  SECTOR, LCYL and HCYL are undefined.  For
-	  WIN_SMART, 0x4f and 0xc2 are loaded into LCYL and HCYL
-	  respectively.  In both cases SELECT will contain the default
-	  value for the drive.  Please refer to HDIO_DRIVE_TASKFILE
-	  notes for the default value of SELECT.
-
-	  [2] If NSECTOR value is greater than zero and the drive sets
-	  DRQ when interrupting for the command, NSECTOR * 512 bytes
-	  are read from the device into the area following NSECTOR.
-	  In the above example, the area would be
-	  args[4..4+XFER_SIZE].  16bit PIO is used regardless of
-	  HDIO_SET_32BIT setting.
-
-	  [3] If COMMAND == WIN_SETFEATURES && FEATURE == SETFEATURES_XFER
-	  && NSECTOR >= XFER_SW_DMA_0 && the drive supports any DMA
-	  mode, IDE driver will try to tune the transfer mode of the
-	  drive accordingly.
-
-
-
-HDIO_DRIVE_TASK			execute task and special drive command
-
-	Note:  If you don't have a copy of the ANSI ATA specification
-	handy, you should probably ignore this ioctl.
-
-	usage:
-
-	  u8 args[7];
-	  ...
-	  ioctl(fd, HDIO_DRIVE_TASK, args);
-
-	inputs:
-
-	  Taskfile register values:
-	    args[0]	COMMAND
-	    args[1]	FEATURE
-	    args[2]	NSECTOR
-	    args[3]	SECTOR
-	    args[4]	LCYL
-	    args[5]	HCYL
-	    args[6]	SELECT
-
-	outputs:
-
-	  Taskfile register values:
-	    args[0]	status
-	    args[1]	error
-	    args[2]	NSECTOR
-	    args[3]	SECTOR
-	    args[4]	LCYL
-	    args[5]	HCYL
-	    args[6]	SELECT
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_RAWIO
-	  ENOMEM	Unable to allocate memory for task
-	  ENOMSG	Device is not a disk drive.
-	  EIO		Drive failed the command.
-
-	notes:
-
-	  [1] DEV bit (0x10) of SELECT register is ignored and the
-	  appropriate value for the drive is used.  All other bits
-	  are used unaltered.
-
-
-
-HDIO_DRIVE_CMD_AEB		HDIO_DRIVE_TASK
-
-	Not implemented, as of 2.6.8.1
-
-
-
-HDIO_SET_32BIT			change io_32bit flags
-
-	usage:
-
-	  int val;
-	  ioctl(fd, HDIO_SET_32BIT, val);
-
-	inputs:
-	  New value for io_32bit flag
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 3]
-	  EBUSY		Controller busy
-
-
-
-
-HDIO_SET_NOWERR			change ignore-write-error flag
-
-	usage:
-
-	  int val;
-	  ioctl(fd, HDIO_SET_NOWERR, val);
-
-	inputs:
-	  New value for ignore-write-error flag.  Used for ignoring
-	  WRERR_STAT
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 1]
-	  EBUSY		Controller busy
-
-
-
-HDIO_SET_DMA			change use-dma flag
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_SET_DMA, val);
-
-	inputs:
-	  New value for use-dma flag
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 1]
-	  EBUSY		Controller busy
-
-
-
-HDIO_SET_PIO_MODE		reconfig interface to new speed
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_SET_PIO_MODE, val);
-
-	inputs:
-	  New interface speed.
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 255]
-	  EBUSY		Controller busy
-
-
-
-HDIO_SCAN_HWIF			register and (re)scan interface
-
-	usage:
-
-	  int args[3]
-	  ...
-	  ioctl(fd, HDIO_SCAN_HWIF, args);
-
-	inputs:
-	  args[0]	io address to probe
-	  args[1]	control address to probe
-	  args[2]	irq number
-
-	outputs:	none
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_RAWIO
-	  EIO		Probe failed.
-
-	notes:
-
-	  This ioctl initializes the addresses and irq for a disk
-	  controller, probes for drives, and creates /proc/ide
-	  interfaces as appropriate.
-
-
-
-HDIO_UNREGISTER_HWIF		unregister interface
-
-	usage:
-
-	  int index;
-	  ioctl(fd, HDIO_UNREGISTER_HWIF, index);
-
-	inputs:
-	  index		index of hardware interface to unregister
-
-	outputs:	none
-
-	error returns:
-	  EACCES	Access denied:  requires CAP_SYS_RAWIO
-
-	notes:
-
-	  This ioctl removes a hardware interface from the kernel.
-
-	  Currently (2.6.8) this ioctl silently fails if any drive on
-	  the interface is busy.
-
-
-
-HDIO_SET_WCACHE			change write cache enable-disable
-
-	usage:
-
-	  int val;
-	  ioctl(fd, HDIO_SET_WCACHE, val);
-
-	inputs:
-	  New value for write cache enable
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 1]
-	  EBUSY		Controller busy
-
-
-
-HDIO_SET_ACOUSTIC		change acoustic behavior
-
-	usage:
-
-	  int val;
-	  ioctl(fd, HDIO_SET_ACOUSTIC, val);
-
-	inputs:
-	  New value for drive acoustic settings
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 254]
-	  EBUSY		Controller busy
-
-
-
-HDIO_SET_QDMA			change use-qdma flag
-
-	Not implemented, as of 2.6.8.1
-
-
-
-HDIO_SET_ADDRESS		change lba addressing modes
-
-	usage:
-
-	  int val;
-	  ioctl(fd, HDIO_SET_ADDRESS, val);
-
-	inputs:
-	  New value for addressing mode
-	    0 = 28-bit
-	    1 = 48-bit
-	    2 = 48-bit doing 28-bit
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 2]
-	  EBUSY		Controller busy
-	  EIO		Drive does not support lba48 mode.
-
-
-HDIO_SET_IDE_SCSI
-
-	usage:
-
-	  long val;
-	  ioctl(fd, HDIO_SET_IDE_SCSI, val);
-
-	inputs:
-	  New value for scsi emulation mode (?)
-
-	outputs:	none
-
-	error return:
-	  EINVAL	(bdev != bdev->bd_contains) (not sure what this means)
-	  EACCES	Access denied:  requires CAP_SYS_ADMIN
-	  EINVAL	value out of range [0 1]
-	  EBUSY		Controller busy
-
-
-
-HDIO_SET_SCSI_IDE
-
-	Not implemented, as of 2.6.8.1
-
-
diff --git a/Documentation/ioctl/index.rst b/Documentation/ioctl/index.rst
new file mode 100644
index 000000000000..1a6f437566e3
--- /dev/null
+++ b/Documentation/ioctl/index.rst
@@ -0,0 +1,16 @@
+:orphan:
+
+======
+IOCTLs
+======
+
+.. toctree::
+   :maxdepth: 1
+
+   ioctl-number
+
+   botching-up-ioctls
+   ioctl-decoding
+
+   cdrom
+   hdio
diff --git a/Documentation/ioctl/ioctl-decoding.rst b/Documentation/ioctl/ioctl-decoding.rst
new file mode 100644
index 000000000000..380d6bb3e3ea
--- /dev/null
+++ b/Documentation/ioctl/ioctl-decoding.rst
@@ -0,0 +1,31 @@
+==============================
+Decoding an IOCTL Magic Number
+==============================
+
+To decode a hex IOCTL code:
+
+Most architectures use this generic format, but check
+include/ARCH/ioctl.h for specifics, e.g. powerpc
+uses 3 bits to encode read/write and 13 bits for size.
+
+ ====== ==================================
+ bits   meaning
+ ====== ==================================
+ 31-30	00 - no parameters: uses _IO macro
+	10 - read: _IOR
+	01 - write: _IOW
+	11 - read/write: _IOWR
+
+ 29-16	size of arguments
+
+ 15-8	ascii character supposedly
+	unique to each driver
+
+ 7-0	function #
+ ====== ==================================
+
+
+So for example 0x82187201 is a read with arg length of 0x218,
+character 'r' function 1. Grepping the source reveals this is::
+
+	#define VFAT_IOCTL_READDIR_BOTH         _IOR('r', 1, struct dirent [2])
diff --git a/Documentation/ioctl/ioctl-decoding.txt b/Documentation/ioctl/ioctl-decoding.txt
deleted file mode 100644
index e35efb0cec2e..000000000000
--- a/Documentation/ioctl/ioctl-decoding.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-To decode a hex IOCTL code:
-
-Most architectures use this generic format, but check
-include/ARCH/ioctl.h for specifics, e.g. powerpc
-uses 3 bits to encode read/write and 13 bits for size.
-
- bits    meaning
- 31-30	00 - no parameters: uses _IO macro
-	10 - read: _IOR
-	01 - write: _IOW
-	11 - read/write: _IOWR
-
- 29-16	size of arguments
-
- 15-8	ascii character supposedly
-	unique to each driver
-
- 7-0	function #
-
-
-So for example 0x82187201 is a read with arg length of 0x218,
-character 'r' function 1. Grepping the source reveals this is:
-
-#define VFAT_IOCTL_READDIR_BOTH         _IOR('r', 1, struct dirent [2])
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 2263e3ddd822..42fb0d6d6034 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -730,7 +730,7 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  *     };
  *
  * Please make sure that you follow all the best practices from
- * ``Documentation/ioctl/botching-up-ioctls.txt``. Note that drm_ioctl()
+ * ``Documentation/ioctl/botching-up-ioctls.rst``. Note that drm_ioctl()
  * automatically zero-extends structures, hence make sure you can add more stuff
  * at the end, i.e. don't put a variable sized array there.
  *
-- 
cgit v1.2.3-55-g7522


From e0ae154404c33477473244f286b1193364144289 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 16:49:39 -0300
Subject: docs: rapidio: convert to ReST

Rename the rapidio documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/rapidio/index.rst      |  15 ++
 Documentation/rapidio/mport_cdev.rst | 110 +++++++++++
 Documentation/rapidio/mport_cdev.txt | 107 -----------
 Documentation/rapidio/rapidio.rst    | 362 +++++++++++++++++++++++++++++++++++
 Documentation/rapidio/rapidio.txt    | 351 ---------------------------------
 Documentation/rapidio/rio_cm.rst     | 135 +++++++++++++
 Documentation/rapidio/rio_cm.txt     | 119 ------------
 Documentation/rapidio/sysfs.rst      |   7 +
 Documentation/rapidio/sysfs.txt      |   3 -
 Documentation/rapidio/tsi721.rst     | 112 +++++++++++
 Documentation/rapidio/tsi721.txt     |  97 ----------
 drivers/rapidio/Kconfig              |   2 +-
 12 files changed, 742 insertions(+), 678 deletions(-)
 create mode 100644 Documentation/rapidio/index.rst
 create mode 100644 Documentation/rapidio/mport_cdev.rst
 delete mode 100644 Documentation/rapidio/mport_cdev.txt
 create mode 100644 Documentation/rapidio/rapidio.rst
 delete mode 100644 Documentation/rapidio/rapidio.txt
 create mode 100644 Documentation/rapidio/rio_cm.rst
 delete mode 100644 Documentation/rapidio/rio_cm.txt
 create mode 100644 Documentation/rapidio/sysfs.rst
 delete mode 100644 Documentation/rapidio/sysfs.txt
 create mode 100644 Documentation/rapidio/tsi721.rst
 delete mode 100644 Documentation/rapidio/tsi721.txt

diff --git a/Documentation/rapidio/index.rst b/Documentation/rapidio/index.rst
new file mode 100644
index 000000000000..ab7b5541b346
--- /dev/null
+++ b/Documentation/rapidio/index.rst
@@ -0,0 +1,15 @@
+:orphan:
+
+===========================
+The Linux RapidIO Subsystem
+===========================
+
+.. toctree::
+   :maxdepth: 1
+
+   rapidio
+   sysfs
+
+   tsi721
+   mport_cdev
+   rio_cm
diff --git a/Documentation/rapidio/mport_cdev.rst b/Documentation/rapidio/mport_cdev.rst
new file mode 100644
index 000000000000..df77a7f7be7d
--- /dev/null
+++ b/Documentation/rapidio/mport_cdev.rst
@@ -0,0 +1,110 @@
+==================================================================
+RapidIO subsystem mport character device driver (rio_mport_cdev.c)
+==================================================================
+
+1. Overview
+===========
+
+This device driver is the result of collaboration within the RapidIO.org
+Software Task Group (STG) between Texas Instruments, Freescale,
+Prodrive Technologies, Nokia Networks, BAE and IDT.  Additional input was
+received from other members of RapidIO.org. The objective was to create a
+character mode driver interface which exposes the capabilities of RapidIO
+devices directly to applications, in a manner that allows the numerous and
+varied RapidIO implementations to interoperate.
+
+This driver (MPORT_CDEV) provides access to basic RapidIO subsystem operations
+for user-space applications. Most of RapidIO operations are supported through
+'ioctl' system calls.
+
+When loaded this device driver creates filesystem nodes named rio_mportX in /dev
+directory for each registered RapidIO mport device. 'X' in the node name matches
+to unique port ID assigned to each local mport device.
+
+Using available set of ioctl commands user-space applications can perform
+following RapidIO bus and subsystem operations:
+
+- Reads and writes from/to configuration registers of mport devices
+  (RIO_MPORT_MAINT_READ_LOCAL/RIO_MPORT_MAINT_WRITE_LOCAL)
+- Reads and writes from/to configuration registers of remote RapidIO devices.
+  This operations are defined as RapidIO Maintenance reads/writes in RIO spec.
+  (RIO_MPORT_MAINT_READ_REMOTE/RIO_MPORT_MAINT_WRITE_REMOTE)
+- Set RapidIO Destination ID for mport devices (RIO_MPORT_MAINT_HDID_SET)
+- Set RapidIO Component Tag for mport devices (RIO_MPORT_MAINT_COMPTAG_SET)
+- Query logical index of mport devices (RIO_MPORT_MAINT_PORT_IDX_GET)
+- Query capabilities and RapidIO link configuration of mport devices
+  (RIO_MPORT_GET_PROPERTIES)
+- Enable/Disable reporting of RapidIO doorbell events to user-space applications
+  (RIO_ENABLE_DOORBELL_RANGE/RIO_DISABLE_DOORBELL_RANGE)
+- Enable/Disable reporting of RIO port-write events to user-space applications
+  (RIO_ENABLE_PORTWRITE_RANGE/RIO_DISABLE_PORTWRITE_RANGE)
+- Query/Control type of events reported through this driver: doorbells,
+  port-writes or both (RIO_SET_EVENT_MASK/RIO_GET_EVENT_MASK)
+- Configure/Map mport's outbound requests window(s) for specific size,
+  RapidIO destination ID, hopcount and request type
+  (RIO_MAP_OUTBOUND/RIO_UNMAP_OUTBOUND)
+- Configure/Map mport's inbound requests window(s) for specific size,
+  RapidIO base address and local memory base address
+  (RIO_MAP_INBOUND/RIO_UNMAP_INBOUND)
+- Allocate/Free contiguous DMA coherent memory buffer for DMA data transfers
+  to/from remote RapidIO devices (RIO_ALLOC_DMA/RIO_FREE_DMA)
+- Initiate DMA data transfers to/from remote RapidIO devices (RIO_TRANSFER).
+  Supports blocking, asynchronous and posted (a.k.a 'fire-and-forget') data
+  transfer modes.
+- Check/Wait for completion of asynchronous DMA data transfer
+  (RIO_WAIT_FOR_ASYNC)
+- Manage device objects supported by RapidIO subsystem (RIO_DEV_ADD/RIO_DEV_DEL).
+  This allows implementation of various RapidIO fabric enumeration algorithms
+  as user-space applications while using remaining functionality provided by
+  kernel RapidIO subsystem.
+
+2. Hardware Compatibility
+=========================
+
+This device driver uses standard interfaces defined by kernel RapidIO subsystem
+and therefore it can be used with any mport device driver registered by RapidIO
+subsystem with limitations set by available mport implementation.
+
+At this moment the most common limitation is availability of RapidIO-specific
+DMA engine framework for specific mport device. Users should verify available
+functionality of their platform when planning to use this driver:
+
+- IDT Tsi721 PCIe-to-RapidIO bridge device and its mport device driver are fully
+  compatible with this driver.
+- Freescale SoCs 'fsl_rio' mport driver does not have implementation for RapidIO
+  specific DMA engine support and therefore DMA data transfers mport_cdev driver
+  are not available.
+
+3. Module parameters
+====================
+
+- 'dma_timeout'
+      - DMA transfer completion timeout (in msec, default value 3000).
+        This parameter set a maximum completion wait time for SYNC mode DMA
+        transfer requests and for RIO_WAIT_FOR_ASYNC ioctl requests.
+
+- 'dbg_level'
+      - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        bit masks that correspond to the specific functional blocks.
+        For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+4. Known problems
+=================
+
+  None.
+
+5. User-space Applications and API
+==================================
+
+API library and applications that use this device driver are available from
+RapidIO.org.
+
+6. TODO List
+============
+
+- Add support for sending/receiving "raw" RapidIO messaging packets.
+- Add memory mapped DMA data transfers as an option when RapidIO-specific DMA
+  is not available.
diff --git a/Documentation/rapidio/mport_cdev.txt b/Documentation/rapidio/mport_cdev.txt
deleted file mode 100644
index a53f786ee2e9..000000000000
--- a/Documentation/rapidio/mport_cdev.txt
+++ /dev/null
@@ -1,107 +0,0 @@
-RapidIO subsystem mport character device driver (rio_mport_cdev.c)
-==================================================================
-
-Version History:
-----------------
-  1.0.0 - Initial driver release.
-
-==================================================================
-
-I. Overview
-
-This device driver is the result of collaboration within the RapidIO.org
-Software Task Group (STG) between Texas Instruments, Freescale,
-Prodrive Technologies, Nokia Networks, BAE and IDT.  Additional input was
-received from other members of RapidIO.org. The objective was to create a
-character mode driver interface which exposes the capabilities of RapidIO
-devices directly to applications, in a manner that allows the numerous and
-varied RapidIO implementations to interoperate.
-
-This driver (MPORT_CDEV) provides access to basic RapidIO subsystem operations
-for user-space applications. Most of RapidIO operations are supported through
-'ioctl' system calls.
-
-When loaded this device driver creates filesystem nodes named rio_mportX in /dev
-directory for each registered RapidIO mport device. 'X' in the node name matches
-to unique port ID assigned to each local mport device.
-
-Using available set of ioctl commands user-space applications can perform
-following RapidIO bus and subsystem operations:
-
-- Reads and writes from/to configuration registers of mport devices
-    (RIO_MPORT_MAINT_READ_LOCAL/RIO_MPORT_MAINT_WRITE_LOCAL)
-- Reads and writes from/to configuration registers of remote RapidIO devices.
-  This operations are defined as RapidIO Maintenance reads/writes in RIO spec.
-    (RIO_MPORT_MAINT_READ_REMOTE/RIO_MPORT_MAINT_WRITE_REMOTE)
-- Set RapidIO Destination ID for mport devices (RIO_MPORT_MAINT_HDID_SET)
-- Set RapidIO Component Tag for mport devices (RIO_MPORT_MAINT_COMPTAG_SET)
-- Query logical index of mport devices (RIO_MPORT_MAINT_PORT_IDX_GET)
-- Query capabilities and RapidIO link configuration of mport devices
-    (RIO_MPORT_GET_PROPERTIES)
-- Enable/Disable reporting of RapidIO doorbell events to user-space applications
-    (RIO_ENABLE_DOORBELL_RANGE/RIO_DISABLE_DOORBELL_RANGE)
-- Enable/Disable reporting of RIO port-write events to user-space applications
-    (RIO_ENABLE_PORTWRITE_RANGE/RIO_DISABLE_PORTWRITE_RANGE)
-- Query/Control type of events reported through this driver: doorbells,
-  port-writes or both (RIO_SET_EVENT_MASK/RIO_GET_EVENT_MASK)
-- Configure/Map mport's outbound requests window(s) for specific size,
-  RapidIO destination ID, hopcount and request type
-    (RIO_MAP_OUTBOUND/RIO_UNMAP_OUTBOUND)
-- Configure/Map mport's inbound requests window(s) for specific size,
-  RapidIO base address and local memory base address
-    (RIO_MAP_INBOUND/RIO_UNMAP_INBOUND)
-- Allocate/Free contiguous DMA coherent memory buffer for DMA data transfers
-  to/from remote RapidIO devices (RIO_ALLOC_DMA/RIO_FREE_DMA)
-- Initiate DMA data transfers to/from remote RapidIO devices (RIO_TRANSFER).
-  Supports blocking, asynchronous and posted (a.k.a 'fire-and-forget') data
-  transfer modes.
-- Check/Wait for completion of asynchronous DMA data transfer
-    (RIO_WAIT_FOR_ASYNC)
-- Manage device objects supported by RapidIO subsystem (RIO_DEV_ADD/RIO_DEV_DEL).
-  This allows implementation of various RapidIO fabric enumeration algorithms
-  as user-space applications while using remaining functionality provided by
-  kernel RapidIO subsystem.
-
-II. Hardware Compatibility
-
-This device driver uses standard interfaces defined by kernel RapidIO subsystem
-and therefore it can be used with any mport device driver registered by RapidIO
-subsystem with limitations set by available mport implementation.
-
-At this moment the most common limitation is availability of RapidIO-specific
-DMA engine framework for specific mport device. Users should verify available
-functionality of their platform when planning to use this driver:
-
-- IDT Tsi721 PCIe-to-RapidIO bridge device and its mport device driver are fully
-  compatible with this driver.
-- Freescale SoCs 'fsl_rio' mport driver does not have implementation for RapidIO
-  specific DMA engine support and therefore DMA data transfers mport_cdev driver
-  are not available.
-
-III. Module parameters
-
-- 'dma_timeout' - DMA transfer completion timeout (in msec, default value 3000).
-        This parameter set a maximum completion wait time for SYNC mode DMA
-        transfer requests and for RIO_WAIT_FOR_ASYNC ioctl requests.
-
-- 'dbg_level' - This parameter allows to control amount of debug information
-        generated by this device driver. This parameter is formed by set of
-        bit masks that correspond to the specific functional blocks.
-        For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
-        This parameter can be changed dynamically.
-        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
-
-IV. Known problems
-
-  None.
-
-V. User-space Applications and API
-
-API library and applications that use this device driver are available from
-RapidIO.org.
-
-VI. TODO List
-
-- Add support for sending/receiving "raw" RapidIO messaging packets.
-- Add memory mapped DMA data transfers as an option when RapidIO-specific DMA
-  is not available.
diff --git a/Documentation/rapidio/rapidio.rst b/Documentation/rapidio/rapidio.rst
new file mode 100644
index 000000000000..fb8942d3ba85
--- /dev/null
+++ b/Documentation/rapidio/rapidio.rst
@@ -0,0 +1,362 @@
+============
+Introduction
+============
+
+The RapidIO standard is a packet-based fabric interconnect standard designed for
+use in embedded systems. Development of the RapidIO standard is directed by the
+RapidIO Trade Association (RTA). The current version of the RapidIO specification
+is publicly available for download from the RTA web-site [1].
+
+This document describes the basics of the Linux RapidIO subsystem and provides
+information on its major components.
+
+1 Overview
+==========
+
+Because the RapidIO subsystem follows the Linux device model it is integrated
+into the kernel similarly to other buses by defining RapidIO-specific device and
+bus types and registering them within the device model.
+
+The Linux RapidIO subsystem is architecture independent and therefore defines
+architecture-specific interfaces that provide support for common RapidIO
+subsystem operations.
+
+2. Core Components
+==================
+
+A typical RapidIO network is a combination of endpoints and switches.
+Each of these components is represented in the subsystem by an associated data
+structure. The core logical components of the RapidIO subsystem are defined
+in include/linux/rio.h file.
+
+2.1 Master Port
+---------------
+
+A master port (or mport) is a RapidIO interface controller that is local to the
+processor executing the Linux code. A master port generates and receives RapidIO
+packets (transactions). In the RapidIO subsystem each master port is represented
+by a rio_mport data structure. This structure contains master port specific
+resources such as mailboxes and doorbells. The rio_mport also includes a unique
+host device ID that is valid when a master port is configured as an enumerating
+host.
+
+RapidIO master ports are serviced by subsystem specific mport device drivers
+that provide functionality defined for this subsystem. To provide a hardware
+independent interface for RapidIO subsystem operations, rio_mport structure
+includes rio_ops data structure which contains pointers to hardware specific
+implementations of RapidIO functions.
+
+2.2 Device
+----------
+
+A RapidIO device is any endpoint (other than mport) or switch in the network.
+All devices are presented in the RapidIO subsystem by corresponding rio_dev data
+structure. Devices form one global device list and per-network device lists
+(depending on number of available mports and networks).
+
+2.3 Switch
+----------
+
+A RapidIO switch is a special class of device that routes packets between its
+ports towards their final destination. The packet destination port within a
+switch is defined by an internal routing table. A switch is presented in the
+RapidIO subsystem by rio_dev data structure expanded by additional rio_switch
+data structure, which contains switch specific information such as copy of the
+routing table and pointers to switch specific functions.
+
+The RapidIO subsystem defines the format and initialization method for subsystem
+specific switch drivers that are designed to provide hardware-specific
+implementation of common switch management routines.
+
+2.4 Network
+-----------
+
+A RapidIO network is a combination of interconnected endpoint and switch devices.
+Each RapidIO network known to the system is represented by corresponding rio_net
+data structure. This structure includes lists of all devices and local master
+ports that form the same network. It also contains a pointer to the default
+master port that is used to communicate with devices within the network.
+
+2.5 Device Drivers
+------------------
+
+RapidIO device-specific drivers follow Linux Kernel Driver Model and are
+intended to support specific RapidIO devices attached to the RapidIO network.
+
+2.6 Subsystem Interfaces
+------------------------
+
+RapidIO interconnect specification defines features that may be used to provide
+one or more common service layers for all participating RapidIO devices. These
+common services may act separately from device-specific drivers or be used by
+device-specific drivers. Example of such service provider is the RIONET driver
+which implements Ethernet-over-RapidIO interface. Because only one driver can be
+registered for a device, all common RapidIO services have to be registered as
+subsystem interfaces. This allows to have multiple common services attached to
+the same device without blocking attachment of a device-specific driver.
+
+3. Subsystem Initialization
+===========================
+
+In order to initialize the RapidIO subsystem, a platform must initialize and
+register at least one master port within the RapidIO network. To register mport
+within the subsystem controller driver's initialization code calls function
+rio_register_mport() for each available master port.
+
+After all active master ports are registered with a RapidIO subsystem,
+an enumeration and/or discovery routine may be called automatically or
+by user-space command.
+
+RapidIO subsystem can be configured to be built as a statically linked or
+modular component of the kernel (see details below).
+
+4. Enumeration and Discovery
+============================
+
+4.1 Overview
+------------
+
+RapidIO subsystem configuration options allow users to build enumeration and
+discovery methods as statically linked components or loadable modules.
+An enumeration/discovery method implementation and available input parameters
+define how any given method can be attached to available RapidIO mports:
+simply to all available mports OR individually to the specified mport device.
+
+Depending on selected enumeration/discovery build configuration, there are
+several methods to initiate an enumeration and/or discovery process:
+
+  (a) Statically linked enumeration and discovery process can be started
+  automatically during kernel initialization time using corresponding module
+  parameters. This was the original method used since introduction of RapidIO
+  subsystem. Now this method relies on enumerator module parameter which is
+  'rio-scan.scan' for existing basic enumeration/discovery method.
+  When automatic start of enumeration/discovery is used a user has to ensure
+  that all discovering endpoints are started before the enumerating endpoint
+  and are waiting for enumeration to be completed.
+  Configuration option CONFIG_RAPIDIO_DISC_TIMEOUT defines time that discovering
+  endpoint waits for enumeration to be completed. If the specified timeout
+  expires the discovery process is terminated without obtaining RapidIO network
+  information. NOTE: a timed out discovery process may be restarted later using
+  a user-space command as it is described below (if the given endpoint was
+  enumerated successfully).
+
+  (b) Statically linked enumeration and discovery process can be started by
+  a command from user space. This initiation method provides more flexibility
+  for a system startup compared to the option (a) above. After all participating
+  endpoints have been successfully booted, an enumeration process shall be
+  started first by issuing a user-space command, after an enumeration is
+  completed a discovery process can be started on all remaining endpoints.
+
+  (c) Modular enumeration and discovery process can be started by a command from
+  user space. After an enumeration/discovery module is loaded, a network scan
+  process can be started by issuing a user-space command.
+  Similar to the option (b) above, an enumerator has to be started first.
+
+  (d) Modular enumeration and discovery process can be started by a module
+  initialization routine. In this case an enumerating module shall be loaded
+  first.
+
+When a network scan process is started it calls an enumeration or discovery
+routine depending on the configured role of a master port: host or agent.
+
+Enumeration is performed by a master port if it is configured as a host port by
+assigning a host destination ID greater than or equal to zero. The host
+destination ID can be assigned to a master port using various methods depending
+on RapidIO subsystem build configuration:
+
+  (a) For a statically linked RapidIO subsystem core use command line parameter
+  "rapidio.hdid=" with a list of destination ID assignments in order of mport
+  device registration. For example, in a system with two RapidIO controllers
+  the command line parameter "rapidio.hdid=-1,7" will result in assignment of
+  the host destination ID=7 to the second RapidIO controller, while the first
+  one will be assigned destination ID=-1.
+
+  (b) If the RapidIO subsystem core is built as a loadable module, in addition
+  to the method shown above, the host destination ID(s) can be specified using
+  traditional methods of passing module parameter "hdid=" during its loading:
+
+  - from command line: "modprobe rapidio hdid=-1,7", or
+  - from modprobe configuration file using configuration command "options",
+    like in this example: "options rapidio hdid=-1,7". An example of modprobe
+    configuration file is provided in the section below.
+
+NOTES:
+  (i) if "hdid=" parameter is omitted all available mport will be assigned
+  destination ID = -1;
+
+  (ii) the "hdid=" parameter in systems with multiple mports can have
+  destination ID assignments omitted from the end of list (default = -1).
+
+If the host device ID for a specific master port is set to -1, the discovery
+process will be performed for it.
+
+The enumeration and discovery routines use RapidIO maintenance transactions
+to access the configuration space of devices.
+
+NOTE: If RapidIO switch-specific device drivers are built as loadable modules
+they must be loaded before enumeration/discovery process starts.
+This requirement is cased by the fact that enumeration/discovery methods invoke
+vendor-specific callbacks on early stages.
+
+4.2 Automatic Start of Enumeration and Discovery
+------------------------------------------------
+
+Automatic enumeration/discovery start method is applicable only to built-in
+enumeration/discovery RapidIO configuration selection. To enable automatic
+enumeration/discovery start by existing basic enumerator method set use boot
+command line parameter "rio-scan.scan=1".
+
+This configuration requires synchronized start of all RapidIO endpoints that
+form a network which will be enumerated/discovered. Discovering endpoints have
+to be started before an enumeration starts to ensure that all RapidIO
+controllers have been initialized and are ready to be discovered. Configuration
+parameter CONFIG_RAPIDIO_DISC_TIMEOUT defines time (in seconds) which
+a discovering endpoint will wait for enumeration to be completed.
+
+When automatic enumeration/discovery start is selected, basic method's
+initialization routine calls rio_init_mports() to perform enumeration or
+discovery for all known mport devices.
+
+Depending on RapidIO network size and configuration this automatic
+enumeration/discovery start method may be difficult to use due to the
+requirement for synchronized start of all endpoints.
+
+4.3 User-space Start of Enumeration and Discovery
+-------------------------------------------------
+
+User-space start of enumeration and discovery can be used with built-in and
+modular build configurations. For user-space controlled start RapidIO subsystem
+creates the sysfs write-only attribute file '/sys/bus/rapidio/scan'. To initiate
+an enumeration or discovery process on specific mport device, a user needs to
+write mport_ID (not RapidIO destination ID) into that file. The mport_ID is a
+sequential number (0 ... RIO_MAX_MPORTS) assigned during mport device
+registration. For example for machine with single RapidIO controller, mport_ID
+for that controller always will be 0.
+
+To initiate RapidIO enumeration/discovery on all available mports a user may
+write '-1' (or RIO_MPORT_ANY) into the scan attribute file.
+
+4.4 Basic Enumeration Method
+----------------------------
+
+This is an original enumeration/discovery method which is available since
+first release of RapidIO subsystem code. The enumeration process is
+implemented according to the enumeration algorithm outlined in the RapidIO
+Interconnect Specification: Annex I [1].
+
+This method can be configured as statically linked or loadable module.
+The method's single parameter "scan" allows to trigger the enumeration/discovery
+process from module initialization routine.
+
+This enumeration/discovery method can be started only once and does not support
+unloading if it is built as a module.
+
+The enumeration process traverses the network using a recursive depth-first
+algorithm. When a new device is found, the enumerator takes ownership of that
+device by writing into the Host Device ID Lock CSR. It does this to ensure that
+the enumerator has exclusive right to enumerate the device. If device ownership
+is successfully acquired, the enumerator allocates a new rio_dev structure and
+initializes it according to device capabilities.
+
+If the device is an endpoint, a unique device ID is assigned to it and its value
+is written into the device's Base Device ID CSR.
+
+If the device is a switch, the enumerator allocates an additional rio_switch
+structure to store switch specific information. Then the switch's vendor ID and
+device ID are queried against a table of known RapidIO switches. Each switch
+table entry contains a pointer to a switch-specific initialization routine that
+initializes pointers to the rest of switch specific operations, and performs
+hardware initialization if necessary. A RapidIO switch does not have a unique
+device ID; it relies on hopcount and routing for device ID of an attached
+endpoint if access to its configuration registers is required. If a switch (or
+chain of switches) does not have any endpoint (except enumerator) attached to
+it, a fake device ID will be assigned to configure a route to that switch.
+In the case of a chain of switches without endpoint, one fake device ID is used
+to configure a route through the entire chain and switches are differentiated by
+their hopcount value.
+
+For both endpoints and switches the enumerator writes a unique component tag
+into device's Component Tag CSR. That unique value is used by the error
+management notification mechanism to identify a device that is reporting an
+error management event.
+
+Enumeration beyond a switch is completed by iterating over each active egress
+port of that switch. For each active link, a route to a default device ID
+(0xFF for 8-bit systems and 0xFFFF for 16-bit systems) is temporarily written
+into the routing table. The algorithm recurs by calling itself with hopcount + 1
+and the default device ID in order to access the device on the active port.
+
+After the host has completed enumeration of the entire network it releases
+devices by clearing device ID locks (calls rio_clear_locks()). For each endpoint
+in the system, it sets the Discovered bit in the Port General Control CSR
+to indicate that enumeration is completed and agents are allowed to execute
+passive discovery of the network.
+
+The discovery process is performed by agents and is similar to the enumeration
+process that is described above. However, the discovery process is performed
+without changes to the existing routing because agents only gather information
+about RapidIO network structure and are building an internal map of discovered
+devices. This way each Linux-based component of the RapidIO subsystem has
+a complete view of the network. The discovery process can be performed
+simultaneously by several agents. After initializing its RapidIO master port
+each agent waits for enumeration completion by the host for the configured wait
+time period. If this wait time period expires before enumeration is completed,
+an agent skips RapidIO discovery and continues with remaining kernel
+initialization.
+
+4.5 Adding New Enumeration/Discovery Method
+-------------------------------------------
+
+RapidIO subsystem code organization allows addition of new enumeration/discovery
+methods as new configuration options without significant impact to the core
+RapidIO code.
+
+A new enumeration/discovery method has to be attached to one or more mport
+devices before an enumeration/discovery process can be started. Normally,
+method's module initialization routine calls rio_register_scan() to attach
+an enumerator to a specified mport device (or devices). The basic enumerator
+implementation demonstrates this process.
+
+4.6 Using Loadable RapidIO Switch Drivers
+-----------------------------------------
+
+In the case when RapidIO switch drivers are built as loadable modules a user
+must ensure that they are loaded before the enumeration/discovery starts.
+This process can be automated by specifying pre- or post- dependencies in the
+RapidIO-specific modprobe configuration file as shown in the example below.
+
+File /etc/modprobe.d/rapidio.conf::
+
+  # Configure RapidIO subsystem modules
+
+  # Set enumerator host destination ID (overrides kernel command line option)
+  options rapidio hdid=-1,2
+
+  # Load RapidIO switch drivers immediately after rapidio core module was loaded
+  softdep rapidio post: idt_gen2 idtcps tsi57x
+
+  # OR :
+
+  # Load RapidIO switch drivers just before rio-scan enumerator module is loaded
+  softdep rio-scan pre: idt_gen2 idtcps tsi57x
+
+  --------------------------
+
+NOTE:
+  In the example above, one of "softdep" commands must be removed or
+  commented out to keep required module loading sequence.
+
+5. References
+=============
+
+[1] RapidIO Trade Association. RapidIO Interconnect Specifications.
+    http://www.rapidio.org.
+
+[2] Rapidio TA. Technology Comparisons.
+    http://www.rapidio.org/education/technology_comparisons/
+
+[3] RapidIO support for Linux.
+    http://lwn.net/Articles/139118/
+
+[4] Matt Porter. RapidIO for Linux. Ottawa Linux Symposium, 2005
+    http://www.kernel.org/doc/ols/2005/ols2005v2-pages-43-56.pdf
diff --git a/Documentation/rapidio/rapidio.txt b/Documentation/rapidio/rapidio.txt
deleted file mode 100644
index 28fbd877f85a..000000000000
--- a/Documentation/rapidio/rapidio.txt
+++ /dev/null
@@ -1,351 +0,0 @@
-                          The Linux RapidIO Subsystem
-
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The RapidIO standard is a packet-based fabric interconnect standard designed for
-use in embedded systems. Development of the RapidIO standard is directed by the
-RapidIO Trade Association (RTA). The current version of the RapidIO specification
-is publicly available for download from the RTA web-site [1].
-
-This document describes the basics of the Linux RapidIO subsystem and provides
-information on its major components.
-
-1 Overview
-----------
-
-Because the RapidIO subsystem follows the Linux device model it is integrated
-into the kernel similarly to other buses by defining RapidIO-specific device and
-bus types and registering them within the device model.
-
-The Linux RapidIO subsystem is architecture independent and therefore defines
-architecture-specific interfaces that provide support for common RapidIO
-subsystem operations.
-
-2. Core Components
-------------------
-
-A typical RapidIO network is a combination of endpoints and switches.
-Each of these components is represented in the subsystem by an associated data
-structure. The core logical components of the RapidIO subsystem are defined
-in include/linux/rio.h file.
-
-2.1 Master Port
-
-A master port (or mport) is a RapidIO interface controller that is local to the
-processor executing the Linux code. A master port generates and receives RapidIO
-packets (transactions). In the RapidIO subsystem each master port is represented
-by a rio_mport data structure. This structure contains master port specific
-resources such as mailboxes and doorbells. The rio_mport also includes a unique
-host device ID that is valid when a master port is configured as an enumerating
-host.
-
-RapidIO master ports are serviced by subsystem specific mport device drivers
-that provide functionality defined for this subsystem. To provide a hardware
-independent interface for RapidIO subsystem operations, rio_mport structure
-includes rio_ops data structure which contains pointers to hardware specific
-implementations of RapidIO functions.
-
-2.2 Device
-
-A RapidIO device is any endpoint (other than mport) or switch in the network.
-All devices are presented in the RapidIO subsystem by corresponding rio_dev data
-structure. Devices form one global device list and per-network device lists
-(depending on number of available mports and networks).
-
-2.3 Switch
-
-A RapidIO switch is a special class of device that routes packets between its
-ports towards their final destination. The packet destination port within a
-switch is defined by an internal routing table. A switch is presented in the
-RapidIO subsystem by rio_dev data structure expanded by additional rio_switch
-data structure, which contains switch specific information such as copy of the
-routing table and pointers to switch specific functions.
-
-The RapidIO subsystem defines the format and initialization method for subsystem
-specific switch drivers that are designed to provide hardware-specific
-implementation of common switch management routines.
-
-2.4 Network
-
-A RapidIO network is a combination of interconnected endpoint and switch devices.
-Each RapidIO network known to the system is represented by corresponding rio_net
-data structure. This structure includes lists of all devices and local master
-ports that form the same network. It also contains a pointer to the default
-master port that is used to communicate with devices within the network.
-
-2.5 Device Drivers
-
-RapidIO device-specific drivers follow Linux Kernel Driver Model and are
-intended to support specific RapidIO devices attached to the RapidIO network.
-
-2.6 Subsystem Interfaces
-
-RapidIO interconnect specification defines features that may be used to provide
-one or more common service layers for all participating RapidIO devices. These
-common services may act separately from device-specific drivers or be used by
-device-specific drivers. Example of such service provider is the RIONET driver
-which implements Ethernet-over-RapidIO interface. Because only one driver can be
-registered for a device, all common RapidIO services have to be registered as
-subsystem interfaces. This allows to have multiple common services attached to
-the same device without blocking attachment of a device-specific driver.
-
-3. Subsystem Initialization
----------------------------
-
-In order to initialize the RapidIO subsystem, a platform must initialize and
-register at least one master port within the RapidIO network. To register mport
-within the subsystem controller driver's initialization code calls function
-rio_register_mport() for each available master port.
-
-After all active master ports are registered with a RapidIO subsystem,
-an enumeration and/or discovery routine may be called automatically or
-by user-space command.
-
-RapidIO subsystem can be configured to be built as a statically linked or
-modular component of the kernel (see details below).
-
-4. Enumeration and Discovery
-----------------------------
-
-4.1 Overview
-------------
-
-RapidIO subsystem configuration options allow users to build enumeration and
-discovery methods as statically linked components or loadable modules.
-An enumeration/discovery method implementation and available input parameters
-define how any given method can be attached to available RapidIO mports:
-simply to all available mports OR individually to the specified mport device.
-
-Depending on selected enumeration/discovery build configuration, there are
-several methods to initiate an enumeration and/or discovery process:
-
-  (a) Statically linked enumeration and discovery process can be started
-  automatically during kernel initialization time using corresponding module
-  parameters. This was the original method used since introduction of RapidIO
-  subsystem. Now this method relies on enumerator module parameter which is
-  'rio-scan.scan' for existing basic enumeration/discovery method.
-  When automatic start of enumeration/discovery is used a user has to ensure
-  that all discovering endpoints are started before the enumerating endpoint
-  and are waiting for enumeration to be completed.
-  Configuration option CONFIG_RAPIDIO_DISC_TIMEOUT defines time that discovering
-  endpoint waits for enumeration to be completed. If the specified timeout
-  expires the discovery process is terminated without obtaining RapidIO network
-  information. NOTE: a timed out discovery process may be restarted later using
-  a user-space command as it is described below (if the given endpoint was
-  enumerated successfully).
-
-  (b) Statically linked enumeration and discovery process can be started by
-  a command from user space. This initiation method provides more flexibility
-  for a system startup compared to the option (a) above. After all participating
-  endpoints have been successfully booted, an enumeration process shall be
-  started first by issuing a user-space command, after an enumeration is
-  completed a discovery process can be started on all remaining endpoints.
-
-  (c) Modular enumeration and discovery process can be started by a command from
-  user space. After an enumeration/discovery module is loaded, a network scan
-  process can be started by issuing a user-space command.
-  Similar to the option (b) above, an enumerator has to be started first.
-
-  (d) Modular enumeration and discovery process can be started by a module
-  initialization routine. In this case an enumerating module shall be loaded
-  first.
-
-When a network scan process is started it calls an enumeration or discovery
-routine depending on the configured role of a master port: host or agent.
-
-Enumeration is performed by a master port if it is configured as a host port by
-assigning a host destination ID greater than or equal to zero. The host
-destination ID can be assigned to a master port using various methods depending
-on RapidIO subsystem build configuration:
-
-  (a) For a statically linked RapidIO subsystem core use command line parameter
-  "rapidio.hdid=" with a list of destination ID assignments in order of mport
-  device registration. For example, in a system with two RapidIO controllers
-  the command line parameter "rapidio.hdid=-1,7" will result in assignment of
-  the host destination ID=7 to the second RapidIO controller, while the first
-  one will be assigned destination ID=-1.
-
-  (b) If the RapidIO subsystem core is built as a loadable module, in addition
-  to the method shown above, the host destination ID(s) can be specified using
-  traditional methods of passing module parameter "hdid=" during its loading:
-  - from command line: "modprobe rapidio hdid=-1,7", or
-  - from modprobe configuration file using configuration command "options",
-    like in this example: "options rapidio hdid=-1,7". An example of modprobe
-    configuration file is provided in the section below.
-
-  NOTES:
-  (i) if "hdid=" parameter is omitted all available mport will be assigned
-  destination ID = -1;
-  (ii) the "hdid=" parameter in systems with multiple mports can have
-  destination ID assignments omitted from the end of list (default = -1).
-
-If the host device ID for a specific master port is set to -1, the discovery
-process will be performed for it.
-
-The enumeration and discovery routines use RapidIO maintenance transactions
-to access the configuration space of devices.
-
-NOTE: If RapidIO switch-specific device drivers are built as loadable modules
-they must be loaded before enumeration/discovery process starts.
-This requirement is cased by the fact that enumeration/discovery methods invoke
-vendor-specific callbacks on early stages.
-
-4.2 Automatic Start of Enumeration and Discovery
-------------------------------------------------
-
-Automatic enumeration/discovery start method is applicable only to built-in
-enumeration/discovery RapidIO configuration selection. To enable automatic
-enumeration/discovery start by existing basic enumerator method set use boot
-command line parameter "rio-scan.scan=1".
-
-This configuration requires synchronized start of all RapidIO endpoints that
-form a network which will be enumerated/discovered. Discovering endpoints have
-to be started before an enumeration starts to ensure that all RapidIO
-controllers have been initialized and are ready to be discovered. Configuration
-parameter CONFIG_RAPIDIO_DISC_TIMEOUT defines time (in seconds) which
-a discovering endpoint will wait for enumeration to be completed.
-
-When automatic enumeration/discovery start is selected, basic method's
-initialization routine calls rio_init_mports() to perform enumeration or
-discovery for all known mport devices.
-
-Depending on RapidIO network size and configuration this automatic
-enumeration/discovery start method may be difficult to use due to the
-requirement for synchronized start of all endpoints.
-
-4.3 User-space Start of Enumeration and Discovery
--------------------------------------------------
-
-User-space start of enumeration and discovery can be used with built-in and
-modular build configurations. For user-space controlled start RapidIO subsystem
-creates the sysfs write-only attribute file '/sys/bus/rapidio/scan'. To initiate
-an enumeration or discovery process on specific mport device, a user needs to
-write mport_ID (not RapidIO destination ID) into that file. The mport_ID is a
-sequential number (0 ... RIO_MAX_MPORTS) assigned during mport device
-registration. For example for machine with single RapidIO controller, mport_ID
-for that controller always will be 0.
-
-To initiate RapidIO enumeration/discovery on all available mports a user may
-write '-1' (or RIO_MPORT_ANY) into the scan attribute file.
-
-4.4 Basic Enumeration Method
-----------------------------
-
-This is an original enumeration/discovery method which is available since
-first release of RapidIO subsystem code. The enumeration process is
-implemented according to the enumeration algorithm outlined in the RapidIO
-Interconnect Specification: Annex I [1].
-
-This method can be configured as statically linked or loadable module.
-The method's single parameter "scan" allows to trigger the enumeration/discovery
-process from module initialization routine.
-
-This enumeration/discovery method can be started only once and does not support
-unloading if it is built as a module.
-
-The enumeration process traverses the network using a recursive depth-first
-algorithm. When a new device is found, the enumerator takes ownership of that
-device by writing into the Host Device ID Lock CSR. It does this to ensure that
-the enumerator has exclusive right to enumerate the device. If device ownership
-is successfully acquired, the enumerator allocates a new rio_dev structure and
-initializes it according to device capabilities.
-
-If the device is an endpoint, a unique device ID is assigned to it and its value
-is written into the device's Base Device ID CSR.
-
-If the device is a switch, the enumerator allocates an additional rio_switch
-structure to store switch specific information. Then the switch's vendor ID and
-device ID are queried against a table of known RapidIO switches. Each switch
-table entry contains a pointer to a switch-specific initialization routine that
-initializes pointers to the rest of switch specific operations, and performs
-hardware initialization if necessary. A RapidIO switch does not have a unique
-device ID; it relies on hopcount and routing for device ID of an attached
-endpoint if access to its configuration registers is required. If a switch (or
-chain of switches) does not have any endpoint (except enumerator) attached to
-it, a fake device ID will be assigned to configure a route to that switch.
-In the case of a chain of switches without endpoint, one fake device ID is used
-to configure a route through the entire chain and switches are differentiated by
-their hopcount value.
-
-For both endpoints and switches the enumerator writes a unique component tag
-into device's Component Tag CSR. That unique value is used by the error
-management notification mechanism to identify a device that is reporting an
-error management event.
-
-Enumeration beyond a switch is completed by iterating over each active egress
-port of that switch. For each active link, a route to a default device ID
-(0xFF for 8-bit systems and 0xFFFF for 16-bit systems) is temporarily written
-into the routing table. The algorithm recurs by calling itself with hopcount + 1
-and the default device ID in order to access the device on the active port.
-
-After the host has completed enumeration of the entire network it releases
-devices by clearing device ID locks (calls rio_clear_locks()). For each endpoint
-in the system, it sets the Discovered bit in the Port General Control CSR
-to indicate that enumeration is completed and agents are allowed to execute
-passive discovery of the network.
-
-The discovery process is performed by agents and is similar to the enumeration
-process that is described above. However, the discovery process is performed
-without changes to the existing routing because agents only gather information
-about RapidIO network structure and are building an internal map of discovered
-devices. This way each Linux-based component of the RapidIO subsystem has
-a complete view of the network. The discovery process can be performed
-simultaneously by several agents. After initializing its RapidIO master port
-each agent waits for enumeration completion by the host for the configured wait
-time period. If this wait time period expires before enumeration is completed,
-an agent skips RapidIO discovery and continues with remaining kernel
-initialization.
-
-4.5 Adding New Enumeration/Discovery Method
--------------------------------------------
-
-RapidIO subsystem code organization allows addition of new enumeration/discovery
-methods as new configuration options without significant impact to the core
-RapidIO code.
-
-A new enumeration/discovery method has to be attached to one or more mport
-devices before an enumeration/discovery process can be started. Normally,
-method's module initialization routine calls rio_register_scan() to attach
-an enumerator to a specified mport device (or devices). The basic enumerator
-implementation demonstrates this process.
-
-4.6 Using Loadable RapidIO Switch Drivers
------------------------------------------
-
-In the case when RapidIO switch drivers are built as loadable modules a user
-must ensure that they are loaded before the enumeration/discovery starts.
-This process can be automated by specifying pre- or post- dependencies in the
-RapidIO-specific modprobe configuration file as shown in the example below.
-
-  File /etc/modprobe.d/rapidio.conf:
-  ----------------------------------
-
-  # Configure RapidIO subsystem modules
-
-  # Set enumerator host destination ID (overrides kernel command line option)
-  options rapidio hdid=-1,2
-
-  # Load RapidIO switch drivers immediately after rapidio core module was loaded
-  softdep rapidio post: idt_gen2 idtcps tsi57x
-
-  # OR :
-
-  # Load RapidIO switch drivers just before rio-scan enumerator module is loaded
-  softdep rio-scan pre: idt_gen2 idtcps tsi57x
-
-  --------------------------
-
-NOTE: In the example above, one of "softdep" commands must be removed or
-commented out to keep required module loading sequence.
-
-A. References
--------------
-
-[1] RapidIO Trade Association. RapidIO Interconnect Specifications.
-    http://www.rapidio.org.
-[2] Rapidio TA. Technology Comparisons.
-    http://www.rapidio.org/education/technology_comparisons/
-[3] RapidIO support for Linux.
-    http://lwn.net/Articles/139118/
-[4] Matt Porter. RapidIO for Linux. Ottawa Linux Symposium, 2005
-    http://www.kernel.org/doc/ols/2005/ols2005v2-pages-43-56.pdf
diff --git a/Documentation/rapidio/rio_cm.rst b/Documentation/rapidio/rio_cm.rst
new file mode 100644
index 000000000000..5294430a7a74
--- /dev/null
+++ b/Documentation/rapidio/rio_cm.rst
@@ -0,0 +1,135 @@
+==========================================================================
+RapidIO subsystem Channelized Messaging character device driver (rio_cm.c)
+==========================================================================
+
+
+1. Overview
+===========
+
+This device driver is the result of collaboration within the RapidIO.org
+Software Task Group (STG) between Texas Instruments, Prodrive Technologies,
+Nokia Networks, BAE and IDT.  Additional input was received from other members
+of RapidIO.org.
+
+The objective was to create a character mode driver interface which exposes
+messaging capabilities of RapidIO endpoint devices (mports) directly
+to applications, in a manner that allows the numerous and varied RapidIO
+implementations to interoperate.
+
+This driver (RIO_CM) provides to user-space applications shared access to
+RapidIO mailbox messaging resources.
+
+RapidIO specification (Part 2) defines that endpoint devices may have up to four
+messaging mailboxes in case of multi-packet message (up to 4KB) and
+up to 64 mailboxes if single-packet messages (up to 256 B) are used. In addition
+to protocol definition limitations, a particular hardware implementation can
+have reduced number of messaging mailboxes.  RapidIO aware applications must
+therefore share the messaging resources of a RapidIO endpoint.
+
+Main purpose of this device driver is to provide RapidIO mailbox messaging
+capability to large number of user-space processes by introducing socket-like
+operations using a single messaging mailbox.  This allows applications to
+use the limited RapidIO messaging hardware resources efficiently.
+
+Most of device driver's operations are supported through 'ioctl' system calls.
+
+When loaded this device driver creates a single file system node named rio_cm
+in /dev directory common for all registered RapidIO mport devices.
+
+Following ioctl commands are available to user-space applications:
+
+- RIO_CM_MPORT_GET_LIST:
+    Returns to caller list of local mport devices that
+    support messaging operations (number of entries up to RIO_MAX_MPORTS).
+    Each list entry is combination of mport's index in the system and RapidIO
+    destination ID assigned to the port.
+- RIO_CM_EP_GET_LIST_SIZE:
+    Returns number of messaging capable remote endpoints
+    in a RapidIO network associated with the specified mport device.
+- RIO_CM_EP_GET_LIST:
+    Returns list of RapidIO destination IDs for messaging
+    capable remote endpoints (peers) available in a RapidIO network associated
+    with the specified mport device.
+- RIO_CM_CHAN_CREATE:
+    Creates RapidIO message exchange channel data structure
+    with channel ID assigned automatically or as requested by a caller.
+- RIO_CM_CHAN_BIND:
+    Binds the specified channel data structure to the specified
+    mport device.
+- RIO_CM_CHAN_LISTEN:
+    Enables listening for connection requests on the specified
+    channel.
+- RIO_CM_CHAN_ACCEPT:
+    Accepts a connection request from peer on the specified
+    channel. If wait timeout for this request is specified by a caller it is
+    a blocking call. If timeout set to 0 this is non-blocking call - ioctl
+    handler checks for a pending connection request and if one is not available
+    exits with -EGAIN error status immediately.
+- RIO_CM_CHAN_CONNECT:
+    Sends a connection request to a remote peer/channel.
+- RIO_CM_CHAN_SEND:
+    Sends a data message through the specified channel.
+    The handler for this request assumes that message buffer specified by
+    a caller includes the reserved space for a packet header required by
+    this driver.
+- RIO_CM_CHAN_RECEIVE:
+    Receives a data message through a connected channel.
+    If the channel does not have an incoming message ready to return this ioctl
+    handler will wait for new message until timeout specified by a caller
+    expires. If timeout value is set to 0, ioctl handler uses a default value
+    defined by MAX_SCHEDULE_TIMEOUT.
+- RIO_CM_CHAN_CLOSE:
+    Closes a specified channel and frees associated buffers.
+    If the specified channel is in the CONNECTED state, sends close notification
+    to the remote peer.
+
+The ioctl command codes and corresponding data structures intended for use by
+user-space applications are defined in 'include/uapi/linux/rio_cm_cdev.h'.
+
+2. Hardware Compatibility
+=========================
+
+This device driver uses standard interfaces defined by kernel RapidIO subsystem
+and therefore it can be used with any mport device driver registered by RapidIO
+subsystem with limitations set by available mport HW implementation of messaging
+mailboxes.
+
+3. Module parameters
+====================
+
+- 'dbg_level'
+      - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        bit masks that correspond to the specific functional block.
+        For mask definitions see 'drivers/rapidio/devices/rio_cm.c'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+- 'cmbox'
+      - Number of RapidIO mailbox to use (default value is 1).
+        This parameter allows to set messaging mailbox number that will be used
+        within entire RapidIO network. It can be used when default mailbox is
+        used by other device drivers or is not supported by some nodes in the
+        RapidIO network.
+
+- 'chstart'
+      - Start channel number for dynamic assignment. Default value - 256.
+        Allows to exclude channel numbers below this parameter from dynamic
+        allocation to avoid conflicts with software components that use
+        reserved predefined channel numbers.
+
+4. Known problems
+=================
+
+  None.
+
+5. User-space Applications and API Library
+==========================================
+
+Messaging API library and applications that use this device driver are available
+from RapidIO.org.
+
+6. TODO List
+============
+
+- Add support for system notification messages (reserved channel 0).
diff --git a/Documentation/rapidio/rio_cm.txt b/Documentation/rapidio/rio_cm.txt
deleted file mode 100644
index 27aa401f1126..000000000000
--- a/Documentation/rapidio/rio_cm.txt
+++ /dev/null
@@ -1,119 +0,0 @@
-RapidIO subsystem Channelized Messaging character device driver (rio_cm.c)
-==========================================================================
-
-Version History:
-----------------
-  1.0.0 - Initial driver release.
-
-==========================================================================
-
-I. Overview
-
-This device driver is the result of collaboration within the RapidIO.org
-Software Task Group (STG) between Texas Instruments, Prodrive Technologies,
-Nokia Networks, BAE and IDT.  Additional input was received from other members
-of RapidIO.org.
-
-The objective was to create a character mode driver interface which exposes
-messaging capabilities of RapidIO endpoint devices (mports) directly
-to applications, in a manner that allows the numerous and varied RapidIO
-implementations to interoperate.
-
-This driver (RIO_CM) provides to user-space applications shared access to
-RapidIO mailbox messaging resources.
-
-RapidIO specification (Part 2) defines that endpoint devices may have up to four
-messaging mailboxes in case of multi-packet message (up to 4KB) and
-up to 64 mailboxes if single-packet messages (up to 256 B) are used. In addition
-to protocol definition limitations, a particular hardware implementation can
-have reduced number of messaging mailboxes.  RapidIO aware applications must
-therefore share the messaging resources of a RapidIO endpoint.
-
-Main purpose of this device driver is to provide RapidIO mailbox messaging
-capability to large number of user-space processes by introducing socket-like
-operations using a single messaging mailbox.  This allows applications to
-use the limited RapidIO messaging hardware resources efficiently.
-
-Most of device driver's operations are supported through 'ioctl' system calls.
-
-When loaded this device driver creates a single file system node named rio_cm
-in /dev directory common for all registered RapidIO mport devices.
-
-Following ioctl commands are available to user-space applications:
-
-- RIO_CM_MPORT_GET_LIST : Returns to caller list of local mport devices that
-    support messaging operations (number of entries up to RIO_MAX_MPORTS).
-    Each list entry is combination of mport's index in the system and RapidIO
-    destination ID assigned to the port.
-- RIO_CM_EP_GET_LIST_SIZE : Returns number of messaging capable remote endpoints
-    in a RapidIO network associated with the specified mport device.
-- RIO_CM_EP_GET_LIST : Returns list of RapidIO destination IDs for messaging
-    capable remote endpoints (peers) available in a RapidIO network associated
-    with the specified mport device.
-- RIO_CM_CHAN_CREATE : Creates RapidIO message exchange channel data structure
-    with channel ID assigned automatically or as requested by a caller.
-- RIO_CM_CHAN_BIND : Binds the specified channel data structure to the specified
-    mport device.
-- RIO_CM_CHAN_LISTEN : Enables listening for connection requests on the specified
-    channel.
-- RIO_CM_CHAN_ACCEPT : Accepts a connection request from peer on the specified
-    channel. If wait timeout for this request is specified by a caller it is
-    a blocking call. If timeout set to 0 this is non-blocking call - ioctl
-    handler checks for a pending connection request and if one is not available
-    exits with -EGAIN error status immediately.
-- RIO_CM_CHAN_CONNECT : Sends a connection request to a remote peer/channel.
-- RIO_CM_CHAN_SEND : Sends a data message through the specified channel.
-    The handler for this request assumes that message buffer specified by
-    a caller includes the reserved space for a packet header required by
-    this driver.
-- RIO_CM_CHAN_RECEIVE : Receives a data message through a connected channel.
-    If the channel does not have an incoming message ready to return this ioctl
-    handler will wait for new message until timeout specified by a caller
-    expires. If timeout value is set to 0, ioctl handler uses a default value
-    defined by MAX_SCHEDULE_TIMEOUT.
-- RIO_CM_CHAN_CLOSE : Closes a specified channel and frees associated buffers.
-    If the specified channel is in the CONNECTED state, sends close notification
-    to the remote peer.
-
-The ioctl command codes and corresponding data structures intended for use by
-user-space applications are defined in 'include/uapi/linux/rio_cm_cdev.h'.
-
-II. Hardware Compatibility
-
-This device driver uses standard interfaces defined by kernel RapidIO subsystem
-and therefore it can be used with any mport device driver registered by RapidIO
-subsystem with limitations set by available mport HW implementation of messaging
-mailboxes.
-
-III. Module parameters
-
-- 'dbg_level' - This parameter allows to control amount of debug information
-        generated by this device driver. This parameter is formed by set of
-        bit masks that correspond to the specific functional block.
-        For mask definitions see 'drivers/rapidio/devices/rio_cm.c'
-        This parameter can be changed dynamically.
-        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
-
-- 'cmbox' - Number of RapidIO mailbox to use (default value is 1).
-        This parameter allows to set messaging mailbox number that will be used
-        within entire RapidIO network. It can be used when default mailbox is
-        used by other device drivers or is not supported by some nodes in the
-        RapidIO network.
-
-- 'chstart' - Start channel number for dynamic assignment. Default value - 256.
-        Allows to exclude channel numbers below this parameter from dynamic
-        allocation to avoid conflicts with software components that use
-        reserved predefined channel numbers.
-
-IV. Known problems
-
-  None.
-
-V. User-space Applications and API Library
-
-Messaging API library and applications that use this device driver are available
-from RapidIO.org.
-
-VI. TODO List
-
-- Add support for system notification messages (reserved channel 0).
diff --git a/Documentation/rapidio/sysfs.rst b/Documentation/rapidio/sysfs.rst
new file mode 100644
index 000000000000..540f72683496
--- /dev/null
+++ b/Documentation/rapidio/sysfs.rst
@@ -0,0 +1,7 @@
+=============
+Sysfs entries
+=============
+
+The RapidIO sysfs files have moved to:
+Documentation/ABI/testing/sysfs-bus-rapidio and
+Documentation/ABI/testing/sysfs-class-rapidio
diff --git a/Documentation/rapidio/sysfs.txt b/Documentation/rapidio/sysfs.txt
deleted file mode 100644
index a1adac888e6e..000000000000
--- a/Documentation/rapidio/sysfs.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-The RapidIO sysfs files have moved to:
-Documentation/ABI/testing/sysfs-bus-rapidio and
-Documentation/ABI/testing/sysfs-class-rapidio
diff --git a/Documentation/rapidio/tsi721.rst b/Documentation/rapidio/tsi721.rst
new file mode 100644
index 000000000000..42aea438cd20
--- /dev/null
+++ b/Documentation/rapidio/tsi721.rst
@@ -0,0 +1,112 @@
+=========================================================================
+RapidIO subsystem mport driver for IDT Tsi721 PCI Express-to-SRIO bridge.
+=========================================================================
+
+1. Overview
+===========
+
+This driver implements all currently defined RapidIO mport callback functions.
+It supports maintenance read and write operations, inbound and outbound RapidIO
+doorbells, inbound maintenance port-writes and RapidIO messaging.
+
+To generate SRIO maintenance transactions this driver uses one of Tsi721 DMA
+channels. This mechanism provides access to larger range of hop counts and
+destination IDs without need for changes in outbound window translation.
+
+RapidIO messaging support uses dedicated messaging channels for each mailbox.
+For inbound messages this driver uses destination ID matching to forward messages
+into the corresponding message queue. Messaging callbacks are implemented to be
+fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
+
+1. Module parameters:
+
+- 'dbg_level'
+      - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        This parameter can be changed bit masks that correspond to the specific
+        functional block.
+        For mask definitions see 'drivers/rapidio/devices/tsi721.h'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+- 'dma_desc_per_channel'
+      - This parameter defines number of hardware buffer
+        descriptors allocated for each registered Tsi721 DMA channel.
+        Its default value is 128.
+
+- 'dma_txqueue_sz'
+      - DMA transactions queue size. Defines number of pending
+        transaction requests that can be accepted by each DMA channel.
+        Default value is 16.
+
+- 'dma_sel'
+      - DMA channel selection mask. Bitmask that defines which hardware
+        DMA channels (0 ... 6) will be registered with DmaEngine core.
+        If bit is set to 1, the corresponding DMA channel will be registered.
+        DMA channels not selected by this mask will not be used by this device
+        driver. Default value is 0x7f (use all channels).
+
+- 'pcie_mrrs'
+      - override value for PCIe Maximum Read Request Size (MRRS).
+        This parameter gives an ability to override MRRS value set during PCIe
+        configuration process. Tsi721 supports read request sizes up to 4096B.
+        Value for this parameter must be set as defined by PCIe specification:
+        0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
+        Default value is '-1' (= keep platform setting).
+
+- 'mbox_sel'
+      - RIO messaging MBOX selection mask. This is a bitmask that defines
+        messaging MBOXes are managed by this device driver. Mask bits 0 - 3
+        correspond to MBOX0 - MBOX3. MBOX is under driver's control if the
+        corresponding bit is set to '1'. Default value is 0x0f (= all).
+
+2. Known problems
+=================
+
+  None.
+
+3. DMA Engine Support
+=====================
+
+Tsi721 mport driver supports DMA data transfers between local system memory and
+remote RapidIO devices. This functionality is implemented according to SLAVE
+mode API defined by common Linux kernel DMA Engine framework.
+
+Depending on system requirements RapidIO DMA operations can be included/excluded
+by setting CONFIG_RAPIDIO_DMA_ENGINE option. Tsi721 miniport driver uses seven
+out of eight available BDMA channels to support DMA data transfers.
+One BDMA channel is reserved for generation of maintenance read/write requests.
+
+If Tsi721 mport driver have been built with RAPIDIO_DMA_ENGINE support included,
+this driver will accept DMA-specific module parameter:
+
+  "dma_desc_per_channel"
+			 - defines number of hardware buffer descriptors used by
+                           each BDMA channel of Tsi721 (by default - 128).
+
+4. Version History
+
+  =====   ====================================================================
+  1.1.0   DMA operations re-worked to support data scatter/gather lists larger
+          than hardware buffer descriptors ring.
+  1.0.0   Initial driver release.
+  =====   ====================================================================
+
+5.  License
+===========
+
+  Copyright(c) 2011 Integrated Device Technology, Inc. All rights reserved.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms of the GNU General Public License as published by the Free
+  Software Foundation; either version 2 of the License, or (at your option)
+  any later version.
+
+  This program is distributed in the hope that it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
diff --git a/Documentation/rapidio/tsi721.txt b/Documentation/rapidio/tsi721.txt
deleted file mode 100644
index cd2a2935d51d..000000000000
--- a/Documentation/rapidio/tsi721.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-RapidIO subsystem mport driver for IDT Tsi721 PCI Express-to-SRIO bridge.
-=========================================================================
-
-I. Overview
-
-This driver implements all currently defined RapidIO mport callback functions.
-It supports maintenance read and write operations, inbound and outbound RapidIO
-doorbells, inbound maintenance port-writes and RapidIO messaging.
-
-To generate SRIO maintenance transactions this driver uses one of Tsi721 DMA
-channels. This mechanism provides access to larger range of hop counts and
-destination IDs without need for changes in outbound window translation.
-
-RapidIO messaging support uses dedicated messaging channels for each mailbox.
-For inbound messages this driver uses destination ID matching to forward messages
-into the corresponding message queue. Messaging callbacks are implemented to be
-fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
-
-1. Module parameters:
-- 'dbg_level' - This parameter allows to control amount of debug information
-        generated by this device driver. This parameter is formed by set of
-        This parameter can be changed bit masks that correspond to the specific
-        functional block.
-        For mask definitions see 'drivers/rapidio/devices/tsi721.h'
-        This parameter can be changed dynamically.
-        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
-
-- 'dma_desc_per_channel' - This parameter defines number of hardware buffer
-        descriptors allocated for each registered Tsi721 DMA channel.
-        Its default value is 128.
-
-- 'dma_txqueue_sz' - DMA transactions queue size. Defines number of pending
-        transaction requests that can be accepted by each DMA channel.
-        Default value is 16.
-
-- 'dma_sel' - DMA channel selection mask. Bitmask that defines which hardware
-        DMA channels (0 ... 6) will be registered with DmaEngine core.
-        If bit is set to 1, the corresponding DMA channel will be registered.
-        DMA channels not selected by this mask will not be used by this device
-        driver. Default value is 0x7f (use all channels).
-
-- 'pcie_mrrs' - override value for PCIe Maximum Read Request Size (MRRS).
-        This parameter gives an ability to override MRRS value set during PCIe
-        configuration process. Tsi721 supports read request sizes up to 4096B.
-        Value for this parameter must be set as defined by PCIe specification:
-        0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
-        Default value is '-1' (= keep platform setting).
-
-- 'mbox_sel' - RIO messaging MBOX selection mask. This is a bitmask that defines
-        messaging MBOXes are managed by this device driver. Mask bits 0 - 3
-        correspond to MBOX0 - MBOX3. MBOX is under driver's control if the
-        corresponding bit is set to '1'. Default value is 0x0f (= all).
-
-II. Known problems
-
-  None.
-
-III. DMA Engine Support
-
-Tsi721 mport driver supports DMA data transfers between local system memory and
-remote RapidIO devices. This functionality is implemented according to SLAVE
-mode API defined by common Linux kernel DMA Engine framework.
-
-Depending on system requirements RapidIO DMA operations can be included/excluded
-by setting CONFIG_RAPIDIO_DMA_ENGINE option. Tsi721 miniport driver uses seven
-out of eight available BDMA channels to support DMA data transfers.
-One BDMA channel is reserved for generation of maintenance read/write requests.
-
-If Tsi721 mport driver have been built with RAPIDIO_DMA_ENGINE support included,
-this driver will accept DMA-specific module parameter:
-  "dma_desc_per_channel" - defines number of hardware buffer descriptors used by
-                           each BDMA channel of Tsi721 (by default - 128).
-
-IV. Version History
-
-  1.1.0 - DMA operations re-worked to support data scatter/gather lists larger
-          than hardware buffer descriptors ring.
-  1.0.0 - Initial driver release.
-
-V.  License
------------------------------------------------
-
-  Copyright(c) 2011 Integrated Device Technology, Inc. All rights reserved.
-
-  This program is free software; you can redistribute it and/or modify it
-  under the terms of the GNU General Public License as published by the Free
-  Software Foundation; either version 2 of the License, or (at your option)
-  any later version.
-
-  This program is distributed in the hope that it will be useful, but WITHOUT
-  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-  more details.
-
-  You should have received a copy of the GNU General Public License along with
-  this program; if not, write to the Free Software Foundation, Inc.,
-  59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig
index fadafc64705f..467e8fa06904 100644
--- a/drivers/rapidio/Kconfig
+++ b/drivers/rapidio/Kconfig
@@ -86,7 +86,7 @@ config RAPIDIO_CHMAN
 	  This option includes RapidIO channelized messaging driver which
 	  provides socket-like interface to allow sharing of single RapidIO
 	  messaging mailbox between multiple user-space applications.
-	  See "Documentation/rapidio/rio_cm.txt" for driver description.
+	  See "Documentation/rapidio/rio_cm.rst" for driver description.
 
 config RAPIDIO_MPORT_CDEV
 	tristate "RapidIO /dev mport device driver"
-- 
cgit v1.2.3-55-g7522


From 39443104c7d3f2b05a4a330fbcef6da68f80d60b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 17:29:24 -0300
Subject: docs: blockdev: convert to ReST

Rename the blockdev documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

The drbd sub-directory contains some graphs and data flows.
Add those too to the documentation.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt   |  18 +-
 Documentation/blockdev/drbd/README.txt            |  16 -
 Documentation/blockdev/drbd/data-structure-v9.rst |  42 +++
 Documentation/blockdev/drbd/data-structure-v9.txt |  38 --
 Documentation/blockdev/drbd/figures.rst           |  28 ++
 Documentation/blockdev/drbd/index.rst             |  19 +
 Documentation/blockdev/floppy.rst                 | 255 +++++++++++++
 Documentation/blockdev/floppy.txt                 | 245 ------------
 Documentation/blockdev/index.rst                  |  16 +
 Documentation/blockdev/nbd.rst                    |  31 ++
 Documentation/blockdev/nbd.txt                    |  31 --
 Documentation/blockdev/paride.rst                 | 439 ++++++++++++++++++++++
 Documentation/blockdev/paride.txt                 | 417 --------------------
 Documentation/blockdev/ramdisk.rst                | 177 +++++++++
 Documentation/blockdev/ramdisk.txt                | 174 ---------
 Documentation/blockdev/zram.rst                   | 422 +++++++++++++++++++++
 Documentation/blockdev/zram.txt                   | 355 -----------------
 MAINTAINERS                                       |   8 +-
 drivers/block/Kconfig                             |   8 +-
 drivers/block/floppy.c                            |   2 +-
 drivers/block/zram/Kconfig                        |   6 +-
 tools/testing/selftests/zram/README               |   2 +-
 22 files changed, 1451 insertions(+), 1298 deletions(-)
 delete mode 100644 Documentation/blockdev/drbd/README.txt
 create mode 100644 Documentation/blockdev/drbd/data-structure-v9.rst
 delete mode 100644 Documentation/blockdev/drbd/data-structure-v9.txt
 create mode 100644 Documentation/blockdev/drbd/figures.rst
 create mode 100644 Documentation/blockdev/drbd/index.rst
 create mode 100644 Documentation/blockdev/floppy.rst
 delete mode 100644 Documentation/blockdev/floppy.txt
 create mode 100644 Documentation/blockdev/index.rst
 create mode 100644 Documentation/blockdev/nbd.rst
 delete mode 100644 Documentation/blockdev/nbd.txt
 create mode 100644 Documentation/blockdev/paride.rst
 delete mode 100644 Documentation/blockdev/paride.txt
 create mode 100644 Documentation/blockdev/ramdisk.rst
 delete mode 100644 Documentation/blockdev/ramdisk.txt
 create mode 100644 Documentation/blockdev/zram.rst
 delete mode 100644 Documentation/blockdev/zram.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a342dd5c95a9..6b2adda1cc03 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,7 +1249,7 @@
 			See also Documentation/fault-injection/.
 
 	floppy=		[HW]
-			See Documentation/blockdev/floppy.txt.
+			See Documentation/blockdev/floppy.rst.
 
 	force_pal_cache_flush
 			[IA-64] Avoid check_sal_cache_flush which may hang on
@@ -2234,7 +2234,7 @@
 	memblock=debug	[KNL] Enable memblock debug messages.
 
 	load_ramdisk=	[RAM] List of ramdisks to load from floppy
-			See Documentation/blockdev/ramdisk.txt.
+			See Documentation/blockdev/ramdisk.rst.
 
 	lockd.nlm_grace_period=P  [NFS] Assign grace period.
 			Format: <integer>
@@ -3268,7 +3268,7 @@
 
 	pcd.		[PARIDE]
 			See header of drivers/block/paride/pcd.c.
-			See also Documentation/blockdev/paride.txt.
+			See also Documentation/blockdev/paride.rst.
 
 	pci=option[,option...]	[PCI] various PCI subsystem options.
 
@@ -3512,7 +3512,7 @@
 			needed on a platform with proper driver support.
 
 	pd.		[PARIDE]
-			See Documentation/blockdev/paride.txt.
+			See Documentation/blockdev/paride.rst.
 
 	pdcchassis=	[PARISC,HW] Disable/Enable PDC Chassis Status codes at
 			boot time.
@@ -3527,10 +3527,10 @@
 			and performance comparison.
 
 	pf.		[PARIDE]
-			See Documentation/blockdev/paride.txt.
+			See Documentation/blockdev/paride.rst.
 
 	pg.		[PARIDE]
-			See Documentation/blockdev/paride.txt.
+			See Documentation/blockdev/paride.rst.
 
 	pirq=		[SMP,APIC] Manual mp-table setup
 			See Documentation/x86/i386/IO-APIC.rst.
@@ -3642,7 +3642,7 @@
 
 	prompt_ramdisk=	[RAM] List of RAM disks to prompt for floppy disk
 			before loading.
-			See Documentation/blockdev/ramdisk.txt.
+			See Documentation/blockdev/ramdisk.rst.
 
 	psi=		[KNL] Enable or disable pressure stall information
 			tracking.
@@ -3664,7 +3664,7 @@
 	pstore.backend=	Specify the name of the pstore backend to use
 
 	pt.		[PARIDE]
-			See Documentation/blockdev/paride.txt.
+			See Documentation/blockdev/paride.rst.
 
 	pti=		[X86_64] Control Page Table Isolation of user and
 			kernel address spaces.  Disabling this feature
@@ -3693,7 +3693,7 @@
 			See Documentation/admin-guide/md.rst.
 
 	ramdisk_size=	[RAM] Sizes of RAM disks in kilobytes
-			See Documentation/blockdev/ramdisk.txt.
+			See Documentation/blockdev/ramdisk.rst.
 
 	random.trust_cpu={on,off}
 			[KNL] Enable or disable trusting the use of the
diff --git a/Documentation/blockdev/drbd/README.txt b/Documentation/blockdev/drbd/README.txt
deleted file mode 100644
index 627b0a1bf35e..000000000000
--- a/Documentation/blockdev/drbd/README.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-Description
-
-  DRBD is a shared-nothing, synchronously replicated block device. It
-  is designed to serve as a building block for high availability
-  clusters and in this context, is a "drop-in" replacement for shared
-  storage. Simplistically, you could see it as a network RAID 1.
-
-  Please visit http://www.drbd.org to find out more.
-
-The here included files are intended to help understand the implementation
-
-DRBD-8.3-data-packets.svg, DRBD-data-packets.svg  
-  relates some functions, and write packets.
-
-conn-states-8.dot, disk-states-8.dot, node-states-8.dot
-  The sub graphs of DRBD's state transitions
diff --git a/Documentation/blockdev/drbd/data-structure-v9.rst b/Documentation/blockdev/drbd/data-structure-v9.rst
new file mode 100644
index 000000000000..66036b901644
--- /dev/null
+++ b/Documentation/blockdev/drbd/data-structure-v9.rst
@@ -0,0 +1,42 @@
+================================
+kernel data structure for DRBD-9
+================================
+
+This describes the in kernel data structure for DRBD-9. Starting with
+Linux v3.14 we are reorganizing DRBD to use this data structure.
+
+Basic Data Structure
+====================
+
+A node has a number of DRBD resources.  Each such resource has a number of
+devices (aka volumes) and connections to other nodes ("peer nodes"). Each DRBD
+device is represented by a block device locally.
+
+The DRBD objects are interconnected to form a matrix as depicted below; a
+drbd_peer_device object sits at each intersection between a drbd_device and a
+drbd_connection::
+
+  /--------------+---------------+.....+---------------\
+  |   resource   |    device     |     |    device     |
+  +--------------+---------------+.....+---------------+
+  |  connection  |  peer_device  |     |  peer_device  |
+  +--------------+---------------+.....+---------------+
+  :              :               :     :               :
+  :              :               :     :               :
+  +--------------+---------------+.....+---------------+
+  |  connection  |  peer_device  |     |  peer_device  |
+  \--------------+---------------+.....+---------------/
+
+In this table, horizontally, devices can be accessed from resources by their
+volume number.  Likewise, peer_devices can be accessed from connections by
+their volume number.  Objects in the vertical direction are connected by double
+linked lists.  There are back pointers from peer_devices to their connections a
+devices, and from connections and devices to their resource.
+
+All resources are in the drbd_resources double-linked list.  In addition, all
+devices can be accessed by their minor device number via the drbd_devices idr.
+
+The drbd_resource, drbd_connection, and drbd_device objects are reference
+counted.  The peer_device objects only serve to establish the links between
+devices and connections; their lifetime is determined by the lifetime of the
+device and connection which they reference.
diff --git a/Documentation/blockdev/drbd/data-structure-v9.txt b/Documentation/blockdev/drbd/data-structure-v9.txt
deleted file mode 100644
index 1e52a0e32624..000000000000
--- a/Documentation/blockdev/drbd/data-structure-v9.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-This describes the in kernel data structure for DRBD-9. Starting with
-Linux v3.14 we are reorganizing DRBD to use this data structure.
-
-Basic Data Structure
-====================
-
-A node has a number of DRBD resources.  Each such resource has a number of
-devices (aka volumes) and connections to other nodes ("peer nodes"). Each DRBD
-device is represented by a block device locally.
-
-The DRBD objects are interconnected to form a matrix as depicted below; a
-drbd_peer_device object sits at each intersection between a drbd_device and a
-drbd_connection:
-
-  /--------------+---------------+.....+---------------\
-  |   resource   |    device     |     |    device     |
-  +--------------+---------------+.....+---------------+
-  |  connection  |  peer_device  |     |  peer_device  |
-  +--------------+---------------+.....+---------------+
-  :              :               :     :               :
-  :              :               :     :               :
-  +--------------+---------------+.....+---------------+
-  |  connection  |  peer_device  |     |  peer_device  |
-  \--------------+---------------+.....+---------------/
-
-In this table, horizontally, devices can be accessed from resources by their
-volume number.  Likewise, peer_devices can be accessed from connections by
-their volume number.  Objects in the vertical direction are connected by double
-linked lists.  There are back pointers from peer_devices to their connections a
-devices, and from connections and devices to their resource.
-
-All resources are in the drbd_resources double-linked list.  In addition, all
-devices can be accessed by their minor device number via the drbd_devices idr.
-
-The drbd_resource, drbd_connection, and drbd_device objects are reference
-counted.  The peer_device objects only serve to establish the links between
-devices and connections; their lifetime is determined by the lifetime of the
-device and connection which they reference.
diff --git a/Documentation/blockdev/drbd/figures.rst b/Documentation/blockdev/drbd/figures.rst
new file mode 100644
index 000000000000..3e3fd4b8a478
--- /dev/null
+++ b/Documentation/blockdev/drbd/figures.rst
@@ -0,0 +1,28 @@
+.. The here included files are intended to help understand the implementation
+
+Data flows that Relate some functions, and write packets
+========================================================
+
+.. kernel-figure:: DRBD-8.3-data-packets.svg
+    :alt:   DRBD-8.3-data-packets.svg
+    :align: center
+
+.. kernel-figure:: DRBD-data-packets.svg
+    :alt:   DRBD-data-packets.svg
+    :align: center
+
+
+Sub graphs of DRBD's state transitions
+======================================
+
+.. kernel-figure:: conn-states-8.dot
+    :alt:   conn-states-8.dot
+    :align: center
+
+.. kernel-figure:: disk-states-8.dot
+    :alt:   disk-states-8.dot
+    :align: center
+
+.. kernel-figure:: node-states-8.dot
+    :alt:   node-states-8.dot
+    :align: center
diff --git a/Documentation/blockdev/drbd/index.rst b/Documentation/blockdev/drbd/index.rst
new file mode 100644
index 000000000000..68ecd5c113e9
--- /dev/null
+++ b/Documentation/blockdev/drbd/index.rst
@@ -0,0 +1,19 @@
+==========================================
+Distributed Replicated Block Device - DRBD
+==========================================
+
+Description
+===========
+
+  DRBD is a shared-nothing, synchronously replicated block device. It
+  is designed to serve as a building block for high availability
+  clusters and in this context, is a "drop-in" replacement for shared
+  storage. Simplistically, you could see it as a network RAID 1.
+
+  Please visit http://www.drbd.org to find out more.
+
+.. toctree::
+   :maxdepth: 1
+
+   data-structure-v9
+   figures
diff --git a/Documentation/blockdev/floppy.rst b/Documentation/blockdev/floppy.rst
new file mode 100644
index 000000000000..4a8f31cf4139
--- /dev/null
+++ b/Documentation/blockdev/floppy.rst
@@ -0,0 +1,255 @@
+=============
+Floppy Driver
+=============
+
+FAQ list:
+=========
+
+A FAQ list may be found in the fdutils package (see below), and also
+at <http://fdutils.linux.lu/faq.html>.
+
+
+LILO configuration options (Thinkpad users, read this)
+======================================================
+
+The floppy driver is configured using the 'floppy=' option in
+lilo. This option can be typed at the boot prompt, or entered in the
+lilo configuration file.
+
+Example: If your kernel is called linux-2.6.9, type the following line
+at the lilo boot prompt (if you have a thinkpad)::
+
+ linux-2.6.9 floppy=thinkpad
+
+You may also enter the following line in /etc/lilo.conf, in the description
+of linux-2.6.9::
+
+ append = "floppy=thinkpad"
+
+Several floppy related options may be given, example::
+
+ linux-2.6.9 floppy=daring floppy=two_fdc
+ append = "floppy=daring floppy=two_fdc"
+
+If you give options both in the lilo config file and on the boot
+prompt, the option strings of both places are concatenated, the boot
+prompt options coming last. That's why there are also options to
+restore the default behavior.
+
+
+Module configuration options
+============================
+
+If you use the floppy driver as a module, use the following syntax::
+
+	modprobe floppy floppy="<options>"
+
+Example::
+
+	modprobe floppy floppy="omnibook messages"
+
+If you need certain options enabled every time you load the floppy driver,
+you can put::
+
+	options floppy floppy="omnibook messages"
+
+in a configuration file in /etc/modprobe.d/.
+
+
+The floppy driver related options are:
+
+ floppy=asus_pci
+	Sets the bit mask to allow only units 0 and 1. (default)
+
+ floppy=daring
+	Tells the floppy driver that you have a well behaved floppy controller.
+	This allows more efficient and smoother operation, but may fail on
+	certain controllers. This may speed up certain operations.
+
+ floppy=0,daring
+	Tells the floppy driver that your floppy controller should be used
+	with caution.
+
+ floppy=one_fdc
+	Tells the floppy driver that you have only one floppy controller.
+	(default)
+
+ floppy=two_fdc / floppy=<address>,two_fdc
+	Tells the floppy driver that you have two floppy controllers.
+	The second floppy controller is assumed to be at <address>.
+	This option is not needed if the second controller is at address
+	0x370, and if you use the 'cmos' option.
+
+ floppy=thinkpad
+	Tells the floppy driver that you have a Thinkpad. Thinkpads use an
+	inverted convention for the disk change line.
+
+ floppy=0,thinkpad
+	Tells the floppy driver that you don't have a Thinkpad.
+
+ floppy=omnibook / floppy=nodma
+	Tells the floppy driver not to use Dma for data transfers.
+	This is needed on HP Omnibooks, which don't have a workable
+	DMA channel for the floppy driver. This option is also useful
+	if you frequently get "Unable to allocate DMA memory" messages.
+	Indeed, dma memory needs to be continuous in physical memory,
+	and is thus harder to find, whereas non-dma buffers may be
+	allocated in virtual memory. However, I advise against this if
+	you have an FDC without a FIFO (8272A or 82072). 82072A and
+	later are OK. You also need at least a 486 to use nodma.
+	If you use nodma mode, I suggest you also set the FIFO
+	threshold to 10 or lower, in order to limit the number of data
+	transfer interrupts.
+
+	If you have a FIFO-able FDC, the floppy driver automatically
+	falls back on non DMA mode if no DMA-able memory can be found.
+	If you want to avoid this, explicitly ask for 'yesdma'.
+
+ floppy=yesdma
+	Tells the floppy driver that a workable DMA channel is available.
+	(default)
+
+ floppy=nofifo
+	Disables the FIFO entirely. This is needed if you get "Bus
+	master arbitration error" messages from your Ethernet card (or
+	from other devices) while accessing the floppy.
+
+ floppy=usefifo
+	Enables the FIFO. (default)
+
+ floppy=<threshold>,fifo_depth
+	Sets the FIFO threshold. This is mostly relevant in DMA
+	mode. If this is higher, the floppy driver tolerates more
+	interrupt latency, but it triggers more interrupts (i.e. it
+	imposes more load on the rest of the system). If this is
+	lower, the interrupt latency should be lower too (faster
+	processor). The benefit of a lower threshold is less
+	interrupts.
+
+	To tune the fifo threshold, switch on over/underrun messages
+	using 'floppycontrol --messages'. Then access a floppy
+	disk. If you get a huge amount of "Over/Underrun - retrying"
+	messages, then the fifo threshold is too low. Try with a
+	higher value, until you only get an occasional Over/Underrun.
+	It is a good idea to compile the floppy driver as a module
+	when doing this tuning. Indeed, it allows to try different
+	fifo values without rebooting the machine for each test. Note
+	that you need to do 'floppycontrol --messages' every time you
+	re-insert the module.
+
+	Usually, tuning the fifo threshold should not be needed, as
+	the default (0xa) is reasonable.
+
+ floppy=<drive>,<type>,cmos
+	Sets the CMOS type of <drive> to <type>. This is mandatory if
+	you have more than two floppy drives (only two can be
+	described in the physical CMOS), or if your BIOS uses
+	non-standard CMOS types. The CMOS types are:
+
+	       ==  ==================================
+		0  Use the value of the physical CMOS
+		1  5 1/4 DD
+		2  5 1/4 HD
+		3  3 1/2 DD
+		4  3 1/2 HD
+		5  3 1/2 ED
+		6  3 1/2 ED
+	       16  unknown or not installed
+	       ==  ==================================
+
+	(Note: there are two valid types for ED drives. This is because 5 was
+	initially chosen to represent floppy *tapes*, and 6 for ED drives.
+	AMI ignored this, and used 5 for ED drives. That's why the floppy
+	driver handles both.)
+
+ floppy=unexpected_interrupts
+	Print a warning message when an unexpected interrupt is received.
+	(default)
+
+ floppy=no_unexpected_interrupts / floppy=L40SX
+	Don't print a message when an unexpected interrupt is received. This
+	is needed on IBM L40SX laptops in certain video modes. (There seems
+	to be an interaction between video and floppy. The unexpected
+	interrupts affect only performance, and can be safely ignored.)
+
+ floppy=broken_dcl
+	Don't use the disk change line, but assume that the disk was
+	changed whenever the device node is reopened. Needed on some
+	boxes where the disk change line is broken or unsupported.
+	This should be regarded as a stopgap measure, indeed it makes
+	floppy operation less efficient due to unneeded cache
+	flushings, and slightly more unreliable. Please verify your
+	cable, connection and jumper settings if you have any DCL
+	problems. However, some older drives, and also some laptops
+	are known not to have a DCL.
+
+ floppy=debug
+	Print debugging messages.
+
+ floppy=messages
+	Print informational messages for some operations (disk change
+	notifications, warnings about over and underruns, and about
+	autodetection).
+
+ floppy=silent_dcl_clear
+	Uses a less noisy way to clear the disk change line (which
+	doesn't involve seeks). Implied by 'daring' option.
+
+ floppy=<nr>,irq
+	Sets the floppy IRQ to <nr> instead of 6.
+
+ floppy=<nr>,dma
+	Sets the floppy DMA channel to <nr> instead of 2.
+
+ floppy=slow
+	Use PS/2 stepping rate::
+
+	   PS/2 floppies have much slower step rates than regular floppies.
+	   It's been recommended that take about 1/4 of the default speed
+	   in some more extreme cases.
+
+
+Supporting utilities and additional documentation:
+==================================================
+
+Additional parameters of the floppy driver can be configured at
+runtime. Utilities which do this can be found in the fdutils package.
+This package also contains a new version of mtools which allows to
+access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
+It also contains additional documentation about the floppy driver.
+
+The latest version can be found at fdutils homepage:
+
+ http://fdutils.linux.lu
+
+The fdutils releases can be found at:
+
+ http://fdutils.linux.lu/download.html
+
+ http://www.tux.org/pub/knaff/fdutils/
+
+ ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
+
+Reporting problems about the floppy driver
+==========================================
+
+If you have a question or a bug report about the floppy driver, mail
+me at Alain.Knaff@poboxes.com . If you post to Usenet, preferably use
+comp.os.linux.hardware. As the volume in these groups is rather high,
+be sure to include the word "floppy" (or "FLOPPY") in the subject
+line.  If the reported problem happens when mounting floppy disks, be
+sure to mention also the type of the filesystem in the subject line.
+
+Be sure to read the FAQ before mailing/posting any bug reports!
+
+Alain
+
+Changelog
+=========
+
+10-30-2004 :
+		Cleanup, updating, add reference to module configuration.
+		James Nelson <james4765@gmail.com>
+
+6-3-2000 :
+		Original Document
diff --git a/Documentation/blockdev/floppy.txt b/Documentation/blockdev/floppy.txt
deleted file mode 100644
index e2240f5ab64d..000000000000
--- a/Documentation/blockdev/floppy.txt
+++ /dev/null
@@ -1,245 +0,0 @@
-This file describes the floppy driver.
-
-FAQ list:
-=========
-
- A FAQ list may be found in the fdutils package (see below), and also
-at <http://fdutils.linux.lu/faq.html>.
-
-
-LILO configuration options (Thinkpad users, read this)
-======================================================
-
- The floppy driver is configured using the 'floppy=' option in
-lilo. This option can be typed at the boot prompt, or entered in the
-lilo configuration file.
-
- Example: If your kernel is called linux-2.6.9, type the following line
-at the lilo boot prompt (if you have a thinkpad):
-
- linux-2.6.9 floppy=thinkpad
-
-You may also enter the following line in /etc/lilo.conf, in the description
-of linux-2.6.9:
-
- append = "floppy=thinkpad"
-
- Several floppy related options may be given, example:
-
- linux-2.6.9 floppy=daring floppy=two_fdc
- append = "floppy=daring floppy=two_fdc"
-
- If you give options both in the lilo config file and on the boot
-prompt, the option strings of both places are concatenated, the boot
-prompt options coming last. That's why there are also options to
-restore the default behavior.
-
-
-Module configuration options
-============================
-
- If you use the floppy driver as a module, use the following syntax:
-modprobe floppy floppy="<options>"
-
-Example:
- modprobe floppy floppy="omnibook messages"
-
- If you need certain options enabled every time you load the floppy driver,
-you can put:
-
- options floppy floppy="omnibook messages"
-
-in a configuration file in /etc/modprobe.d/.
-
-
- The floppy driver related options are:
-
- floppy=asus_pci
-	Sets the bit mask to allow only units 0 and 1. (default)
-
- floppy=daring
-	Tells the floppy driver that you have a well behaved floppy controller.
-	This allows more efficient and smoother operation, but may fail on
-	certain controllers. This may speed up certain operations.
-
- floppy=0,daring
-	Tells the floppy driver that your floppy controller should be used
-	with caution.
-
- floppy=one_fdc
-	Tells the floppy driver that you have only one floppy controller.
-	(default)
-
- floppy=two_fdc
- floppy=<address>,two_fdc
-	Tells the floppy driver that you have two floppy controllers.
-	The second floppy controller is assumed to be at <address>.
-	This option is not needed if the second controller is at address
-	0x370, and if you use the 'cmos' option.
-
- floppy=thinkpad
-	Tells the floppy driver that you have a Thinkpad. Thinkpads use an
-	inverted convention for the disk change line.
-
- floppy=0,thinkpad
-	Tells the floppy driver that you don't have a Thinkpad.
-
- floppy=omnibook
- floppy=nodma
-	Tells the floppy driver not to use Dma for data transfers.
-	This is needed on HP Omnibooks, which don't have a workable
-	DMA channel for the floppy driver. This option is also useful
-	if you frequently get "Unable to allocate DMA memory" messages.
-	Indeed, dma memory needs to be continuous in physical memory,
-	and is thus harder to find, whereas non-dma buffers may be
-	allocated in virtual memory. However, I advise against this if
-	you have an FDC without a FIFO (8272A or 82072). 82072A and
-	later are OK. You also need at least a 486 to use nodma.
-	If you use nodma mode, I suggest you also set the FIFO
-	threshold to 10 or lower, in order to limit the number of data
-	transfer interrupts.
-
-	If you have a FIFO-able FDC, the floppy driver automatically
-	falls back on non DMA mode if no DMA-able memory can be found.
-	If you want to avoid this, explicitly ask for 'yesdma'.
-
- floppy=yesdma
-	Tells the floppy driver that a workable DMA channel is available.
-	(default)
-
- floppy=nofifo
-	Disables the FIFO entirely. This is needed if you get "Bus
-	master arbitration error" messages from your Ethernet card (or
-	from other devices) while accessing the floppy.
-
- floppy=usefifo
-	Enables the FIFO. (default)
-
- floppy=<threshold>,fifo_depth
-	Sets the FIFO threshold. This is mostly relevant in DMA
-	mode. If this is higher, the floppy driver tolerates more
-	interrupt latency, but it triggers more interrupts (i.e. it
-	imposes more load on the rest of the system). If this is
-	lower, the interrupt latency should be lower too (faster
-	processor). The benefit of a lower threshold is less
-	interrupts.
-
-	To tune the fifo threshold, switch on over/underrun messages
-	using 'floppycontrol --messages'. Then access a floppy
-	disk. If you get a huge amount of "Over/Underrun - retrying"
-	messages, then the fifo threshold is too low. Try with a
-	higher value, until you only get an occasional Over/Underrun.
-	It is a good idea to compile the floppy driver as a module
-	when doing this tuning. Indeed, it allows to try different
-	fifo values without rebooting the machine for each test. Note
-	that you need to do 'floppycontrol --messages' every time you
-	re-insert the module.
-
-	Usually, tuning the fifo threshold should not be needed, as
-	the default (0xa) is reasonable.
-
- floppy=<drive>,<type>,cmos
-	Sets the CMOS type of <drive> to <type>. This is mandatory if
-	you have more than two floppy drives (only two can be
-	described in the physical CMOS), or if your BIOS uses
-	non-standard CMOS types. The CMOS types are:
-
-		0 - Use the value of the physical CMOS
-		1 - 5 1/4 DD
-		2 - 5 1/4 HD
-		3 - 3 1/2 DD
-		4 - 3 1/2 HD
-		5 - 3 1/2 ED
-		6 - 3 1/2 ED
-	       16 - unknown or not installed
-
-	(Note: there are two valid types for ED drives. This is because 5 was
-	initially chosen to represent floppy *tapes*, and 6 for ED drives.
-	AMI ignored this, and used 5 for ED drives. That's why the floppy
-	driver handles both.)
-
- floppy=unexpected_interrupts
-	Print a warning message when an unexpected interrupt is received.
-	(default)
-
- floppy=no_unexpected_interrupts
- floppy=L40SX
-	Don't print a message when an unexpected interrupt is received. This
-	is needed on IBM L40SX laptops in certain video modes. (There seems
-	to be an interaction between video and floppy. The unexpected
-	interrupts affect only performance, and can be safely ignored.)
-
- floppy=broken_dcl
-	Don't use the disk change line, but assume that the disk was
-	changed whenever the device node is reopened. Needed on some
-	boxes where the disk change line is broken or unsupported.
-	This should be regarded as a stopgap measure, indeed it makes
-	floppy operation less efficient due to unneeded cache
-	flushings, and slightly more unreliable. Please verify your
-	cable, connection and jumper settings if you have any DCL
-	problems. However, some older drives, and also some laptops
-	are known not to have a DCL.
-
- floppy=debug
-	Print debugging messages.
-
- floppy=messages
-	Print informational messages for some operations (disk change
-	notifications, warnings about over and underruns, and about
-	autodetection).
-
- floppy=silent_dcl_clear
-	Uses a less noisy way to clear the disk change line (which
-	doesn't involve seeks). Implied by 'daring' option.
-
- floppy=<nr>,irq
-	Sets the floppy IRQ to <nr> instead of 6.
-
- floppy=<nr>,dma
-	Sets the floppy DMA channel to <nr> instead of 2.
-
- floppy=slow
-	Use PS/2 stepping rate:
-	 " PS/2 floppies have much slower step rates than regular floppies.
-	   It's been recommended that take about 1/4 of the default speed
-	   in some more extreme cases."
-
-
-Supporting utilities and additional documentation:
-==================================================
-
- Additional parameters of the floppy driver can be configured at
-runtime. Utilities which do this can be found in the fdutils package.
-This package also contains a new version of mtools which allows to
-access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
-It also contains additional documentation about the floppy driver.
-
-The latest version can be found at fdutils homepage:
- http://fdutils.linux.lu
-
-The fdutils releases can be found at:
- http://fdutils.linux.lu/download.html
- http://www.tux.org/pub/knaff/fdutils/
- ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
-
-Reporting problems about the floppy driver
-==========================================
-
- If you have a question or a bug report about the floppy driver, mail
-me at Alain.Knaff@poboxes.com . If you post to Usenet, preferably use
-comp.os.linux.hardware. As the volume in these groups is rather high,
-be sure to include the word "floppy" (or "FLOPPY") in the subject
-line.  If the reported problem happens when mounting floppy disks, be
-sure to mention also the type of the filesystem in the subject line.
-
- Be sure to read the FAQ before mailing/posting any bug reports!
-
- Alain
-
-Changelog
-=========
-
-10-30-2004 :	Cleanup, updating, add reference to module configuration.
-		James Nelson <james4765@gmail.com>
-
-6-3-2000 :	Original Document
diff --git a/Documentation/blockdev/index.rst b/Documentation/blockdev/index.rst
new file mode 100644
index 000000000000..a9af6ed8b4aa
--- /dev/null
+++ b/Documentation/blockdev/index.rst
@@ -0,0 +1,16 @@
+:orphan:
+
+===========================
+The Linux RapidIO Subsystem
+===========================
+
+.. toctree::
+   :maxdepth: 1
+
+   floppy
+   nbd
+   paride
+   ramdisk
+   zram
+
+   drbd/index
diff --git a/Documentation/blockdev/nbd.rst b/Documentation/blockdev/nbd.rst
new file mode 100644
index 000000000000..d78dfe559dcf
--- /dev/null
+++ b/Documentation/blockdev/nbd.rst
@@ -0,0 +1,31 @@
+==================================
+Network Block Device (TCP version)
+==================================
+
+1) Overview
+-----------
+
+What is it: With this compiled in the kernel (or as a module), Linux
+can use a remote server as one of its block devices. So every time
+the client computer wants to read, e.g., /dev/nb0, it sends a
+request over TCP to the server, which will reply with the data read.
+This can be used for stations with low disk space (or even diskless)
+to borrow disk space from another computer.
+Unlike NFS, it is possible to put any filesystem on it, etc.
+
+For more information, or to download the nbd-client and nbd-server
+tools, go to http://nbd.sf.net/.
+
+The nbd kernel module need only be installed on the client
+system, as the nbd-server is completely in userspace. In fact,
+the nbd-server has been successfully ported to other operating
+systems, including Windows.
+
+A) NBD parameters
+-----------------
+
+max_part
+	Number of partitions per device (default: 0).
+
+nbds_max
+	Number of block devices that should be initialized (default: 16).
diff --git a/Documentation/blockdev/nbd.txt b/Documentation/blockdev/nbd.txt
deleted file mode 100644
index db242ea2bce8..000000000000
--- a/Documentation/blockdev/nbd.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-Network Block Device (TCP version)
-==================================
-
-1) Overview
------------
-
-What is it: With this compiled in the kernel (or as a module), Linux
-can use a remote server as one of its block devices. So every time
-the client computer wants to read, e.g., /dev/nb0, it sends a
-request over TCP to the server, which will reply with the data read.
-This can be used for stations with low disk space (or even diskless)
-to borrow disk space from another computer.
-Unlike NFS, it is possible to put any filesystem on it, etc.
-
-For more information, or to download the nbd-client and nbd-server
-tools, go to http://nbd.sf.net/.
-
-The nbd kernel module need only be installed on the client
-system, as the nbd-server is completely in userspace. In fact,
-the nbd-server has been successfully ported to other operating
-systems, including Windows.
-
-A) NBD parameters
------------------
-
-max_part
-	Number of partitions per device (default: 0).
-
-nbds_max
-	Number of block devices that should be initialized (default: 16).
-
diff --git a/Documentation/blockdev/paride.rst b/Documentation/blockdev/paride.rst
new file mode 100644
index 000000000000..87b4278bf314
--- /dev/null
+++ b/Documentation/blockdev/paride.rst
@@ -0,0 +1,439 @@
+===================================
+Linux and parallel port IDE devices
+===================================
+
+PARIDE v1.03   (c) 1997-8  Grant Guenther <grant@torque.net>
+
+1. Introduction
+===============
+
+Owing to the simplicity and near universality of the parallel port interface
+to personal computers, many external devices such as portable hard-disk,
+CD-ROM, LS-120 and tape drives use the parallel port to connect to their
+host computer.  While some devices (notably scanners) use ad-hoc methods
+to pass commands and data through the parallel port interface, most
+external devices are actually identical to an internal model, but with
+a parallel-port adapter chip added in.  Some of the original parallel port
+adapters were little more than mechanisms for multiplexing a SCSI bus.
+(The Iomega PPA-3 adapter used in the ZIP drives is an example of this
+approach).  Most current designs, however, take a different approach.
+The adapter chip reproduces a small ISA or IDE bus in the external device
+and the communication protocol provides operations for reading and writing
+device registers, as well as data block transfer functions.  Sometimes,
+the device being addressed via the parallel cable is a standard SCSI
+controller like an NCR 5380.  The "ditto" family of external tape
+drives use the ISA replicator to interface a floppy disk controller,
+which is then connected to a floppy-tape mechanism.  The vast majority
+of external parallel port devices, however, are now based on standard
+IDE type devices, which require no intermediate controller.  If one
+were to open up a parallel port CD-ROM drive, for instance, one would
+find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
+that interconnected a standard PC parallel port cable and a standard
+IDE cable.  It is usually possible to exchange the CD-ROM device with
+any other device using the IDE interface.
+
+The document describes the support in Linux for parallel port IDE
+devices.  It does not cover parallel port SCSI devices, "ditto" tape
+drives or scanners.  Many different devices are supported by the
+parallel port IDE subsystem, including:
+
+	- MicroSolutions backpack CD-ROM
+	- MicroSolutions backpack PD/CD
+	- MicroSolutions backpack hard-drives
+	- MicroSolutions backpack 8000t tape drive
+	- SyQuest EZ-135, EZ-230 & SparQ drives
+	- Avatar Shark
+	- Imation Superdisk LS-120
+	- Maxell Superdisk LS-120
+	- FreeCom Power CD
+	- Hewlett-Packard 5GB and 8GB tape drives
+	- Hewlett-Packard 7100 and 7200 CD-RW drives
+
+as well as most of the clone and no-name products on the market.
+
+To support such a wide range of devices, PARIDE, the parallel port IDE
+subsystem, is actually structured in three parts.   There is a base
+paride module which provides a registry and some common methods for
+accessing the parallel ports.  The second component is a set of
+high-level drivers for each of the different types of supported devices:
+
+	===	=============
+	pd	IDE disk
+	pcd	ATAPI CD-ROM
+	pf	ATAPI disk
+	pt	ATAPI tape
+	pg	ATAPI generic
+	===	=============
+
+(Currently, the pg driver is only used with CD-R drives).
+
+The high-level drivers function according to the relevant standards.
+The third component of PARIDE is a set of low-level protocol drivers
+for each of the parallel port IDE adapter chips.  Thanks to the interest
+and encouragement of Linux users from many parts of the world,
+support is available for almost all known adapter protocols:
+
+	====    ====================================== ====
+        aten    ATEN EH-100                            (HK)
+        bpck    Microsolutions backpack                (US)
+        comm    DataStor (old-type) "commuter" adapter (TW)
+        dstr    DataStor EP-2000                       (TW)
+        epat    Shuttle EPAT                           (UK)
+        epia    Shuttle EPIA                           (UK)
+	fit2    FIT TD-2000			       (US)
+	fit3    FIT TD-3000			       (US)
+	friq    Freecom IQ cable                       (DE)
+        frpw    Freecom Power                          (DE)
+        kbic    KingByte KBIC-951A and KBIC-971A       (TW)
+	ktti    KT Technology PHd adapter              (SG)
+        on20    OnSpec 90c20                           (US)
+        on26    OnSpec 90c26                           (US)
+	====    ====================================== ====
+
+
+2. Using the PARIDE subsystem
+=============================
+
+While configuring the Linux kernel, you may choose either to build
+the PARIDE drivers into your kernel, or to build them as modules.
+
+In either case, you will need to select "Parallel port IDE device support"
+as well as at least one of the high-level drivers and at least one
+of the parallel port communication protocols.  If you do not know
+what kind of parallel port adapter is used in your drive, you could
+begin by checking the file names and any text files on your DOS
+installation floppy.  Alternatively, you can look at the markings on
+the adapter chip itself.  That's usually sufficient to identify the
+correct device.
+
+You can actually select all the protocol modules, and allow the PARIDE
+subsystem to try them all for you.
+
+For the "brand-name" products listed above, here are the protocol
+and high-level drivers that you would use:
+
+	================	============	======	========
+	Manufacturer		Model		Driver	Protocol
+	================	============	======	========
+	MicroSolutions		CD-ROM		pcd	bpck
+	MicroSolutions		PD drive	pf	bpck
+	MicroSolutions		hard-drive	pd	bpck
+	MicroSolutions          8000t tape      pt      bpck
+	SyQuest			EZ, SparQ	pd	epat
+	Imation			Superdisk	pf	epat
+	Maxell                  Superdisk       pf      friq
+	Avatar			Shark		pd	epat
+	FreeCom			CD-ROM		pcd	frpw
+	Hewlett-Packard		5GB Tape	pt	epat
+	Hewlett-Packard		7200e (CD)	pcd	epat
+	Hewlett-Packard		7200e (CD-R)	pg	epat
+	================	============	======	========
+
+2.1  Configuring built-in drivers
+---------------------------------
+
+We recommend that you get to know how the drivers work and how to
+configure them as loadable modules, before attempting to compile a
+kernel with the drivers built-in.
+
+If you built all of your PARIDE support directly into your kernel,
+and you have just a single parallel port IDE device, your kernel should
+locate it automatically for you.  If you have more than one device,
+you may need to give some command line options to your bootloader
+(eg: LILO), how to do that is beyond the scope of this document.
+
+The high-level drivers accept a number of command line parameters, all
+of which are documented in the source files in linux/drivers/block/paride.
+By default, each driver will automatically try all parallel ports it
+can find, and all protocol types that have been installed, until it finds
+a parallel port IDE adapter.  Once it finds one, the probe stops.  So,
+if you have more than one device, you will need to tell the drivers
+how to identify them.  This requires specifying the port address, the
+protocol identification number and, for some devices, the drive's
+chain ID.  While your system is booting, a number of messages are
+displayed on the console.  Like all such messages, they can be
+reviewed with the 'dmesg' command.  Among those messages will be
+some lines like::
+
+	paride: bpck registered as protocol 0
+	paride: epat registered as protocol 1
+
+The numbers will always be the same until you build a new kernel with
+different protocol selections.  You should note these numbers as you
+will need them to identify the devices.
+
+If you happen to be using a MicroSolutions backpack device, you will
+also need to know the unit ID number for each drive.  This is usually
+the last two digits of the drive's serial number (but read MicroSolutions'
+documentation about this).
+
+As an example, let's assume that you have a MicroSolutions PD/CD drive
+with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
+EZ-135 connected to the chained port on the PD/CD drive and also an
+Imation Superdisk connected to port 0x278.  You could give the following
+options on your boot command::
+
+	pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
+
+In the last option, pf.drive1 configures device /dev/pf1, the 0x378
+is the parallel port base address, the 0 is the protocol registration
+number and 36 is the chain ID.
+
+Please note:  while PARIDE will work both with and without the
+PARPORT parallel port sharing system that is included by the
+"Parallel port support" option, PARPORT must be included and enabled
+if you want to use chains of devices on the same parallel port.
+
+2.2  Loading and configuring PARIDE as modules
+----------------------------------------------
+
+It is much faster and simpler to get to understand the PARIDE drivers
+if you use them as loadable kernel modules.
+
+Note 1:
+	using these drivers with the "kerneld" automatic module loading
+	system is not recommended for beginners, and is not documented here.
+
+Note 2:
+	if you build PARPORT support as a loadable module, PARIDE must
+	also be built as loadable modules, and PARPORT must be loaded before
+	the PARIDE modules.
+
+To use PARIDE, you must begin by::
+
+	insmod paride
+
+this loads a base module which provides a registry for the protocols,
+among other tasks.
+
+Then, load as many of the protocol modules as you think you might need.
+As you load each module, it will register the protocols that it supports,
+and print a log message to your kernel log file and your console. For
+example::
+
+	# insmod epat
+	paride: epat registered as protocol 0
+	# insmod kbic
+	paride: k951 registered as protocol 1
+        paride: k971 registered as protocol 2
+
+Finally, you can load high-level drivers for each kind of device that
+you have connected.  By default, each driver will autoprobe for a single
+device, but you can support up to four similar devices by giving their
+individual co-ordinates when you load the driver.
+
+For example, if you had two no-name CD-ROM drives both using the
+KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
+you could give the following command::
+
+	# insmod pcd drive0=0x378,1 drive1=0x3bc,1
+
+For most adapters, giving a port address and protocol number is sufficient,
+but check the source files in linux/drivers/block/paride for more
+information.  (Hopefully someone will write some man pages one day !).
+
+As another example, here's what happens when PARPORT is installed, and
+a SyQuest EZ-135 is attached to port 0x378::
+
+	# insmod paride
+	paride: version 1.0 installed
+	# insmod epat
+	paride: epat registered as protocol 0
+	# insmod pd
+	pd: pd version 1.0, major 45, cluster 64, nice 0
+	pda: Sharing parport1 at 0x378
+	pda: epat 1.0, Shuttle EPAT chip c3 at 0x378, mode 5 (EPP-32), delay 1
+	pda: SyQuest EZ135A, 262144 blocks [128M], (512/16/32), removable media
+	 pda: pda1
+
+Note that the last line is the output from the generic partition table
+scanner - in this case it reports that it has found a disk with one partition.
+
+2.3  Using a PARIDE device
+--------------------------
+
+Once the drivers have been loaded, you can access PARIDE devices in the
+same way as their traditional counterparts.  You will probably need to
+create the device "special files".  Here is a simple script that you can
+cut to a file and execute::
+
+  #!/bin/bash
+  #
+  # mkd -- a script to create the device special files for the PARIDE subsystem
+  #
+  function mkdev {
+    mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
+  }
+  #
+  function pd {
+    D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
+    mkdev pd$D b 45 $[ $1 * 16 ]
+    for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+    do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
+    done
+  }
+  #
+  cd /dev
+  #
+  for u in 0 1 2 3 ; do pd $u ; done
+  for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
+  for u in 0 1 2 3 ; do mkdev pf$u  b 47 $u ; done
+  for u in 0 1 2 3 ; do mkdev pt$u  c 96 $u ; done
+  for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
+  for u in 0 1 2 3 ; do mkdev pg$u  c 97 $u ; done
+  #
+  # end of mkd
+
+With the device files and drivers in place, you can access PARIDE devices
+like any other Linux device.   For example, to mount a CD-ROM in pcd0, use::
+
+	mount /dev/pcd0 /cdrom
+
+If you have a fresh Avatar Shark cartridge, and the drive is pda, you
+might do something like::
+
+	fdisk /dev/pda		-- make a new partition table with
+				   partition 1 of type 83
+
+	mke2fs /dev/pda1	-- to build the file system
+
+	mkdir /shark		-- make a place to mount the disk
+
+	mount /dev/pda1 /shark
+
+Devices like the Imation superdisk work in the same way, except that
+they do not have a partition table.  For example to make a 120MB
+floppy that you could share with a DOS system::
+
+	mkdosfs /dev/pf0
+	mount /dev/pf0 /mnt
+
+
+2.4  The pf driver
+------------------
+
+The pf driver is intended for use with parallel port ATAPI disk
+devices.  The most common devices in this category are PD drives
+and LS-120 drives.  Traditionally, media for these devices are not
+partitioned.  Consequently, the pf driver does not support partitioned
+media.  This may be changed in a future version of the driver.
+
+2.5  Using the pt driver
+------------------------
+
+The pt driver for parallel port ATAPI tape drives is a minimal driver.
+It does not yet support many of the standard tape ioctl operations.
+For best performance, a block size of 32KB should be used.  You will
+probably want to set the parallel port delay to 0, if you can.
+
+2.6  Using the pg driver
+------------------------
+
+The pg driver can be used in conjunction with the cdrecord program
+to create CD-ROMs.  Please get cdrecord version 1.6.1 or later
+from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ .  To record CD-R media
+your parallel port should ideally be set to EPP mode, and the "port delay"
+should be set to 0.  With those settings it is possible to record at 2x
+speed without any buffer underruns.  If you cannot get the driver to work
+in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
+
+
+3. Troubleshooting
+==================
+
+3.1  Use EPP mode if you can
+----------------------------
+
+The most common problems that people report with the PARIDE drivers
+concern the parallel port CMOS settings.  At this time, none of the
+PARIDE protocol modules support ECP mode, or any ECP combination modes.
+If you are able to do so, please set your parallel port into EPP mode
+using your CMOS setup procedure.
+
+3.2  Check the port delay
+-------------------------
+
+Some parallel ports cannot reliably transfer data at full speed.  To
+offset the errors, the PARIDE protocol modules introduce a "port
+delay" between each access to the i/o ports.  Each protocol sets
+a default value for this delay.  In most cases, the user can override
+the default and set it to 0 - resulting in somewhat higher transfer
+rates.  In some rare cases (especially with older 486 systems) the
+default delays are not long enough.  if you experience corrupt data
+transfers, or unexpected failures, you may wish to increase the
+port delay.   The delay can be programmed using the "driveN" parameters
+to each of the high-level drivers.  Please see the notes above, or
+read the comments at the beginning of the driver source files in
+linux/drivers/block/paride.
+
+3.3  Some drives need a printer reset
+-------------------------------------
+
+There appear to be a number of "noname" external drives on the market
+that do not always power up correctly.  We have noticed this with some
+drives based on OnSpec and older Freecom adapters.  In these rare cases,
+the adapter can often be reinitialised by issuing a "printer reset" on
+the parallel port.  As the reset operation is potentially disruptive in
+multiple device environments, the PARIDE drivers will not do it
+automatically.  You can however, force a printer reset by doing::
+
+	insmod lp reset=1
+	rmmod lp
+
+If you have one of these marginal cases, you should probably build
+your paride drivers as modules, and arrange to do the printer reset
+before loading the PARIDE drivers.
+
+3.4  Use the verbose option and dmesg if you need help
+------------------------------------------------------
+
+While a lot of testing has gone into these drivers to make them work
+as smoothly as possible, problems will arise.  If you do have problems,
+please check all the obvious things first:  does the drive work in
+DOS with the manufacturer's drivers ?  If that doesn't yield any useful
+clues, then please make sure that only one drive is hooked to your system,
+and that either (a) PARPORT is enabled or (b) no other device driver
+is using your parallel port (check in /proc/ioports).  Then, load the
+appropriate drivers (you can load several protocol modules if you want)
+as in::
+
+	# insmod paride
+	# insmod epat
+	# insmod bpck
+	# insmod kbic
+	...
+	# insmod pd verbose=1
+
+(using the correct driver for the type of device you have, of course).
+The verbose=1 parameter will cause the drivers to log a trace of their
+activity as they attempt to locate your drive.
+
+Use 'dmesg' to capture a log of all the PARIDE messages (any messages
+beginning with paride:, a protocol module's name or a driver's name) and
+include that with your bug report.  You can submit a bug report in one
+of two ways.  Either send it directly to the author of the PARIDE suite,
+by e-mail to grant@torque.net, or join the linux-parport mailing list
+and post your report there.
+
+3.5  For more information or help
+---------------------------------
+
+You can join the linux-parport mailing list by sending a mail message
+to:
+
+		linux-parport-request@torque.net
+
+with the single word::
+
+		subscribe
+
+in the body of the mail message (not in the subject line).   Please be
+sure that your mail program is correctly set up when you do this,  as
+the list manager is a robot that will subscribe you using the reply
+address in your mail headers.  REMOVE any anti-spam gimmicks you may
+have in your mail headers, when sending mail to the list server.
+
+You might also find some useful information on the linux-parport
+web pages (although they are not always up to date) at
+
+	http://web.archive.org/web/%2E/http://www.torque.net/parport/
diff --git a/Documentation/blockdev/paride.txt b/Documentation/blockdev/paride.txt
deleted file mode 100644
index ee6717e3771d..000000000000
--- a/Documentation/blockdev/paride.txt
+++ /dev/null
@@ -1,417 +0,0 @@
-
-		Linux and parallel port IDE devices
-
-PARIDE v1.03   (c) 1997-8  Grant Guenther <grant@torque.net>
-
-1. Introduction
-
-Owing to the simplicity and near universality of the parallel port interface
-to personal computers, many external devices such as portable hard-disk,
-CD-ROM, LS-120 and tape drives use the parallel port to connect to their
-host computer.  While some devices (notably scanners) use ad-hoc methods
-to pass commands and data through the parallel port interface, most 
-external devices are actually identical to an internal model, but with
-a parallel-port adapter chip added in.  Some of the original parallel port
-adapters were little more than mechanisms for multiplexing a SCSI bus.
-(The Iomega PPA-3 adapter used in the ZIP drives is an example of this
-approach).  Most current designs, however, take a different approach.
-The adapter chip reproduces a small ISA or IDE bus in the external device
-and the communication protocol provides operations for reading and writing
-device registers, as well as data block transfer functions.  Sometimes,
-the device being addressed via the parallel cable is a standard SCSI
-controller like an NCR 5380.  The "ditto" family of external tape
-drives use the ISA replicator to interface a floppy disk controller,
-which is then connected to a floppy-tape mechanism.  The vast majority
-of external parallel port devices, however, are now based on standard
-IDE type devices, which require no intermediate controller.  If one
-were to open up a parallel port CD-ROM drive, for instance, one would
-find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
-that interconnected a standard PC parallel port cable and a standard
-IDE cable.  It is usually possible to exchange the CD-ROM device with
-any other device using the IDE interface. 
-
-The document describes the support in Linux for parallel port IDE
-devices.  It does not cover parallel port SCSI devices, "ditto" tape
-drives or scanners.  Many different devices are supported by the 
-parallel port IDE subsystem, including:
-
-	MicroSolutions backpack CD-ROM
-	MicroSolutions backpack PD/CD
-	MicroSolutions backpack hard-drives
-	MicroSolutions backpack 8000t tape drive
-	SyQuest EZ-135, EZ-230 & SparQ drives
-	Avatar Shark
-	Imation Superdisk LS-120
-	Maxell Superdisk LS-120
-	FreeCom Power CD 
-	Hewlett-Packard 5GB and 8GB tape drives
-	Hewlett-Packard 7100 and 7200 CD-RW drives
-
-as well as most of the clone and no-name products on the market.
-
-To support such a wide range of devices, PARIDE, the parallel port IDE
-subsystem, is actually structured in three parts.   There is a base
-paride module which provides a registry and some common methods for
-accessing the parallel ports.  The second component is a set of 
-high-level drivers for each of the different types of supported devices: 
-
-	pd	IDE disk
-	pcd	ATAPI CD-ROM
-	pf	ATAPI disk
-	pt	ATAPI tape
-	pg	ATAPI generic
-
-(Currently, the pg driver is only used with CD-R drives).
-
-The high-level drivers function according to the relevant standards.
-The third component of PARIDE is a set of low-level protocol drivers
-for each of the parallel port IDE adapter chips.  Thanks to the interest
-and encouragement of Linux users from many parts of the world, 
-support is available for almost all known adapter protocols:
-
-        aten    ATEN EH-100                            (HK)
-        bpck    Microsolutions backpack                (US)
-        comm    DataStor (old-type) "commuter" adapter (TW)
-        dstr    DataStor EP-2000                       (TW)
-        epat    Shuttle EPAT                           (UK)
-        epia    Shuttle EPIA                           (UK)
-	fit2    FIT TD-2000			       (US)
-	fit3    FIT TD-3000			       (US)
-	friq    Freecom IQ cable                       (DE)
-        frpw    Freecom Power                          (DE)
-        kbic    KingByte KBIC-951A and KBIC-971A       (TW)
-	ktti    KT Technology PHd adapter              (SG)
-        on20    OnSpec 90c20                           (US)
-        on26    OnSpec 90c26                           (US)
-
-
-2. Using the PARIDE subsystem
-
-While configuring the Linux kernel, you may choose either to build
-the PARIDE drivers into your kernel, or to build them as modules.
-
-In either case, you will need to select "Parallel port IDE device support"
-as well as at least one of the high-level drivers and at least one
-of the parallel port communication protocols.  If you do not know
-what kind of parallel port adapter is used in your drive, you could
-begin by checking the file names and any text files on your DOS 
-installation floppy.  Alternatively, you can look at the markings on
-the adapter chip itself.  That's usually sufficient to identify the
-correct device.  
-
-You can actually select all the protocol modules, and allow the PARIDE
-subsystem to try them all for you.
-
-For the "brand-name" products listed above, here are the protocol
-and high-level drivers that you would use:
-
-	Manufacturer		Model		Driver	Protocol
-	
-	MicroSolutions		CD-ROM		pcd	bpck
-	MicroSolutions		PD drive	pf	bpck
-	MicroSolutions		hard-drive	pd	bpck
-	MicroSolutions          8000t tape      pt      bpck
-	SyQuest			EZ, SparQ	pd	epat
-	Imation			Superdisk	pf	epat
-	Maxell                  Superdisk       pf      friq
-	Avatar			Shark		pd	epat
-	FreeCom			CD-ROM		pcd	frpw
-	Hewlett-Packard		5GB Tape	pt	epat
-	Hewlett-Packard		7200e (CD)	pcd	epat
-	Hewlett-Packard		7200e (CD-R)	pg	epat
-
-2.1  Configuring built-in drivers
-
-We recommend that you get to know how the drivers work and how to
-configure them as loadable modules, before attempting to compile a
-kernel with the drivers built-in.
-
-If you built all of your PARIDE support directly into your kernel,
-and you have just a single parallel port IDE device, your kernel should
-locate it automatically for you.  If you have more than one device,
-you may need to give some command line options to your bootloader
-(eg: LILO), how to do that is beyond the scope of this document.
-
-The high-level drivers accept a number of command line parameters, all
-of which are documented in the source files in linux/drivers/block/paride.
-By default, each driver will automatically try all parallel ports it
-can find, and all protocol types that have been installed, until it finds
-a parallel port IDE adapter.  Once it finds one, the probe stops.  So,
-if you have more than one device, you will need to tell the drivers
-how to identify them.  This requires specifying the port address, the
-protocol identification number and, for some devices, the drive's
-chain ID.  While your system is booting, a number of messages are
-displayed on the console.  Like all such messages, they can be
-reviewed with the 'dmesg' command.  Among those messages will be
-some lines like:
-
-	paride: bpck registered as protocol 0
-	paride: epat registered as protocol 1
-
-The numbers will always be the same until you build a new kernel with
-different protocol selections.  You should note these numbers as you
-will need them to identify the devices.
-
-If you happen to be using a MicroSolutions backpack device, you will
-also need to know the unit ID number for each drive.  This is usually
-the last two digits of the drive's serial number (but read MicroSolutions'
-documentation about this).
-
-As an example, let's assume that you have a MicroSolutions PD/CD drive
-with unit ID number 36 connected to the parallel port at 0x378, a SyQuest 
-EZ-135 connected to the chained port on the PD/CD drive and also an 
-Imation Superdisk connected to port 0x278.  You could give the following 
-options on your boot command:
-
-	pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
-
-In the last option, pf.drive1 configures device /dev/pf1, the 0x378
-is the parallel port base address, the 0 is the protocol registration
-number and 36 is the chain ID.
-
-Please note:  while PARIDE will work both with and without the 
-PARPORT parallel port sharing system that is included by the
-"Parallel port support" option, PARPORT must be included and enabled
-if you want to use chains of devices on the same parallel port.
-
-2.2  Loading and configuring PARIDE as modules
-
-It is much faster and simpler to get to understand the PARIDE drivers
-if you use them as loadable kernel modules.   
-
-Note 1:  using these drivers with the "kerneld" automatic module loading
-system is not recommended for beginners, and is not documented here.  
-
-Note 2:  if you build PARPORT support as a loadable module, PARIDE must
-also be built as loadable modules, and PARPORT must be loaded before the
-PARIDE modules.
-
-To use PARIDE, you must begin by 
-
-	insmod paride
-
-this loads a base module which provides a registry for the protocols,
-among other tasks.
-
-Then, load as many of the protocol modules as you think you might need.
-As you load each module, it will register the protocols that it supports,
-and print a log message to your kernel log file and your console. For 
-example:
-
-	# insmod epat
-	paride: epat registered as protocol 0
-	# insmod kbic
-	paride: k951 registered as protocol 1
-        paride: k971 registered as protocol 2
-
-Finally, you can load high-level drivers for each kind of device that
-you have connected.  By default, each driver will autoprobe for a single 
-device, but you can support up to four similar devices by giving their
-individual co-ordinates when you load the driver.
-
-For example, if you had two no-name CD-ROM drives both using the
-KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
-you could give the following command:
-
-	# insmod pcd drive0=0x378,1 drive1=0x3bc,1
-
-For most adapters, giving a port address and protocol number is sufficient,
-but check the source files in linux/drivers/block/paride for more 
-information.  (Hopefully someone will write some man pages one day !).
-
-As another example, here's what happens when PARPORT is installed, and
-a SyQuest EZ-135 is attached to port 0x378:
-
-	# insmod paride
-	paride: version 1.0 installed
-	# insmod epat
-	paride: epat registered as protocol 0
-	# insmod pd
-	pd: pd version 1.0, major 45, cluster 64, nice 0
-	pda: Sharing parport1 at 0x378
-	pda: epat 1.0, Shuttle EPAT chip c3 at 0x378, mode 5 (EPP-32), delay 1
-	pda: SyQuest EZ135A, 262144 blocks [128M], (512/16/32), removable media
-	 pda: pda1
-
-Note that the last line is the output from the generic partition table
-scanner - in this case it reports that it has found a disk with one partition.
-
-2.3  Using a PARIDE device
-
-Once the drivers have been loaded, you can access PARIDE devices in the
-same way as their traditional counterparts.  You will probably need to
-create the device "special files".  Here is a simple script that you can
-cut to a file and execute:
-
-#!/bin/bash
-#
-# mkd -- a script to create the device special files for the PARIDE subsystem
-#
-function mkdev {
-  mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
-}
-#
-function pd {
-  D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
-  mkdev pd$D b 45 $[ $1 * 16 ]
-  for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-  do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
-  done
-}
-#
-cd /dev
-#
-for u in 0 1 2 3 ; do pd $u ; done
-for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done 
-for u in 0 1 2 3 ; do mkdev pf$u  b 47 $u ; done 
-for u in 0 1 2 3 ; do mkdev pt$u  c 96 $u ; done 
-for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done 
-for u in 0 1 2 3 ; do mkdev pg$u  c 97 $u ; done 
-#
-# end of mkd
-
-With the device files and drivers in place, you can access PARIDE devices
-like any other Linux device.   For example, to mount a CD-ROM in pcd0, use:
-
-	mount /dev/pcd0 /cdrom
-
-If you have a fresh Avatar Shark cartridge, and the drive is pda, you
-might do something like:
-
-	fdisk /dev/pda		-- make a new partition table with
-				   partition 1 of type 83
-
-	mke2fs /dev/pda1	-- to build the file system
-
-	mkdir /shark		-- make a place to mount the disk
-
-	mount /dev/pda1 /shark
-
-Devices like the Imation superdisk work in the same way, except that
-they do not have a partition table.  For example to make a 120MB
-floppy that you could share with a DOS system:
-
-	mkdosfs /dev/pf0
-	mount /dev/pf0 /mnt
-
-
-2.4  The pf driver
-
-The pf driver is intended for use with parallel port ATAPI disk
-devices.  The most common devices in this category are PD drives
-and LS-120 drives.  Traditionally, media for these devices are not
-partitioned.  Consequently, the pf driver does not support partitioned
-media.  This may be changed in a future version of the driver. 
-
-2.5  Using the pt driver
-
-The pt driver for parallel port ATAPI tape drives is a minimal driver.
-It does not yet support many of the standard tape ioctl operations. 
-For best performance, a block size of 32KB should be used.  You will
-probably want to set the parallel port delay to 0, if you can.
-
-2.6  Using the pg driver
-
-The pg driver can be used in conjunction with the cdrecord program
-to create CD-ROMs.  Please get cdrecord version 1.6.1 or later
-from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ .  To record CD-R media 
-your parallel port should ideally be set to EPP mode, and the "port delay" 
-should be set to 0.  With those settings it is possible to record at 2x 
-speed without any buffer underruns.  If you cannot get the driver to work
-in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
-
-
-3. Troubleshooting
-
-3.1  Use EPP mode if you can
-
-The most common problems that people report with the PARIDE drivers
-concern the parallel port CMOS settings.  At this time, none of the
-PARIDE protocol modules support ECP mode, or any ECP combination modes.
-If you are able to do so, please set your parallel port into EPP mode
-using your CMOS setup procedure.
-
-3.2  Check the port delay
-
-Some parallel ports cannot reliably transfer data at full speed.  To
-offset the errors, the PARIDE protocol modules introduce a "port
-delay" between each access to the i/o ports.  Each protocol sets
-a default value for this delay.  In most cases, the user can override
-the default and set it to 0 - resulting in somewhat higher transfer
-rates.  In some rare cases (especially with older 486 systems) the
-default delays are not long enough.  if you experience corrupt data
-transfers, or unexpected failures, you may wish to increase the
-port delay.   The delay can be programmed using the "driveN" parameters
-to each of the high-level drivers.  Please see the notes above, or
-read the comments at the beginning of the driver source files in
-linux/drivers/block/paride.
-
-3.3  Some drives need a printer reset
-
-There appear to be a number of "noname" external drives on the market
-that do not always power up correctly.  We have noticed this with some
-drives based on OnSpec and older Freecom adapters.  In these rare cases,
-the adapter can often be reinitialised by issuing a "printer reset" on
-the parallel port.  As the reset operation is potentially disruptive in 
-multiple device environments, the PARIDE drivers will not do it 
-automatically.  You can however, force a printer reset by doing:
-
-	insmod lp reset=1
-	rmmod lp
-
-If you have one of these marginal cases, you should probably build
-your paride drivers as modules, and arrange to do the printer reset
-before loading the PARIDE drivers. 
-
-3.4  Use the verbose option and dmesg if you need help
-
-While a lot of testing has gone into these drivers to make them work
-as smoothly as possible, problems will arise.  If you do have problems,
-please check all the obvious things first:  does the drive work in
-DOS with the manufacturer's drivers ?  If that doesn't yield any useful
-clues, then please make sure that only one drive is hooked to your system,
-and that either (a) PARPORT is enabled or (b) no other device driver
-is using your parallel port (check in /proc/ioports).  Then, load the
-appropriate drivers (you can load several protocol modules if you want)
-as in:
-
-	# insmod paride
-	# insmod epat
-	# insmod bpck
-	# insmod kbic
-	...
-	# insmod pd verbose=1
-
-(using the correct driver for the type of device you have, of course).
-The verbose=1 parameter will cause the drivers to log a trace of their
-activity as they attempt to locate your drive.
-
-Use 'dmesg' to capture a log of all the PARIDE messages (any messages
-beginning with paride:, a protocol module's name or a driver's name) and
-include that with your bug report.  You can submit a bug report in one
-of two ways.  Either send it directly to the author of the PARIDE suite,
-by e-mail to grant@torque.net, or join the linux-parport mailing list
-and post your report there.
-
-3.5  For more information or help
-
-You can join the linux-parport mailing list by sending a mail message
-to 
-		linux-parport-request@torque.net
-
-with the single word 
-
-		subscribe
-
-in the body of the mail message (not in the subject line).   Please be
-sure that your mail program is correctly set up when you do this,  as
-the list manager is a robot that will subscribe you using the reply
-address in your mail headers.  REMOVE any anti-spam gimmicks you may
-have in your mail headers, when sending mail to the list server.
-
-You might also find some useful information on the linux-parport
-web pages (although they are not always up to date) at
-
-	http://web.archive.org/web/*/http://www.torque.net/parport/
-
-
diff --git a/Documentation/blockdev/ramdisk.rst b/Documentation/blockdev/ramdisk.rst
new file mode 100644
index 000000000000..b7c2268f8dec
--- /dev/null
+++ b/Documentation/blockdev/ramdisk.rst
@@ -0,0 +1,177 @@
+==========================================
+Using the RAM disk block device with Linux
+==========================================
+
+.. Contents:
+
+	1) Overview
+	2) Kernel Command Line Parameters
+	3) Using "rdev -r"
+	4) An Example of Creating a Compressed RAM Disk
+
+
+1) Overview
+-----------
+
+The RAM disk driver is a way to use main system memory as a block device.  It
+is required for initrd, an initial filesystem used if you need to load modules
+in order to access the root filesystem (see Documentation/admin-guide/initrd.rst).  It can
+also be used for a temporary filesystem for crypto work, since the contents
+are erased on reboot.
+
+The RAM disk dynamically grows as more space is required. It does this by using
+RAM from the buffer cache. The driver marks the buffers it is using as dirty
+so that the VM subsystem does not try to reclaim them later.
+
+The RAM disk supports up to 16 RAM disks by default, and can be reconfigured
+to support an unlimited number of RAM disks (at your own risk).  Just change
+the configuration symbol BLK_DEV_RAM_COUNT in the Block drivers config menu
+and (re)build the kernel.
+
+To use RAM disk support with your system, run './MAKEDEV ram' from the /dev
+directory.  RAM disks are all major number 1, and start with minor number 0
+for /dev/ram0, etc.  If used, modern kernels use /dev/ram0 for an initrd.
+
+The new RAM disk also has the ability to load compressed RAM disk images,
+allowing one to squeeze more programs onto an average installation or
+rescue floppy disk.
+
+
+2) Parameters
+---------------------------------
+
+2a) Kernel Command Line Parameters
+
+	ramdisk_size=N
+		Size of the ramdisk.
+
+This parameter tells the RAM disk driver to set up RAM disks of N k size.  The
+default is 4096 (4 MB).
+
+2b) Module parameters
+
+	rd_nr
+		/dev/ramX devices created.
+
+	max_part
+		Maximum partition number.
+
+	rd_size
+		See ramdisk_size.
+
+3) Using "rdev -r"
+------------------
+
+The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
+as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
+to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
+14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
+prompt/wait sequence is to be given before trying to read the RAM disk. Since
+the RAM disk dynamically grows as data is being written into it, a size field
+is not required. Bits 11 to 13 are not currently used and may as well be zero.
+These numbers are no magical secrets, as seen below::
+
+  ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK     0x07FF
+  ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG          0x8000
+  ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG            0x4000
+
+Consider a typical two floppy disk setup, where you will have the
+kernel on disk one, and have already put a RAM disk image onto disk #2.
+
+Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
+starts at an offset of 0 kB from the beginning of the floppy.
+The command line equivalent is: "ramdisk_start=0"
+
+You want bit 14 as one, indicating that a RAM disk is to be loaded.
+The command line equivalent is: "load_ramdisk=1"
+
+You want bit 15 as one, indicating that you want a prompt/keypress
+sequence so that you have a chance to switch floppy disks.
+The command line equivalent is: "prompt_ramdisk=1"
+
+Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
+So to create disk one of the set, you would do::
+
+	/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
+	/usr/src/linux# rdev /dev/fd0 /dev/fd0
+	/usr/src/linux# rdev -r /dev/fd0 49152
+
+If you make a boot disk that has LILO, then for the above, you would use::
+
+	append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
+
+Since the default start = 0 and the default prompt = 1, you could use::
+
+	append = "load_ramdisk=1"
+
+
+4) An Example of Creating a Compressed RAM Disk
+-----------------------------------------------
+
+To create a RAM disk image, you will need a spare block device to
+construct it on. This can be the RAM disk device itself, or an
+unused disk partition (such as an unmounted swap partition). For this
+example, we will use the RAM disk device, "/dev/ram0".
+
+Note: This technique should not be done on a machine with less than 8 MB
+of RAM. If using a spare disk partition instead of /dev/ram0, then this
+restriction does not apply.
+
+a) Decide on the RAM disk size that you want. Say 2 MB for this example.
+   Create it by writing to the RAM disk device. (This step is not currently
+   required, but may be in the future.) It is wise to zero out the
+   area (esp. for disks) so that maximal compression is achieved for
+   the unused blocks of the image that you are about to create::
+
+	dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
+
+b) Make a filesystem on it. Say ext2fs for this example::
+
+	mke2fs -vm0 /dev/ram0 2048
+
+c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
+   and unmount it again.
+
+d) Compress the contents of the RAM disk. The level of compression
+   will be approximately 50% of the space used by the files. Unused
+   space on the RAM disk will compress to almost nothing::
+
+	dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
+
+e) Put the kernel onto the floppy::
+
+	dd if=zImage of=/dev/fd0 bs=1k
+
+f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
+   that is slightly larger than the kernel, so that you can put another
+   (possibly larger) kernel onto the same floppy later without overlapping
+   the RAM disk image. An offset of 400 kB for kernels about 350 kB in
+   size would be reasonable. Make sure offset+size of ram_image.gz is
+   not larger than the total space on your floppy (usually 1440 kB)::
+
+	dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
+
+g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
+   For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
+   have 2^15 + 2^14 + 400 = 49552::
+
+	rdev /dev/fd0 /dev/fd0
+	rdev -r /dev/fd0 49552
+
+That is it. You now have your boot/root compressed RAM disk floppy. Some
+users may wish to combine steps (d) and (f) by using a pipe.
+
+
+						Paul Gortmaker 12/95
+
+Changelog:
+----------
+
+10-22-04 :
+		Updated to reflect changes in command line options, remove
+		obsolete references, general cleanup.
+		James Nelson (james4765@gmail.com)
+
+
+12-95 :
+		Original Document
diff --git a/Documentation/blockdev/ramdisk.txt b/Documentation/blockdev/ramdisk.txt
deleted file mode 100644
index 501e12e0323e..000000000000
--- a/Documentation/blockdev/ramdisk.txt
+++ /dev/null
@@ -1,174 +0,0 @@
-Using the RAM disk block device with Linux
-------------------------------------------
-
-Contents:
-
-	1) Overview
-	2) Kernel Command Line Parameters
-	3) Using "rdev -r"
-	4) An Example of Creating a Compressed RAM Disk
-
-
-1) Overview
------------
-
-The RAM disk driver is a way to use main system memory as a block device.  It
-is required for initrd, an initial filesystem used if you need to load modules
-in order to access the root filesystem (see Documentation/admin-guide/initrd.rst).  It can
-also be used for a temporary filesystem for crypto work, since the contents
-are erased on reboot.
-
-The RAM disk dynamically grows as more space is required. It does this by using
-RAM from the buffer cache. The driver marks the buffers it is using as dirty
-so that the VM subsystem does not try to reclaim them later.
-
-The RAM disk supports up to 16 RAM disks by default, and can be reconfigured
-to support an unlimited number of RAM disks (at your own risk).  Just change
-the configuration symbol BLK_DEV_RAM_COUNT in the Block drivers config menu
-and (re)build the kernel.
-
-To use RAM disk support with your system, run './MAKEDEV ram' from the /dev
-directory.  RAM disks are all major number 1, and start with minor number 0
-for /dev/ram0, etc.  If used, modern kernels use /dev/ram0 for an initrd.
-
-The new RAM disk also has the ability to load compressed RAM disk images,
-allowing one to squeeze more programs onto an average installation or
-rescue floppy disk.
-
-
-2) Parameters
----------------------------------
-
-2a) Kernel Command Line Parameters
-
-	ramdisk_size=N
-	==============
-
-This parameter tells the RAM disk driver to set up RAM disks of N k size.  The
-default is 4096 (4 MB).
-
-2b) Module parameters
-
-	rd_nr
-	=====
-	/dev/ramX devices created.
-
-	max_part
-	========
-	Maximum partition number.
-
-	rd_size
-	=======
-	See ramdisk_size.
-
-3) Using "rdev -r"
-------------------
-
-The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
-as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
-to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
-14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
-prompt/wait sequence is to be given before trying to read the RAM disk. Since
-the RAM disk dynamically grows as data is being written into it, a size field
-is not required. Bits 11 to 13 are not currently used and may as well be zero.
-These numbers are no magical secrets, as seen below:
-
-./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK     0x07FF
-./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG          0x8000
-./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG            0x4000
-
-Consider a typical two floppy disk setup, where you will have the
-kernel on disk one, and have already put a RAM disk image onto disk #2.
-
-Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
-starts at an offset of 0 kB from the beginning of the floppy.
-The command line equivalent is: "ramdisk_start=0"
-
-You want bit 14 as one, indicating that a RAM disk is to be loaded.
-The command line equivalent is: "load_ramdisk=1"
-
-You want bit 15 as one, indicating that you want a prompt/keypress
-sequence so that you have a chance to switch floppy disks.
-The command line equivalent is: "prompt_ramdisk=1"
-
-Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
-So to create disk one of the set, you would do:
-
-	/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
-	/usr/src/linux# rdev /dev/fd0 /dev/fd0
-	/usr/src/linux# rdev -r /dev/fd0 49152
-
-If you make a boot disk that has LILO, then for the above, you would use:
-	append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
-Since the default start = 0 and the default prompt = 1, you could use:
-	append = "load_ramdisk=1"
-
-
-4) An Example of Creating a Compressed RAM Disk
-----------------------------------------------
-
-To create a RAM disk image, you will need a spare block device to
-construct it on. This can be the RAM disk device itself, or an
-unused disk partition (such as an unmounted swap partition). For this
-example, we will use the RAM disk device, "/dev/ram0".
-
-Note: This technique should not be done on a machine with less than 8 MB
-of RAM. If using a spare disk partition instead of /dev/ram0, then this
-restriction does not apply.
-
-a) Decide on the RAM disk size that you want. Say 2 MB for this example.
-   Create it by writing to the RAM disk device. (This step is not currently
-   required, but may be in the future.) It is wise to zero out the
-   area (esp. for disks) so that maximal compression is achieved for
-   the unused blocks of the image that you are about to create.
-
-	dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
-
-b) Make a filesystem on it. Say ext2fs for this example.
-
-	mke2fs -vm0 /dev/ram0 2048
-
-c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
-   and unmount it again.
-
-d) Compress the contents of the RAM disk. The level of compression
-   will be approximately 50% of the space used by the files. Unused
-   space on the RAM disk will compress to almost nothing.
-
-	dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
-
-e) Put the kernel onto the floppy
-
-	dd if=zImage of=/dev/fd0 bs=1k
-
-f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
-   that is slightly larger than the kernel, so that you can put another
-   (possibly larger) kernel onto the same floppy later without overlapping
-   the RAM disk image. An offset of 400 kB for kernels about 350 kB in
-   size would be reasonable. Make sure offset+size of ram_image.gz is
-   not larger than the total space on your floppy (usually 1440 kB).
-
-	dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
-
-g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
-   For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
-   have 2^15 + 2^14 + 400 = 49552.
-
-	rdev /dev/fd0 /dev/fd0
-	rdev -r /dev/fd0 49552
-
-That is it. You now have your boot/root compressed RAM disk floppy. Some
-users may wish to combine steps (d) and (f) by using a pipe.
-
---------------------------------------------------------------------------
-						Paul Gortmaker 12/95
-
-Changelog:
-----------
-
-10-22-04 :	Updated to reflect changes in command line options, remove
-		obsolete references, general cleanup.
-		James Nelson (james4765@gmail.com)
-
-
-12-95 :		Original Document
diff --git a/Documentation/blockdev/zram.rst b/Documentation/blockdev/zram.rst
new file mode 100644
index 000000000000..2111231c9c0f
--- /dev/null
+++ b/Documentation/blockdev/zram.rst
@@ -0,0 +1,422 @@
+========================================
+zram: Compressed RAM based block devices
+========================================
+
+Introduction
+============
+
+The zram module creates RAM based block devices named /dev/zram<id>
+(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
+in memory itself. These disks allow very fast I/O and compression provides
+good amounts of memory savings. Some of the usecases include /tmp storage,
+use as swap disks, various caches under /var and maybe many more :)
+
+Statistics for individual zram devices are exported through sysfs nodes at
+/sys/block/zram<id>/
+
+Usage
+=====
+
+There are several ways to configure and manage zram device(-s):
+
+a) using zram and zram_control sysfs attributes
+b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org).
+
+In this document we will describe only 'manual' zram configuration steps,
+IOW, zram and zram_control sysfs attributes.
+
+In order to get a better idea about zramctl please consult util-linux
+documentation, zramctl man-page or `zramctl --help`. Please be informed
+that zram maintainers do not develop/maintain util-linux or zramctl, should
+you have any questions please contact util-linux@vger.kernel.org
+
+Following shows a typical sequence of steps for using zram.
+
+WARNING
+=======
+
+For the sake of simplicity we skip error checking parts in most of the
+examples below. However, it is your sole responsibility to handle errors.
+
+zram sysfs attributes always return negative values in case of errors.
+The list of possible return codes:
+
+========  =============================================================
+-EBUSY	  an attempt to modify an attribute that cannot be changed once
+	  the device has been initialised. Please reset device first;
+-ENOMEM	  zram was not able to allocate enough memory to fulfil your
+	  needs;
+-EINVAL	  invalid input has been provided.
+========  =============================================================
+
+If you use 'echo', the returned value that is changed by 'echo' utility,
+and, in general case, something like::
+
+	echo 3 > /sys/block/zram0/max_comp_streams
+	if [ $? -ne 0 ];
+		handle_error
+	fi
+
+should suffice.
+
+1) Load Module
+==============
+
+::
+
+	modprobe zram num_devices=4
+	This creates 4 devices: /dev/zram{0,1,2,3}
+
+num_devices parameter is optional and tells zram how many devices should be
+pre-created. Default: 1.
+
+2) Set max number of compression streams
+========================================
+
+Regardless the value passed to this attribute, ZRAM will always
+allocate multiple compression streams - one per online CPUs - thus
+allowing several concurrent compression operations. The number of
+allocated compression streams goes down when some of the CPUs
+become offline. There is no single-compression-stream mode anymore,
+unless you are running a UP system or has only 1 CPU online.
+
+To find out how many streams are currently available::
+
+	cat /sys/block/zram0/max_comp_streams
+
+3) Select compression algorithm
+===============================
+
+Using comp_algorithm device attribute one can see available and
+currently selected (shown in square brackets) compression algorithms,
+change selected compression algorithm (once the device is initialised
+there is no way to change compression algorithm).
+
+Examples::
+
+	#show supported compression algorithms
+	cat /sys/block/zram0/comp_algorithm
+	lzo [lz4]
+
+	#select lzo compression algorithm
+	echo lzo > /sys/block/zram0/comp_algorithm
+
+For the time being, the `comp_algorithm` content does not necessarily
+show every compression algorithm supported by the kernel. We keep this
+list primarily to simplify device configuration and one can configure
+a new device with a compression algorithm that is not listed in
+`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
+and, if some of the algorithms were built as modules, it's impossible
+to list all of them using, for instance, /proc/crypto or any other
+method. This, however, has an advantage of permitting the usage of
+custom crypto compression modules (implementing S/W or H/W compression).
+
+4) Set Disksize
+===============
+
+Set disk size by writing the value to sysfs node 'disksize'.
+The value can be either in bytes or you can use mem suffixes.
+Examples::
+
+	# Initialize /dev/zram0 with 50MB disksize
+	echo $((50*1024*1024)) > /sys/block/zram0/disksize
+
+	# Using mem suffixes
+	echo 256K > /sys/block/zram0/disksize
+	echo 512M > /sys/block/zram0/disksize
+	echo 1G > /sys/block/zram0/disksize
+
+Note:
+There is little point creating a zram of greater than twice the size of memory
+since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
+size of the disk when not in use so a huge zram is wasteful.
+
+5) Set memory limit: Optional
+=============================
+
+Set memory limit by writing the value to sysfs node 'mem_limit'.
+The value can be either in bytes or you can use mem suffixes.
+In addition, you could change the value in runtime.
+Examples::
+
+	# limit /dev/zram0 with 50MB memory
+	echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
+
+	# Using mem suffixes
+	echo 256K > /sys/block/zram0/mem_limit
+	echo 512M > /sys/block/zram0/mem_limit
+	echo 1G > /sys/block/zram0/mem_limit
+
+	# To disable memory limit
+	echo 0 > /sys/block/zram0/mem_limit
+
+6) Activate
+===========
+
+::
+
+	mkswap /dev/zram0
+	swapon /dev/zram0
+
+	mkfs.ext4 /dev/zram1
+	mount /dev/zram1 /tmp
+
+7) Add/remove zram devices
+==========================
+
+zram provides a control interface, which enables dynamic (on-demand) device
+addition and removal.
+
+In order to add a new /dev/zramX device, perform read operation on hot_add
+attribute. This will return either new device's device id (meaning that you
+can use /dev/zram<id>) or error code.
+
+Example::
+
+	cat /sys/class/zram-control/hot_add
+	1
+
+To remove the existing /dev/zramX device (where X is a device id)
+execute::
+
+	echo X > /sys/class/zram-control/hot_remove
+
+8) Stats
+========
+
+Per-device statistics are exported as various nodes under /sys/block/zram<id>/
+
+A brief description of exported device attributes. For more details please
+read Documentation/ABI/testing/sysfs-block-zram.
+
+======================  ======  ===============================================
+Name            	access            description
+======================  ======  ===============================================
+disksize          	RW	show and set the device's disk size
+initstate         	RO	shows the initialization state of the device
+reset             	WO	trigger device reset
+mem_used_max      	WO	reset the `mem_used_max` counter (see later)
+mem_limit         	WO	specifies the maximum amount of memory ZRAM can
+				use to store the compressed data
+writeback_limit   	WO	specifies the maximum amount of write IO zram
+				can write out to backing device as 4KB unit
+writeback_limit_enable  RW	show and set writeback_limit feature
+max_comp_streams  	RW	the number of possible concurrent compress
+				operations
+comp_algorithm    	RW	show and change the compression algorithm
+compact           	WO	trigger memory compaction
+debug_stat        	RO	this file is used for zram debugging purposes
+backing_dev	  	RW	set up backend storage for zram to write out
+idle		  	WO	mark allocated slot as idle
+======================  ======  ===============================================
+
+
+User space is advised to use the following files to read the device statistics.
+
+File /sys/block/zram<id>/stat
+
+Represents block layer statistics. Read Documentation/block/stat.txt for
+details.
+
+File /sys/block/zram<id>/io_stat
+
+The stat file represents device's I/O statistics not accounted by block
+layer and, thus, not available in zram<id>/stat file. It consists of a
+single line of text and contains the following stats separated by
+whitespace:
+
+ =============    =============================================================
+ failed_reads     The number of failed reads
+ failed_writes    The number of failed writes
+ invalid_io       The number of non-page-size-aligned I/O requests
+ notify_free      Depending on device usage scenario it may account
+
+                  a) the number of pages freed because of swap slot free
+                     notifications
+                  b) the number of pages freed because of
+                     REQ_OP_DISCARD requests sent by bio. The former ones are
+                     sent to a swap block device when a swap slot is freed,
+                     which implies that this disk is being used as a swap disk.
+
+                  The latter ones are sent by filesystem mounted with
+                  discard option, whenever some data blocks are getting
+                  discarded.
+ =============    =============================================================
+
+File /sys/block/zram<id>/mm_stat
+
+The stat file represents device's mm statistics. It consists of a single
+line of text and contains the following stats separated by whitespace:
+
+ ================ =============================================================
+ orig_data_size   uncompressed size of data stored in this disk.
+		  This excludes same-element-filled pages (same_pages) since
+		  no memory is allocated for them.
+                  Unit: bytes
+ compr_data_size  compressed size of data stored in this disk
+ mem_used_total   the amount of memory allocated for this disk. This
+                  includes allocator fragmentation and metadata overhead,
+                  allocated for this disk. So, allocator space efficiency
+                  can be calculated using compr_data_size and this statistic.
+                  Unit: bytes
+ mem_limit        the maximum amount of memory ZRAM can use to store
+                  the compressed data
+ mem_used_max     the maximum amount of memory zram have consumed to
+                  store the data
+ same_pages       the number of same element filled pages written to this disk.
+                  No memory is allocated for such pages.
+ pages_compacted  the number of pages freed during compaction
+ huge_pages	  the number of incompressible pages
+ ================ =============================================================
+
+File /sys/block/zram<id>/bd_stat
+
+The stat file represents device's backing device statistics. It consists of
+a single line of text and contains the following stats separated by whitespace:
+
+ ============== =============================================================
+ bd_count	size of data written in backing device.
+		Unit: 4K bytes
+ bd_reads	the number of reads from backing device
+		Unit: 4K bytes
+ bd_writes	the number of writes to backing device
+		Unit: 4K bytes
+ ============== =============================================================
+
+9) Deactivate
+=============
+
+::
+
+	swapoff /dev/zram0
+	umount /dev/zram1
+
+10) Reset
+=========
+
+	Write any positive value to 'reset' sysfs node::
+
+		echo 1 > /sys/block/zram0/reset
+		echo 1 > /sys/block/zram1/reset
+
+	This frees all the memory allocated for the given device and
+	resets the disksize to zero. You must set the disksize again
+	before reusing the device.
+
+Optional Feature
+================
+
+writeback
+---------
+
+With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
+to backing storage rather than keeping it in memory.
+To use the feature, admin should set up backing device via::
+
+	echo /dev/sda5 > /sys/block/zramX/backing_dev
+
+before disksize setting. It supports only partition at this moment.
+If admin want to use incompressible page writeback, they could do via::
+
+	echo huge > /sys/block/zramX/write
+
+To use idle page writeback, first, user need to declare zram pages
+as idle::
+
+	echo all > /sys/block/zramX/idle
+
+From now on, any pages on zram are idle pages. The idle mark
+will be removed until someone request access of the block.
+IOW, unless there is access request, those pages are still idle pages.
+
+Admin can request writeback of those idle pages at right timing via::
+
+	echo idle > /sys/block/zramX/writeback
+
+With the command, zram writeback idle pages from memory to the storage.
+
+If there are lots of write IO with flash device, potentially, it has
+flash wearout problem so that admin needs to design write limitation
+to guarantee storage health for entire product life.
+
+To overcome the concern, zram supports "writeback_limit" feature.
+The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
+any writeback. IOW, if admin want to apply writeback budget, he should
+enable writeback_limit_enable via::
+
+	$ echo 1 > /sys/block/zramX/writeback_limit_enable
+
+Once writeback_limit_enable is set, zram doesn't allow any writeback
+until admin set the budget via /sys/block/zramX/writeback_limit.
+
+(If admin doesn't enable writeback_limit_enable, writeback_limit's value
+assigned via /sys/block/zramX/writeback_limit is meaninless.)
+
+If admin want to limit writeback as per-day 400M, he could do it
+like below::
+
+	$ MB_SHIFT=20
+	$ 4K_SHIFT=12
+	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
+		/sys/block/zram0/writeback_limit.
+	$ echo 1 > /sys/block/zram0/writeback_limit_enable
+
+If admin want to allow further write again once the bugdet is exausted,
+he could do it like below::
+
+	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
+		/sys/block/zram0/writeback_limit
+
+If admin want to see remaining writeback budget since he set::
+
+	$ cat /sys/block/zramX/writeback_limit
+
+If admin want to disable writeback limit, he could do::
+
+	$ echo 0 > /sys/block/zramX/writeback_limit_enable
+
+The writeback_limit count will reset whenever you reset zram(e.g.,
+system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
+writeback happened until you reset the zram to allocate extra writeback
+budget in next setting is user's job.
+
+If admin want to measure writeback count in a certain period, he could
+know it via /sys/block/zram0/bd_stat's 3rd column.
+
+memory tracking
+===============
+
+With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
+zram block. It could be useful to catch cold or incompressible
+pages of the process with*pagemap.
+
+If you enable the feature, you could see block state via
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
+
+	  300    75.033841 .wh.
+	  301    63.806904 s...
+	  302    63.806919 ..hi
+
+First column
+	zram's block index.
+Second column
+	access time since the system was booted
+Third column
+	state of the block:
+
+	s:
+		same page
+	w:
+		written page to backing store
+	h:
+		huge page
+	i:
+		idle page
+
+First line of above example says 300th block is accessed at 75.033841sec
+and the block's state is huge so it is written back to the backing
+storage. It's a debugging feature so anyone shouldn't rely on it to work
+properly.
+
+Nitin Gupta
+ngupta@vflare.org
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
deleted file mode 100644
index 4df0ce271085..000000000000
--- a/Documentation/blockdev/zram.txt
+++ /dev/null
@@ -1,355 +0,0 @@
-zram: Compressed RAM based block devices
-----------------------------------------
-
-* Introduction
-
-The zram module creates RAM based block devices named /dev/zram<id>
-(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
-in memory itself. These disks allow very fast I/O and compression provides
-good amounts of memory savings. Some of the usecases include /tmp storage,
-use as swap disks, various caches under /var and maybe many more :)
-
-Statistics for individual zram devices are exported through sysfs nodes at
-/sys/block/zram<id>/
-
-* Usage
-
-There are several ways to configure and manage zram device(-s):
-a) using zram and zram_control sysfs attributes
-b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org).
-
-In this document we will describe only 'manual' zram configuration steps,
-IOW, zram and zram_control sysfs attributes.
-
-In order to get a better idea about zramctl please consult util-linux
-documentation, zramctl man-page or `zramctl --help'. Please be informed
-that zram maintainers do not develop/maintain util-linux or zramctl, should
-you have any questions please contact util-linux@vger.kernel.org
-
-Following shows a typical sequence of steps for using zram.
-
-WARNING
-=======
-For the sake of simplicity we skip error checking parts in most of the
-examples below. However, it is your sole responsibility to handle errors.
-
-zram sysfs attributes always return negative values in case of errors.
-The list of possible return codes:
--EBUSY	-- an attempt to modify an attribute that cannot be changed once
-the device has been initialised. Please reset device first;
--ENOMEM	-- zram was not able to allocate enough memory to fulfil your
-needs;
--EINVAL	-- invalid input has been provided.
-
-If you use 'echo', the returned value that is changed by 'echo' utility,
-and, in general case, something like:
-
-	echo 3 > /sys/block/zram0/max_comp_streams
-	if [ $? -ne 0 ];
-		handle_error
-	fi
-
-should suffice.
-
-1) Load Module:
-	modprobe zram num_devices=4
-	This creates 4 devices: /dev/zram{0,1,2,3}
-
-num_devices parameter is optional and tells zram how many devices should be
-pre-created. Default: 1.
-
-2) Set max number of compression streams
-Regardless the value passed to this attribute, ZRAM will always
-allocate multiple compression streams - one per online CPUs - thus
-allowing several concurrent compression operations. The number of
-allocated compression streams goes down when some of the CPUs
-become offline. There is no single-compression-stream mode anymore,
-unless you are running a UP system or has only 1 CPU online.
-
-To find out how many streams are currently available:
-	cat /sys/block/zram0/max_comp_streams
-
-3) Select compression algorithm
-Using comp_algorithm device attribute one can see available and
-currently selected (shown in square brackets) compression algorithms,
-change selected compression algorithm (once the device is initialised
-there is no way to change compression algorithm).
-
-Examples:
-	#show supported compression algorithms
-	cat /sys/block/zram0/comp_algorithm
-	lzo [lz4]
-
-	#select lzo compression algorithm
-	echo lzo > /sys/block/zram0/comp_algorithm
-
-For the time being, the `comp_algorithm' content does not necessarily
-show every compression algorithm supported by the kernel. We keep this
-list primarily to simplify device configuration and one can configure
-a new device with a compression algorithm that is not listed in
-`comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API
-and, if some of the algorithms were built as modules, it's impossible
-to list all of them using, for instance, /proc/crypto or any other
-method. This, however, has an advantage of permitting the usage of
-custom crypto compression modules (implementing S/W or H/W compression).
-
-4) Set Disksize
-Set disk size by writing the value to sysfs node 'disksize'.
-The value can be either in bytes or you can use mem suffixes.
-Examples:
-	# Initialize /dev/zram0 with 50MB disksize
-	echo $((50*1024*1024)) > /sys/block/zram0/disksize
-
-	# Using mem suffixes
-	echo 256K > /sys/block/zram0/disksize
-	echo 512M > /sys/block/zram0/disksize
-	echo 1G > /sys/block/zram0/disksize
-
-Note:
-There is little point creating a zram of greater than twice the size of memory
-since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
-size of the disk when not in use so a huge zram is wasteful.
-
-5) Set memory limit: Optional
-Set memory limit by writing the value to sysfs node 'mem_limit'.
-The value can be either in bytes or you can use mem suffixes.
-In addition, you could change the value in runtime.
-Examples:
-	# limit /dev/zram0 with 50MB memory
-	echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
-
-	# Using mem suffixes
-	echo 256K > /sys/block/zram0/mem_limit
-	echo 512M > /sys/block/zram0/mem_limit
-	echo 1G > /sys/block/zram0/mem_limit
-
-	# To disable memory limit
-	echo 0 > /sys/block/zram0/mem_limit
-
-6) Activate:
-	mkswap /dev/zram0
-	swapon /dev/zram0
-
-	mkfs.ext4 /dev/zram1
-	mount /dev/zram1 /tmp
-
-7) Add/remove zram devices
-
-zram provides a control interface, which enables dynamic (on-demand) device
-addition and removal.
-
-In order to add a new /dev/zramX device, perform read operation on hot_add
-attribute. This will return either new device's device id (meaning that you
-can use /dev/zram<id>) or error code.
-
-Example:
-	cat /sys/class/zram-control/hot_add
-	1
-
-To remove the existing /dev/zramX device (where X is a device id)
-execute
-	echo X > /sys/class/zram-control/hot_remove
-
-8) Stats:
-Per-device statistics are exported as various nodes under /sys/block/zram<id>/
-
-A brief description of exported device attributes. For more details please
-read Documentation/ABI/testing/sysfs-block-zram.
-
-Name            	access            description
-----            	------            -----------
-disksize          	RW	show and set the device's disk size
-initstate         	RO	shows the initialization state of the device
-reset             	WO	trigger device reset
-mem_used_max      	WO	reset the `mem_used_max' counter (see later)
-mem_limit         	WO	specifies the maximum amount of memory ZRAM can use
-				to store the compressed data
-writeback_limit   	WO	specifies the maximum amount of write IO zram can
-				write out to backing device as 4KB unit
-writeback_limit_enable  RW	show and set writeback_limit feature
-max_comp_streams  	RW	the number of possible concurrent compress operations
-comp_algorithm    	RW	show and change the compression algorithm
-compact           	WO	trigger memory compaction
-debug_stat        	RO	this file is used for zram debugging purposes
-backing_dev	  	RW	set up backend storage for zram to write out
-idle		  	WO	mark allocated slot as idle
-
-
-User space is advised to use the following files to read the device statistics.
-
-File /sys/block/zram<id>/stat
-
-Represents block layer statistics. Read Documentation/block/stat.txt for
-details.
-
-File /sys/block/zram<id>/io_stat
-
-The stat file represents device's I/O statistics not accounted by block
-layer and, thus, not available in zram<id>/stat file. It consists of a
-single line of text and contains the following stats separated by
-whitespace:
- failed_reads     the number of failed reads
- failed_writes    the number of failed writes
- invalid_io       the number of non-page-size-aligned I/O requests
- notify_free      Depending on device usage scenario it may account
-                  a) the number of pages freed because of swap slot free
-                  notifications or b) the number of pages freed because of
-                  REQ_OP_DISCARD requests sent by bio. The former ones are
-                  sent to a swap block device when a swap slot is freed,
-                  which implies that this disk is being used as a swap disk.
-                  The latter ones are sent by filesystem mounted with
-                  discard option, whenever some data blocks are getting
-                  discarded.
-
-File /sys/block/zram<id>/mm_stat
-
-The stat file represents device's mm statistics. It consists of a single
-line of text and contains the following stats separated by whitespace:
- orig_data_size   uncompressed size of data stored in this disk.
-		  This excludes same-element-filled pages (same_pages) since
-		  no memory is allocated for them.
-                  Unit: bytes
- compr_data_size  compressed size of data stored in this disk
- mem_used_total   the amount of memory allocated for this disk. This
-                  includes allocator fragmentation and metadata overhead,
-                  allocated for this disk. So, allocator space efficiency
-                  can be calculated using compr_data_size and this statistic.
-                  Unit: bytes
- mem_limit        the maximum amount of memory ZRAM can use to store
-                  the compressed data
- mem_used_max     the maximum amount of memory zram have consumed to
-                  store the data
- same_pages       the number of same element filled pages written to this disk.
-                  No memory is allocated for such pages.
- pages_compacted  the number of pages freed during compaction
- huge_pages	  the number of incompressible pages
-
-File /sys/block/zram<id>/bd_stat
-
-The stat file represents device's backing device statistics. It consists of
-a single line of text and contains the following stats separated by whitespace:
- bd_count	size of data written in backing device.
-		Unit: 4K bytes
- bd_reads	the number of reads from backing device
-		Unit: 4K bytes
- bd_writes	the number of writes to backing device
-		Unit: 4K bytes
-
-9) Deactivate:
-	swapoff /dev/zram0
-	umount /dev/zram1
-
-10) Reset:
-	Write any positive value to 'reset' sysfs node
-	echo 1 > /sys/block/zram0/reset
-	echo 1 > /sys/block/zram1/reset
-
-	This frees all the memory allocated for the given device and
-	resets the disksize to zero. You must set the disksize again
-	before reusing the device.
-
-* Optional Feature
-
-= writeback
-
-With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
-to backing storage rather than keeping it in memory.
-To use the feature, admin should set up backing device via
-
-	"echo /dev/sda5 > /sys/block/zramX/backing_dev"
-
-before disksize setting. It supports only partition at this moment.
-If admin want to use incompressible page writeback, they could do via
-
-	"echo huge > /sys/block/zramX/write"
-
-To use idle page writeback, first, user need to declare zram pages
-as idle.
-
-	"echo all > /sys/block/zramX/idle"
-
-From now on, any pages on zram are idle pages. The idle mark
-will be removed until someone request access of the block.
-IOW, unless there is access request, those pages are still idle pages.
-
-Admin can request writeback of those idle pages at right timing via
-
-	"echo idle > /sys/block/zramX/writeback"
-
-With the command, zram writeback idle pages from memory to the storage.
-
-If there are lots of write IO with flash device, potentially, it has
-flash wearout problem so that admin needs to design write limitation
-to guarantee storage health for entire product life.
-
-To overcome the concern, zram supports "writeback_limit" feature.
-The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
-any writeback. IOW, if admin want to apply writeback budget, he should
-enable writeback_limit_enable via
-
-	$ echo 1 > /sys/block/zramX/writeback_limit_enable
-
-Once writeback_limit_enable is set, zram doesn't allow any writeback
-until admin set the budget via /sys/block/zramX/writeback_limit.
-
-(If admin doesn't enable writeback_limit_enable, writeback_limit's value
-assigned via /sys/block/zramX/writeback_limit is meaninless.)
-
-If admin want to limit writeback as per-day 400M, he could do it
-like below.
-
-	$ MB_SHIFT=20
-	$ 4K_SHIFT=12
-	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
-		/sys/block/zram0/writeback_limit.
-	$ echo 1 > /sys/block/zram0/writeback_limit_enable
-
-If admin want to allow further write again once the bugdet is exausted,
-he could do it like below
-
-	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
-		/sys/block/zram0/writeback_limit
-
-If admin want to see remaining writeback budget since he set,
-
-	$ cat /sys/block/zramX/writeback_limit
-
-If admin want to disable writeback limit, he could do
-
-	$ echo 0 > /sys/block/zramX/writeback_limit_enable
-
-The writeback_limit count will reset whenever you reset zram(e.g.,
-system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
-writeback happened until you reset the zram to allocate extra writeback
-budget in next setting is user's job.
-
-If admin want to measure writeback count in a certain period, he could
-know it via /sys/block/zram0/bd_stat's 3rd column.
-
-= memory tracking
-
-With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
-zram block. It could be useful to catch cold or incompressible
-pages of the process with*pagemap.
-If you enable the feature, you could see block state via
-/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
-
-	  300    75.033841 .wh.
-	  301    63.806904 s...
-	  302    63.806919 ..hi
-
-First column is zram's block index.
-Second column is access time since the system was booted
-Third column is state of the block.
-(s: same page
-w: written page to backing store
-h: huge page
-i: idle page)
-
-First line of above example says 300th block is accessed at 75.033841sec
-and the block's state is huge so it is written back to the backing
-storage. It's a debugging feature so anyone shouldn't rely on it to work
-properly.
-
-Nitin Gupta
-ngupta@vflare.org
diff --git a/MAINTAINERS b/MAINTAINERS
index 3ee73751f56c..ec541c8dc645 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11076,7 +11076,7 @@ M:	Josef Bacik <josef@toxicpanda.com>
 S:	Maintained
 L:	linux-block@vger.kernel.org
 L:	nbd@other.debian.org
-F:	Documentation/blockdev/nbd.txt
+F:	Documentation/blockdev/nbd.rst
 F:	drivers/block/nbd.c
 F:	include/trace/events/nbd.h
 F:	include/uapi/linux/nbd.h
@@ -12086,7 +12086,7 @@ PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
 M:	Tim Waugh <tim@cyberelk.net>
 L:	linux-parport@lists.infradead.org (subscribers-only)
 S:	Maintained
-F:	Documentation/blockdev/paride.txt
+F:	Documentation/blockdev/paride.rst
 F:	drivers/block/paride/
 
 PARISC ARCHITECTURE
@@ -13367,7 +13367,7 @@ F:	drivers/net/wireless/ralink/rt2x00/
 RAMDISK RAM BLOCK DEVICE DRIVER
 M:	Jens Axboe <axboe@kernel.dk>
 S:	Maintained
-F:	Documentation/blockdev/ramdisk.txt
+F:	Documentation/blockdev/ramdisk.rst
 F:	drivers/block/brd.c
 
 RANCHU VIRTUAL BOARD FOR MIPS
@@ -17723,7 +17723,7 @@ R:	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
 L:	linux-kernel@vger.kernel.org
 S:	Maintained
 F:	drivers/block/zram/
-F:	Documentation/blockdev/zram.txt
+F:	Documentation/blockdev/zram.rst
 
 ZS DECSTATION Z85C30 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro@linux-mips.org>
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 96ec7e0fc1ea..c43690b973d8 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -31,7 +31,7 @@ config BLK_DEV_FD
 	  If you want to use the floppy disk drive(s) of your PC under Linux,
 	  say Y. Information about this driver, especially important for IBM
 	  Thinkpad users, is contained in
-	  <file:Documentation/blockdev/floppy.txt>.
+	  <file:Documentation/blockdev/floppy.rst>.
 	  That file also contains the location of the Floppy driver FAQ as
 	  well as location of the fdutils package used to configure additional
 	  parameters of the driver at run time.
@@ -96,7 +96,7 @@ config PARIDE
 	  your computer's parallel port. Most of them are actually IDE devices
 	  using a parallel port IDE adapter. This option enables the PARIDE
 	  subsystem which contains drivers for many of these external drives.
-	  Read <file:Documentation/blockdev/paride.txt> for more information.
+	  Read <file:Documentation/blockdev/paride.rst> for more information.
 
 	  If you have said Y to the "Parallel-port support" configuration
 	  option, you may share a single port between your printer and other
@@ -261,7 +261,7 @@ config BLK_DEV_NBD
 	  userland (making server and client physically the same computer,
 	  communicating using the loopback network device).
 
-	  Read <file:Documentation/blockdev/nbd.txt> for more information,
+	  Read <file:Documentation/blockdev/nbd.rst> for more information,
 	  especially about where to find the server code, which runs in user
 	  space and does not need special kernel support.
 
@@ -303,7 +303,7 @@ config BLK_DEV_RAM
 	  during the initial install of Linux.
 
 	  Note that the kernel command line option "ramdisk=XX" is now obsolete.
-	  For details, read <file:Documentation/blockdev/ramdisk.txt>.
+	  For details, read <file:Documentation/blockdev/ramdisk.rst>.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called brd. An alias "rd" has been defined
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index b933a7eea52b..5c99e52f9dc1 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4424,7 +4424,7 @@ static int __init floppy_setup(char *str)
 		pr_cont("\n");
 	} else
 		DPRINT("botched floppy option\n");
-	DPRINT("Read Documentation/blockdev/floppy.txt\n");
+	DPRINT("Read Documentation/blockdev/floppy.rst\n");
 	return 0;
 }
 
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 1ffc64770643..e06b99d54816 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -12,7 +12,7 @@ config ZRAM
 	  It has several use cases, for example: /tmp storage, use as swap
 	  disks and maybe many more.
 
-	  See Documentation/blockdev/zram.txt for more information.
+	  See Documentation/blockdev/zram.rst for more information.
 
 config ZRAM_WRITEBACK
        bool "Write back incompressible or idle page to backing device"
@@ -26,7 +26,7 @@ config ZRAM_WRITEBACK
 	 With /sys/block/zramX/{idle,writeback}, application could ask
 	 idle page's writeback to the backing device to save in memory.
 
-	 See Documentation/blockdev/zram.txt for more information.
+	 See Documentation/blockdev/zram.rst for more information.
 
 config ZRAM_MEMORY_TRACKING
 	bool "Track zRam block status"
@@ -36,4 +36,4 @@ config ZRAM_MEMORY_TRACKING
 	  of zRAM. Admin could see the information via
 	  /sys/kernel/debug/zram/zramX/block_state.
 
-	  See Documentation/blockdev/zram.txt for more information.
+	  See Documentation/blockdev/zram.rst for more information.
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 7972cc512408..5fa378391d3b 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -37,4 +37,4 @@ Commands required for testing:
  - mkfs/ mkfs.ext4
 
 For more information please refer:
-kernel-source-tree/Documentation/blockdev/zram.txt
+kernel-source-tree/Documentation/blockdev/zram.rst
-- 
cgit v1.2.3-55-g7522


From 6baec31591cee0f2f6d446abb81c828499a6ed23 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 17:35:40 -0300
Subject: docs: perf: convert to ReST

Rename the perf documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/perf/arm-ccn.rst       | 61 ++++++++++++++++++++++++++++++++++++
 Documentation/perf/arm-ccn.txt       | 59 ----------------------------------
 Documentation/perf/arm_dsu_pmu.rst   | 29 +++++++++++++++++
 Documentation/perf/arm_dsu_pmu.txt   | 28 -----------------
 Documentation/perf/hisi-pmu.rst      | 60 +++++++++++++++++++++++++++++++++++
 Documentation/perf/hisi-pmu.txt      | 53 -------------------------------
 Documentation/perf/index.rst         | 16 ++++++++++
 Documentation/perf/qcom_l2_pmu.rst   | 39 +++++++++++++++++++++++
 Documentation/perf/qcom_l2_pmu.txt   | 38 ----------------------
 Documentation/perf/qcom_l3_pmu.rst   | 26 +++++++++++++++
 Documentation/perf/qcom_l3_pmu.txt   | 25 ---------------
 Documentation/perf/thunderx2-pmu.rst | 42 +++++++++++++++++++++++++
 Documentation/perf/thunderx2-pmu.txt | 41 ------------------------
 Documentation/perf/xgene-pmu.rst     | 49 +++++++++++++++++++++++++++++
 Documentation/perf/xgene-pmu.txt     | 48 ----------------------------
 MAINTAINERS                          |  4 +--
 drivers/perf/qcom_l3_pmu.c           |  2 +-
 17 files changed, 325 insertions(+), 295 deletions(-)
 create mode 100644 Documentation/perf/arm-ccn.rst
 delete mode 100644 Documentation/perf/arm-ccn.txt
 create mode 100644 Documentation/perf/arm_dsu_pmu.rst
 delete mode 100644 Documentation/perf/arm_dsu_pmu.txt
 create mode 100644 Documentation/perf/hisi-pmu.rst
 delete mode 100644 Documentation/perf/hisi-pmu.txt
 create mode 100644 Documentation/perf/index.rst
 create mode 100644 Documentation/perf/qcom_l2_pmu.rst
 delete mode 100644 Documentation/perf/qcom_l2_pmu.txt
 create mode 100644 Documentation/perf/qcom_l3_pmu.rst
 delete mode 100644 Documentation/perf/qcom_l3_pmu.txt
 create mode 100644 Documentation/perf/thunderx2-pmu.rst
 delete mode 100644 Documentation/perf/thunderx2-pmu.txt
 create mode 100644 Documentation/perf/xgene-pmu.rst
 delete mode 100644 Documentation/perf/xgene-pmu.txt

diff --git a/Documentation/perf/arm-ccn.rst b/Documentation/perf/arm-ccn.rst
new file mode 100644
index 000000000000..832b0c64023a
--- /dev/null
+++ b/Documentation/perf/arm-ccn.rst
@@ -0,0 +1,61 @@
+==========================
+ARM Cache Coherent Network
+==========================
+
+CCN-504 is a ring-bus interconnect consisting of 11 crosspoints
+(XPs), with each crosspoint supporting up to two device ports,
+so nodes (devices) 0 and 1 are connected to crosspoint 0,
+nodes 2 and 3 to crosspoint 1 etc.
+
+PMU (perf) driver
+-----------------
+
+The CCN driver registers a perf PMU driver, which provides
+description of available events and configuration options
+in sysfs, see /sys/bus/event_source/devices/ccn*.
+
+The "format" directory describes format of the config, config1
+and config2 fields of the perf_event_attr structure. The "events"
+directory provides configuration templates for all documented
+events, that can be used with perf tool. For example "xp_valid_flit"
+is an equivalent of "type=0x8,event=0x4". Other parameters must be
+explicitly specified.
+
+For events originating from device, "node" defines its index.
+
+Crosspoint PMU events require "xp" (index), "bus" (bus number)
+and "vc" (virtual channel ID).
+
+Crosspoint watchpoint-based events (special "event" value 0xfe)
+require "xp" and "vc" as as above plus "port" (device port index),
+"dir" (transmit/receive direction), comparator values ("cmp_l"
+and "cmp_h") and "mask", being index of the comparator mask.
+
+Masks are defined separately from the event description
+(due to limited number of the config values) in the "cmp_mask"
+directory, with first 8 configurable by user and additional
+4 hardcoded for the most frequent use cases.
+
+Cycle counter is described by a "type" value 0xff and does
+not require any other settings.
+
+The driver also provides a "cpumask" sysfs attribute, which contains
+a single CPU ID, of the processor which will be used to handle all
+the CCN PMU events. It is recommended that the user space tools
+request the events on this processor (if not, the perf_event->cpu value
+will be overwritten anyway). In case of this processor being offlined,
+the events are migrated to another one and the attribute is updated.
+
+Example of perf tool use::
+
+  / # perf list | grep ccn
+    ccn/cycles/                                        [Kernel PMU event]
+  <...>
+    ccn/xp_valid_flit,xp=?,port=?,vc=?,dir=?/          [Kernel PMU event]
+  <...>
+
+  / # perf stat -a -e ccn/cycles/,ccn/xp_valid_flit,xp=1,port=0,vc=1,dir=1/ \
+                                                                         sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/perf/arm-ccn.txt b/Documentation/perf/arm-ccn.txt
deleted file mode 100644
index 15cdb7bc57c3..000000000000
--- a/Documentation/perf/arm-ccn.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-ARM Cache Coherent Network
-==========================
-
-CCN-504 is a ring-bus interconnect consisting of 11 crosspoints
-(XPs), with each crosspoint supporting up to two device ports,
-so nodes (devices) 0 and 1 are connected to crosspoint 0,
-nodes 2 and 3 to crosspoint 1 etc.
-
-PMU (perf) driver
------------------
-
-The CCN driver registers a perf PMU driver, which provides
-description of available events and configuration options
-in sysfs, see /sys/bus/event_source/devices/ccn*.
-
-The "format" directory describes format of the config, config1
-and config2 fields of the perf_event_attr structure. The "events"
-directory provides configuration templates for all documented
-events, that can be used with perf tool. For example "xp_valid_flit"
-is an equivalent of "type=0x8,event=0x4". Other parameters must be
-explicitly specified.
-
-For events originating from device, "node" defines its index.
-
-Crosspoint PMU events require "xp" (index), "bus" (bus number)
-and "vc" (virtual channel ID).
-
-Crosspoint watchpoint-based events (special "event" value 0xfe)
-require "xp" and "vc" as as above plus "port" (device port index),
-"dir" (transmit/receive direction), comparator values ("cmp_l"
-and "cmp_h") and "mask", being index of the comparator mask.
-Masks are defined separately from the event description
-(due to limited number of the config values) in the "cmp_mask"
-directory, with first 8 configurable by user and additional
-4 hardcoded for the most frequent use cases.
-
-Cycle counter is described by a "type" value 0xff and does
-not require any other settings.
-
-The driver also provides a "cpumask" sysfs attribute, which contains
-a single CPU ID, of the processor which will be used to handle all
-the CCN PMU events. It is recommended that the user space tools
-request the events on this processor (if not, the perf_event->cpu value
-will be overwritten anyway). In case of this processor being offlined,
-the events are migrated to another one and the attribute is updated.
-
-Example of perf tool use:
-
-/ # perf list | grep ccn
-  ccn/cycles/                                        [Kernel PMU event]
-<...>
-  ccn/xp_valid_flit,xp=?,port=?,vc=?,dir=?/          [Kernel PMU event]
-<...>
-
-/ # perf stat -a -e ccn/cycles/,ccn/xp_valid_flit,xp=1,port=0,vc=1,dir=1/ \
-                                                                       sleep 1
-
-The driver does not support sampling, therefore "perf record" will
-not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/perf/arm_dsu_pmu.rst b/Documentation/perf/arm_dsu_pmu.rst
new file mode 100644
index 000000000000..7fd34db75d13
--- /dev/null
+++ b/Documentation/perf/arm_dsu_pmu.rst
@@ -0,0 +1,29 @@
+==================================
+ARM DynamIQ Shared Unit (DSU) PMU
+==================================
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under::
+
+  /sys/bus/event_sources/devices/arm_dsu_<N>/
+
+The user should refer to the TRM of the product to figure out the supported events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in "associated_cpus".
+
+
+e.g usage::
+
+	perf stat -a -e arm_dsu_0/cycles/
diff --git a/Documentation/perf/arm_dsu_pmu.txt b/Documentation/perf/arm_dsu_pmu.txt
deleted file mode 100644
index d611e15f5add..000000000000
--- a/Documentation/perf/arm_dsu_pmu.txt
+++ /dev/null
@@ -1,28 +0,0 @@
-ARM DynamIQ Shared Unit (DSU) PMU
-==================================
-
-ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
-control logic and external interfaces to form a multicore cluster. The PMU
-allows counting the various events related to the L3 cache, Snoop Control Unit
-etc, using 32bit independent counters. It also provides a 64bit cycle counter.
-
-The PMU can only be accessed via CPU system registers and are common to the
-cores connected to the same DSU. Like most of the other uncore PMUs, DSU
-PMU doesn't support process specific events and cannot be used in sampling mode.
-
-The DSU provides a bitmap for a subset of implemented events via hardware
-registers. There is no way for the driver to determine if the other events
-are available or not. Hence the driver exposes only those events advertised
-by the DSU, in "events" directory under :
-
-  /sys/bus/event_sources/devices/arm_dsu_<N>/
-
-The user should refer to the TRM of the product to figure out the supported events
-and use the raw event code for the unlisted events.
-
-The driver also exposes the CPUs connected to the DSU instance in "associated_cpus".
-
-
-e.g usage :
-
-	perf stat -a -e arm_dsu_0/cycles/
diff --git a/Documentation/perf/hisi-pmu.rst b/Documentation/perf/hisi-pmu.rst
new file mode 100644
index 000000000000..404a5c3d9d00
--- /dev/null
+++ b/Documentation/perf/hisi-pmu.rst
@@ -0,0 +1,60 @@
+======================================================
+HiSilicon SoC uncore Performance Monitoring Unit (PMU)
+======================================================
+
+The HiSilicon SoC chip includes various independent system device PMUs
+such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
+independent and have hardware logic to gather statistics and performance
+information.
+
+The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
+(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
+called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
+two HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
+
+HiSilicon SoC uncore PMU driver
+-------------------------------
+
+Each device PMU has separate registers for event counting, control and
+interrupt, and the PMU driver shall register perf PMU drivers like L3C,
+HHA and DDRC etc. The available events and configuration options shall
+be described in the sysfs, see:
+
+/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
+/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
+The "perf list" command shall list the available events from sysfs.
+
+Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
+name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
+where "sccl-id" is the identifier of the SCCL and "index-id" is the index of
+module.
+
+e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
+SCCL ID #3.
+
+e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
+SCCL ID #1.
+
+The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
+ID used to count the uncore PMU event.
+
+Example usage of perf::
+
+  $# perf list
+  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+
+  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
+  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
+
+The current driver does not support sampling. So "perf record" is unsupported.
+Also attach to a task is unsupported as the events are all uncore.
+
+Note: Please contact the maintainer for a complete list of events supported for
+the PMU devices in the SoC and its information if needed.
diff --git a/Documentation/perf/hisi-pmu.txt b/Documentation/perf/hisi-pmu.txt
deleted file mode 100644
index 267a028b2741..000000000000
--- a/Documentation/perf/hisi-pmu.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-HiSilicon SoC uncore Performance Monitoring Unit (PMU)
-======================================================
-The HiSilicon SoC chip includes various independent system device PMUs
-such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
-independent and have hardware logic to gather statistics and performance
-information.
-
-The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
-(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
-called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
-two HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
-
-HiSilicon SoC uncore PMU driver
----------------------------------------
-Each device PMU has separate registers for event counting, control and
-interrupt, and the PMU driver shall register perf PMU drivers like L3C,
-HHA and DDRC etc. The available events and configuration options shall
-be described in the sysfs, see :
-/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
-/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
-The "perf list" command shall list the available events from sysfs.
-
-Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
-name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
-where "sccl-id" is the identifier of the SCCL and "index-id" is the index of
-module.
-e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
-SCCL ID #3.
-e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
-SCCL ID #1.
-
-The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
-ID used to count the uncore PMU event.
-
-Example usage of perf:
-$# perf list
-hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
-------------------------------------------
-hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
-------------------------------------------
-hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
-------------------------------------------
-hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
-------------------------------------------
-
-$# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
-$# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
-
-The current driver does not support sampling. So "perf record" is unsupported.
-Also attach to a task is unsupported as the events are all uncore.
-
-Note: Please contact the maintainer for a complete list of events supported for
-the PMU devices in the SoC and its information if needed.
diff --git a/Documentation/perf/index.rst b/Documentation/perf/index.rst
new file mode 100644
index 000000000000..4bf848e27f26
--- /dev/null
+++ b/Documentation/perf/index.rst
@@ -0,0 +1,16 @@
+:orphan:
+
+===========================
+Performance monitor support
+===========================
+
+.. toctree::
+   :maxdepth: 1
+
+   hisi-pmu
+   qcom_l2_pmu
+   qcom_l3_pmu
+   arm-ccn
+   xgene-pmu
+   arm_dsu_pmu
+   thunderx2-pmu
diff --git a/Documentation/perf/qcom_l2_pmu.rst b/Documentation/perf/qcom_l2_pmu.rst
new file mode 100644
index 000000000000..c130178a4a55
--- /dev/null
+++ b/Documentation/perf/qcom_l2_pmu.rst
@@ -0,0 +1,39 @@
+=====================================================================
+Qualcomm Technologies Level-2 Cache Performance Monitoring Unit (PMU)
+=====================================================================
+
+This driver supports the L2 cache clusters found in Qualcomm Technologies
+Centriq SoCs. There are multiple physical L2 cache clusters, each with their
+own PMU. Each cluster has one or more CPUs associated with it.
+
+There is one logical L2 PMU exposed, which aggregates the results from
+the physical PMUs.
+
+The driver provides a description of its available events and configuration
+options in sysfs, see /sys/devices/l2cache_0.
+
+The "format" directory describes the format of the events.
+
+Events can be envisioned as a 2-dimensional array. Each column represents
+a group of events. There are 8 groups. Only one entry from each
+group can be in use at a time. If multiple events from the same group
+are specified, the conflicting events cannot be counted at the same time.
+
+Events are specified as 0xCCG, where CC is 2 hex digits specifying
+the code (array row) and G specifies the group (column) 0-7.
+
+In addition there is a cycle counter event specified by the value 0xFE
+which is outside the above scheme.
+
+The driver provides a "cpumask" sysfs attribute which contains a mask
+consisting of one CPU per cluster which will be used to handle all the PMU
+events on that cluster.
+
+Examples for use with perf::
+
+  perf stat -e l2cache_0/config=0x001/,l2cache_0/config=0x042/ -a sleep 1
+
+  perf stat -e l2cache_0/config=0xfe/ -C 2 sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task perf sessions are not supported.
diff --git a/Documentation/perf/qcom_l2_pmu.txt b/Documentation/perf/qcom_l2_pmu.txt
deleted file mode 100644
index b25b97659ab9..000000000000
--- a/Documentation/perf/qcom_l2_pmu.txt
+++ /dev/null
@@ -1,38 +0,0 @@
-Qualcomm Technologies Level-2 Cache Performance Monitoring Unit (PMU)
-=====================================================================
-
-This driver supports the L2 cache clusters found in Qualcomm Technologies
-Centriq SoCs. There are multiple physical L2 cache clusters, each with their
-own PMU. Each cluster has one or more CPUs associated with it.
-
-There is one logical L2 PMU exposed, which aggregates the results from
-the physical PMUs.
-
-The driver provides a description of its available events and configuration
-options in sysfs, see /sys/devices/l2cache_0.
-
-The "format" directory describes the format of the events.
-
-Events can be envisioned as a 2-dimensional array. Each column represents
-a group of events. There are 8 groups. Only one entry from each
-group can be in use at a time. If multiple events from the same group
-are specified, the conflicting events cannot be counted at the same time.
-
-Events are specified as 0xCCG, where CC is 2 hex digits specifying
-the code (array row) and G specifies the group (column) 0-7.
-
-In addition there is a cycle counter event specified by the value 0xFE
-which is outside the above scheme.
-
-The driver provides a "cpumask" sysfs attribute which contains a mask
-consisting of one CPU per cluster which will be used to handle all the PMU
-events on that cluster.
-
-Examples for use with perf:
-
-  perf stat -e l2cache_0/config=0x001/,l2cache_0/config=0x042/ -a sleep 1
-
-  perf stat -e l2cache_0/config=0xfe/ -C 2 sleep 1
-
-The driver does not support sampling, therefore "perf record" will
-not work. Per-task perf sessions are not supported.
diff --git a/Documentation/perf/qcom_l3_pmu.rst b/Documentation/perf/qcom_l3_pmu.rst
new file mode 100644
index 000000000000..a3d014a46bfd
--- /dev/null
+++ b/Documentation/perf/qcom_l3_pmu.rst
@@ -0,0 +1,26 @@
+===========================================================================
+Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
+===========================================================================
+
+This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
+Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
+by all cores within a socket. Each slice is exposed as a separate uncore perf
+PMU with device name l3cache_<socket>_<instance>. User space is responsible
+for aggregating across slices.
+
+The driver provides a description of its available events and configuration
+options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
+the driver also exposes a "cpumask" sysfs attribute which contains a mask
+consisting of one CPU per socket which will be used to handle all the PMU
+events on that socket.
+
+The hardware implements 32bit event counters and has a flat 8bit event space
+exposed via the "event" format attribute. In addition to the 32bit physical
+counters the driver supports virtual 64bit hardware counters by using hardware
+counter chaining. This feature is exposed via the "lc" (long counter) format
+flag. E.g.::
+
+  perf stat -e l3cache_0_0/read-miss,lc/
+
+Given that these are uncore PMUs the driver does not support sampling, therefore
+"perf record" will not work. Per-task perf sessions are not supported.
diff --git a/Documentation/perf/qcom_l3_pmu.txt b/Documentation/perf/qcom_l3_pmu.txt
deleted file mode 100644
index 96b3a9444a0d..000000000000
--- a/Documentation/perf/qcom_l3_pmu.txt
+++ /dev/null
@@ -1,25 +0,0 @@
-Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
-===========================================================================
-
-This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
-Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
-by all cores within a socket. Each slice is exposed as a separate uncore perf
-PMU with device name l3cache_<socket>_<instance>. User space is responsible
-for aggregating across slices.
-
-The driver provides a description of its available events and configuration
-options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
-the driver also exposes a "cpumask" sysfs attribute which contains a mask
-consisting of one CPU per socket which will be used to handle all the PMU
-events on that socket.
-
-The hardware implements 32bit event counters and has a flat 8bit event space
-exposed via the "event" format attribute. In addition to the 32bit physical
-counters the driver supports virtual 64bit hardware counters by using hardware
-counter chaining. This feature is exposed via the "lc" (long counter) format
-flag. E.g.:
-
-  perf stat -e l3cache_0_0/read-miss,lc/
-
-Given that these are uncore PMUs the driver does not support sampling, therefore
-"perf record" will not work. Per-task perf sessions are not supported.
diff --git a/Documentation/perf/thunderx2-pmu.rst b/Documentation/perf/thunderx2-pmu.rst
new file mode 100644
index 000000000000..08e33675853a
--- /dev/null
+++ b/Documentation/perf/thunderx2-pmu.rst
@@ -0,0 +1,42 @@
+=============================================================
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+=============================================================
+
+The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
+PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
+
+The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
+Events are counted for the default channel (i.e. channel 0) and prorated
+to the total number of channels/tiles.
+
+The DMC and L3C support up to 4 counters. Counters are independently
+programmable and can be started and stopped individually. Each counter
+can be set to a different event. Counters are 32-bit and do not support
+an overflow interrupt; they are read every 2 seconds.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
+L3C devices.  Each PMU can be used to count up to 4 events
+simultaneously. The PMUs provide a description of their available events
+and configuration options under sysfs, see
+/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
+
+The driver does not support sampling, therefore "perf record" will not
+work. Per-task perf sessions are also not supported.
+
+Examples::
+
+  # perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
+
+  # perf stat -a -e \
+  uncore_dmc_0/cnt_cycles/,\
+  uncore_dmc_0/data_transfers/,\
+  uncore_dmc_0/read_txns/,\
+  uncore_dmc_0/write_txns/ sleep 1
+
+  # perf stat -a -e \
+  uncore_l3c_0/read_request/,\
+  uncore_l3c_0/read_hit/,\
+  uncore_l3c_0/inv_request/,\
+  uncore_l3c_0/inv_hit/ sleep 1
diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
deleted file mode 100644
index dffc57143736..000000000000
--- a/Documentation/perf/thunderx2-pmu.txt
+++ /dev/null
@@ -1,41 +0,0 @@
-Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
-=============================================================
-
-The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
-PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
-
-The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
-Events are counted for the default channel (i.e. channel 0) and prorated
-to the total number of channels/tiles.
-
-The DMC and L3C support up to 4 counters. Counters are independently
-programmable and can be started and stopped individually. Each counter
-can be set to a different event. Counters are 32-bit and do not support
-an overflow interrupt; they are read every 2 seconds.
-
-PMU UNCORE (perf) driver:
-
-The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
-L3C devices.  Each PMU can be used to count up to 4 events
-simultaneously. The PMUs provide a description of their available events
-and configuration options under sysfs, see
-/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
-
-The driver does not support sampling, therefore "perf record" will not
-work. Per-task perf sessions are also not supported.
-
-Examples:
-
-# perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
-
-# perf stat -a -e \
-uncore_dmc_0/cnt_cycles/,\
-uncore_dmc_0/data_transfers/,\
-uncore_dmc_0/read_txns/,\
-uncore_dmc_0/write_txns/ sleep 1
-
-# perf stat -a -e \
-uncore_l3c_0/read_request/,\
-uncore_l3c_0/read_hit/,\
-uncore_l3c_0/inv_request/,\
-uncore_l3c_0/inv_hit/ sleep 1
diff --git a/Documentation/perf/xgene-pmu.rst b/Documentation/perf/xgene-pmu.rst
new file mode 100644
index 000000000000..644f8ed89152
--- /dev/null
+++ b/Documentation/perf/xgene-pmu.rst
@@ -0,0 +1,49 @@
+================================================
+APM X-Gene SoC Performance Monitoring Unit (PMU)
+================================================
+
+X-Gene SoC PMU consists of various independent system device PMUs such as
+L3 cache(s), I/O bridge(s), memory controller bridge(s) and memory
+controller(s). These PMU devices are loosely architected to follow the
+same model as the PMU for ARM cores. The PMUs share the same top level
+interrupt and status CSR region.
+
+PMU (perf) driver
+-----------------
+
+The xgene-pmu driver registers several perf PMU drivers. Each of the perf
+driver provides description of its available events and configuration options
+in sysfs, see /sys/devices/<l3cX/iobX/mcbX/mcX>/.
+
+The "format" directory describes format of the config (event ID),
+config1 (agent ID) fields of the perf_event_attr structure. The "events"
+directory provides configuration templates for all supported event types that
+can be used with perf tool. For example, "l3c0/bank-fifo-full/" is an
+equivalent of "l3c0/config=0x0b/".
+
+Most of the SoC PMU has a specific list of agent ID used for monitoring
+performance of a specific datapath. For example, agents of a L3 cache can be
+a specific CPU or an I/O bridge. Each PMU has a set of 2 registers capable of
+masking the agents from which the request come from. If the bit with
+the bit number corresponding to the agent is set, the event is counted only if
+it is caused by a request from that agent. Each agent ID bit is inversely mapped
+to a corresponding bit in "config1" field. By default, the event will be
+counted for all agent requests (config1 = 0x0). For all the supported agents of
+each PMU, please refer to APM X-Gene User Manual.
+
+Each perf driver also provides a "cpumask" sysfs attribute, which contains a
+single CPU ID of the processor which will be used to handle all the PMU events.
+
+Example for perf tool use::
+
+ / # perf list | grep -e l3c -e iob -e mcb -e mc
+   l3c0/ackq-full/                                    [Kernel PMU event]
+ <...>
+   mcb1/mcb-csw-stall/                                [Kernel PMU event]
+
+ / # perf stat -a -e l3c0/read-miss/,mcb1/csw-write-request/ sleep 1
+
+ / # perf stat -a -e l3c0/read-miss,config1=0xfffffffffffffffe/ sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/perf/xgene-pmu.txt b/Documentation/perf/xgene-pmu.txt
deleted file mode 100644
index d7cff4454e5b..000000000000
--- a/Documentation/perf/xgene-pmu.txt
+++ /dev/null
@@ -1,48 +0,0 @@
-APM X-Gene SoC Performance Monitoring Unit (PMU)
-================================================
-
-X-Gene SoC PMU consists of various independent system device PMUs such as
-L3 cache(s), I/O bridge(s), memory controller bridge(s) and memory
-controller(s). These PMU devices are loosely architected to follow the
-same model as the PMU for ARM cores. The PMUs share the same top level
-interrupt and status CSR region.
-
-PMU (perf) driver
------------------
-
-The xgene-pmu driver registers several perf PMU drivers. Each of the perf
-driver provides description of its available events and configuration options
-in sysfs, see /sys/devices/<l3cX/iobX/mcbX/mcX>/.
-
-The "format" directory describes format of the config (event ID),
-config1 (agent ID) fields of the perf_event_attr structure. The "events"
-directory provides configuration templates for all supported event types that
-can be used with perf tool. For example, "l3c0/bank-fifo-full/" is an
-equivalent of "l3c0/config=0x0b/".
-
-Most of the SoC PMU has a specific list of agent ID used for monitoring
-performance of a specific datapath. For example, agents of a L3 cache can be
-a specific CPU or an I/O bridge. Each PMU has a set of 2 registers capable of
-masking the agents from which the request come from. If the bit with
-the bit number corresponding to the agent is set, the event is counted only if
-it is caused by a request from that agent. Each agent ID bit is inversely mapped
-to a corresponding bit in "config1" field. By default, the event will be
-counted for all agent requests (config1 = 0x0). For all the supported agents of
-each PMU, please refer to APM X-Gene User Manual.
-
-Each perf driver also provides a "cpumask" sysfs attribute, which contains a
-single CPU ID of the processor which will be used to handle all the PMU events.
-
-Example for perf tool use:
-
- / # perf list | grep -e l3c -e iob -e mcb -e mc
-   l3c0/ackq-full/                                    [Kernel PMU event]
- <...>
-   mcb1/mcb-csw-stall/                                [Kernel PMU event]
-
- / # perf stat -a -e l3c0/read-miss/,mcb1/csw-write-request/ sleep 1
-
- / # perf stat -a -e l3c0/read-miss,config1=0xfffffffffffffffe/ sleep 1
-
-The driver does not support sampling, therefore "perf record" will
-not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/MAINTAINERS b/MAINTAINERS
index ec541c8dc645..93e5ac1de255 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1155,7 +1155,7 @@ APPLIED MICRO (APM) X-GENE SOC PMU
 M:	Khuong Dinh <khuong@os.amperecomputing.com>
 S:	Supported
 F:	drivers/perf/xgene_pmu.c
-F:	Documentation/perf/xgene-pmu.txt
+F:	Documentation/perf/xgene-pmu.rst
 F:	Documentation/devicetree/bindings/perf/apm-xgene-pmu.txt
 
 APTINA CAMERA SENSOR PLL
@@ -7262,7 +7262,7 @@ M:	Shaokun Zhang <zhangshaokun@hisilicon.com>
 W:	http://www.hisilicon.com
 S:	Supported
 F:	drivers/perf/hisilicon
-F:	Documentation/perf/hisi-pmu.txt
+F:	Documentation/perf/hisi-pmu.rst
 
 HISILICON ROCE DRIVER
 M:	Lijun Ou <oulijun@huawei.com>
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 15b8c10c2b2b..90f88ce5192b 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -8,7 +8,7 @@
  * the slices. User space needs to aggregate to individual counts to provide
  * a global picture.
  *
- * See Documentation/perf/qcom_l3_pmu.txt for more details.
+ * See Documentation/perf/qcom_l3_pmu.rst for more details.
  *
  * Copyright (c) 2015-2017, The Linux Foundation. All rights reserved.
  */
-- 
cgit v1.2.3-55-g7522


From 53b9537509654a6267c3f56b4d2e7409b9089686 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 18:35:54 -0300
Subject: docs: sysctl: convert to ReST

Rename the /proc/sys/ documentation files to ReST, using the
README file as a template for an index.rst, adding the other
files there via TOC markup.

Despite being written on different times with different
styles, try to make them somewhat coherent with a similar
look and feel, ensuring that they'll look nice as both
raw text file and as via the html output produced by the
Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 Documentation/admin-guide/mm/index.rst          |    2 +-
 Documentation/admin-guide/mm/ksm.rst            |    2 +-
 Documentation/core-api/printk-formats.rst       |    2 +-
 Documentation/networking/ip-sysctl.txt          |    2 +-
 Documentation/sysctl/README                     |   76 --
 Documentation/sysctl/abi.rst                    |   67 ++
 Documentation/sysctl/abi.txt                    |   54 --
 Documentation/sysctl/fs.rst                     |  384 ++++++++
 Documentation/sysctl/fs.txt                     |  374 -------
 Documentation/sysctl/index.rst                  |  100 ++
 Documentation/sysctl/kernel.rst                 | 1177 +++++++++++++++++++++++
 Documentation/sysctl/kernel.txt                 | 1129 ----------------------
 Documentation/sysctl/net.rst                    |  461 +++++++++
 Documentation/sysctl/net.txt                    |  422 --------
 Documentation/sysctl/sunrpc.rst                 |   25 +
 Documentation/sysctl/sunrpc.txt                 |   20 -
 Documentation/sysctl/user.rst                   |   78 ++
 Documentation/sysctl/user.txt                   |   66 --
 Documentation/sysctl/vm.rst                     |  964 +++++++++++++++++++
 Documentation/sysctl/vm.txt                     |  946 ------------------
 Documentation/vm/unevictable-lru.rst            |    2 +-
 kernel/panic.c                                  |    2 +-
 mm/swap.c                                       |    2 +-
 24 files changed, 3264 insertions(+), 3095 deletions(-)
 delete mode 100644 Documentation/sysctl/README
 create mode 100644 Documentation/sysctl/abi.rst
 delete mode 100644 Documentation/sysctl/abi.txt
 create mode 100644 Documentation/sysctl/fs.rst
 delete mode 100644 Documentation/sysctl/fs.txt
 create mode 100644 Documentation/sysctl/index.rst
 create mode 100644 Documentation/sysctl/kernel.rst
 delete mode 100644 Documentation/sysctl/kernel.txt
 create mode 100644 Documentation/sysctl/net.rst
 delete mode 100644 Documentation/sysctl/net.txt
 create mode 100644 Documentation/sysctl/sunrpc.rst
 delete mode 100644 Documentation/sysctl/sunrpc.txt
 create mode 100644 Documentation/sysctl/user.rst
 delete mode 100644 Documentation/sysctl/user.txt
 create mode 100644 Documentation/sysctl/vm.rst
 delete mode 100644 Documentation/sysctl/vm.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6b2adda1cc03..01123f1de354 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3144,7 +3144,7 @@
 	numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
 			'node', 'default' can be specified
 			This can be set from sysctl after boot.
-			See Documentation/sysctl/vm.txt for details.
+			See Documentation/sysctl/vm.rst for details.
 
 	ohci1394_dma=early	[HW] enable debugging via the ohci1394 driver.
 			See Documentation/debugging-via-ohci1394.txt for more
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index ddf8d8d33377..f5e92f33f96e 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -11,7 +11,7 @@ processes address space and many other cool things.
 Linux memory management is a complex system with many configurable
 settings. Most of these settings are available via ``/proc``
 filesystem and can be quired and adjusted using ``sysctl``. These APIs
-are described in Documentation/sysctl/vm.txt and in `man 5 proc`_.
+are described in Documentation/sysctl/vm.rst and in `man 5 proc`_.
 
 .. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
 
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 9303786632d1..7b2b8767c0b4 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -59,7 +59,7 @@ MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
 
 If a region of memory must be split into at least one new MADV_MERGEABLE
 or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
-will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt).
+will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.rst).
 
 Like other madvise calls, they are intended for use on mapped areas of
 the user address space: they will report ENOMEM if the specified range
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 75d2bbe9813f..1d8e748f909f 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -119,7 +119,7 @@ Kernel Pointers
 
 For printing kernel pointers which should be hidden from unprivileged
 users. The behaviour of %pK depends on the kptr_restrict sysctl - see
-Documentation/sysctl/kernel.txt for more details.
+Documentation/sysctl/kernel.rst for more details.
 
 Unmodified Addresses
 --------------------
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 48c79e78817b..5c3399cde1c4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -2287,7 +2287,7 @@ addr_scope_policy - INTEGER
 
 
 /proc/sys/net/core/*
-	Please see: Documentation/sysctl/net.txt for descriptions of these entries.
+	Please see: Documentation/sysctl/net.rst for descriptions of these entries.
 
 
 /proc/sys/net/unix/*
diff --git a/Documentation/sysctl/README b/Documentation/sysctl/README
deleted file mode 100644
index d5f24ab0ecc3..000000000000
--- a/Documentation/sysctl/README
+++ /dev/null
@@ -1,76 +0,0 @@
-Documentation for /proc/sys/		kernel version 2.2.10
-	(c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-'Why', I hear you ask, 'would anyone even _want_ documentation
-for them sysctl files? If anybody really needs it, it's all in
-the source...'
-
-Well, this documentation is written because some people either
-don't know they need to tweak something, or because they don't
-have the time or knowledge to read the source code.
-
-Furthermore, the programmers who built sysctl have built it to
-be actually used, not just for the fun of programming it :-)
-
-==============================================================
-
-Legal blurb:
-
-As usual, there are two main things to consider:
-1. you get what you pay for
-2. it's free
-
-The consequences are that I won't guarantee the correctness of
-this document, and if you come to me complaining about how you
-screwed up your system because of wrong documentation, I won't
-feel sorry for you. I might even laugh at you...
-
-But of course, if you _do_ manage to screw up your system using
-only the sysctl options used in this file, I'd like to hear of
-it. Not only to have a great laugh, but also to make sure that
-you're the last RTFMing person to screw up.
-
-In short, e-mail your suggestions, corrections and / or horror
-stories to: <riel@nl.linux.org>
-
-Rik van Riel.
-
-==============================================================
-
-Introduction:
-
-Sysctl is a means of configuring certain aspects of the kernel
-at run-time, and the /proc/sys/ directory is there so that you
-don't even need special tools to do it!
-In fact, there are only four things needed to use these config
-facilities:
-- a running Linux system
-- root access
-- common sense (this is especially hard to come by these days)
-- knowledge of what all those values mean
-
-As a quick 'ls /proc/sys' will show, the directory consists of
-several (arch-dependent?) subdirs. Each subdir is mainly about
-one part of the kernel, so you can do configuration on a piece
-by piece basis, or just some 'thematic frobbing'.
-
-The subdirs are about:
-abi/		execution domains & personalities
-debug/		<empty>
-dev/		device specific information (eg dev/cdrom/info)
-fs/		specific filesystems
-		filehandle, inode, dentry and quota tuning
-		binfmt_misc <Documentation/admin-guide/binfmt-misc.rst>
-kernel/		global kernel info / tuning
-		miscellaneous stuff
-net/		networking stuff, for documentation look in:
-		<Documentation/networking/>
-proc/		<empty>
-sunrpc/		SUN Remote Procedure Call (NFS)
-vm/		memory management tuning
-		buffer and cache management
-user/		Per user per user namespace limits
-
-These are the subdirs I have on my system. There might be more
-or other subdirs in another setup. If you see another dir, I'd
-really like to hear about it :-)
diff --git a/Documentation/sysctl/abi.rst b/Documentation/sysctl/abi.rst
new file mode 100644
index 000000000000..599bcde7f0b7
--- /dev/null
+++ b/Documentation/sysctl/abi.rst
@@ -0,0 +1,67 @@
+================================
+Documentation for /proc/sys/abi/
+================================
+
+kernel version 2.6.0.test2
+
+Copyright (c) 2003,  Fabian Frederick <ffrederick@users.sourceforge.net>
+
+For general info: index.rst.
+
+------------------------------------------------------------------------------
+
+This path is binary emulation relevant aka personality types aka abi.
+When a process is executed, it's linked to an exec_domain whose
+personality is defined using values available from /proc/sys/abi.
+You can find further details about abi in include/linux/personality.h.
+
+Here are the files featuring in 2.6 kernel:
+
+- defhandler_coff
+- defhandler_elf
+- defhandler_lcall7
+- defhandler_libcso
+- fake_utsname
+- trace
+
+defhandler_coff
+---------------
+
+defined value:
+	PER_SCOSVR3::
+
+		0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
+
+defhandler_elf
+--------------
+
+defined value:
+	PER_LINUX::
+
+		0
+
+defhandler_lcall7
+-----------------
+
+defined value :
+	PER_SVR4::
+
+		0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
+
+defhandler_libsco
+-----------------
+
+defined value:
+	PER_SVR4::
+
+		0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
+
+fake_utsname
+------------
+
+Unused
+
+trace
+-----
+
+Unused
diff --git a/Documentation/sysctl/abi.txt b/Documentation/sysctl/abi.txt
deleted file mode 100644
index 63f4ebcf652c..000000000000
--- a/Documentation/sysctl/abi.txt
+++ /dev/null
@@ -1,54 +0,0 @@
-Documentation for /proc/sys/abi/* kernel version 2.6.0.test2
-	(c) 2003,  Fabian Frederick <ffrederick@users.sourceforge.net>
-
-For general info : README.
-
-==============================================================
-
-This path is binary emulation relevant aka personality types aka abi.
-When a process is executed, it's linked to an exec_domain whose
-personality is defined using values available from /proc/sys/abi.
-You can find further details about abi in include/linux/personality.h.
-
-Here are the files featuring in 2.6 kernel :
-
-- defhandler_coff
-- defhandler_elf
-- defhandler_lcall7
-- defhandler_libcso
-- fake_utsname
-- trace
-
-===========================================================
-defhandler_coff:
-defined value :
-PER_SCOSVR3
-0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
-
-===========================================================
-defhandler_elf:
-defined value :
-PER_LINUX
-0
-
-===========================================================
-defhandler_lcall7:
-defined value :
-PER_SVR4
-0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
-
-===========================================================
-defhandler_libsco:
-defined value:
-PER_SVR4
-0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
-
-===========================================================
-fake_utsname:
-Unused
-
-===========================================================
-trace:
-Unused
-
-===========================================================
diff --git a/Documentation/sysctl/fs.rst b/Documentation/sysctl/fs.rst
new file mode 100644
index 000000000000..2a45119e3331
--- /dev/null
+++ b/Documentation/sysctl/fs.rst
@@ -0,0 +1,384 @@
+===============================
+Documentation for /proc/sys/fs/
+===============================
+
+kernel version 2.2.10
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
+
+For general info and legal blurb, please look in intro.rst.
+
+------------------------------------------------------------------------------
+
+This file contains documentation for the sysctl files in
+/proc/sys/fs/ and is valid for Linux kernel version 2.2.
+
+The files in this directory can be used to tune and monitor
+miscellaneous and general things in the operation of the Linux
+kernel. Since some of the files _can_ be used to screw up your
+system, it is advisable to read both documentation and source
+before actually making adjustments.
+
+1. /proc/sys/fs
+===============
+
+Currently, these files are in /proc/sys/fs:
+
+- aio-max-nr
+- aio-nr
+- dentry-state
+- dquot-max
+- dquot-nr
+- file-max
+- file-nr
+- inode-max
+- inode-nr
+- inode-state
+- nr_open
+- overflowuid
+- overflowgid
+- pipe-user-pages-hard
+- pipe-user-pages-soft
+- protected_fifos
+- protected_hardlinks
+- protected_regular
+- protected_symlinks
+- suid_dumpable
+- super-max
+- super-nr
+
+
+aio-nr & aio-max-nr
+-------------------
+
+aio-nr is the running total of the number of events specified on the
+io_setup system call for all currently active aio contexts.  If aio-nr
+reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
+raising aio-max-nr does not result in the pre-allocation or re-sizing
+of any kernel data structures.
+
+
+dentry-state
+------------
+
+From linux/include/linux/dcache.h::
+
+  struct dentry_stat_t dentry_stat {
+        int nr_dentry;
+        int nr_unused;
+        int age_limit;         /* age in seconds */
+        int want_pages;        /* pages requested by system */
+        int nr_negative;       /* # of unused negative dentries */
+        int dummy;             /* Reserved for future use */
+  };
+
+Dentries are dynamically allocated and deallocated.
+
+nr_dentry shows the total number of dentries allocated (active
++ unused). nr_unused shows the number of dentries that are not
+actively used, but are saved in the LRU list for future reuse.
+
+Age_limit is the age in seconds after which dcache entries
+can be reclaimed when memory is short and want_pages is
+nonzero when shrink_dcache_pages() has been called and the
+dcache isn't pruned yet.
+
+nr_negative shows the number of unused dentries that are also
+negative dentries which do not map to any files. Instead,
+they help speeding up rejection of non-existing files provided
+by the users.
+
+
+dquot-max & dquot-nr
+--------------------
+
+The file dquot-max shows the maximum number of cached disk
+quota entries.
+
+The file dquot-nr shows the number of allocated disk quota
+entries and the number of free disk quota entries.
+
+If the number of free cached disk quotas is very low and
+you have some awesome number of simultaneous system users,
+you might want to raise the limit.
+
+
+file-max & file-nr
+------------------
+
+The value in file-max denotes the maximum number of file-
+handles that the Linux kernel will allocate. When you get lots
+of error messages about running out of file handles, you might
+want to increase this limit.
+
+Historically,the kernel was able to allocate file handles
+dynamically, but not to free them again. The three values in
+file-nr denote the number of allocated file handles, the number
+of allocated but unused file handles, and the maximum number of
+file handles. Linux 2.6 always reports 0 as the number of free
+file handles -- this is not an error, it just means that the
+number of allocated file handles exactly matches the number of
+used file handles.
+
+Attempts to allocate more file descriptors than file-max are
+reported with printk, look for "VFS: file-max limit <number>
+reached".
+
+
+nr_open
+-------
+
+This denotes the maximum number of file-handles a process can
+allocate. Default value is 1024*1024 (1048576) which should be
+enough for most machines. Actual limit depends on RLIMIT_NOFILE
+resource limit.
+
+
+inode-max, inode-nr & inode-state
+---------------------------------
+
+As with file handles, the kernel allocates the inode structures
+dynamically, but can't free them yet.
+
+The value in inode-max denotes the maximum number of inode
+handlers. This value should be 3-4 times larger than the value
+in file-max, since stdin, stdout and network sockets also
+need an inode struct to handle them. When you regularly run
+out of inodes, you need to increase this value.
+
+The file inode-nr contains the first two items from
+inode-state, so we'll skip to that file...
+
+Inode-state contains three actual numbers and four dummies.
+The actual numbers are, in order of appearance, nr_inodes,
+nr_free_inodes and preshrink.
+
+Nr_inodes stands for the number of inodes the system has
+allocated, this can be slightly more than inode-max because
+Linux allocates them one pageful at a time.
+
+Nr_free_inodes represents the number of free inodes (?) and
+preshrink is nonzero when the nr_inodes > inode-max and the
+system needs to prune the inode list instead of allocating
+more.
+
+
+overflowgid & overflowuid
+-------------------------
+
+Some filesystems only support 16-bit UIDs and GIDs, although in Linux
+UIDs and GIDs are 32 bits. When one of these filesystems is mounted
+with writes enabled, any UID or GID that would exceed 65535 is translated
+to a fixed value before being written to disk.
+
+These sysctls allow you to change the value of the fixed UID and GID.
+The default is 65534.
+
+
+pipe-user-pages-hard
+--------------------
+
+Maximum total number of pages a non-privileged user may allocate for pipes.
+Once this limit is reached, no new pipes may be allocated until usage goes
+below the limit again. When set to 0, no limit is applied, which is the default
+setting.
+
+
+pipe-user-pages-soft
+--------------------
+
+Maximum total number of pages a non-privileged user may allocate for pipes
+before the pipe size gets limited to a single page. Once this limit is reached,
+new pipes will be limited to a single page in size for this user in order to
+limit total memory usage, and trying to increase them using fcntl() will be
+denied until usage goes below the limit again. The default value allows to
+allocate up to 1024 pipes at their default size. When set to 0, no limit is
+applied.
+
+
+protected_fifos
+---------------
+
+The intent of this protection is to avoid unintentional writes to
+an attacker-controlled FIFO, where a program expected to create a regular
+file.
+
+When set to "0", writing to FIFOs is unrestricted.
+
+When set to "1" don't allow O_CREAT open on FIFOs that we don't own
+in world writable sticky directories, unless they are owned by the
+owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+This protection is based on the restrictions in Openwall.
+
+
+protected_hardlinks
+--------------------
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+
+protected_regular
+-----------------
+
+This protection is similar to protected_fifos, but it
+avoids writes to an attacker-controlled regular file, where a program
+expected to create one.
+
+When set to "0", writing to regular files is unrestricted.
+
+When set to "1" don't allow O_CREAT open on regular files that we
+don't own in world writable sticky directories, unless they are
+owned by the owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+
+protected_symlinks
+------------------
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+
+suid_dumpable:
+--------------
+
+This value can be used to query and set the core dump mode for setuid
+or otherwise protected/tainted binaries. The modes are
+
+=   ==========  ===============================================================
+0   (default)	traditional behaviour. Any process which has changed
+		privilege levels or is execute only will not be dumped.
+1   (debug)	all processes dump core when possible. The core dump is
+		owned by the current user and no security is applied. This is
+		intended for system debugging situations only.
+		Ptrace is unchecked.
+		This is insecure as it allows regular users to examine the
+		memory contents of privileged processes.
+2   (suidsafe)	any binary which normally would not be dumped is dumped
+		anyway, but only if the "core_pattern" kernel sysctl is set to
+		either a pipe handler or a fully qualified path. (For more
+		details on this limitation, see CVE-2006-2451.) This mode is
+		appropriate when administrators are attempting to debug
+		problems in a normal environment, and either have a core dump
+		pipe handler that knows to treat privileged core dumps with
+		care, or specific directory defined for catching core dumps.
+		If a core dump happens without a pipe handler or fully
+		qualified path, a message will be emitted to syslog warning
+		about the lack of a correct setting.
+=   ==========  ===============================================================
+
+
+super-max & super-nr
+--------------------
+
+These numbers control the maximum number of superblocks, and
+thus the maximum number of mounted filesystems the kernel
+can have. You only need to increase super-max if you need to
+mount more filesystems than the current value in super-max
+allows you to.
+
+
+aio-nr & aio-max-nr
+-------------------
+
+aio-nr shows the current system-wide number of asynchronous io
+requests.  aio-max-nr allows you to change the maximum value
+aio-nr can grow to.
+
+
+mount-max
+---------
+
+This denotes the maximum number of mounts that may exist
+in a mount namespace.
+
+
+
+2. /proc/sys/fs/binfmt_misc
+===========================
+
+Documentation for the files in /proc/sys/fs/binfmt_misc is
+in Documentation/admin-guide/binfmt-misc.rst.
+
+
+3. /proc/sys/fs/mqueue - POSIX message queues filesystem
+========================================================
+
+
+The "mqueue"  filesystem provides  the necessary kernel features to enable the
+creation of a  user space  library that  implements  the  POSIX message queues
+API (as noted by the  MSG tag in the  POSIX 1003.1-2001 version  of the System
+Interfaces specification.)
+
+The "mqueue" filesystem contains values for determining/setting  the amount of
+resources used by the file system.
+
+/proc/sys/fs/mqueue/queues_max is a read/write  file for  setting/getting  the
+maximum number of message queues allowed on the system.
+
+/proc/sys/fs/mqueue/msg_max  is  a  read/write file  for  setting/getting  the
+maximum number of messages in a queue value.  In fact it is the limiting value
+for another (user) limit which is set in mq_open invocation. This attribute of
+a queue must be less or equal then msg_max.
+
+/proc/sys/fs/mqueue/msgsize_max is  a read/write  file for setting/getting the
+maximum  message size value (it is every  message queue's attribute set during
+its creation).
+
+/proc/sys/fs/mqueue/msg_default is  a read/write  file for setting/getting the
+default number of messages in a queue value if attr parameter of mq_open(2) is
+NULL. If it exceed msg_max, the default value is initialized msg_max.
+
+/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
+the default message size value if attr parameter of mq_open(2) is NULL. If it
+exceed msgsize_max, the default value is initialized msgsize_max.
+
+4. /proc/sys/fs/epoll - Configuration options for the epoll interface
+=====================================================================
+
+This directory contains configuration options for the epoll(7) interface.
+
+max_user_watches
+----------------
+
+Every epoll file descriptor can store a number of files to be monitored
+for event readiness. Each one of these monitored files constitutes a "watch".
+This configuration option sets the maximum number of "watches" that are
+allowed for each user.
+Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
+on a 64bit one.
+The current default value for  max_user_watches  is the 1/32 of the available
+low memory, divided for the "watch" cost in bytes.
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
deleted file mode 100644
index ebc679bcb2dc..000000000000
--- a/Documentation/sysctl/fs.txt
+++ /dev/null
@@ -1,374 +0,0 @@
-Documentation for /proc/sys/fs/*	kernel version 2.2.10
-	(c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-	(c) 2009,        Shen Feng<shen@cn.fujitsu.com>
-
-For general info and legal blurb, please look in README.
-
-==============================================================
-
-This file contains documentation for the sysctl files in
-/proc/sys/fs/ and is valid for Linux kernel version 2.2.
-
-The files in this directory can be used to tune and monitor
-miscellaneous and general things in the operation of the Linux
-kernel. Since some of the files _can_ be used to screw up your
-system, it is advisable to read both documentation and source
-before actually making adjustments.
-
-1. /proc/sys/fs
-----------------------------------------------------------
-
-Currently, these files are in /proc/sys/fs:
-- aio-max-nr
-- aio-nr
-- dentry-state
-- dquot-max
-- dquot-nr
-- file-max
-- file-nr
-- inode-max
-- inode-nr
-- inode-state
-- nr_open
-- overflowuid
-- overflowgid
-- pipe-user-pages-hard
-- pipe-user-pages-soft
-- protected_fifos
-- protected_hardlinks
-- protected_regular
-- protected_symlinks
-- suid_dumpable
-- super-max
-- super-nr
-
-==============================================================
-
-aio-nr & aio-max-nr:
-
-aio-nr is the running total of the number of events specified on the
-io_setup system call for all currently active aio contexts.  If aio-nr
-reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
-raising aio-max-nr does not result in the pre-allocation or re-sizing
-of any kernel data structures.
-
-==============================================================
-
-dentry-state:
-
-From linux/include/linux/dcache.h:
---------------------------------------------------------------
-struct dentry_stat_t dentry_stat {
-        int nr_dentry;
-        int nr_unused;
-        int age_limit;         /* age in seconds */
-        int want_pages;        /* pages requested by system */
-        int nr_negative;       /* # of unused negative dentries */
-        int dummy;             /* Reserved for future use */
-};
---------------------------------------------------------------
-
-Dentries are dynamically allocated and deallocated.
-
-nr_dentry shows the total number of dentries allocated (active
-+ unused). nr_unused shows the number of dentries that are not
-actively used, but are saved in the LRU list for future reuse.
-
-Age_limit is the age in seconds after which dcache entries
-can be reclaimed when memory is short and want_pages is
-nonzero when shrink_dcache_pages() has been called and the
-dcache isn't pruned yet.
-
-nr_negative shows the number of unused dentries that are also
-negative dentries which do not map to any files. Instead,
-they help speeding up rejection of non-existing files provided
-by the users.
-
-==============================================================
-
-dquot-max & dquot-nr:
-
-The file dquot-max shows the maximum number of cached disk
-quota entries.
-
-The file dquot-nr shows the number of allocated disk quota
-entries and the number of free disk quota entries.
-
-If the number of free cached disk quotas is very low and
-you have some awesome number of simultaneous system users,
-you might want to raise the limit.
-
-==============================================================
-
-file-max & file-nr:
-
-The value in file-max denotes the maximum number of file-
-handles that the Linux kernel will allocate. When you get lots
-of error messages about running out of file handles, you might
-want to increase this limit.
-
-Historically,the kernel was able to allocate file handles
-dynamically, but not to free them again. The three values in
-file-nr denote the number of allocated file handles, the number
-of allocated but unused file handles, and the maximum number of
-file handles. Linux 2.6 always reports 0 as the number of free
-file handles -- this is not an error, it just means that the
-number of allocated file handles exactly matches the number of
-used file handles.
-
-Attempts to allocate more file descriptors than file-max are
-reported with printk, look for "VFS: file-max limit <number>
-reached".
-==============================================================
-
-nr_open:
-
-This denotes the maximum number of file-handles a process can
-allocate. Default value is 1024*1024 (1048576) which should be
-enough for most machines. Actual limit depends on RLIMIT_NOFILE
-resource limit.
-
-==============================================================
-
-inode-max, inode-nr & inode-state:
-
-As with file handles, the kernel allocates the inode structures
-dynamically, but can't free them yet.
-
-The value in inode-max denotes the maximum number of inode
-handlers. This value should be 3-4 times larger than the value
-in file-max, since stdin, stdout and network sockets also
-need an inode struct to handle them. When you regularly run
-out of inodes, you need to increase this value.
-
-The file inode-nr contains the first two items from
-inode-state, so we'll skip to that file...
-
-Inode-state contains three actual numbers and four dummies.
-The actual numbers are, in order of appearance, nr_inodes,
-nr_free_inodes and preshrink.
-
-Nr_inodes stands for the number of inodes the system has
-allocated, this can be slightly more than inode-max because
-Linux allocates them one pageful at a time.
-
-Nr_free_inodes represents the number of free inodes (?) and
-preshrink is nonzero when the nr_inodes > inode-max and the
-system needs to prune the inode list instead of allocating
-more.
-
-==============================================================
-
-overflowgid & overflowuid:
-
-Some filesystems only support 16-bit UIDs and GIDs, although in Linux
-UIDs and GIDs are 32 bits. When one of these filesystems is mounted
-with writes enabled, any UID or GID that would exceed 65535 is translated
-to a fixed value before being written to disk.
-
-These sysctls allow you to change the value of the fixed UID and GID.
-The default is 65534.
-
-==============================================================
-
-pipe-user-pages-hard:
-
-Maximum total number of pages a non-privileged user may allocate for pipes.
-Once this limit is reached, no new pipes may be allocated until usage goes
-below the limit again. When set to 0, no limit is applied, which is the default
-setting.
-
-==============================================================
-
-pipe-user-pages-soft:
-
-Maximum total number of pages a non-privileged user may allocate for pipes
-before the pipe size gets limited to a single page. Once this limit is reached,
-new pipes will be limited to a single page in size for this user in order to
-limit total memory usage, and trying to increase them using fcntl() will be
-denied until usage goes below the limit again. The default value allows to
-allocate up to 1024 pipes at their default size. When set to 0, no limit is
-applied.
-
-==============================================================
-
-protected_fifos:
-
-The intent of this protection is to avoid unintentional writes to
-an attacker-controlled FIFO, where a program expected to create a regular
-file.
-
-When set to "0", writing to FIFOs is unrestricted.
-
-When set to "1" don't allow O_CREAT open on FIFOs that we don't own
-in world writable sticky directories, unless they are owned by the
-owner of the directory.
-
-When set to "2" it also applies to group writable sticky directories.
-
-This protection is based on the restrictions in Openwall.
-
-==============================================================
-
-protected_hardlinks:
-
-A long-standing class of security issues is the hardlink-based
-time-of-check-time-of-use race, most commonly seen in world-writable
-directories like /tmp. The common method of exploitation of this flaw
-is to cross privilege boundaries when following a given hardlink (i.e. a
-root process follows a hardlink created by another user). Additionally,
-on systems without separated partitions, this stops unauthorized users
-from "pinning" vulnerable setuid/setgid files against being upgraded by
-the administrator, or linking to special files.
-
-When set to "0", hardlink creation behavior is unrestricted.
-
-When set to "1" hardlinks cannot be created by users if they do not
-already own the source file, or do not have read/write access to it.
-
-This protection is based on the restrictions in Openwall and grsecurity.
-
-==============================================================
-
-protected_regular:
-
-This protection is similar to protected_fifos, but it
-avoids writes to an attacker-controlled regular file, where a program
-expected to create one.
-
-When set to "0", writing to regular files is unrestricted.
-
-When set to "1" don't allow O_CREAT open on regular files that we
-don't own in world writable sticky directories, unless they are
-owned by the owner of the directory.
-
-When set to "2" it also applies to group writable sticky directories.
-
-==============================================================
-
-protected_symlinks:
-
-A long-standing class of security issues is the symlink-based
-time-of-check-time-of-use race, most commonly seen in world-writable
-directories like /tmp. The common method of exploitation of this flaw
-is to cross privilege boundaries when following a given symlink (i.e. a
-root process follows a symlink belonging to another user). For a likely
-incomplete list of hundreds of examples across the years, please see:
-http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
-
-When set to "0", symlink following behavior is unrestricted.
-
-When set to "1" symlinks are permitted to be followed only when outside
-a sticky world-writable directory, or when the uid of the symlink and
-follower match, or when the directory owner matches the symlink's owner.
-
-This protection is based on the restrictions in Openwall and grsecurity.
-
-==============================================================
-
-suid_dumpable:
-
-This value can be used to query and set the core dump mode for setuid
-or otherwise protected/tainted binaries. The modes are
-
-0 - (default) - traditional behaviour. Any process which has changed
-	privilege levels or is execute only will not be dumped.
-1 - (debug) - all processes dump core when possible. The core dump is
-	owned by the current user and no security is applied. This is
-	intended for system debugging situations only. Ptrace is unchecked.
-	This is insecure as it allows regular users to examine the memory
-	contents of privileged processes.
-2 - (suidsafe) - any binary which normally would not be dumped is dumped
-	anyway, but only if the "core_pattern" kernel sysctl is set to
-	either a pipe handler or a fully qualified path. (For more details
-	on this limitation, see CVE-2006-2451.) This mode is appropriate
-	when administrators are attempting to debug problems in a normal
-	environment, and either have a core dump pipe handler that knows
-	to treat privileged core dumps with care, or specific directory
-	defined for catching core dumps. If a core dump happens without
-	a pipe handler or fully qualifid path, a message will be emitted
-	to syslog warning about the lack of a correct setting.
-
-==============================================================
-
-super-max & super-nr:
-
-These numbers control the maximum number of superblocks, and
-thus the maximum number of mounted filesystems the kernel
-can have. You only need to increase super-max if you need to
-mount more filesystems than the current value in super-max
-allows you to.
-
-==============================================================
-
-aio-nr & aio-max-nr:
-
-aio-nr shows the current system-wide number of asynchronous io
-requests.  aio-max-nr allows you to change the maximum value
-aio-nr can grow to.
-
-==============================================================
-
-mount-max:
-
-This denotes the maximum number of mounts that may exist
-in a mount namespace.
-
-==============================================================
-
-
-2. /proc/sys/fs/binfmt_misc
-----------------------------------------------------------
-
-Documentation for the files in /proc/sys/fs/binfmt_misc is
-in Documentation/admin-guide/binfmt-misc.rst.
-
-
-3. /proc/sys/fs/mqueue - POSIX message queues filesystem
-----------------------------------------------------------
-
-The "mqueue"  filesystem provides  the necessary kernel features to enable the
-creation of a  user space  library that  implements  the  POSIX message queues
-API (as noted by the  MSG tag in the  POSIX 1003.1-2001 version  of the System
-Interfaces specification.)
-
-The "mqueue" filesystem contains values for determining/setting  the amount of
-resources used by the file system.
-
-/proc/sys/fs/mqueue/queues_max is a read/write  file for  setting/getting  the
-maximum number of message queues allowed on the system.
-
-/proc/sys/fs/mqueue/msg_max  is  a  read/write file  for  setting/getting  the
-maximum number of messages in a queue value.  In fact it is the limiting value
-for another (user) limit which is set in mq_open invocation. This attribute of
-a queue must be less or equal then msg_max.
-
-/proc/sys/fs/mqueue/msgsize_max is  a read/write  file for setting/getting the
-maximum  message size value (it is every  message queue's attribute set during
-its creation).
-
-/proc/sys/fs/mqueue/msg_default is  a read/write  file for setting/getting the
-default number of messages in a queue value if attr parameter of mq_open(2) is
-NULL. If it exceed msg_max, the default value is initialized msg_max.
-
-/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
-the default message size value if attr parameter of mq_open(2) is NULL. If it
-exceed msgsize_max, the default value is initialized msgsize_max.
-
-4. /proc/sys/fs/epoll - Configuration options for the epoll interface
---------------------------------------------------------
-
-This directory contains configuration options for the epoll(7) interface.
-
-max_user_watches
-----------------
-
-Every epoll file descriptor can store a number of files to be monitored
-for event readiness. Each one of these monitored files constitutes a "watch".
-This configuration option sets the maximum number of "watches" that are
-allowed for each user.
-Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
-on a 64bit one.
-The current default value for  max_user_watches  is the 1/32 of the available
-low memory, divided for the "watch" cost in bytes.
-
diff --git a/Documentation/sysctl/index.rst b/Documentation/sysctl/index.rst
new file mode 100644
index 000000000000..efbcde8c1c9c
--- /dev/null
+++ b/Documentation/sysctl/index.rst
@@ -0,0 +1,100 @@
+:orphan:
+
+===========================
+Documentation for /proc/sys
+===========================
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+------------------------------------------------------------------------------
+
+'Why', I hear you ask, 'would anyone even _want_ documentation
+for them sysctl files? If anybody really needs it, it's all in
+the source...'
+
+Well, this documentation is written because some people either
+don't know they need to tweak something, or because they don't
+have the time or knowledge to read the source code.
+
+Furthermore, the programmers who built sysctl have built it to
+be actually used, not just for the fun of programming it :-)
+
+------------------------------------------------------------------------------
+
+Legal blurb:
+
+As usual, there are two main things to consider:
+
+1. you get what you pay for
+2. it's free
+
+The consequences are that I won't guarantee the correctness of
+this document, and if you come to me complaining about how you
+screwed up your system because of wrong documentation, I won't
+feel sorry for you. I might even laugh at you...
+
+But of course, if you _do_ manage to screw up your system using
+only the sysctl options used in this file, I'd like to hear of
+it. Not only to have a great laugh, but also to make sure that
+you're the last RTFMing person to screw up.
+
+In short, e-mail your suggestions, corrections and / or horror
+stories to: <riel@nl.linux.org>
+
+Rik van Riel.
+
+--------------------------------------------------------------
+
+Introduction
+============
+
+Sysctl is a means of configuring certain aspects of the kernel
+at run-time, and the /proc/sys/ directory is there so that you
+don't even need special tools to do it!
+In fact, there are only four things needed to use these config
+facilities:
+
+- a running Linux system
+- root access
+- common sense (this is especially hard to come by these days)
+- knowledge of what all those values mean
+
+As a quick 'ls /proc/sys' will show, the directory consists of
+several (arch-dependent?) subdirs. Each subdir is mainly about
+one part of the kernel, so you can do configuration on a piece
+by piece basis, or just some 'thematic frobbing'.
+
+This documentation is about:
+
+=============== ===============================================================
+abi/		execution domains & personalities
+debug/		<empty>
+dev/		device specific information (eg dev/cdrom/info)
+fs/		specific filesystems
+		filehandle, inode, dentry and quota tuning
+		binfmt_misc <Documentation/admin-guide/binfmt-misc.rst>
+kernel/		global kernel info / tuning
+		miscellaneous stuff
+net/		networking stuff, for documentation look in:
+		<Documentation/networking/>
+proc/		<empty>
+sunrpc/		SUN Remote Procedure Call (NFS)
+vm/		memory management tuning
+		buffer and cache management
+user/		Per user per user namespace limits
+=============== ===============================================================
+
+These are the subdirs I have on my system. There might be more
+or other subdirs in another setup. If you see another dir, I'd
+really like to hear about it :-)
+
+.. toctree::
+   :maxdepth: 1
+
+   abi
+   fs
+   kernel
+   net
+   sunrpc
+   user
+   vm
diff --git a/Documentation/sysctl/kernel.rst b/Documentation/sysctl/kernel.rst
new file mode 100644
index 000000000000..a0c1d4ce403a
--- /dev/null
+++ b/Documentation/sysctl/kernel.rst
@@ -0,0 +1,1177 @@
+===================================
+Documentation for /proc/sys/kernel/
+===================================
+
+kernel version 2.2.10
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains documentation for the sysctl files in
+/proc/sys/kernel/ and is valid for Linux kernel version 2.2.
+
+The files in this directory can be used to tune and monitor
+miscellaneous and general things in the operation of the Linux
+kernel. Since some of the files _can_ be used to screw up your
+system, it is advisable to read both documentation and source
+before actually making adjustments.
+
+Currently, these files might (depending on your configuration)
+show up in /proc/sys/kernel:
+
+- acct
+- acpi_video_flags
+- auto_msgmni
+- bootloader_type	     [ X86 only ]
+- bootloader_version	     [ X86 only ]
+- cap_last_cap
+- core_pattern
+- core_pipe_limit
+- core_uses_pid
+- ctrl-alt-del
+- dmesg_restrict
+- domainname
+- hostname
+- hotplug
+- hardlockup_all_cpu_backtrace
+- hardlockup_panic
+- hung_task_panic
+- hung_task_check_count
+- hung_task_timeout_secs
+- hung_task_check_interval_secs
+- hung_task_warnings
+- hyperv_record_panic_msg
+- kexec_load_disabled
+- kptr_restrict
+- l2cr                        [ PPC only ]
+- modprobe                    ==> Documentation/debugging-modules.txt
+- modules_disabled
+- msg_next_id		      [ sysv ipc ]
+- msgmax
+- msgmnb
+- msgmni
+- nmi_watchdog
+- osrelease
+- ostype
+- overflowgid
+- overflowuid
+- panic
+- panic_on_oops
+- panic_on_stackoverflow
+- panic_on_unrecovered_nmi
+- panic_on_warn
+- panic_print
+- panic_on_rcu_stall
+- perf_cpu_time_max_percent
+- perf_event_paranoid
+- perf_event_max_stack
+- perf_event_mlock_kb
+- perf_event_max_contexts_per_stack
+- pid_max
+- powersave-nap               [ PPC only ]
+- printk
+- printk_delay
+- printk_ratelimit
+- printk_ratelimit_burst
+- pty                         ==> Documentation/filesystems/devpts.txt
+- randomize_va_space
+- real-root-dev               ==> Documentation/admin-guide/initrd.rst
+- reboot-cmd                  [ SPARC only ]
+- rtsig-max
+- rtsig-nr
+- sched_energy_aware
+- seccomp/                    ==> Documentation/userspace-api/seccomp_filter.rst
+- sem
+- sem_next_id		      [ sysv ipc ]
+- sg-big-buff                 [ generic SCSI device (sg) ]
+- shm_next_id		      [ sysv ipc ]
+- shm_rmid_forced
+- shmall
+- shmmax                      [ sysv ipc ]
+- shmmni
+- softlockup_all_cpu_backtrace
+- soft_watchdog
+- stack_erasing
+- stop-a                      [ SPARC only ]
+- sysrq                       ==> Documentation/admin-guide/sysrq.rst
+- sysctl_writes_strict
+- tainted                     ==> Documentation/admin-guide/tainted-kernels.rst
+- threads-max
+- unknown_nmi_panic
+- watchdog
+- watchdog_thresh
+- version
+
+
+acct:
+=====
+
+highwater lowwater frequency
+
+If BSD-style process accounting is enabled these values control
+its behaviour. If free space on filesystem where the log lives
+goes below <lowwater>% accounting suspends. If free space gets
+above <highwater>% accounting resumes. <Frequency> determines
+how often do we check the amount of free space (value is in
+seconds). Default:
+4 2 30
+That is, suspend accounting if there left <= 2% free; resume it
+if we got >=4%; consider information about amount of free space
+valid for 30 seconds.
+
+
+acpi_video_flags:
+=================
+
+flags
+
+See Doc*/kernel/power/video.txt, it allows mode of video boot to be
+set during run time.
+
+
+auto_msgmni:
+============
+
+This variable has no effect and may be removed in future kernel
+releases. Reading it always returns 0.
+Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
+upon memory add/remove or upon ipc namespace creation/removal.
+Echoing "1" into this file enabled msgmni automatic recomputing.
+Echoing "0" turned it off. auto_msgmni default value was 1.
+
+
+bootloader_type:
+================
+
+x86 bootloader identification
+
+This gives the bootloader type number as indicated by the bootloader,
+shifted left by 4, and OR'd with the low four bits of the bootloader
+version.  The reason for this encoding is that this used to match the
+type_of_loader field in the kernel header; the encoding is kept for
+backwards compatibility.  That is, if the full bootloader type number
+is 0x15 and the full version number is 0x234, this file will contain
+the value 340 = 0x154.
+
+See the type_of_loader and ext_loader_type fields in
+Documentation/x86/boot.rst for additional information.
+
+
+bootloader_version:
+===================
+
+x86 bootloader version
+
+The complete bootloader version number.  In the example above, this
+file will contain the value 564 = 0x234.
+
+See the type_of_loader and ext_loader_ver fields in
+Documentation/x86/boot.rst for additional information.
+
+
+cap_last_cap:
+=============
+
+Highest valid capability of the running kernel.  Exports
+CAP_LAST_CAP from the kernel.
+
+
+core_pattern:
+=============
+
+core_pattern is used to specify a core dumpfile pattern name.
+
+* max length 127 characters; default value is "core"
+* core_pattern is used as a pattern template for the output filename;
+  certain string patterns (beginning with '%') are substituted with
+  their actual values.
+* backward compatibility with core_uses_pid:
+
+	If core_pattern does not include "%p" (default does not)
+	and core_uses_pid is set, then .PID will be appended to
+	the filename.
+
+* corename format specifiers::
+
+	%<NUL>	'%' is dropped
+	%%	output one '%'
+	%p	pid
+	%P	global pid (init PID namespace)
+	%i	tid
+	%I	global tid (init PID namespace)
+	%u	uid (in initial user namespace)
+	%g	gid (in initial user namespace)
+	%d	dump mode, matches PR_SET_DUMPABLE and
+		/proc/sys/fs/suid_dumpable
+	%s	signal number
+	%t	UNIX time of dump
+	%h	hostname
+	%e	executable filename (may be shortened)
+	%E	executable path
+	%<OTHER> both are dropped
+
+* If the first character of the pattern is a '|', the kernel will treat
+  the rest of the pattern as a command to run.  The core dump will be
+  written to the standard input of that program instead of to a file.
+
+
+core_pipe_limit:
+================
+
+This sysctl is only applicable when core_pattern is configured to pipe
+core files to a user space helper (when the first character of
+core_pattern is a '|', see above).  When collecting cores via a pipe
+to an application, it is occasionally useful for the collecting
+application to gather data about the crashing process from its
+/proc/pid directory.  In order to do this safely, the kernel must wait
+for the collecting process to exit, so as not to remove the crashing
+processes proc files prematurely.  This in turn creates the
+possibility that a misbehaving userspace collecting process can block
+the reaping of a crashed process simply by never exiting.  This sysctl
+defends against that.  It defines how many concurrent crashing
+processes may be piped to user space applications in parallel.  If
+this value is exceeded, then those crashing processes above that value
+are noted via the kernel log and their cores are skipped.  0 is a
+special value, indicating that unlimited processes may be captured in
+parallel, but that no waiting will take place (i.e. the collecting
+process is not guaranteed access to /proc/<crashing pid>/).  This
+value defaults to 0.
+
+
+core_uses_pid:
+==============
+
+The default coredump filename is "core".  By setting
+core_uses_pid to 1, the coredump filename becomes core.PID.
+If core_pattern does not include "%p" (default does not)
+and core_uses_pid is set, then .PID will be appended to
+the filename.
+
+
+ctrl-alt-del:
+=============
+
+When the value in this file is 0, ctrl-alt-del is trapped and
+sent to the init(1) program to handle a graceful restart.
+When, however, the value is > 0, Linux's reaction to a Vulcan
+Nerve Pinch (tm) will be an immediate reboot, without even
+syncing its dirty buffers.
+
+Note:
+  when a program (like dosemu) has the keyboard in 'raw'
+  mode, the ctrl-alt-del is intercepted by the program before it
+  ever reaches the kernel tty layer, and it's up to the program
+  to decide what to do with it.
+
+
+dmesg_restrict:
+===============
+
+This toggle indicates whether unprivileged users are prevented
+from using dmesg(8) to view messages from the kernel's log buffer.
+When dmesg_restrict is set to (0) there are no restrictions. When
+dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
+dmesg(8).
+
+The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
+default value of dmesg_restrict.
+
+
+domainname & hostname:
+======================
+
+These files can be used to set the NIS/YP domainname and the
+hostname of your box in exactly the same way as the commands
+domainname and hostname, i.e.::
+
+	# echo "darkstar" > /proc/sys/kernel/hostname
+	# echo "mydomain" > /proc/sys/kernel/domainname
+
+has the same effect as::
+
+	# hostname "darkstar"
+	# domainname "mydomain"
+
+Note, however, that the classic darkstar.frop.org has the
+hostname "darkstar" and DNS (Internet Domain Name Server)
+domainname "frop.org", not to be confused with the NIS (Network
+Information Service) or YP (Yellow Pages) domainname. These two
+domain names are in general different. For a detailed discussion
+see the hostname(1) man page.
+
+
+hardlockup_all_cpu_backtrace:
+=============================
+
+This value controls the hard lockup detector behavior when a hard
+lockup condition is detected as to whether or not to gather further
+debug information. If enabled, arch-specific all-CPU stack dumping
+will be initiated.
+
+0: do nothing. This is the default behavior.
+
+1: on detection capture more debug information.
+
+
+hardlockup_panic:
+=================
+
+This parameter can be used to control whether the kernel panics
+when a hard lockup is detected.
+
+   0 - don't panic on hard lockup
+   1 - panic on hard lockup
+
+See Documentation/lockup-watchdogs.txt for more information.  This can
+also be set using the nmi_watchdog kernel parameter.
+
+
+hotplug:
+========
+
+Path for the hotplug policy agent.
+Default value is "/sbin/hotplug".
+
+
+hung_task_panic:
+================
+
+Controls the kernel's behavior when a hung task is detected.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0: continue operation. This is the default behavior.
+
+1: panic immediately.
+
+
+hung_task_check_count:
+======================
+
+The upper bound on the number of tasks that are checked.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+
+hung_task_timeout_secs:
+=======================
+
+When a task in D state did not get scheduled
+for more than this value report a warning.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0: means infinite timeout - no checking done.
+
+Possible values to set are in range {0..LONG_MAX/HZ}.
+
+
+hung_task_check_interval_secs:
+==============================
+
+Hung task check interval. If hung task checking is enabled
+(see hung_task_timeout_secs), the check is done every
+hung_task_check_interval_secs seconds.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0 (default): means use hung_task_timeout_secs as checking interval.
+Possible values to set are in range {0..LONG_MAX/HZ}.
+
+
+hung_task_warnings:
+===================
+
+The maximum number of warnings to report. During a check interval
+if a hung task is detected, this value is decreased by 1.
+When this value reaches 0, no more warnings will be reported.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+-1: report an infinite number of warnings.
+
+
+hyperv_record_panic_msg:
+========================
+
+Controls whether the panic kmsg data should be reported to Hyper-V.
+
+0: do not report panic kmsg data.
+
+1: report the panic kmsg data. This is the default behavior.
+
+
+kexec_load_disabled:
+====================
+
+A toggle indicating if the kexec_load syscall has been disabled. This
+value defaults to 0 (false: kexec_load enabled), but can be set to 1
+(true: kexec_load disabled). Once true, kexec can no longer be used, and
+the toggle cannot be set back to false. This allows a kexec image to be
+loaded before disabling the syscall, allowing a system to set up (and
+later use) an image without it being altered. Generally used together
+with the "modules_disabled" sysctl.
+
+
+kptr_restrict:
+==============
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces.
+
+When kptr_restrict is set to 0 (the default) the address is hashed before
+printing. (This is the equivalent to %p.)
+
+When kptr_restrict is set to (1), kernel pointers printed using the %pK
+format specifier will be replaced with 0's unless the user has CAP_SYSLOG
+and effective user and group ids are equal to the real ids. This is
+because %pK checks are done at read() time rather than open() time, so
+if permissions are elevated between the open() and the read() (e.g via
+a setuid binary) then %pK will not leak kernel pointers to unprivileged
+users. Note, this is a temporary solution only. The correct long-term
+solution is to do the permission checks at open() time. Consider removing
+world read permissions from files that use %pK, and using dmesg_restrict
+to protect against uses of %pK in dmesg(8) if leaking kernel pointer
+values to unprivileged users is a concern.
+
+When kptr_restrict is set to (2), kernel pointers printed using
+%pK will be replaced with 0's regardless of privileges.
+
+
+l2cr: (PPC only)
+================
+
+This flag controls the L2 cache of G3 processor boards. If
+0, the cache is disabled. Enabled if nonzero.
+
+
+modules_disabled:
+=================
+
+A toggle value indicating if modules are allowed to be loaded
+in an otherwise modular kernel.  This toggle defaults to off
+(0), but can be set true (1).  Once true, modules can be
+neither loaded nor unloaded, and the toggle cannot be set back
+to false.  Generally used with the "kexec_load_disabled" toggle.
+
+
+msg_next_id, sem_next_id, and shm_next_id:
+==========================================
+
+These three toggles allows to specify desired id for next allocated IPC
+object: message, semaphore or shared memory respectively.
+
+By default they are equal to -1, which means generic allocation logic.
+Possible values to set are in range {0..INT_MAX}.
+
+Notes:
+  1) kernel doesn't guarantee, that new object will have desired id. So,
+     it's up to userspace, how to handle an object with "wrong" id.
+  2) Toggle with non-default value will be set back to -1 by kernel after
+     successful IPC object allocation. If an IPC object allocation syscall
+     fails, it is undefined if the value remains unmodified or is reset to -1.
+
+
+nmi_watchdog:
+=============
+
+This parameter can be used to control the NMI watchdog
+(i.e. the hard lockup detector) on x86 systems.
+
+0 - disable the hard lockup detector
+
+1 - enable the hard lockup detector
+
+The hard lockup detector monitors each CPU for its ability to respond to
+timer interrupts. The mechanism utilizes CPU performance counter registers
+that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
+while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
+
+The NMI watchdog is disabled by default if the kernel is running as a guest
+in a KVM virtual machine. This default can be overridden by adding::
+
+   nmi_watchdog=1
+
+to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst).
+
+
+numa_balancing:
+===============
+
+Enables/disables automatic page fault based NUMA memory
+balancing. Memory is moved automatically to nodes
+that access it often.
+
+Enables/disables automatic NUMA memory balancing. On NUMA machines, there
+is a performance penalty if remote memory is accessed by a CPU. When this
+feature is enabled the kernel samples what task thread is accessing memory
+by periodically unmapping pages and later trapping a page fault. At the
+time of the page fault, it is determined if the data being accessed should
+be migrated to a local memory node.
+
+The unmapping of pages and trapping faults incur additional overhead that
+ideally is offset by improved memory locality but there is no universal
+guarantee. If the target workload is already bound to NUMA nodes then this
+feature should be disabled. Otherwise, if the system overhead from the
+feature is too high then the rate the kernel samples for NUMA hinting
+faults may be controlled by the numa_balancing_scan_period_min_ms,
+numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
+numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls.
+
+numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
+===============================================================================================================================
+
+
+Automatic NUMA balancing scans tasks address space and unmaps pages to
+detect if pages are properly placed or if the data should be migrated to a
+memory node local to where the task is running.  Every "scan delay" the task
+scans the next "scan size" number of pages in its address space. When the
+end of the address space is reached the scanner restarts from the beginning.
+
+In combination, the "scan delay" and "scan size" determine the scan rate.
+When "scan delay" decreases, the scan rate increases.  The scan delay and
+hence the scan rate of every task is adaptive and depends on historical
+behaviour. If pages are properly placed then the scan delay increases,
+otherwise the scan delay decreases.  The "scan size" is not adaptive but
+the higher the "scan size", the higher the scan rate.
+
+Higher scan rates incur higher system overhead as page faults must be
+trapped and potentially data must be migrated. However, the higher the scan
+rate, the more quickly a tasks memory is migrated to a local node if the
+workload pattern changes and minimises performance impact due to remote
+memory accesses. These sysctls control the thresholds for scan delays and
+the number of pages scanned.
+
+numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
+scan a tasks virtual memory. It effectively controls the maximum scanning
+rate for each task.
+
+numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
+when it initially forks.
+
+numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
+scan a tasks virtual memory. It effectively controls the minimum scanning
+rate for each task.
+
+numa_balancing_scan_size_mb is how many megabytes worth of pages are
+scanned for a given scan.
+
+
+osrelease, ostype & version:
+============================
+
+::
+
+  # cat osrelease
+  2.1.88
+  # cat ostype
+  Linux
+  # cat version
+  #5 Wed Feb 25 21:49:24 MET 1998
+
+The files osrelease and ostype should be clear enough. Version
+needs a little more clarification however. The '#5' means that
+this is the fifth kernel built from this source base and the
+date behind it indicates the time the kernel was built.
+The only way to tune these values is to rebuild the kernel :-)
+
+
+overflowgid & overflowuid:
+==========================
+
+if your architecture did not always support 32-bit UIDs (i.e. arm,
+i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
+applications that use the old 16-bit UID/GID system calls, if the
+actual UID or GID would exceed 65535.
+
+These sysctls allow you to change the value of the fixed UID and GID.
+The default is 65534.
+
+
+panic:
+======
+
+The value in this file represents the number of seconds the kernel
+waits before rebooting on a panic. When you use the software watchdog,
+the recommended setting is 60.
+
+
+panic_on_io_nmi:
+================
+
+Controls the kernel's behavior when a CPU receives an NMI caused by
+an IO error.
+
+0: try to continue operation (default)
+
+1: panic immediately. The IO error triggered an NMI. This indicates a
+   serious system condition which could result in IO data corruption.
+   Rather than continuing, panicking might be a better choice. Some
+   servers issue this sort of NMI when the dump button is pushed,
+   and you can use this option to take a crash dump.
+
+
+panic_on_oops:
+==============
+
+Controls the kernel's behaviour when an oops or BUG is encountered.
+
+0: try to continue operation
+
+1: panic immediately.  If the `panic` sysctl is also non-zero then the
+   machine will be rebooted.
+
+
+panic_on_stackoverflow:
+=======================
+
+Controls the kernel's behavior when detecting the overflows of
+kernel, IRQ and exception stacks except a user stack.
+This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
+
+0: try to continue operation.
+
+1: panic immediately.
+
+
+panic_on_unrecovered_nmi:
+=========================
+
+The default Linux behaviour on an NMI of either memory or unknown is
+to continue operation. For many environments such as scientific
+computing it is preferable that the box is taken out and the error
+dealt with than an uncorrected parity/ECC error get propagated.
+
+A small number of systems do generate NMI's for bizarre random reasons
+such as power management so the default is off. That sysctl works like
+the existing panic controls already in that directory.
+
+
+panic_on_warn:
+==============
+
+Calls panic() in the WARN() path when set to 1.  This is useful to avoid
+a kernel rebuild when attempting to kdump at the location of a WARN().
+
+0: only WARN(), default behaviour.
+
+1: call panic() after printing out WARN() location.
+
+
+panic_print:
+============
+
+Bitmask for printing system info when panic happens. User can chose
+combination of the following bits:
+
+=====  ========================================
+bit 0  print all tasks info
+bit 1  print system memory info
+bit 2  print timer info
+bit 3  print locks info if CONFIG_LOCKDEP is on
+bit 4  print ftrace buffer
+=====  ========================================
+
+So for example to print tasks and memory info on panic, user can::
+
+  echo 3 > /proc/sys/kernel/panic_print
+
+
+panic_on_rcu_stall:
+===================
+
+When set to 1, calls panic() after RCU stall detection messages. This
+is useful to define the root cause of RCU stalls using a vmcore.
+
+0: do not panic() when RCU stall takes place, default behavior.
+
+1: panic() after printing RCU stall messages.
+
+
+perf_cpu_time_max_percent:
+==========================
+
+Hints to the kernel how much CPU time it should be allowed to
+use to handle perf sampling events.  If the perf subsystem
+is informed that its samples are exceeding this limit, it
+will drop its sampling frequency to attempt to reduce its CPU
+usage.
+
+Some perf sampling happens in NMIs.  If these samples
+unexpectedly take too long to execute, the NMIs can become
+stacked up next to each other so much that nothing else is
+allowed to execute.
+
+0:
+   disable the mechanism.  Do not monitor or correct perf's
+   sampling rate no matter how CPU time it takes.
+
+1-100:
+   attempt to throttle perf's sample rate to this
+   percentage of CPU.  Note: the kernel calculates an
+   "expected" length of each sample event.  100 here means
+   100% of that expected length.  Even if this is set to
+   100, you may still see sample throttling if this
+   length is exceeded.  Set to 0 if you truly do not care
+   how much CPU is consumed.
+
+
+perf_event_paranoid:
+====================
+
+Controls use of the performance events system by unprivileged
+users (without CAP_SYS_ADMIN).  The default value is 2.
+
+===  ==================================================================
+ -1  Allow use of (almost) all events by all users
+
+     Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
+
+>=0  Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+
+     Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+
+>=1  Disallow CPU event access by users without CAP_SYS_ADMIN
+
+>=2  Disallow kernel profiling by users without CAP_SYS_ADMIN
+===  ==================================================================
+
+
+perf_event_max_stack:
+=====================
+
+Controls maximum number of stack frames to copy for (attr.sample_type &
+PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
+'perf record -g' or 'perf trace --call-graph fp'.
+
+This can only be done when no events are in use that have callchains
+enabled, otherwise writing to this file will return -EBUSY.
+
+The default value is 127.
+
+
+perf_event_mlock_kb:
+====================
+
+Control size of per-cpu ring buffer not counted agains mlock limit.
+
+The default value is 512 + 1 page
+
+
+perf_event_max_contexts_per_stack:
+==================================
+
+Controls maximum number of stack frame context entries for
+(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
+instance, when using 'perf record -g' or 'perf trace --call-graph fp'.
+
+This can only be done when no events are in use that have callchains
+enabled, otherwise writing to this file will return -EBUSY.
+
+The default value is 8.
+
+
+pid_max:
+========
+
+PID allocation wrap value.  When the kernel's next PID value
+reaches this value, it wraps back to a minimum PID value.
+PIDs of value pid_max or larger are not allocated.
+
+
+ns_last_pid:
+============
+
+The last pid allocated in the current (the one task using this sysctl
+lives in) pid namespace. When selecting a pid for a next task on fork
+kernel tries to allocate a number starting from this one.
+
+
+powersave-nap: (PPC only)
+=========================
+
+If set, Linux-PPC will use the 'nap' mode of powersaving,
+otherwise the 'doze' mode will be used.
+
+==============================================================
+
+printk:
+=======
+
+The four values in printk denote: console_loglevel,
+default_message_loglevel, minimum_console_loglevel and
+default_console_loglevel respectively.
+
+These values influence printk() behavior when printing or
+logging error messages. See 'man 2 syslog' for more info on
+the different loglevels.
+
+- console_loglevel:
+	messages with a higher priority than
+	this will be printed to the console
+- default_message_loglevel:
+	messages without an explicit priority
+	will be printed with this priority
+- minimum_console_loglevel:
+	minimum (highest) value to which
+	console_loglevel can be set
+- default_console_loglevel:
+	default value for console_loglevel
+
+
+printk_delay:
+=============
+
+Delay each printk message in printk_delay milliseconds
+
+Value from 0 - 10000 is allowed.
+
+
+printk_ratelimit:
+=================
+
+Some warning messages are rate limited. printk_ratelimit specifies
+the minimum length of time between these messages (in jiffies), by
+default we allow one every 5 seconds.
+
+A value of 0 will disable rate limiting.
+
+
+printk_ratelimit_burst:
+=======================
+
+While long term we enforce one message per printk_ratelimit
+seconds, we do allow a burst of messages to pass through.
+printk_ratelimit_burst specifies the number of messages we can
+send before ratelimiting kicks in.
+
+
+printk_devkmsg:
+===============
+
+Control the logging to /dev/kmsg from userspace:
+
+ratelimit:
+	default, ratelimited
+
+on: unlimited logging to /dev/kmsg from userspace
+
+off: logging to /dev/kmsg disabled
+
+The kernel command line parameter printk.devkmsg= overrides this and is
+a one-time setting until next reboot: once set, it cannot be changed by
+this sysctl interface anymore.
+
+
+randomize_va_space:
+===================
+
+This option can be used to select the type of process address
+space randomization that is used in the system, for architectures
+that support this feature.
+
+==  ===========================================================================
+0   Turn the process address space randomization off.  This is the
+    default for architectures that do not support this feature anyways,
+    and kernels that are booted with the "norandmaps" parameter.
+
+1   Make the addresses of mmap base, stack and VDSO page randomized.
+    This, among other things, implies that shared libraries will be
+    loaded to random addresses.  Also for PIE-linked binaries, the
+    location of code start is randomized.  This is the default if the
+    CONFIG_COMPAT_BRK option is enabled.
+
+2   Additionally enable heap randomization.  This is the default if
+    CONFIG_COMPAT_BRK is disabled.
+
+    There are a few legacy applications out there (such as some ancient
+    versions of libc.so.5 from 1996) that assume that brk area starts
+    just after the end of the code+bss.  These applications break when
+    start of the brk area is randomized.  There are however no known
+    non-legacy applications that would be broken this way, so for most
+    systems it is safe to choose full randomization.
+
+    Systems with ancient and/or broken binaries should be configured
+    with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
+    address space randomization.
+==  ===========================================================================
+
+
+reboot-cmd: (Sparc only)
+========================
+
+??? This seems to be a way to give an argument to the Sparc
+ROM/Flash boot loader. Maybe to tell it what to do after
+rebooting. ???
+
+
+rtsig-max & rtsig-nr:
+=====================
+
+The file rtsig-max can be used to tune the maximum number
+of POSIX realtime (queued) signals that can be outstanding
+in the system.
+
+rtsig-nr shows the number of RT signals currently queued.
+
+
+sched_energy_aware:
+===================
+
+Enables/disables Energy Aware Scheduling (EAS). EAS starts
+automatically on platforms where it can run (that is,
+platforms with asymmetric CPU topologies and having an Energy
+Model available). If your platform happens to meet the
+requirements for EAS but you do not want to use it, change
+this value to 0.
+
+
+sched_schedstats:
+=================
+
+Enables/disables scheduler statistics. Enabling this feature
+incurs a small amount of overhead in the scheduler but is
+useful for debugging and performance tuning.
+
+
+sg-big-buff:
+============
+
+This file shows the size of the generic SCSI (sg) buffer.
+You can't tune it just yet, but you could change it on
+compile time by editing include/scsi/sg.h and changing
+the value of SG_BIG_BUFF.
+
+There shouldn't be any reason to change this value. If
+you can come up with one, you probably know what you
+are doing anyway :)
+
+
+shmall:
+=======
+
+This parameter sets the total amount of shared memory pages that
+can be used system wide. Hence, SHMALL should always be at least
+ceil(shmmax/PAGE_SIZE).
+
+If you are not sure what the default PAGE_SIZE is on your Linux
+system, you can run the following command:
+
+	# getconf PAGE_SIZE
+
+
+shmmax:
+=======
+
+This value can be used to query and set the run time limit
+on the maximum shared memory segment size that can be created.
+Shared memory segments up to 1Gb are now supported in the
+kernel.  This value defaults to SHMMAX.
+
+
+shm_rmid_forced:
+================
+
+Linux lets you set resource limits, including how much memory one
+process can consume, via setrlimit(2).  Unfortunately, shared memory
+segments are allowed to exist without association with any process, and
+thus might not be counted against any resource limits.  If enabled,
+shared memory segments are automatically destroyed when their attach
+count becomes zero after a detach or a process termination.  It will
+also destroy segments that were created, but never attached to, on exit
+from the process.  The only use left for IPC_RMID is to immediately
+destroy an unattached segment.  Of course, this breaks the way things are
+defined, so some applications might stop working.  Note that this
+feature will do you no good unless you also configure your resource
+limits (in particular, RLIMIT_AS and RLIMIT_NPROC).  Most systems don't
+need this.
+
+Note that if you change this from 0 to 1, already created segments
+without users and with a dead originative process will be destroyed.
+
+
+sysctl_writes_strict:
+=====================
+
+Control how file position affects the behavior of updating sysctl values
+via the /proc/sys interface:
+
+  ==   ======================================================================
+  -1   Legacy per-write sysctl value handling, with no printk warnings.
+       Each write syscall must fully contain the sysctl value to be
+       written, and multiple writes on the same sysctl file descriptor
+       will rewrite the sysctl value, regardless of file position.
+   0   Same behavior as above, but warn about processes that perform writes
+       to a sysctl file descriptor when the file position is not 0.
+   1   (default) Respect file position when writing sysctl strings. Multiple
+       writes will append to the sysctl value buffer. Anything past the max
+       length of the sysctl value buffer will be ignored. Writes to numeric
+       sysctl entries must always be at file position 0 and the value must
+       be fully contained in the buffer sent in the write syscall.
+  ==   ======================================================================
+
+
+softlockup_all_cpu_backtrace:
+=============================
+
+This value controls the soft lockup detector thread's behavior
+when a soft lockup condition is detected as to whether or not
+to gather further debug information. If enabled, each cpu will
+be issued an NMI and instructed to capture stack trace.
+
+This feature is only applicable for architectures which support
+NMI.
+
+0: do nothing. This is the default behavior.
+
+1: on detection capture more debug information.
+
+
+soft_watchdog:
+==============
+
+This parameter can be used to control the soft lockup detector.
+
+   0 - disable the soft lockup detector
+
+   1 - enable the soft lockup detector
+
+The soft lockup detector monitors CPUs for threads that are hogging the CPUs
+without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
+from running. The mechanism depends on the CPUs ability to respond to timer
+interrupts which are needed for the 'watchdog/N' threads to be woken up by
+the watchdog timer function, otherwise the NMI watchdog - if enabled - can
+detect a hard lockup condition.
+
+
+stack_erasing:
+==============
+
+This parameter can be used to control kernel stack erasing at the end
+of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.
+
+That erasing reduces the information which kernel stack leak bugs
+can reveal and blocks some uninitialized stack variable attacks.
+The tradeoff is the performance impact: on a single CPU system kernel
+compilation sees a 1% slowdown, other systems and workloads may vary.
+
+  0: kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
+
+  1: kernel stack erasing is enabled (default), it is performed before
+     returning to the userspace at the end of syscalls.
+
+
+tainted
+=======
+
+Non-zero if the kernel has been tainted. Numeric values, which can be
+ORed together. The letters are seen in "Tainted" line of Oops reports.
+
+======  =====  ==============================================================
+     1  `(P)`  proprietary module was loaded
+     2  `(F)`  module was force loaded
+     4  `(S)`  SMP kernel oops on an officially SMP incapable processor
+     8  `(R)`  module was force unloaded
+    16  `(M)`  processor reported a Machine Check Exception (MCE)
+    32  `(B)`  bad page referenced or some unexpected page flags
+    64  `(U)`  taint requested by userspace application
+   128  `(D)`  kernel died recently, i.e. there was an OOPS or BUG
+   256  `(A)`  an ACPI table was overridden by user
+   512  `(W)`  kernel issued warning
+  1024  `(C)`  staging driver was loaded
+  2048  `(I)`  workaround for bug in platform firmware applied
+  4096  `(O)`  externally-built ("out-of-tree") module was loaded
+  8192  `(E)`  unsigned module was loaded
+ 16384  `(L)`  soft lockup occurred
+ 32768  `(K)`  kernel has been live patched
+ 65536  `(X)`  Auxiliary taint, defined and used by for distros
+131072  `(T)`  The kernel was built with the struct randomization plugin
+======  =====  ==============================================================
+
+See Documentation/admin-guide/tainted-kernels.rst for more information.
+
+
+threads-max:
+============
+
+This value controls the maximum number of threads that can be created
+using fork().
+
+During initialization the kernel sets this value such that even if the
+maximum number of threads is created, the thread structures occupy only
+a part (1/8th) of the available RAM pages.
+
+The minimum value that can be written to threads-max is 20.
+
+The maximum value that can be written to threads-max is given by the
+constant FUTEX_TID_MASK (0x3fffffff).
+
+If a value outside of this range is written to threads-max an error
+EINVAL occurs.
+
+The value written is checked against the available RAM pages. If the
+thread structures would occupy too much (more than 1/8th) of the
+available RAM pages threads-max is reduced accordingly.
+
+
+unknown_nmi_panic:
+==================
+
+The value in this file affects behavior of handling NMI. When the
+value is non-zero, unknown NMI is trapped and then panic occurs. At
+that time, kernel debugging information is displayed on console.
+
+NMI switch that most IA32 servers have fires unknown NMI up, for
+example.  If a system hangs up, try pressing the NMI switch.
+
+
+watchdog:
+=========
+
+This parameter can be used to disable or enable the soft lockup detector
+_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
+
+   0 - disable both lockup detectors
+
+   1 - enable both lockup detectors
+
+The soft lockup detector and the NMI watchdog can also be disabled or
+enabled individually, using the soft_watchdog and nmi_watchdog parameters.
+If the watchdog parameter is read, for example by executing::
+
+   cat /proc/sys/kernel/watchdog
+
+the output of this command (0 or 1) shows the logical OR of soft_watchdog
+and nmi_watchdog.
+
+
+watchdog_cpumask:
+=================
+
+This value can be used to control on which cpus the watchdog may run.
+The default cpumask is all possible cores, but if NO_HZ_FULL is
+enabled in the kernel config, and cores are specified with the
+nohz_full= boot argument, those cores are excluded by default.
+Offline cores can be included in this mask, and if the core is later
+brought online, the watchdog will be started based on the mask value.
+
+Typically this value would only be touched in the nohz_full case
+to re-enable cores that by default were not running the watchdog,
+if a kernel lockup was suspected on those cores.
+
+The argument value is the standard cpulist format for cpumasks,
+so for example to enable the watchdog on cores 0, 2, 3, and 4 you
+might say::
+
+  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
+
+
+watchdog_thresh:
+================
+
+This value can be used to control the frequency of hrtimer and NMI
+events and the soft and hard lockup thresholds. The default threshold
+is 10 seconds.
+
+The softlockup threshold is (2 * watchdog_thresh). Setting this
+tunable to zero will disable lockup detection altogether.
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
deleted file mode 100644
index 1b2fe17cd2fa..000000000000
--- a/Documentation/sysctl/kernel.txt
+++ /dev/null
@@ -1,1129 +0,0 @@
-Documentation for /proc/sys/kernel/*	kernel version 2.2.10
-	(c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-	(c) 2009,        Shen Feng<shen@cn.fujitsu.com>
-
-For general info and legal blurb, please look in README.
-
-==============================================================
-
-This file contains documentation for the sysctl files in
-/proc/sys/kernel/ and is valid for Linux kernel version 2.2.
-
-The files in this directory can be used to tune and monitor
-miscellaneous and general things in the operation of the Linux
-kernel. Since some of the files _can_ be used to screw up your
-system, it is advisable to read both documentation and source
-before actually making adjustments.
-
-Currently, these files might (depending on your configuration)
-show up in /proc/sys/kernel:
-
-- acct
-- acpi_video_flags
-- auto_msgmni
-- bootloader_type	     [ X86 only ]
-- bootloader_version	     [ X86 only ]
-- cap_last_cap
-- core_pattern
-- core_pipe_limit
-- core_uses_pid
-- ctrl-alt-del
-- dmesg_restrict
-- domainname
-- hostname
-- hotplug
-- hardlockup_all_cpu_backtrace
-- hardlockup_panic
-- hung_task_panic
-- hung_task_check_count
-- hung_task_timeout_secs
-- hung_task_check_interval_secs
-- hung_task_warnings
-- hyperv_record_panic_msg
-- kexec_load_disabled
-- kptr_restrict
-- l2cr                        [ PPC only ]
-- modprobe                    ==> Documentation/debugging-modules.txt
-- modules_disabled
-- msg_next_id		      [ sysv ipc ]
-- msgmax
-- msgmnb
-- msgmni
-- nmi_watchdog
-- osrelease
-- ostype
-- overflowgid
-- overflowuid
-- panic
-- panic_on_oops
-- panic_on_stackoverflow
-- panic_on_unrecovered_nmi
-- panic_on_warn
-- panic_print
-- panic_on_rcu_stall
-- perf_cpu_time_max_percent
-- perf_event_paranoid
-- perf_event_max_stack
-- perf_event_mlock_kb
-- perf_event_max_contexts_per_stack
-- pid_max
-- powersave-nap               [ PPC only ]
-- printk
-- printk_delay
-- printk_ratelimit
-- printk_ratelimit_burst
-- pty                         ==> Documentation/filesystems/devpts.txt
-- randomize_va_space
-- real-root-dev               ==> Documentation/admin-guide/initrd.rst
-- reboot-cmd                  [ SPARC only ]
-- rtsig-max
-- rtsig-nr
-- sched_energy_aware
-- seccomp/                    ==> Documentation/userspace-api/seccomp_filter.rst
-- sem
-- sem_next_id		      [ sysv ipc ]
-- sg-big-buff                 [ generic SCSI device (sg) ]
-- shm_next_id		      [ sysv ipc ]
-- shm_rmid_forced
-- shmall
-- shmmax                      [ sysv ipc ]
-- shmmni
-- softlockup_all_cpu_backtrace
-- soft_watchdog
-- stack_erasing
-- stop-a                      [ SPARC only ]
-- sysrq                       ==> Documentation/admin-guide/sysrq.rst
-- sysctl_writes_strict
-- tainted                     ==> Documentation/admin-guide/tainted-kernels.rst
-- threads-max
-- unknown_nmi_panic
-- watchdog
-- watchdog_thresh
-- version
-
-==============================================================
-
-acct:
-
-highwater lowwater frequency
-
-If BSD-style process accounting is enabled these values control
-its behaviour. If free space on filesystem where the log lives
-goes below <lowwater>% accounting suspends. If free space gets
-above <highwater>% accounting resumes. <Frequency> determines
-how often do we check the amount of free space (value is in
-seconds). Default:
-4 2 30
-That is, suspend accounting if there left <= 2% free; resume it
-if we got >=4%; consider information about amount of free space
-valid for 30 seconds.
-
-==============================================================
-
-acpi_video_flags:
-
-flags
-
-See Doc*/kernel/power/video.txt, it allows mode of video boot to be
-set during run time.
-
-==============================================================
-
-auto_msgmni:
-
-This variable has no effect and may be removed in future kernel
-releases. Reading it always returns 0.
-Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
-upon memory add/remove or upon ipc namespace creation/removal.
-Echoing "1" into this file enabled msgmni automatic recomputing.
-Echoing "0" turned it off. auto_msgmni default value was 1.
-
-
-==============================================================
-
-bootloader_type:
-
-x86 bootloader identification
-
-This gives the bootloader type number as indicated by the bootloader,
-shifted left by 4, and OR'd with the low four bits of the bootloader
-version.  The reason for this encoding is that this used to match the
-type_of_loader field in the kernel header; the encoding is kept for
-backwards compatibility.  That is, if the full bootloader type number
-is 0x15 and the full version number is 0x234, this file will contain
-the value 340 = 0x154.
-
-See the type_of_loader and ext_loader_type fields in
-Documentation/x86/boot.rst for additional information.
-
-==============================================================
-
-bootloader_version:
-
-x86 bootloader version
-
-The complete bootloader version number.  In the example above, this
-file will contain the value 564 = 0x234.
-
-See the type_of_loader and ext_loader_ver fields in
-Documentation/x86/boot.rst for additional information.
-
-==============================================================
-
-cap_last_cap
-
-Highest valid capability of the running kernel.  Exports
-CAP_LAST_CAP from the kernel.
-
-==============================================================
-
-core_pattern:
-
-core_pattern is used to specify a core dumpfile pattern name.
-. max length 127 characters; default value is "core"
-. core_pattern is used as a pattern template for the output filename;
-  certain string patterns (beginning with '%') are substituted with
-  their actual values.
-. backward compatibility with core_uses_pid:
-	If core_pattern does not include "%p" (default does not)
-	and core_uses_pid is set, then .PID will be appended to
-	the filename.
-. corename format specifiers:
-	%<NUL>	'%' is dropped
-	%%	output one '%'
-	%p	pid
-	%P	global pid (init PID namespace)
-	%i	tid
-	%I	global tid (init PID namespace)
-	%u	uid (in initial user namespace)
-	%g	gid (in initial user namespace)
-	%d	dump mode, matches PR_SET_DUMPABLE and
-		/proc/sys/fs/suid_dumpable
-	%s	signal number
-	%t	UNIX time of dump
-	%h	hostname
-	%e	executable filename (may be shortened)
-	%E	executable path
-	%<OTHER> both are dropped
-. If the first character of the pattern is a '|', the kernel will treat
-  the rest of the pattern as a command to run.  The core dump will be
-  written to the standard input of that program instead of to a file.
-
-==============================================================
-
-core_pipe_limit:
-
-This sysctl is only applicable when core_pattern is configured to pipe
-core files to a user space helper (when the first character of
-core_pattern is a '|', see above).  When collecting cores via a pipe
-to an application, it is occasionally useful for the collecting
-application to gather data about the crashing process from its
-/proc/pid directory.  In order to do this safely, the kernel must wait
-for the collecting process to exit, so as not to remove the crashing
-processes proc files prematurely.  This in turn creates the
-possibility that a misbehaving userspace collecting process can block
-the reaping of a crashed process simply by never exiting.  This sysctl
-defends against that.  It defines how many concurrent crashing
-processes may be piped to user space applications in parallel.  If
-this value is exceeded, then those crashing processes above that value
-are noted via the kernel log and their cores are skipped.  0 is a
-special value, indicating that unlimited processes may be captured in
-parallel, but that no waiting will take place (i.e. the collecting
-process is not guaranteed access to /proc/<crashing pid>/).  This
-value defaults to 0.
-
-==============================================================
-
-core_uses_pid:
-
-The default coredump filename is "core".  By setting
-core_uses_pid to 1, the coredump filename becomes core.PID.
-If core_pattern does not include "%p" (default does not)
-and core_uses_pid is set, then .PID will be appended to
-the filename.
-
-==============================================================
-
-ctrl-alt-del:
-
-When the value in this file is 0, ctrl-alt-del is trapped and
-sent to the init(1) program to handle a graceful restart.
-When, however, the value is > 0, Linux's reaction to a Vulcan
-Nerve Pinch (tm) will be an immediate reboot, without even
-syncing its dirty buffers.
-
-Note: when a program (like dosemu) has the keyboard in 'raw'
-mode, the ctrl-alt-del is intercepted by the program before it
-ever reaches the kernel tty layer, and it's up to the program
-to decide what to do with it.
-
-==============================================================
-
-dmesg_restrict:
-
-This toggle indicates whether unprivileged users are prevented
-from using dmesg(8) to view messages from the kernel's log buffer.
-When dmesg_restrict is set to (0) there are no restrictions. When
-dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
-dmesg(8).
-
-The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
-default value of dmesg_restrict.
-
-==============================================================
-
-domainname & hostname:
-
-These files can be used to set the NIS/YP domainname and the
-hostname of your box in exactly the same way as the commands
-domainname and hostname, i.e.:
-# echo "darkstar" > /proc/sys/kernel/hostname
-# echo "mydomain" > /proc/sys/kernel/domainname
-has the same effect as
-# hostname "darkstar"
-# domainname "mydomain"
-
-Note, however, that the classic darkstar.frop.org has the
-hostname "darkstar" and DNS (Internet Domain Name Server)
-domainname "frop.org", not to be confused with the NIS (Network
-Information Service) or YP (Yellow Pages) domainname. These two
-domain names are in general different. For a detailed discussion
-see the hostname(1) man page.
-
-==============================================================
-hardlockup_all_cpu_backtrace:
-
-This value controls the hard lockup detector behavior when a hard
-lockup condition is detected as to whether or not to gather further
-debug information. If enabled, arch-specific all-CPU stack dumping
-will be initiated.
-
-0: do nothing. This is the default behavior.
-
-1: on detection capture more debug information.
-==============================================================
-
-hardlockup_panic:
-
-This parameter can be used to control whether the kernel panics
-when a hard lockup is detected.
-
-   0 - don't panic on hard lockup
-   1 - panic on hard lockup
-
-See Documentation/lockup-watchdogs.txt for more information.  This can
-also be set using the nmi_watchdog kernel parameter.
-
-==============================================================
-
-hotplug:
-
-Path for the hotplug policy agent.
-Default value is "/sbin/hotplug".
-
-==============================================================
-
-hung_task_panic:
-
-Controls the kernel's behavior when a hung task is detected.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-0: continue operation. This is the default behavior.
-
-1: panic immediately.
-
-==============================================================
-
-hung_task_check_count:
-
-The upper bound on the number of tasks that are checked.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-==============================================================
-
-hung_task_timeout_secs:
-
-When a task in D state did not get scheduled
-for more than this value report a warning.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-0: means infinite timeout - no checking done.
-Possible values to set are in range {0..LONG_MAX/HZ}.
-
-==============================================================
-
-hung_task_check_interval_secs:
-
-Hung task check interval. If hung task checking is enabled
-(see hung_task_timeout_secs), the check is done every
-hung_task_check_interval_secs seconds.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-0 (default): means use hung_task_timeout_secs as checking interval.
-Possible values to set are in range {0..LONG_MAX/HZ}.
-
-==============================================================
-
-hung_task_warnings:
-
-The maximum number of warnings to report. During a check interval
-if a hung task is detected, this value is decreased by 1.
-When this value reaches 0, no more warnings will be reported.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
--1: report an infinite number of warnings.
-
-==============================================================
-
-hyperv_record_panic_msg:
-
-Controls whether the panic kmsg data should be reported to Hyper-V.
-
-0: do not report panic kmsg data.
-
-1: report the panic kmsg data. This is the default behavior.
-
-==============================================================
-
-kexec_load_disabled:
-
-A toggle indicating if the kexec_load syscall has been disabled. This
-value defaults to 0 (false: kexec_load enabled), but can be set to 1
-(true: kexec_load disabled). Once true, kexec can no longer be used, and
-the toggle cannot be set back to false. This allows a kexec image to be
-loaded before disabling the syscall, allowing a system to set up (and
-later use) an image without it being altered. Generally used together
-with the "modules_disabled" sysctl.
-
-==============================================================
-
-kptr_restrict:
-
-This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces.
-
-When kptr_restrict is set to 0 (the default) the address is hashed before
-printing. (This is the equivalent to %p.)
-
-When kptr_restrict is set to (1), kernel pointers printed using the %pK
-format specifier will be replaced with 0's unless the user has CAP_SYSLOG
-and effective user and group ids are equal to the real ids. This is
-because %pK checks are done at read() time rather than open() time, so
-if permissions are elevated between the open() and the read() (e.g via
-a setuid binary) then %pK will not leak kernel pointers to unprivileged
-users. Note, this is a temporary solution only. The correct long-term
-solution is to do the permission checks at open() time. Consider removing
-world read permissions from files that use %pK, and using dmesg_restrict
-to protect against uses of %pK in dmesg(8) if leaking kernel pointer
-values to unprivileged users is a concern.
-
-When kptr_restrict is set to (2), kernel pointers printed using
-%pK will be replaced with 0's regardless of privileges.
-
-==============================================================
-
-l2cr: (PPC only)
-
-This flag controls the L2 cache of G3 processor boards. If
-0, the cache is disabled. Enabled if nonzero.
-
-==============================================================
-
-modules_disabled:
-
-A toggle value indicating if modules are allowed to be loaded
-in an otherwise modular kernel.  This toggle defaults to off
-(0), but can be set true (1).  Once true, modules can be
-neither loaded nor unloaded, and the toggle cannot be set back
-to false.  Generally used with the "kexec_load_disabled" toggle.
-
-==============================================================
-
-msg_next_id, sem_next_id, and shm_next_id:
-
-These three toggles allows to specify desired id for next allocated IPC
-object: message, semaphore or shared memory respectively.
-
-By default they are equal to -1, which means generic allocation logic.
-Possible values to set are in range {0..INT_MAX}.
-
-Notes:
-1) kernel doesn't guarantee, that new object will have desired id. So,
-it's up to userspace, how to handle an object with "wrong" id.
-2) Toggle with non-default value will be set back to -1 by kernel after
-successful IPC object allocation. If an IPC object allocation syscall
-fails, it is undefined if the value remains unmodified or is reset to -1.
-
-==============================================================
-
-nmi_watchdog:
-
-This parameter can be used to control the NMI watchdog
-(i.e. the hard lockup detector) on x86 systems.
-
-   0 - disable the hard lockup detector
-   1 - enable the hard lockup detector
-
-The hard lockup detector monitors each CPU for its ability to respond to
-timer interrupts. The mechanism utilizes CPU performance counter registers
-that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
-while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
-
-The NMI watchdog is disabled by default if the kernel is running as a guest
-in a KVM virtual machine. This default can be overridden by adding
-
-   nmi_watchdog=1
-
-to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst).
-
-==============================================================
-
-numa_balancing
-
-Enables/disables automatic page fault based NUMA memory
-balancing. Memory is moved automatically to nodes
-that access it often.
-
-Enables/disables automatic NUMA memory balancing. On NUMA machines, there
-is a performance penalty if remote memory is accessed by a CPU. When this
-feature is enabled the kernel samples what task thread is accessing memory
-by periodically unmapping pages and later trapping a page fault. At the
-time of the page fault, it is determined if the data being accessed should
-be migrated to a local memory node.
-
-The unmapping of pages and trapping faults incur additional overhead that
-ideally is offset by improved memory locality but there is no universal
-guarantee. If the target workload is already bound to NUMA nodes then this
-feature should be disabled. Otherwise, if the system overhead from the
-feature is too high then the rate the kernel samples for NUMA hinting
-faults may be controlled by the numa_balancing_scan_period_min_ms,
-numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
-numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls.
-
-==============================================================
-
-numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
-numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
-
-Automatic NUMA balancing scans tasks address space and unmaps pages to
-detect if pages are properly placed or if the data should be migrated to a
-memory node local to where the task is running.  Every "scan delay" the task
-scans the next "scan size" number of pages in its address space. When the
-end of the address space is reached the scanner restarts from the beginning.
-
-In combination, the "scan delay" and "scan size" determine the scan rate.
-When "scan delay" decreases, the scan rate increases.  The scan delay and
-hence the scan rate of every task is adaptive and depends on historical
-behaviour. If pages are properly placed then the scan delay increases,
-otherwise the scan delay decreases.  The "scan size" is not adaptive but
-the higher the "scan size", the higher the scan rate.
-
-Higher scan rates incur higher system overhead as page faults must be
-trapped and potentially data must be migrated. However, the higher the scan
-rate, the more quickly a tasks memory is migrated to a local node if the
-workload pattern changes and minimises performance impact due to remote
-memory accesses. These sysctls control the thresholds for scan delays and
-the number of pages scanned.
-
-numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
-scan a tasks virtual memory. It effectively controls the maximum scanning
-rate for each task.
-
-numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
-when it initially forks.
-
-numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
-scan a tasks virtual memory. It effectively controls the minimum scanning
-rate for each task.
-
-numa_balancing_scan_size_mb is how many megabytes worth of pages are
-scanned for a given scan.
-
-==============================================================
-
-osrelease, ostype & version:
-
-# cat osrelease
-2.1.88
-# cat ostype
-Linux
-# cat version
-#5 Wed Feb 25 21:49:24 MET 1998
-
-The files osrelease and ostype should be clear enough. Version
-needs a little more clarification however. The '#5' means that
-this is the fifth kernel built from this source base and the
-date behind it indicates the time the kernel was built.
-The only way to tune these values is to rebuild the kernel :-)
-
-==============================================================
-
-overflowgid & overflowuid:
-
-if your architecture did not always support 32-bit UIDs (i.e. arm,
-i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
-applications that use the old 16-bit UID/GID system calls, if the
-actual UID or GID would exceed 65535.
-
-These sysctls allow you to change the value of the fixed UID and GID.
-The default is 65534.
-
-==============================================================
-
-panic:
-
-The value in this file represents the number of seconds the kernel
-waits before rebooting on a panic. When you use the software watchdog,
-the recommended setting is 60.
-
-==============================================================
-
-panic_on_io_nmi:
-
-Controls the kernel's behavior when a CPU receives an NMI caused by
-an IO error.
-
-0: try to continue operation (default)
-
-1: panic immediately. The IO error triggered an NMI. This indicates a
-   serious system condition which could result in IO data corruption.
-   Rather than continuing, panicking might be a better choice. Some
-   servers issue this sort of NMI when the dump button is pushed,
-   and you can use this option to take a crash dump.
-
-==============================================================
-
-panic_on_oops:
-
-Controls the kernel's behaviour when an oops or BUG is encountered.
-
-0: try to continue operation
-
-1: panic immediately.  If the `panic' sysctl is also non-zero then the
-   machine will be rebooted.
-
-==============================================================
-
-panic_on_stackoverflow:
-
-Controls the kernel's behavior when detecting the overflows of
-kernel, IRQ and exception stacks except a user stack.
-This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
-
-0: try to continue operation.
-
-1: panic immediately.
-
-==============================================================
-
-panic_on_unrecovered_nmi:
-
-The default Linux behaviour on an NMI of either memory or unknown is
-to continue operation. For many environments such as scientific
-computing it is preferable that the box is taken out and the error
-dealt with than an uncorrected parity/ECC error get propagated.
-
-A small number of systems do generate NMI's for bizarre random reasons
-such as power management so the default is off. That sysctl works like
-the existing panic controls already in that directory.
-
-==============================================================
-
-panic_on_warn:
-
-Calls panic() in the WARN() path when set to 1.  This is useful to avoid
-a kernel rebuild when attempting to kdump at the location of a WARN().
-
-0: only WARN(), default behaviour.
-
-1: call panic() after printing out WARN() location.
-
-==============================================================
-
-panic_print:
-
-Bitmask for printing system info when panic happens. User can chose
-combination of the following bits:
-
-bit 0: print all tasks info
-bit 1: print system memory info
-bit 2: print timer info
-bit 3: print locks info if CONFIG_LOCKDEP is on
-bit 4: print ftrace buffer
-
-So for example to print tasks and memory info on panic, user can:
-  echo 3 > /proc/sys/kernel/panic_print
-
-==============================================================
-
-panic_on_rcu_stall:
-
-When set to 1, calls panic() after RCU stall detection messages. This
-is useful to define the root cause of RCU stalls using a vmcore.
-
-0: do not panic() when RCU stall takes place, default behavior.
-
-1: panic() after printing RCU stall messages.
-
-==============================================================
-
-perf_cpu_time_max_percent:
-
-Hints to the kernel how much CPU time it should be allowed to
-use to handle perf sampling events.  If the perf subsystem
-is informed that its samples are exceeding this limit, it
-will drop its sampling frequency to attempt to reduce its CPU
-usage.
-
-Some perf sampling happens in NMIs.  If these samples
-unexpectedly take too long to execute, the NMIs can become
-stacked up next to each other so much that nothing else is
-allowed to execute.
-
-0: disable the mechanism.  Do not monitor or correct perf's
-   sampling rate no matter how CPU time it takes.
-
-1-100: attempt to throttle perf's sample rate to this
-   percentage of CPU.  Note: the kernel calculates an
-   "expected" length of each sample event.  100 here means
-   100% of that expected length.  Even if this is set to
-   100, you may still see sample throttling if this
-   length is exceeded.  Set to 0 if you truly do not care
-   how much CPU is consumed.
-
-==============================================================
-
-perf_event_paranoid:
-
-Controls use of the performance events system by unprivileged
-users (without CAP_SYS_ADMIN).  The default value is 2.
-
- -1: Allow use of (almost) all events by all users
-     Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
->=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
-     Disallow raw tracepoint access by users without CAP_SYS_ADMIN
->=1: Disallow CPU event access by users without CAP_SYS_ADMIN
->=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
-
-==============================================================
-
-perf_event_max_stack:
-
-Controls maximum number of stack frames to copy for (attr.sample_type &
-PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
-'perf record -g' or 'perf trace --call-graph fp'.
-
-This can only be done when no events are in use that have callchains
-enabled, otherwise writing to this file will return -EBUSY.
-
-The default value is 127.
-
-==============================================================
-
-perf_event_mlock_kb:
-
-Control size of per-cpu ring buffer not counted agains mlock limit.
-
-The default value is 512 + 1 page
-
-==============================================================
-
-perf_event_max_contexts_per_stack:
-
-Controls maximum number of stack frame context entries for
-(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
-instance, when using 'perf record -g' or 'perf trace --call-graph fp'.
-
-This can only be done when no events are in use that have callchains
-enabled, otherwise writing to this file will return -EBUSY.
-
-The default value is 8.
-
-==============================================================
-
-pid_max:
-
-PID allocation wrap value.  When the kernel's next PID value
-reaches this value, it wraps back to a minimum PID value.
-PIDs of value pid_max or larger are not allocated.
-
-==============================================================
-
-ns_last_pid:
-
-The last pid allocated in the current (the one task using this sysctl
-lives in) pid namespace. When selecting a pid for a next task on fork
-kernel tries to allocate a number starting from this one.
-
-==============================================================
-
-powersave-nap: (PPC only)
-
-If set, Linux-PPC will use the 'nap' mode of powersaving,
-otherwise the 'doze' mode will be used.
-
-==============================================================
-
-printk:
-
-The four values in printk denote: console_loglevel,
-default_message_loglevel, minimum_console_loglevel and
-default_console_loglevel respectively.
-
-These values influence printk() behavior when printing or
-logging error messages. See 'man 2 syslog' for more info on
-the different loglevels.
-
-- console_loglevel: messages with a higher priority than
-  this will be printed to the console
-- default_message_loglevel: messages without an explicit priority
-  will be printed with this priority
-- minimum_console_loglevel: minimum (highest) value to which
-  console_loglevel can be set
-- default_console_loglevel: default value for console_loglevel
-
-==============================================================
-
-printk_delay:
-
-Delay each printk message in printk_delay milliseconds
-
-Value from 0 - 10000 is allowed.
-
-==============================================================
-
-printk_ratelimit:
-
-Some warning messages are rate limited. printk_ratelimit specifies
-the minimum length of time between these messages (in jiffies), by
-default we allow one every 5 seconds.
-
-A value of 0 will disable rate limiting.
-
-==============================================================
-
-printk_ratelimit_burst:
-
-While long term we enforce one message per printk_ratelimit
-seconds, we do allow a burst of messages to pass through.
-printk_ratelimit_burst specifies the number of messages we can
-send before ratelimiting kicks in.
-
-==============================================================
-
-printk_devkmsg:
-
-Control the logging to /dev/kmsg from userspace:
-
-ratelimit: default, ratelimited
-on: unlimited logging to /dev/kmsg from userspace
-off: logging to /dev/kmsg disabled
-
-The kernel command line parameter printk.devkmsg= overrides this and is
-a one-time setting until next reboot: once set, it cannot be changed by
-this sysctl interface anymore.
-
-==============================================================
-
-randomize_va_space:
-
-This option can be used to select the type of process address
-space randomization that is used in the system, for architectures
-that support this feature.
-
-0 - Turn the process address space randomization off.  This is the
-    default for architectures that do not support this feature anyways,
-    and kernels that are booted with the "norandmaps" parameter.
-
-1 - Make the addresses of mmap base, stack and VDSO page randomized.
-    This, among other things, implies that shared libraries will be
-    loaded to random addresses.  Also for PIE-linked binaries, the
-    location of code start is randomized.  This is the default if the
-    CONFIG_COMPAT_BRK option is enabled.
-
-2 - Additionally enable heap randomization.  This is the default if
-    CONFIG_COMPAT_BRK is disabled.
-
-    There are a few legacy applications out there (such as some ancient
-    versions of libc.so.5 from 1996) that assume that brk area starts
-    just after the end of the code+bss.  These applications break when
-    start of the brk area is randomized.  There are however no known
-    non-legacy applications that would be broken this way, so for most
-    systems it is safe to choose full randomization.
-
-    Systems with ancient and/or broken binaries should be configured
-    with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
-    address space randomization.
-
-==============================================================
-
-reboot-cmd: (Sparc only)
-
-??? This seems to be a way to give an argument to the Sparc
-ROM/Flash boot loader. Maybe to tell it what to do after
-rebooting. ???
-
-==============================================================
-
-rtsig-max & rtsig-nr:
-
-The file rtsig-max can be used to tune the maximum number
-of POSIX realtime (queued) signals that can be outstanding
-in the system.
-
-rtsig-nr shows the number of RT signals currently queued.
-
-==============================================================
-
-sched_energy_aware:
-
-Enables/disables Energy Aware Scheduling (EAS). EAS starts
-automatically on platforms where it can run (that is,
-platforms with asymmetric CPU topologies and having an Energy
-Model available). If your platform happens to meet the
-requirements for EAS but you do not want to use it, change
-this value to 0.
-
-==============================================================
-
-sched_schedstats:
-
-Enables/disables scheduler statistics. Enabling this feature
-incurs a small amount of overhead in the scheduler but is
-useful for debugging and performance tuning.
-
-==============================================================
-
-sg-big-buff:
-
-This file shows the size of the generic SCSI (sg) buffer.
-You can't tune it just yet, but you could change it on
-compile time by editing include/scsi/sg.h and changing
-the value of SG_BIG_BUFF.
-
-There shouldn't be any reason to change this value. If
-you can come up with one, you probably know what you
-are doing anyway :)
-
-==============================================================
-
-shmall:
-
-This parameter sets the total amount of shared memory pages that
-can be used system wide. Hence, SHMALL should always be at least
-ceil(shmmax/PAGE_SIZE).
-
-If you are not sure what the default PAGE_SIZE is on your Linux
-system, you can run the following command:
-
-# getconf PAGE_SIZE
-
-==============================================================
-
-shmmax:
-
-This value can be used to query and set the run time limit
-on the maximum shared memory segment size that can be created.
-Shared memory segments up to 1Gb are now supported in the
-kernel.  This value defaults to SHMMAX.
-
-==============================================================
-
-shm_rmid_forced:
-
-Linux lets you set resource limits, including how much memory one
-process can consume, via setrlimit(2).  Unfortunately, shared memory
-segments are allowed to exist without association with any process, and
-thus might not be counted against any resource limits.  If enabled,
-shared memory segments are automatically destroyed when their attach
-count becomes zero after a detach or a process termination.  It will
-also destroy segments that were created, but never attached to, on exit
-from the process.  The only use left for IPC_RMID is to immediately
-destroy an unattached segment.  Of course, this breaks the way things are
-defined, so some applications might stop working.  Note that this
-feature will do you no good unless you also configure your resource
-limits (in particular, RLIMIT_AS and RLIMIT_NPROC).  Most systems don't
-need this.
-
-Note that if you change this from 0 to 1, already created segments
-without users and with a dead originative process will be destroyed.
-
-==============================================================
-
-sysctl_writes_strict:
-
-Control how file position affects the behavior of updating sysctl values
-via the /proc/sys interface:
-
-  -1 - Legacy per-write sysctl value handling, with no printk warnings.
-       Each write syscall must fully contain the sysctl value to be
-       written, and multiple writes on the same sysctl file descriptor
-       will rewrite the sysctl value, regardless of file position.
-   0 - Same behavior as above, but warn about processes that perform writes
-       to a sysctl file descriptor when the file position is not 0.
-   1 - (default) Respect file position when writing sysctl strings. Multiple
-       writes will append to the sysctl value buffer. Anything past the max
-       length of the sysctl value buffer will be ignored. Writes to numeric
-       sysctl entries must always be at file position 0 and the value must
-       be fully contained in the buffer sent in the write syscall.
-
-==============================================================
-
-softlockup_all_cpu_backtrace:
-
-This value controls the soft lockup detector thread's behavior
-when a soft lockup condition is detected as to whether or not
-to gather further debug information. If enabled, each cpu will
-be issued an NMI and instructed to capture stack trace.
-
-This feature is only applicable for architectures which support
-NMI.
-
-0: do nothing. This is the default behavior.
-
-1: on detection capture more debug information.
-
-==============================================================
-
-soft_watchdog
-
-This parameter can be used to control the soft lockup detector.
-
-   0 - disable the soft lockup detector
-   1 - enable the soft lockup detector
-
-The soft lockup detector monitors CPUs for threads that are hogging the CPUs
-without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
-from running. The mechanism depends on the CPUs ability to respond to timer
-interrupts which are needed for the 'watchdog/N' threads to be woken up by
-the watchdog timer function, otherwise the NMI watchdog - if enabled - can
-detect a hard lockup condition.
-
-==============================================================
-
-stack_erasing
-
-This parameter can be used to control kernel stack erasing at the end
-of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.
-
-That erasing reduces the information which kernel stack leak bugs
-can reveal and blocks some uninitialized stack variable attacks.
-The tradeoff is the performance impact: on a single CPU system kernel
-compilation sees a 1% slowdown, other systems and workloads may vary.
-
-  0: kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
-
-  1: kernel stack erasing is enabled (default), it is performed before
-     returning to the userspace at the end of syscalls.
-==============================================================
-
-tainted
-
-Non-zero if the kernel has been tainted. Numeric values, which can be
-ORed together. The letters are seen in "Tainted" line of Oops reports.
-
-     1 (P): proprietary module was loaded
-     2 (F): module was force loaded
-     4 (S): SMP kernel oops on an officially SMP incapable processor
-     8 (R): module was force unloaded
-    16 (M): processor reported a Machine Check Exception (MCE)
-    32 (B): bad page referenced or some unexpected page flags
-    64 (U): taint requested by userspace application
-   128 (D): kernel died recently, i.e. there was an OOPS or BUG
-   256 (A): an ACPI table was overridden by user
-   512 (W): kernel issued warning
-  1024 (C): staging driver was loaded
-  2048 (I): workaround for bug in platform firmware applied
-  4096 (O): externally-built ("out-of-tree") module was loaded
-  8192 (E): unsigned module was loaded
- 16384 (L): soft lockup occurred
- 32768 (K): kernel has been live patched
- 65536 (X): Auxiliary taint, defined and used by for distros
-131072 (T): The kernel was built with the struct randomization plugin
-
-See Documentation/admin-guide/tainted-kernels.rst for more information.
-
-==============================================================
-
-threads-max
-
-This value controls the maximum number of threads that can be created
-using fork().
-
-During initialization the kernel sets this value such that even if the
-maximum number of threads is created, the thread structures occupy only
-a part (1/8th) of the available RAM pages.
-
-The minimum value that can be written to threads-max is 20.
-The maximum value that can be written to threads-max is given by the
-constant FUTEX_TID_MASK (0x3fffffff).
-If a value outside of this range is written to threads-max an error
-EINVAL occurs.
-
-The value written is checked against the available RAM pages. If the
-thread structures would occupy too much (more than 1/8th) of the
-available RAM pages threads-max is reduced accordingly.
-
-==============================================================
-
-unknown_nmi_panic:
-
-The value in this file affects behavior of handling NMI. When the
-value is non-zero, unknown NMI is trapped and then panic occurs. At
-that time, kernel debugging information is displayed on console.
-
-NMI switch that most IA32 servers have fires unknown NMI up, for
-example.  If a system hangs up, try pressing the NMI switch.
-
-==============================================================
-
-watchdog:
-
-This parameter can be used to disable or enable the soft lockup detector
-_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
-
-   0 - disable both lockup detectors
-   1 - enable both lockup detectors
-
-The soft lockup detector and the NMI watchdog can also be disabled or
-enabled individually, using the soft_watchdog and nmi_watchdog parameters.
-If the watchdog parameter is read, for example by executing
-
-   cat /proc/sys/kernel/watchdog
-
-the output of this command (0 or 1) shows the logical OR of soft_watchdog
-and nmi_watchdog.
-
-==============================================================
-
-watchdog_cpumask:
-
-This value can be used to control on which cpus the watchdog may run.
-The default cpumask is all possible cores, but if NO_HZ_FULL is
-enabled in the kernel config, and cores are specified with the
-nohz_full= boot argument, those cores are excluded by default.
-Offline cores can be included in this mask, and if the core is later
-brought online, the watchdog will be started based on the mask value.
-
-Typically this value would only be touched in the nohz_full case
-to re-enable cores that by default were not running the watchdog,
-if a kernel lockup was suspected on those cores.
-
-The argument value is the standard cpulist format for cpumasks,
-so for example to enable the watchdog on cores 0, 2, 3, and 4 you
-might say:
-
-  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
-
-==============================================================
-
-watchdog_thresh:
-
-This value can be used to control the frequency of hrtimer and NMI
-events and the soft and hard lockup thresholds. The default threshold
-is 10 seconds.
-
-The softlockup threshold is (2 * watchdog_thresh). Setting this
-tunable to zero will disable lockup detection altogether.
-
-==============================================================
diff --git a/Documentation/sysctl/net.rst b/Documentation/sysctl/net.rst
new file mode 100644
index 000000000000..a7d44e71019d
--- /dev/null
+++ b/Documentation/sysctl/net.rst
@@ -0,0 +1,461 @@
+================================
+Documentation for /proc/sys/net/
+================================
+
+Copyright
+
+Copyright (c) 1999
+
+	- Terrehon Bowden <terrehon@pacbell.net>
+	- Bodo Bauer <bb@ricochet.net>
+
+Copyright (c) 2000
+
+	- Jorge Nerin <comandante@zaralinux.com>
+
+Copyright (c) 2009
+
+	- Shen Feng <shen@cn.fujitsu.com>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/net
+
+The interface  to  the  networking  parts  of  the  kernel  is  located  in
+/proc/sys/net. The following table shows all possible subdirectories.  You may
+see only some of them, depending on your kernel's configuration.
+
+
+Table : Subdirectories in /proc/sys/net
+
+ ========= =================== = ========== ==================
+ Directory Content               Directory  Content
+ ========= =================== = ========== ==================
+ core      General parameter     appletalk  Appletalk protocol
+ unix      Unix domain sockets   netrom     NET/ROM
+ 802       E802 protocol         ax25       AX25
+ ethernet  Ethernet protocol     rose       X.25 PLP layer
+ ipv4      IP version 4          x25        X.25 protocol
+ ipx       IPX                   token-ring IBM token ring
+ bridge    Bridging              decnet     DEC net
+ ipv6      IP version 6          tipc       TIPC
+ ========= =================== = ========== ==================
+
+1. /proc/sys/net/core - Network core options
+============================================
+
+bpf_jit_enable
+--------------
+
+This enables the BPF Just in Time (JIT) compiler. BPF is a flexible
+and efficient infrastructure allowing to execute bytecode at various
+hook points. It is used in a number of Linux kernel subsystems such
+as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints)
+and security (e.g. seccomp). LLVM has a BPF back end that can compile
+restricted C into a sequence of BPF instructions. After program load
+through bpf(2) and passing a verifier in the kernel, a JIT will then
+translate these BPF proglets into native CPU instructions. There are
+two flavors of JITs, the newer eBPF JIT currently supported on:
+
+  - x86_64
+  - x86_32
+  - arm64
+  - arm32
+  - ppc64
+  - sparc64
+  - mips64
+  - s390x
+  - riscv
+
+And the older cBPF JIT supported on the following archs:
+
+  - mips
+  - ppc
+  - sparc
+
+eBPF JITs are a superset of cBPF JITs, meaning the kernel will
+migrate cBPF instructions into eBPF instructions and then JIT
+compile them transparently. Older cBPF JITs can only translate
+tcpdump filters, seccomp rules, etc, but not mentioned eBPF
+programs loaded through bpf(2).
+
+Values:
+
+	- 0 - disable the JIT (default value)
+	- 1 - enable the JIT
+	- 2 - enable the JIT and ask the compiler to emit traces on kernel log.
+
+bpf_jit_harden
+--------------
+
+This enables hardening for the BPF JIT compiler. Supported are eBPF
+JIT backends. Enabling hardening trades off performance, but can
+mitigate JIT spraying.
+
+Values:
+
+	- 0 - disable JIT hardening (default value)
+	- 1 - enable JIT hardening for unprivileged users only
+	- 2 - enable JIT hardening for all users
+
+bpf_jit_kallsyms
+----------------
+
+When BPF JIT compiler is enabled, then compiled images are unknown
+addresses to the kernel, meaning they neither show up in traces nor
+in /proc/kallsyms. This enables export of these addresses, which can
+be used for debugging/tracing. If bpf_jit_harden is enabled, this
+feature is disabled.
+
+Values :
+
+	- 0 - disable JIT kallsyms export (default value)
+	- 1 - enable JIT kallsyms export for privileged users only
+
+bpf_jit_limit
+-------------
+
+This enforces a global limit for memory allocations to the BPF JIT
+compiler in order to reject unprivileged JIT requests once it has
+been surpassed. bpf_jit_limit contains the value of the global limit
+in bytes.
+
+dev_weight
+----------
+
+The maximum number of packets that kernel can handle on a NAPI interrupt,
+it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
+aggregated packet is counted as one packet in this context.
+
+Default: 64
+
+dev_weight_rx_bias
+------------------
+
+RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
+of the driver for the per softirq cycle netdev_budget. This parameter influences
+the proportion of the configured netdev_budget that is spent on RPS based packet
+processing during RX softirq cycles. It is further meant for making current
+dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
+(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
+on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
+
+Default: 1
+
+dev_weight_tx_bias
+------------------
+
+Scales the maximum number of packets that can be processed during a TX softirq cycle.
+Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
+net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
+
+Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
+
+Default: 1
+
+default_qdisc
+-------------
+
+The default queuing discipline to use for network devices. This allows
+overriding the default of pfifo_fast with an alternative. Since the default
+queuing discipline is created without additional parameters so is best suited
+to queuing disciplines that work well without configuration like stochastic
+fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
+queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
+which require setting up classes and bandwidths. Note that physical multiqueue
+interfaces still use mq as root qdisc, which in turn uses this default for its
+leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
+default to noqueue.
+
+Default: pfifo_fast
+
+busy_read
+---------
+
+Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
+Approximate time in us to busy loop waiting for packets on the device queue.
+This sets the default value of the SO_BUSY_POLL socket option.
+Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
+which is the preferred method of enabling. If you need to enable the feature
+globally via sysctl, a value of 50 is recommended.
+
+Will increase power usage.
+
+Default: 0 (off)
+
+busy_poll
+----------------
+Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
+Approximate time in us to busy loop waiting for events.
+Recommended value depends on the number of sockets you poll on.
+For several sockets 50, for several hundreds 100.
+For more than that you probably want to use epoll.
+Note that only sockets with SO_BUSY_POLL set will be busy polled,
+so you want to either selectively set SO_BUSY_POLL on those sockets or set
+sysctl.net.busy_read globally.
+
+Will increase power usage.
+
+Default: 0 (off)
+
+rmem_default
+------------
+
+The default setting of the socket receive buffer in bytes.
+
+rmem_max
+--------
+
+The maximum receive socket buffer size in bytes.
+
+tstamp_allow_data
+-----------------
+Allow processes to receive tx timestamps looped together with the original
+packet contents. If disabled, transmit timestamp requests from unprivileged
+processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
+
+Default: 1 (on)
+
+
+wmem_default
+------------
+
+The default setting (in bytes) of the socket send buffer.
+
+wmem_max
+--------
+
+The maximum send socket buffer size in bytes.
+
+message_burst and message_cost
+------------------------------
+
+These parameters  are used to limit the warning messages written to the kernel
+log from  the  networking  code.  They  enforce  a  rate  limit  to  make  a
+denial-of-service attack  impossible. A higher message_cost factor, results in
+fewer messages that will be written. Message_burst controls when messages will
+be dropped.  The  default  settings  limit  warning messages to one every five
+seconds.
+
+warnings
+--------
+
+This sysctl is now unused.
+
+This was used to control console messages from the networking stack that
+occur because of problems on the network like duplicate address or bad
+checksums.
+
+These messages are now emitted at KERN_DEBUG and can generally be enabled
+and controlled by the dynamic_debug facility.
+
+netdev_budget
+-------------
+
+Maximum number of packets taken from all interfaces in one polling cycle (NAPI
+poll). In one polling cycle interfaces which are registered to polling are
+probed in a round-robin manner. Also, a polling cycle may not exceed
+netdev_budget_usecs microseconds, even if netdev_budget has not been
+exhausted.
+
+netdev_budget_usecs
+---------------------
+
+Maximum number of microseconds in one NAPI polling cycle. Polling
+will exit when either netdev_budget_usecs have elapsed during the
+poll cycle or the number of packets processed reaches netdev_budget.
+
+netdev_max_backlog
+------------------
+
+Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
+receives packets faster than kernel can process them.
+
+netdev_rss_key
+--------------
+
+RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
+randomly generated.
+Some user space might need to gather its content even if drivers do not
+provide ethtool -x support yet.
+
+::
+
+  myhost:~# cat /proc/sys/net/core/netdev_rss_key
+  84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
+
+File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
+
+Note:
+  /proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
+  but most drivers only use 40 bytes of it.
+
+::
+
+  myhost:~# ethtool -x eth0
+  RX flow hash indirection table for eth0 with 8 RX ring(s):
+      0:    0     1     2     3     4     5     6     7
+  RSS hash key:
+  84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
+
+netdev_tstamp_prequeue
+----------------------
+
+If set to 0, RX packet timestamps can be sampled after RPS processing, when
+the target CPU processes packets. It might give some delay on timestamps, but
+permit to distribute the load on several cpus.
+
+If set to 1 (default), timestamps are sampled as soon as possible, before
+queueing.
+
+optmem_max
+----------
+
+Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
+of struct cmsghdr structures with appended data.
+
+fb_tunnels_only_for_init_net
+----------------------------
+
+Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
+sit0, ip6tnl0, ip6gre0) are automatically created when a new
+network namespace is created, if corresponding tunnel is present
+in initial network namespace.
+If set to 1, these devices are not automatically created, and
+user space is responsible for creating them if needed.
+
+Default : 0  (for compatibility reasons)
+
+devconf_inherit_init_net
+------------------------
+
+Controls if a new network namespace should inherit all current
+settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
+default, we keep the current behavior: for IPv4 we inherit all current
+settings from init_net and for IPv6 we reset all settings to default.
+
+If set to 1, both IPv4 and IPv6 settings are forced to inherit from
+current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
+forced to reset to their default values.
+
+Default : 0  (for compatibility reasons)
+
+2. /proc/sys/net/unix - Parameters for Unix domain sockets
+----------------------------------------------------------
+
+There is only one file in this directory.
+unix_dgram_qlen limits the max number of datagrams queued in Unix domain
+socket's buffer. It will not take effect unless PF_UNIX flag is specified.
+
+
+3. /proc/sys/net/ipv4 - IPV4 settings
+-------------------------------------
+Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
+descriptions of these entries.
+
+
+4. Appletalk
+------------
+
+The /proc/sys/net/appletalk  directory  holds the Appletalk configuration data
+when Appletalk is loaded. The configurable parameters are:
+
+aarp-expiry-time
+----------------
+
+The amount  of  time  we keep an ARP entry before expiring it. Used to age out
+old hosts.
+
+aarp-resolve-time
+-----------------
+
+The amount of time we will spend trying to resolve an Appletalk address.
+
+aarp-retransmit-limit
+---------------------
+
+The number of times we will retransmit a query before giving up.
+
+aarp-tick-time
+--------------
+
+Controls the rate at which expires are checked.
+
+The directory  /proc/net/appletalk  holds the list of active Appletalk sockets
+on a machine.
+
+The fields  indicate  the DDP type, the local address (in network:node format)
+the remote  address,  the  size of the transmit pending queue, the size of the
+received queue  (bytes waiting for applications to read) the state and the uid
+owning the socket.
+
+/proc/net/atalk_iface lists  all  the  interfaces  configured for appletalk.It
+shows the  name  of the interface, its Appletalk address, the network range on
+that address  (or  network number for phase 1 networks), and the status of the
+interface.
+
+/proc/net/atalk_route lists  each  known  network  route.  It lists the target
+(network) that the route leads to, the router (may be directly connected), the
+route flags, and the device the route is using.
+
+
+5. IPX
+------
+
+The IPX protocol has no tunable values in proc/sys/net.
+
+The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX
+socket giving  the  local  and  remote  addresses  in  Novell  format (that is
+network:node:port). In  accordance  with  the  strange  Novell  tradition,
+everything but the port is in hex. Not_Connected is displayed for sockets that
+are not  tied to a specific remote address. The Tx and Rx queue sizes indicate
+the number  of  bytes  pending  for  transmission  and  reception.  The  state
+indicates the  state  the  socket  is  in and the uid is the owning uid of the
+socket.
+
+The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface
+it gives  the network number, the node number, and indicates if the network is
+the primary  network.  It  also  indicates  which  device  it  is bound to (or
+Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux
+supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for
+IPX.
+
+The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it
+gives the  destination  network, the router node (or Directly) and the network
+address of the router (or Connected) for internal networks.
+
+6. TIPC
+-------
+
+tipc_rmem
+---------
+
+The TIPC protocol now has a tunable for the receive memory, similar to the
+tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
+
+::
+
+    # cat /proc/sys/net/tipc/tipc_rmem
+    4252725 34021800        68043600
+    #
+
+The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
+are scaled (shifted) versions of that same value.  Note that the min value
+is not at this point in time used in any meaningful way, but the triplet is
+preserved in order to be consistent with things like tcp_rmem.
+
+named_timeout
+-------------
+
+TIPC name table updates are distributed asynchronously in a cluster, without
+any form of transaction handling. This means that different race scenarios are
+possible. One such is that a name withdrawal sent out by one node and received
+by another node may arrive after a second, overlapping name publication already
+has been accepted from a third node, although the conflicting updates
+originally may have been issued in the correct sequential order.
+If named_timeout is nonzero, failed topology updates will be placed on a defer
+queue until another event arrives that clears the error, or until the timeout
+expires. Value is in milliseconds.
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
deleted file mode 100644
index 2ae91d3873bb..000000000000
--- a/Documentation/sysctl/net.txt
+++ /dev/null
@@ -1,422 +0,0 @@
-Documentation for /proc/sys/net/*
-	(c) 1999		Terrehon Bowden <terrehon@pacbell.net>
-				Bodo Bauer <bb@ricochet.net>
-	(c) 2000		Jorge Nerin <comandante@zaralinux.com>
-	(c) 2009		Shen Feng <shen@cn.fujitsu.com>
-
-For general info and legal blurb, please look in README.
-
-==============================================================
-
-This file contains the documentation for the sysctl files in
-/proc/sys/net
-
-The interface  to  the  networking  parts  of  the  kernel  is  located  in
-/proc/sys/net. The following table shows all possible subdirectories.  You may
-see only some of them, depending on your kernel's configuration.
-
-
-Table : Subdirectories in /proc/sys/net
-..............................................................................
- Directory Content             Directory  Content
- core      General parameter   appletalk  Appletalk protocol
- unix      Unix domain sockets netrom     NET/ROM
- 802       E802 protocol       ax25       AX25
- ethernet  Ethernet protocol   rose       X.25 PLP layer
- ipv4      IP version 4        x25        X.25 protocol
- ipx       IPX                 token-ring IBM token ring
- bridge    Bridging            decnet     DEC net
- ipv6      IP version 6        tipc       TIPC
-..............................................................................
-
-1. /proc/sys/net/core - Network core options
--------------------------------------------------------
-
-bpf_jit_enable
---------------
-
-This enables the BPF Just in Time (JIT) compiler. BPF is a flexible
-and efficient infrastructure allowing to execute bytecode at various
-hook points. It is used in a number of Linux kernel subsystems such
-as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints)
-and security (e.g. seccomp). LLVM has a BPF back end that can compile
-restricted C into a sequence of BPF instructions. After program load
-through bpf(2) and passing a verifier in the kernel, a JIT will then
-translate these BPF proglets into native CPU instructions. There are
-two flavors of JITs, the newer eBPF JIT currently supported on:
-  - x86_64
-  - x86_32
-  - arm64
-  - arm32
-  - ppc64
-  - sparc64
-  - mips64
-  - s390x
-  - riscv
-
-And the older cBPF JIT supported on the following archs:
-  - mips
-  - ppc
-  - sparc
-
-eBPF JITs are a superset of cBPF JITs, meaning the kernel will
-migrate cBPF instructions into eBPF instructions and then JIT
-compile them transparently. Older cBPF JITs can only translate
-tcpdump filters, seccomp rules, etc, but not mentioned eBPF
-programs loaded through bpf(2).
-
-Values :
-	0 - disable the JIT (default value)
-	1 - enable the JIT
-	2 - enable the JIT and ask the compiler to emit traces on kernel log.
-
-bpf_jit_harden
---------------
-
-This enables hardening for the BPF JIT compiler. Supported are eBPF
-JIT backends. Enabling hardening trades off performance, but can
-mitigate JIT spraying.
-Values :
-	0 - disable JIT hardening (default value)
-	1 - enable JIT hardening for unprivileged users only
-	2 - enable JIT hardening for all users
-
-bpf_jit_kallsyms
-----------------
-
-When BPF JIT compiler is enabled, then compiled images are unknown
-addresses to the kernel, meaning they neither show up in traces nor
-in /proc/kallsyms. This enables export of these addresses, which can
-be used for debugging/tracing. If bpf_jit_harden is enabled, this
-feature is disabled.
-Values :
-	0 - disable JIT kallsyms export (default value)
-	1 - enable JIT kallsyms export for privileged users only
-
-bpf_jit_limit
--------------
-
-This enforces a global limit for memory allocations to the BPF JIT
-compiler in order to reject unprivileged JIT requests once it has
-been surpassed. bpf_jit_limit contains the value of the global limit
-in bytes.
-
-dev_weight
---------------
-
-The maximum number of packets that kernel can handle on a NAPI interrupt,
-it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
-aggregated packet is counted as one packet in this context.
-
-Default: 64
-
-dev_weight_rx_bias
---------------
-
-RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
-of the driver for the per softirq cycle netdev_budget. This parameter influences
-the proportion of the configured netdev_budget that is spent on RPS based packet
-processing during RX softirq cycles. It is further meant for making current
-dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
-(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
-on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
-Default: 1
-
-dev_weight_tx_bias
---------------
-
-Scales the maximum number of packets that can be processed during a TX softirq cycle.
-Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
-net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
-Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
-Default: 1
-
-default_qdisc
---------------
-
-The default queuing discipline to use for network devices. This allows
-overriding the default of pfifo_fast with an alternative. Since the default
-queuing discipline is created without additional parameters so is best suited
-to queuing disciplines that work well without configuration like stochastic
-fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
-queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
-which require setting up classes and bandwidths. Note that physical multiqueue
-interfaces still use mq as root qdisc, which in turn uses this default for its
-leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
-default to noqueue.
-Default: pfifo_fast
-
-busy_read
-----------------
-Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
-Approximate time in us to busy loop waiting for packets on the device queue.
-This sets the default value of the SO_BUSY_POLL socket option.
-Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
-which is the preferred method of enabling. If you need to enable the feature
-globally via sysctl, a value of 50 is recommended.
-Will increase power usage.
-Default: 0 (off)
-
-busy_poll
-----------------
-Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
-Approximate time in us to busy loop waiting for events.
-Recommended value depends on the number of sockets you poll on.
-For several sockets 50, for several hundreds 100.
-For more than that you probably want to use epoll.
-Note that only sockets with SO_BUSY_POLL set will be busy polled,
-so you want to either selectively set SO_BUSY_POLL on those sockets or set
-sysctl.net.busy_read globally.
-Will increase power usage.
-Default: 0 (off)
-
-rmem_default
-------------
-
-The default setting of the socket receive buffer in bytes.
-
-rmem_max
---------
-
-The maximum receive socket buffer size in bytes.
-
-tstamp_allow_data
------------------
-Allow processes to receive tx timestamps looped together with the original
-packet contents. If disabled, transmit timestamp requests from unprivileged
-processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
-Default: 1 (on)
-
-
-wmem_default
-------------
-
-The default setting (in bytes) of the socket send buffer.
-
-wmem_max
---------
-
-The maximum send socket buffer size in bytes.
-
-message_burst and message_cost
-------------------------------
-
-These parameters  are used to limit the warning messages written to the kernel
-log from  the  networking  code.  They  enforce  a  rate  limit  to  make  a
-denial-of-service attack  impossible. A higher message_cost factor, results in
-fewer messages that will be written. Message_burst controls when messages will
-be dropped.  The  default  settings  limit  warning messages to one every five
-seconds.
-
-warnings
---------
-
-This sysctl is now unused.
-
-This was used to control console messages from the networking stack that
-occur because of problems on the network like duplicate address or bad
-checksums.
-
-These messages are now emitted at KERN_DEBUG and can generally be enabled
-and controlled by the dynamic_debug facility.
-
-netdev_budget
--------------
-
-Maximum number of packets taken from all interfaces in one polling cycle (NAPI
-poll). In one polling cycle interfaces which are registered to polling are
-probed in a round-robin manner. Also, a polling cycle may not exceed
-netdev_budget_usecs microseconds, even if netdev_budget has not been
-exhausted.
-
-netdev_budget_usecs
----------------------
-
-Maximum number of microseconds in one NAPI polling cycle. Polling
-will exit when either netdev_budget_usecs have elapsed during the
-poll cycle or the number of packets processed reaches netdev_budget.
-
-netdev_max_backlog
-------------------
-
-Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
-receives packets faster than kernel can process them.
-
-netdev_rss_key
---------------
-
-RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
-randomly generated.
-Some user space might need to gather its content even if drivers do not
-provide ethtool -x support yet.
-
-myhost:~# cat /proc/sys/net/core/netdev_rss_key
-84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
-
-File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
-Note:
-/proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
-but most drivers only use 40 bytes of it.
-
-myhost:~# ethtool -x eth0
-RX flow hash indirection table for eth0 with 8 RX ring(s):
-    0:    0     1     2     3     4     5     6     7
-RSS hash key:
-84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
-
-netdev_tstamp_prequeue
-----------------------
-
-If set to 0, RX packet timestamps can be sampled after RPS processing, when
-the target CPU processes packets. It might give some delay on timestamps, but
-permit to distribute the load on several cpus.
-
-If set to 1 (default), timestamps are sampled as soon as possible, before
-queueing.
-
-optmem_max
-----------
-
-Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
-of struct cmsghdr structures with appended data.
-
-fb_tunnels_only_for_init_net
-----------------------------
-
-Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
-sit0, ip6tnl0, ip6gre0) are automatically created when a new
-network namespace is created, if corresponding tunnel is present
-in initial network namespace.
-If set to 1, these devices are not automatically created, and
-user space is responsible for creating them if needed.
-
-Default : 0  (for compatibility reasons)
-
-devconf_inherit_init_net
-----------------------------
-
-Controls if a new network namespace should inherit all current
-settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
-default, we keep the current behavior: for IPv4 we inherit all current
-settings from init_net and for IPv6 we reset all settings to default.
-
-If set to 1, both IPv4 and IPv6 settings are forced to inherit from
-current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
-forced to reset to their default values.
-
-Default : 0  (for compatibility reasons)
-
-2. /proc/sys/net/unix - Parameters for Unix domain sockets
--------------------------------------------------------
-
-There is only one file in this directory.
-unix_dgram_qlen limits the max number of datagrams queued in Unix domain
-socket's buffer. It will not take effect unless PF_UNIX flag is specified.
-
-
-3. /proc/sys/net/ipv4 - IPV4 settings
--------------------------------------------------------
-Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
-descriptions of these entries.
-
-
-4. Appletalk
--------------------------------------------------------
-
-The /proc/sys/net/appletalk  directory  holds the Appletalk configuration data
-when Appletalk is loaded. The configurable parameters are:
-
-aarp-expiry-time
-----------------
-
-The amount  of  time  we keep an ARP entry before expiring it. Used to age out
-old hosts.
-
-aarp-resolve-time
------------------
-
-The amount of time we will spend trying to resolve an Appletalk address.
-
-aarp-retransmit-limit
----------------------
-
-The number of times we will retransmit a query before giving up.
-
-aarp-tick-time
---------------
-
-Controls the rate at which expires are checked.
-
-The directory  /proc/net/appletalk  holds the list of active Appletalk sockets
-on a machine.
-
-The fields  indicate  the DDP type, the local address (in network:node format)
-the remote  address,  the  size of the transmit pending queue, the size of the
-received queue  (bytes waiting for applications to read) the state and the uid
-owning the socket.
-
-/proc/net/atalk_iface lists  all  the  interfaces  configured for appletalk.It
-shows the  name  of the interface, its Appletalk address, the network range on
-that address  (or  network number for phase 1 networks), and the status of the
-interface.
-
-/proc/net/atalk_route lists  each  known  network  route.  It lists the target
-(network) that the route leads to, the router (may be directly connected), the
-route flags, and the device the route is using.
-
-
-5. IPX
--------------------------------------------------------
-
-The IPX protocol has no tunable values in proc/sys/net.
-
-The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX
-socket giving  the  local  and  remote  addresses  in  Novell  format (that is
-network:node:port). In  accordance  with  the  strange  Novell  tradition,
-everything but the port is in hex. Not_Connected is displayed for sockets that
-are not  tied to a specific remote address. The Tx and Rx queue sizes indicate
-the number  of  bytes  pending  for  transmission  and  reception.  The  state
-indicates the  state  the  socket  is  in and the uid is the owning uid of the
-socket.
-
-The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface
-it gives  the network number, the node number, and indicates if the network is
-the primary  network.  It  also  indicates  which  device  it  is bound to (or
-Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux
-supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for
-IPX.
-
-The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it
-gives the  destination  network, the router node (or Directly) and the network
-address of the router (or Connected) for internal networks.
-
-6. TIPC
--------------------------------------------------------
-
-tipc_rmem
-----------
-
-The TIPC protocol now has a tunable for the receive memory, similar to the
-tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
-
-    # cat /proc/sys/net/tipc/tipc_rmem
-    4252725 34021800        68043600
-    #
-
-The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
-are scaled (shifted) versions of that same value.  Note that the min value
-is not at this point in time used in any meaningful way, but the triplet is
-preserved in order to be consistent with things like tcp_rmem.
-
-named_timeout
---------------
-
-TIPC name table updates are distributed asynchronously in a cluster, without
-any form of transaction handling. This means that different race scenarios are
-possible. One such is that a name withdrawal sent out by one node and received
-by another node may arrive after a second, overlapping name publication already
-has been accepted from a third node, although the conflicting updates
-originally may have been issued in the correct sequential order.
-If named_timeout is nonzero, failed topology updates will be placed on a defer
-queue until another event arrives that clears the error, or until the timeout
-expires. Value is in milliseconds.
diff --git a/Documentation/sysctl/sunrpc.rst b/Documentation/sysctl/sunrpc.rst
new file mode 100644
index 000000000000..09780a682afd
--- /dev/null
+++ b/Documentation/sysctl/sunrpc.rst
@@ -0,0 +1,25 @@
+===================================
+Documentation for /proc/sys/sunrpc/
+===================================
+
+kernel version 2.2.10
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/sunrpc and is valid for Linux kernel version 2.2.
+
+The files in this directory can be used to (re)set the debug
+flags of the SUN Remote Procedure Call (RPC) subsystem in
+the Linux kernel. This stuff is used for NFS, KNFSD and
+maybe a few other things as well.
+
+The files in there are used to control the debugging flags:
+rpc_debug, nfs_debug, nfsd_debug and nlm_debug.
+
+These flags are for kernel hackers only. You should read the
+source code in net/sunrpc/ for more information.
diff --git a/Documentation/sysctl/sunrpc.txt b/Documentation/sysctl/sunrpc.txt
deleted file mode 100644
index ae1ecac6f85a..000000000000
--- a/Documentation/sysctl/sunrpc.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-Documentation for /proc/sys/sunrpc/*	kernel version 2.2.10
-	(c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-For general info and legal blurb, please look in README.
-
-==============================================================
-
-This file contains the documentation for the sysctl files in
-/proc/sys/sunrpc and is valid for Linux kernel version 2.2.
-
-The files in this directory can be used to (re)set the debug
-flags of the SUN Remote Procedure Call (RPC) subsystem in
-the Linux kernel. This stuff is used for NFS, KNFSD and
-maybe a few other things as well.
-
-The files in there are used to control the debugging flags:
-rpc_debug, nfs_debug, nfsd_debug and nlm_debug.
-
-These flags are for kernel hackers only. You should read the
-source code in net/sunrpc/ for more information.
diff --git a/Documentation/sysctl/user.rst b/Documentation/sysctl/user.rst
new file mode 100644
index 000000000000..650eaa03f15e
--- /dev/null
+++ b/Documentation/sysctl/user.rst
@@ -0,0 +1,78 @@
+=================================
+Documentation for /proc/sys/user/
+=================================
+
+kernel version 4.9.0
+
+Copyright (c) 2016		Eric Biederman <ebiederm@xmission.com>
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/user.
+
+The files in this directory can be used to override the default
+limits on the number of namespaces and other objects that have
+per user per user namespace limits.
+
+The primary purpose of these limits is to stop programs that
+malfunction and attempt to create a ridiculous number of objects,
+before the malfunction becomes a system wide problem.  It is the
+intention that the defaults of these limits are set high enough that
+no program in normal operation should run into these limits.
+
+The creation of per user per user namespace objects are charged to
+the user in the user namespace who created the object and
+verified to be below the per user limit in that user namespace.
+
+The creation of objects is also charged to all of the users
+who created user namespaces the creation of the object happens
+in (user namespaces can be nested) and verified to be below the per user
+limits in the user namespaces of those users.
+
+This recursive counting of created objects ensures that creating a
+user namespace does not allow a user to escape their current limits.
+
+Currently, these files are in /proc/sys/user:
+
+max_cgroup_namespaces
+=====================
+
+  The maximum number of cgroup namespaces that any user in the current
+  user namespace may create.
+
+max_ipc_namespaces
+==================
+
+  The maximum number of ipc namespaces that any user in the current
+  user namespace may create.
+
+max_mnt_namespaces
+==================
+
+  The maximum number of mount namespaces that any user in the current
+  user namespace may create.
+
+max_net_namespaces
+==================
+
+  The maximum number of network namespaces that any user in the
+  current user namespace may create.
+
+max_pid_namespaces
+==================
+
+  The maximum number of pid namespaces that any user in the current
+  user namespace may create.
+
+max_user_namespaces
+===================
+
+  The maximum number of user namespaces that any user in the current
+  user namespace may create.
+
+max_uts_namespaces
+==================
+
+  The maximum number of user namespaces that any user in the current
+  user namespace may create.
diff --git a/Documentation/sysctl/user.txt b/Documentation/sysctl/user.txt
deleted file mode 100644
index a5882865836e..000000000000
--- a/Documentation/sysctl/user.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-Documentation for /proc/sys/user/*	kernel version 4.9.0
-	(c) 2016		Eric Biederman <ebiederm@xmission.com>
-
-==============================================================
-
-This file contains the documentation for the sysctl files in
-/proc/sys/user.
-
-The files in this directory can be used to override the default
-limits on the number of namespaces and other objects that have
-per user per user namespace limits.
-
-The primary purpose of these limits is to stop programs that
-malfunction and attempt to create a ridiculous number of objects,
-before the malfunction becomes a system wide problem.  It is the
-intention that the defaults of these limits are set high enough that
-no program in normal operation should run into these limits.
-
-The creation of per user per user namespace objects are charged to
-the user in the user namespace who created the object and
-verified to be below the per user limit in that user namespace.
-
-The creation of objects is also charged to all of the users
-who created user namespaces the creation of the object happens
-in (user namespaces can be nested) and verified to be below the per user
-limits in the user namespaces of those users.
-
-This recursive counting of created objects ensures that creating a
-user namespace does not allow a user to escape their current limits.
-
-Currently, these files are in /proc/sys/user:
-
-- max_cgroup_namespaces
-
-  The maximum number of cgroup namespaces that any user in the current
-  user namespace may create.
-
-- max_ipc_namespaces
-
-  The maximum number of ipc namespaces that any user in the current
-  user namespace may create.
-
-- max_mnt_namespaces
-
-  The maximum number of mount namespaces that any user in the current
-  user namespace may create.
-
-- max_net_namespaces
-
-  The maximum number of network namespaces that any user in the
-  current user namespace may create.
-
-- max_pid_namespaces
-
-  The maximum number of pid namespaces that any user in the current
-  user namespace may create.
-
-- max_user_namespaces
-
-  The maximum number of user namespaces that any user in the current
-  user namespace may create.
-
-- max_uts_namespaces
-
-  The maximum number of user namespaces that any user in the current
-  user namespace may create.
diff --git a/Documentation/sysctl/vm.rst b/Documentation/sysctl/vm.rst
new file mode 100644
index 000000000000..5aceb5cd5ce7
--- /dev/null
+++ b/Documentation/sysctl/vm.rst
@@ -0,0 +1,964 @@
+===============================
+Documentation for /proc/sys/vm/
+===============================
+
+kernel version 2.6.29
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+Copyright (c) 2008         Peter W. Morreale <pmorreale@novell.com>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/vm and is valid for Linux kernel version 2.6.29.
+
+The files in this directory can be used to tune the operation
+of the virtual memory (VM) subsystem of the Linux kernel and
+the writeout of dirty data to disk.
+
+Default values and initialization routines for most of these
+files can be found in mm/swap.c.
+
+Currently, these files are in /proc/sys/vm:
+
+- admin_reserve_kbytes
+- block_dump
+- compact_memory
+- compact_unevictable_allowed
+- dirty_background_bytes
+- dirty_background_ratio
+- dirty_bytes
+- dirty_expire_centisecs
+- dirty_ratio
+- dirtytime_expire_seconds
+- dirty_writeback_centisecs
+- drop_caches
+- extfrag_threshold
+- hugetlb_shm_group
+- laptop_mode
+- legacy_va_layout
+- lowmem_reserve_ratio
+- max_map_count
+- memory_failure_early_kill
+- memory_failure_recovery
+- min_free_kbytes
+- min_slab_ratio
+- min_unmapped_ratio
+- mmap_min_addr
+- mmap_rnd_bits
+- mmap_rnd_compat_bits
+- nr_hugepages
+- nr_hugepages_mempolicy
+- nr_overcommit_hugepages
+- nr_trim_pages         (only if CONFIG_MMU=n)
+- numa_zonelist_order
+- oom_dump_tasks
+- oom_kill_allocating_task
+- overcommit_kbytes
+- overcommit_memory
+- overcommit_ratio
+- page-cluster
+- panic_on_oom
+- percpu_pagelist_fraction
+- stat_interval
+- stat_refresh
+- numa_stat
+- swappiness
+- unprivileged_userfaultfd
+- user_reserve_kbytes
+- vfs_cache_pressure
+- watermark_boost_factor
+- watermark_scale_factor
+- zone_reclaim_mode
+
+
+admin_reserve_kbytes
+====================
+
+The amount of free memory in the system that should be reserved for users
+with the capability cap_sys_admin.
+
+admin_reserve_kbytes defaults to min(3% of free pages, 8MB)
+
+That should provide enough for the admin to log in and kill a process,
+if necessary, under the default overcommit 'guess' mode.
+
+Systems running under overcommit 'never' should increase this to account
+for the full Virtual Memory Size of programs used to recover. Otherwise,
+root may not be able to log in to recover the system.
+
+How do you calculate a minimum useful reserve?
+
+sshd or login + bash (or some other shell) + top (or ps, kill, etc.)
+
+For overcommit 'guess', we can sum resident set sizes (RSS).
+On x86_64 this is about 8MB.
+
+For overcommit 'never', we can take the max of their virtual sizes (VSZ)
+and add the sum of their RSS.
+On x86_64 this is about 128MB.
+
+Changing this takes effect whenever an application requests memory.
+
+
+block_dump
+==========
+
+block_dump enables block I/O debugging when set to a nonzero value. More
+information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
+
+
+compact_memory
+==============
+
+Available only when CONFIG_COMPACTION is set. When 1 is written to the file,
+all zones are compacted such that free memory is available in contiguous
+blocks where possible. This can be important for example in the allocation of
+huge pages although processes will also directly compact memory as required.
+
+
+compact_unevictable_allowed
+===========================
+
+Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
+allowed to examine the unevictable lru (mlocked pages) for pages to compact.
+This should be used on systems where stalls for minor page faults are an
+acceptable trade for large contiguous free memory.  Set to 0 to prevent
+compaction from moving pages that are unevictable.  Default value is 1.
+
+
+dirty_background_bytes
+======================
+
+Contains the amount of dirty memory at which the background kernel
+flusher threads will start writeback.
+
+Note:
+  dirty_background_bytes is the counterpart of dirty_background_ratio. Only
+  one of them may be specified at a time. When one sysctl is written it is
+  immediately taken into account to evaluate the dirty memory limits and the
+  other appears as 0 when read.
+
+
+dirty_background_ratio
+======================
+
+Contains, as a percentage of total available memory that contains free pages
+and reclaimable pages, the number of pages at which the background kernel
+flusher threads will start writing out dirty data.
+
+The total available memory is not equal to total system memory.
+
+
+dirty_bytes
+===========
+
+Contains the amount of dirty memory at which a process generating disk writes
+will itself start writeback.
+
+Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be
+specified at a time. When one sysctl is written it is immediately taken into
+account to evaluate the dirty memory limits and the other appears as 0 when
+read.
+
+Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
+value lower than this limit will be ignored and the old configuration will be
+retained.
+
+
+dirty_expire_centisecs
+======================
+
+This tunable is used to define when dirty data is old enough to be eligible
+for writeout by the kernel flusher threads.  It is expressed in 100'ths
+of a second.  Data which has been dirty in-memory for longer than this
+interval will be written out next time a flusher thread wakes up.
+
+
+dirty_ratio
+===========
+
+Contains, as a percentage of total available memory that contains free pages
+and reclaimable pages, the number of pages at which a process which is
+generating disk writes will itself start writing out dirty data.
+
+The total available memory is not equal to total system memory.
+
+
+dirtytime_expire_seconds
+========================
+
+When a lazytime inode is constantly having its pages dirtied, the inode with
+an updated timestamp will never get chance to be written out.  And, if the
+only thing that has happened on the file system is a dirtytime inode caused
+by an atime update, a worker will be scheduled to make sure that inode
+eventually gets pushed out to disk.  This tunable is used to define when dirty
+inode is old enough to be eligible for writeback by the kernel flusher threads.
+And, it is also used as the interval to wakeup dirtytime_writeback thread.
+
+
+dirty_writeback_centisecs
+=========================
+
+The kernel flusher threads will periodically wake up and write `old` data
+out to disk.  This tunable expresses the interval between those wakeups, in
+100'ths of a second.
+
+Setting this to zero disables periodic writeback altogether.
+
+
+drop_caches
+===========
+
+Writing to this will cause the kernel to drop clean caches, as well as
+reclaimable slab objects like dentries and inodes.  Once dropped, their
+memory becomes free.
+
+To free pagecache::
+
+	echo 1 > /proc/sys/vm/drop_caches
+
+To free reclaimable slab objects (includes dentries and inodes)::
+
+	echo 2 > /proc/sys/vm/drop_caches
+
+To free slab objects and pagecache::
+
+	echo 3 > /proc/sys/vm/drop_caches
+
+This is a non-destructive operation and will not free any dirty objects.
+To increase the number of objects freed by this operation, the user may run
+`sync` prior to writing to /proc/sys/vm/drop_caches.  This will minimize the
+number of dirty objects on the system and create more candidates to be
+dropped.
+
+This file is not a means to control the growth of the various kernel caches
+(inodes, dentries, pagecache, etc...)  These objects are automatically
+reclaimed by the kernel when memory is needed elsewhere on the system.
+
+Use of this file can cause performance problems.  Since it discards cached
+objects, it may cost a significant amount of I/O and CPU to recreate the
+dropped objects, especially if they were under heavy use.  Because of this,
+use outside of a testing or debugging environment is not recommended.
+
+You may see informational messages in your kernel log when this file is
+used::
+
+	cat (1234): drop_caches: 3
+
+These are informational only.  They do not mean that anything is wrong
+with your system.  To disable them, echo 4 (bit 2) into drop_caches.
+
+
+extfrag_threshold
+=================
+
+This parameter affects whether the kernel will compact memory or direct
+reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
+debugfs shows what the fragmentation index for each order is in each zone in
+the system. Values tending towards 0 imply allocations would fail due to lack
+of memory, values towards 1000 imply failures are due to fragmentation and -1
+implies that the allocation will succeed as long as watermarks are met.
+
+The kernel will not compact memory in a zone if the
+fragmentation index is <= extfrag_threshold. The default value is 500.
+
+
+highmem_is_dirtyable
+====================
+
+Available only for systems with CONFIG_HIGHMEM enabled (32b systems).
+
+This parameter controls whether the high memory is considered for dirty
+writers throttling.  This is not the case by default which means that
+only the amount of memory directly visible/usable by the kernel can
+be dirtied. As a result, on systems with a large amount of memory and
+lowmem basically depleted writers might be throttled too early and
+streaming writes can get very slow.
+
+Changing the value to non zero would allow more memory to be dirtied
+and thus allow writers to write more data which can be flushed to the
+storage more effectively. Note this also comes with a risk of pre-mature
+OOM killer because some writers (e.g. direct block device writes) can
+only use the low memory and they can fill it up with dirty data without
+any throttling.
+
+
+hugetlb_shm_group
+=================
+
+hugetlb_shm_group contains group id that is allowed to create SysV
+shared memory segment using hugetlb page.
+
+
+laptop_mode
+===========
+
+laptop_mode is a knob that controls "laptop mode". All the things that are
+controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
+
+
+legacy_va_layout
+================
+
+If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel
+will use the legacy (2.4) layout for all processes.
+
+
+lowmem_reserve_ratio
+====================
+
+For some specialised workloads on highmem machines it is dangerous for
+the kernel to allow process memory to be allocated from the "lowmem"
+zone.  This is because that memory could then be pinned via the mlock()
+system call, or by unavailability of swapspace.
+
+And on large highmem machines this lack of reclaimable lowmem memory
+can be fatal.
+
+So the Linux page allocator has a mechanism which prevents allocations
+which *could* use highmem from using too much lowmem.  This means that
+a certain amount of lowmem is defended from the possibility of being
+captured into pinned user memory.
+
+(The same argument applies to the old 16 megabyte ISA DMA region.  This
+mechanism will also defend that region from allocations which could use
+highmem or lowmem).
+
+The `lowmem_reserve_ratio` tunable determines how aggressive the kernel is
+in defending these lower zones.
+
+If you have a machine which uses highmem or ISA DMA and your
+applications are using mlock(), or if you are running with no swap then
+you probably should change the lowmem_reserve_ratio setting.
+
+The lowmem_reserve_ratio is an array. You can see them by reading this file::
+
+	% cat /proc/sys/vm/lowmem_reserve_ratio
+	256     256     32
+
+But, these values are not used directly. The kernel calculates # of protection
+pages for each zones from them. These are shown as array of protection pages
+in /proc/zoneinfo like followings. (This is an example of x86-64 box).
+Each zone has an array of protection pages like this::
+
+  Node 0, zone      DMA
+    pages free     1355
+          min      3
+          low      3
+          high     4
+	:
+	:
+      numa_other   0
+          protection: (0, 2004, 2004, 2004)
+	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    pagesets
+      cpu: 0 pcp: 0
+          :
+
+These protections are added to score to judge whether this zone should be used
+for page allocation or should be reclaimed.
+
+In this example, if normal pages (index=2) are required to this DMA zone and
+watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
+not be used because pages_free(1355) is smaller than watermark + protection[2]
+(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
+normal page requirement. If requirement is DMA zone(index=0), protection[0]
+(=0) is used.
+
+zone[i]'s protection[j] is calculated by following expression::
+
+  (i < j):
+    zone[i]->protection[j]
+    = (total sums of managed_pages from zone[i+1] to zone[j] on the node)
+      / lowmem_reserve_ratio[i];
+  (i = j):
+     (should not be protected. = 0;
+  (i > j):
+     (not necessary, but looks 0)
+
+The default values of lowmem_reserve_ratio[i] are
+
+    === ====================================
+    256 (if zone[i] means DMA or DMA32 zone)
+    32  (others)
+    === ====================================
+
+As above expression, they are reciprocal number of ratio.
+256 means 1/256. # of protection pages becomes about "0.39%" of total managed
+pages of higher zones on the node.
+
+If you would like to protect more pages, smaller values are effective.
+The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
+disables protection of the pages.
+
+
+max_map_count:
+==============
+
+This file contains the maximum number of memory map areas a process
+may have. Memory map areas are used as a side-effect of calling
+malloc, directly by mmap, mprotect, and madvise, and also when loading
+shared libraries.
+
+While most applications need less than a thousand maps, certain
+programs, particularly malloc debuggers, may consume lots of them,
+e.g., up to one or two maps per allocation.
+
+The default value is 65536.
+
+
+memory_failure_early_kill:
+==========================
+
+Control how to kill processes when uncorrected memory error (typically
+a 2bit error in a memory module) is detected in the background by hardware
+that cannot be handled by the kernel. In some cases (like the page
+still having a valid copy on disk) the kernel will handle the failure
+transparently without affecting any applications. But if there is
+no other uptodate copy of the data it will kill to prevent any data
+corruptions from propagating.
+
+1: Kill all processes that have the corrupted and not reloadable page mapped
+as soon as the corruption is detected.  Note this is not supported
+for a few types of pages, like kernel internally allocated data or
+the swap cache, but works for the majority of user pages.
+
+0: Only unmap the corrupted page from all processes and only kill a process
+who tries to access it.
+
+The kill is done using a catchable SIGBUS with BUS_MCEERR_AO, so processes can
+handle this if they want to.
+
+This is only active on architectures/platforms with advanced machine
+check handling and depends on the hardware capabilities.
+
+Applications can override this setting individually with the PR_MCE_KILL prctl
+
+
+memory_failure_recovery
+=======================
+
+Enable memory failure recovery (when supported by the platform)
+
+1: Attempt recovery.
+
+0: Always panic on a memory failure.
+
+
+min_free_kbytes
+===============
+
+This is used to force the Linux VM to keep a minimum number
+of kilobytes free.  The VM uses this number to compute a
+watermark[WMARK_MIN] value for each lowmem zone in the system.
+Each lowmem zone gets a number of reserved free pages based
+proportionally on its size.
+
+Some minimal amount of memory is needed to satisfy PF_MEMALLOC
+allocations; if you set this to lower than 1024KB, your system will
+become subtly broken, and prone to deadlock under high loads.
+
+Setting this too high will OOM your machine instantly.
+
+
+min_slab_ratio
+==============
+
+This is available only on NUMA kernels.
+
+A percentage of the total pages in each zone.  On Zone reclaim
+(fallback from the local zone occurs) slabs will be reclaimed if more
+than this percentage of pages in a zone are reclaimable slab pages.
+This insures that the slab growth stays under control even in NUMA
+systems that rarely perform global reclaim.
+
+The default is 5 percent.
+
+Note that slab reclaim is triggered in a per zone / node fashion.
+The process of reclaiming slab memory is currently not node specific
+and may not be fast.
+
+
+min_unmapped_ratio
+==================
+
+This is available only on NUMA kernels.
+
+This is a percentage of the total pages in each zone. Zone reclaim will
+only occur if more than this percentage of pages are in a state that
+zone_reclaim_mode allows to be reclaimed.
+
+If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
+against all file-backed unmapped pages including swapcache pages and tmpfs
+files. Otherwise, only unmapped pages backed by normal files but not tmpfs
+files and similar are considered.
+
+The default is 1 percent.
+
+
+mmap_min_addr
+=============
+
+This file indicates the amount of address space  which a user process will
+be restricted from mmapping.  Since kernel null dereference bugs could
+accidentally operate based on the information in the first couple of pages
+of memory userspace processes should not be allowed to write to them.  By
+default this value is set to 0 and no protections will be enforced by the
+security module.  Setting this value to something like 64k will allow the
+vast majority of applications to work correctly and provide defense in depth
+against future potential kernel bugs.
+
+
+mmap_rnd_bits
+=============
+
+This value can be used to select the number of bits to use to
+determine the random offset to the base address of vma regions
+resulting from mmap allocations on architectures which support
+tuning address space randomization.  This value will be bounded
+by the architecture's minimum and maximum supported values.
+
+This value can be changed after boot using the
+/proc/sys/vm/mmap_rnd_bits tunable
+
+
+mmap_rnd_compat_bits
+====================
+
+This value can be used to select the number of bits to use to
+determine the random offset to the base address of vma regions
+resulting from mmap allocations for applications run in
+compatibility mode on architectures which support tuning address
+space randomization.  This value will be bounded by the
+architecture's minimum and maximum supported values.
+
+This value can be changed after boot using the
+/proc/sys/vm/mmap_rnd_compat_bits tunable
+
+
+nr_hugepages
+============
+
+Change the minimum size of the hugepage pool.
+
+See Documentation/admin-guide/mm/hugetlbpage.rst
+
+
+nr_hugepages_mempolicy
+======================
+
+Change the size of the hugepage pool at run-time on a specific
+set of NUMA nodes.
+
+See Documentation/admin-guide/mm/hugetlbpage.rst
+
+
+nr_overcommit_hugepages
+=======================
+
+Change the maximum size of the hugepage pool. The maximum is
+nr_hugepages + nr_overcommit_hugepages.
+
+See Documentation/admin-guide/mm/hugetlbpage.rst
+
+
+nr_trim_pages
+=============
+
+This is available only on NOMMU kernels.
+
+This value adjusts the excess page trimming behaviour of power-of-2 aligned
+NOMMU mmap allocations.
+
+A value of 0 disables trimming of allocations entirely, while a value of 1
+trims excess pages aggressively. Any value >= 1 acts as the watermark where
+trimming of allocations is initiated.
+
+The default value is 1.
+
+See Documentation/nommu-mmap.txt for more information.
+
+
+numa_zonelist_order
+===================
+
+This sysctl is only for NUMA and it is deprecated. Anything but
+Node order will fail!
+
+'where the memory is allocated from' is controlled by zonelists.
+
+(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation.
+you may be able to read ZONE_DMA as ZONE_DMA32...)
+
+In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following.
+ZONE_NORMAL -> ZONE_DMA
+This means that a memory allocation request for GFP_KERNEL will
+get memory from ZONE_DMA only when ZONE_NORMAL is not available.
+
+In NUMA case, you can think of following 2 types of order.
+Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL::
+
+  (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
+  (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
+
+Type(A) offers the best locality for processes on Node(0), but ZONE_DMA
+will be used before ZONE_NORMAL exhaustion. This increases possibility of
+out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small.
+
+Type(B) cannot offer the best locality but is more robust against OOM of
+the DMA zone.
+
+Type(A) is called as "Node" order. Type (B) is "Zone" order.
+
+"Node order" orders the zonelists by node, then by zone within each node.
+Specify "[Nn]ode" for node order
+
+"Zone Order" orders the zonelists by zone type, then by node within each
+zone.  Specify "[Zz]one" for zone order.
+
+Specify "[Dd]efault" to request automatic configuration.
+
+On 32-bit, the Normal zone needs to be preserved for allocations accessible
+by the kernel, so "zone" order will be selected.
+
+On 64-bit, devices that require DMA32/DMA are relatively rare, so "node"
+order will be selected.
+
+Default order is recommended unless this is causing problems for your
+system/application.
+
+
+oom_dump_tasks
+==============
+
+Enables a system-wide task dump (excluding kernel threads) to be produced
+when the kernel performs an OOM-killing and includes such information as
+pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
+score, and name.  This is helpful to determine why the OOM killer was
+invoked, to identify the rogue task that caused it, and to determine why
+the OOM killer chose the task it did to kill.
+
+If this is set to zero, this information is suppressed.  On very
+large systems with thousands of tasks it may not be feasible to dump
+the memory state information for each one.  Such systems should not
+be forced to incur a performance penalty in OOM conditions when the
+information may not be desired.
+
+If this is set to non-zero, this information is shown whenever the
+OOM killer actually kills a memory-hogging task.
+
+The default value is 1 (enabled).
+
+
+oom_kill_allocating_task
+========================
+
+This enables or disables killing the OOM-triggering task in
+out-of-memory situations.
+
+If this is set to zero, the OOM killer will scan through the entire
+tasklist and select a task based on heuristics to kill.  This normally
+selects a rogue memory-hogging task that frees up a large amount of
+memory when killed.
+
+If this is set to non-zero, the OOM killer simply kills the task that
+triggered the out-of-memory condition.  This avoids the expensive
+tasklist scan.
+
+If panic_on_oom is selected, it takes precedence over whatever value
+is used in oom_kill_allocating_task.
+
+The default value is 0.
+
+
+overcommit_kbytes
+=================
+
+When overcommit_memory is set to 2, the committed address space is not
+permitted to exceed swap plus this amount of physical RAM. See below.
+
+Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one
+of them may be specified at a time. Setting one disables the other (which
+then appears as 0 when read).
+
+
+overcommit_memory
+=================
+
+This value contains a flag that enables memory overcommitment.
+
+When this flag is 0, the kernel attempts to estimate the amount
+of free memory left when userspace requests more memory.
+
+When this flag is 1, the kernel pretends there is always enough
+memory until it actually runs out.
+
+When this flag is 2, the kernel uses a "never overcommit"
+policy that attempts to prevent any overcommit of memory.
+Note that user_reserve_kbytes affects this policy.
+
+This feature can be very useful because there are a lot of
+programs that malloc() huge amounts of memory "just-in-case"
+and don't use much of it.
+
+The default value is 0.
+
+See Documentation/vm/overcommit-accounting.rst and
+mm/util.c::__vm_enough_memory() for more information.
+
+
+overcommit_ratio
+================
+
+When overcommit_memory is set to 2, the committed address
+space is not permitted to exceed swap plus this percentage
+of physical RAM.  See above.
+
+
+page-cluster
+============
+
+page-cluster controls the number of pages up to which consecutive pages
+are read in from swap in a single attempt. This is the swap counterpart
+to page cache readahead.
+The mentioned consecutivity is not in terms of virtual/physical addresses,
+but consecutive on swap space - that means they were swapped out together.
+
+It is a logarithmic value - setting it to zero means "1 page", setting
+it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
+Zero disables swap readahead completely.
+
+The default value is three (eight pages at a time).  There may be some
+small benefits in tuning this to a different value if your workload is
+swap-intensive.
+
+Lower values mean lower latencies for initial faults, but at the same time
+extra faults and I/O delays for following faults if they would have been part of
+that consecutive pages readahead would have brought in.
+
+
+panic_on_oom
+============
+
+This enables or disables panic on out-of-memory feature.
+
+If this is set to 0, the kernel will kill some rogue process,
+called oom_killer.  Usually, oom_killer can kill rogue processes and
+system will survive.
+
+If this is set to 1, the kernel panics when out-of-memory happens.
+However, if a process limits using nodes by mempolicy/cpusets,
+and those nodes become memory exhaustion status, one process
+may be killed by oom-killer. No panic occurs in this case.
+Because other nodes' memory may be free. This means system total status
+may be not fatal yet.
+
+If this is set to 2, the kernel panics compulsorily even on the
+above-mentioned. Even oom happens under memory cgroup, the whole
+system panics.
+
+The default value is 0.
+
+1 and 2 are for failover of clustering. Please select either
+according to your policy of failover.
+
+panic_on_oom=2+kdump gives you very strong tool to investigate
+why oom happens. You can get snapshot.
+
+
+percpu_pagelist_fraction
+========================
+
+This is the fraction of pages at most (high mark pcp->high) in each zone that
+are allocated for each per cpu page list.  The min value for this is 8.  It
+means that we don't allow more than 1/8th of pages in each zone to be
+allocated in any single per_cpu_pagelist.  This entry only changes the value
+of hot per cpu pagelists.  User can specify a number like 100 to allocate
+1/100th of each zone to each per cpu page list.
+
+The batch value of each per cpu pagelist is also updated as a result.  It is
+set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
+
+The initial value is zero.  Kernel does not use this value at boot time to set
+the high water marks for each per cpu page list.  If the user writes '0' to this
+sysctl, it will revert to this default behavior.
+
+
+stat_interval
+=============
+
+The time interval between which vm statistics are updated.  The default
+is 1 second.
+
+
+stat_refresh
+============
+
+Any read or write (by root only) flushes all the per-cpu vm statistics
+into their global totals, for more accurate reports when testing
+e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
+
+As a side-effect, it also checks for negative totals (elsewhere reported
+as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
+(At time of writing, a few stats are known sometimes to be found negative,
+with no ill effects: errors and warnings on these stats are suppressed.)
+
+
+numa_stat
+=========
+
+This interface allows runtime configuration of numa statistics.
+
+When page allocation performance becomes a bottleneck and you can tolerate
+some possible tool breakage and decreased numa counter precision, you can
+do::
+
+	echo 0 > /proc/sys/vm/numa_stat
+
+When page allocation performance is not a bottleneck and you want all
+tooling to work, you can do::
+
+	echo 1 > /proc/sys/vm/numa_stat
+
+
+swappiness
+==========
+
+This control is used to define how aggressive the kernel will swap
+memory pages.  Higher values will increase aggressiveness, lower values
+decrease the amount of swap.  A value of 0 instructs the kernel not to
+initiate swap until the amount of free and file-backed pages is less
+than the high water mark in a zone.
+
+The default value is 60.
+
+
+unprivileged_userfaultfd
+========================
+
+This flag controls whether unprivileged users can use the userfaultfd
+system calls.  Set this to 1 to allow unprivileged users to use the
+userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
+privileged users (with SYS_CAP_PTRACE capability).
+
+The default value is 1.
+
+
+user_reserve_kbytes
+===================
+
+When overcommit_memory is set to 2, "never overcommit" mode, reserve
+min(3% of current process size, user_reserve_kbytes) of free memory.
+This is intended to prevent a user from starting a single memory hogging
+process, such that they cannot recover (kill the hog).
+
+user_reserve_kbytes defaults to min(3% of the current process size, 128MB).
+
+If this is reduced to zero, then the user will be allowed to allocate
+all free memory with a single process, minus admin_reserve_kbytes.
+Any subsequent attempts to execute a command will result in
+"fork: Cannot allocate memory".
+
+Changing this takes effect whenever an application requests memory.
+
+
+vfs_cache_pressure
+==================
+
+This percentage value controls the tendency of the kernel to reclaim
+the memory which is used for caching of directory and inode objects.
+
+At the default value of vfs_cache_pressure=100 the kernel will attempt to
+reclaim dentries and inodes at a "fair" rate with respect to pagecache and
+swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
+to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
+never reclaim dentries and inodes due to memory pressure and this can easily
+lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
+causes the kernel to prefer to reclaim dentries and inodes.
+
+Increasing vfs_cache_pressure significantly beyond 100 may have negative
+performance impact. Reclaim code needs to take various locks to find freeable
+directory and inode objects. With vfs_cache_pressure=1000, it will look for
+ten times more freeable objects than there are.
+
+
+watermark_boost_factor
+======================
+
+This factor controls the level of reclaim when memory is being fragmented.
+It defines the percentage of the high watermark of a zone that will be
+reclaimed if pages of different mobility are being mixed within pageblocks.
+The intent is that compaction has less work to do in the future and to
+increase the success rate of future high-order allocations such as SLUB
+allocations, THP and hugetlbfs pages.
+
+To make it sensible with respect to the watermark_scale_factor
+parameter, the unit is in fractions of 10,000. The default value of
+15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
+watermark will be reclaimed in the event of a pageblock being mixed due
+to fragmentation. The level of reclaim is determined by the number of
+fragmentation events that occurred in the recent past. If this value is
+smaller than a pageblock then a pageblocks worth of pages will be reclaimed
+(e.g.  2MB on 64-bit x86). A boost factor of 0 will disable the feature.
+
+
+watermark_scale_factor
+======================
+
+This factor controls the aggressiveness of kswapd. It defines the
+amount of memory left in a node/system before kswapd is woken up and
+how much memory needs to be free before kswapd goes back to sleep.
+
+The unit is in fractions of 10,000. The default value of 10 means the
+distances between watermarks are 0.1% of the available memory in the
+node/system. The maximum value is 1000, or 10% of memory.
+
+A high rate of threads entering direct reclaim (allocstall) or kswapd
+going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
+that the number of free pages kswapd maintains for latency reasons is
+too small for the allocation bursts occurring in the system. This knob
+can then be used to tune kswapd aggressiveness accordingly.
+
+
+zone_reclaim_mode
+=================
+
+Zone_reclaim_mode allows someone to set more or less aggressive approaches to
+reclaim memory when a zone runs out of memory. If it is set to zero then no
+zone reclaim occurs. Allocations will be satisfied from other zones / nodes
+in the system.
+
+This is value OR'ed together of
+
+=	===================================
+1	Zone reclaim on
+2	Zone reclaim writes dirty pages out
+4	Zone reclaim swaps pages
+=	===================================
+
+zone_reclaim_mode is disabled by default.  For file servers or workloads
+that benefit from having their data cached, zone_reclaim_mode should be
+left disabled as the caching effect is likely to be more important than
+data locality.
+
+zone_reclaim may be enabled if it's known that the workload is partitioned
+such that each partition fits within a NUMA node and that accessing remote
+memory would cause a measurable performance reduction.  The page allocator
+will then reclaim easily reusable pages (those page cache pages that are
+currently not used) before allocating off node pages.
+
+Allowing zone reclaim to write out pages stops processes that are
+writing large amounts of data from dirtying pages on other nodes. Zone
+reclaim will write out dirty pages if a zone fills up and so effectively
+throttle the process. This may decrease the performance of a single process
+since it cannot use all of system memory to buffer the outgoing writes
+anymore but it preserve the memory on other nodes so that the performance
+of other processes running on other nodes will not be affected.
+
+Allowing regular swap effectively restricts allocations to the local
+node unless explicitly overridden by memory policies or cpuset
+configurations.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
deleted file mode 100644
index c5f0d44433a2..000000000000
--- a/Documentation/sysctl/vm.txt
+++ /dev/null
@@ -1,946 +0,0 @@
-Documentation for /proc/sys/vm/*	kernel version 2.6.29
-	(c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-	(c) 2008         Peter W. Morreale <pmorreale@novell.com>
-
-For general info and legal blurb, please look in README.
-
-==============================================================
-
-This file contains the documentation for the sysctl files in
-/proc/sys/vm and is valid for Linux kernel version 2.6.29.
-
-The files in this directory can be used to tune the operation
-of the virtual memory (VM) subsystem of the Linux kernel and
-the writeout of dirty data to disk.
-
-Default values and initialization routines for most of these
-files can be found in mm/swap.c.
-
-Currently, these files are in /proc/sys/vm:
-
-- admin_reserve_kbytes
-- block_dump
-- compact_memory
-- compact_unevictable_allowed
-- dirty_background_bytes
-- dirty_background_ratio
-- dirty_bytes
-- dirty_expire_centisecs
-- dirty_ratio
-- dirtytime_expire_seconds
-- dirty_writeback_centisecs
-- drop_caches
-- extfrag_threshold
-- hugetlb_shm_group
-- laptop_mode
-- legacy_va_layout
-- lowmem_reserve_ratio
-- max_map_count
-- memory_failure_early_kill
-- memory_failure_recovery
-- min_free_kbytes
-- min_slab_ratio
-- min_unmapped_ratio
-- mmap_min_addr
-- mmap_rnd_bits
-- mmap_rnd_compat_bits
-- nr_hugepages
-- nr_hugepages_mempolicy
-- nr_overcommit_hugepages
-- nr_trim_pages         (only if CONFIG_MMU=n)
-- numa_zonelist_order
-- oom_dump_tasks
-- oom_kill_allocating_task
-- overcommit_kbytes
-- overcommit_memory
-- overcommit_ratio
-- page-cluster
-- panic_on_oom
-- percpu_pagelist_fraction
-- stat_interval
-- stat_refresh
-- numa_stat
-- swappiness
-- unprivileged_userfaultfd
-- user_reserve_kbytes
-- vfs_cache_pressure
-- watermark_boost_factor
-- watermark_scale_factor
-- zone_reclaim_mode
-
-==============================================================
-
-admin_reserve_kbytes
-
-The amount of free memory in the system that should be reserved for users
-with the capability cap_sys_admin.
-
-admin_reserve_kbytes defaults to min(3% of free pages, 8MB)
-
-That should provide enough for the admin to log in and kill a process,
-if necessary, under the default overcommit 'guess' mode.
-
-Systems running under overcommit 'never' should increase this to account
-for the full Virtual Memory Size of programs used to recover. Otherwise,
-root may not be able to log in to recover the system.
-
-How do you calculate a minimum useful reserve?
-
-sshd or login + bash (or some other shell) + top (or ps, kill, etc.)
-
-For overcommit 'guess', we can sum resident set sizes (RSS).
-On x86_64 this is about 8MB.
-
-For overcommit 'never', we can take the max of their virtual sizes (VSZ)
-and add the sum of their RSS.
-On x86_64 this is about 128MB.
-
-Changing this takes effect whenever an application requests memory.
-
-==============================================================
-
-block_dump
-
-block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
-
-==============================================================
-
-compact_memory
-
-Available only when CONFIG_COMPACTION is set. When 1 is written to the file,
-all zones are compacted such that free memory is available in contiguous
-blocks where possible. This can be important for example in the allocation of
-huge pages although processes will also directly compact memory as required.
-
-==============================================================
-
-compact_unevictable_allowed
-
-Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
-allowed to examine the unevictable lru (mlocked pages) for pages to compact.
-This should be used on systems where stalls for minor page faults are an
-acceptable trade for large contiguous free memory.  Set to 0 to prevent
-compaction from moving pages that are unevictable.  Default value is 1.
-
-==============================================================
-
-dirty_background_bytes
-
-Contains the amount of dirty memory at which the background kernel
-flusher threads will start writeback.
-
-Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only
-one of them may be specified at a time. When one sysctl is written it is
-immediately taken into account to evaluate the dirty memory limits and the
-other appears as 0 when read.
-
-==============================================================
-
-dirty_background_ratio
-
-Contains, as a percentage of total available memory that contains free pages
-and reclaimable pages, the number of pages at which the background kernel
-flusher threads will start writing out dirty data.
-
-The total available memory is not equal to total system memory.
-
-==============================================================
-
-dirty_bytes
-
-Contains the amount of dirty memory at which a process generating disk writes
-will itself start writeback.
-
-Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be
-specified at a time. When one sysctl is written it is immediately taken into
-account to evaluate the dirty memory limits and the other appears as 0 when
-read.
-
-Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
-value lower than this limit will be ignored and the old configuration will be
-retained.
-
-==============================================================
-
-dirty_expire_centisecs
-
-This tunable is used to define when dirty data is old enough to be eligible
-for writeout by the kernel flusher threads.  It is expressed in 100'ths
-of a second.  Data which has been dirty in-memory for longer than this
-interval will be written out next time a flusher thread wakes up.
-
-==============================================================
-
-dirty_ratio
-
-Contains, as a percentage of total available memory that contains free pages
-and reclaimable pages, the number of pages at which a process which is
-generating disk writes will itself start writing out dirty data.
-
-The total available memory is not equal to total system memory.
-
-==============================================================
-
-dirtytime_expire_seconds
-
-When a lazytime inode is constantly having its pages dirtied, the inode with
-an updated timestamp will never get chance to be written out.  And, if the
-only thing that has happened on the file system is a dirtytime inode caused
-by an atime update, a worker will be scheduled to make sure that inode
-eventually gets pushed out to disk.  This tunable is used to define when dirty
-inode is old enough to be eligible for writeback by the kernel flusher threads.
-And, it is also used as the interval to wakeup dirtytime_writeback thread.
-
-==============================================================
-
-dirty_writeback_centisecs
-
-The kernel flusher threads will periodically wake up and write `old' data
-out to disk.  This tunable expresses the interval between those wakeups, in
-100'ths of a second.
-
-Setting this to zero disables periodic writeback altogether.
-
-==============================================================
-
-drop_caches
-
-Writing to this will cause the kernel to drop clean caches, as well as
-reclaimable slab objects like dentries and inodes.  Once dropped, their
-memory becomes free.
-
-To free pagecache:
-	echo 1 > /proc/sys/vm/drop_caches
-To free reclaimable slab objects (includes dentries and inodes):
-	echo 2 > /proc/sys/vm/drop_caches
-To free slab objects and pagecache:
-	echo 3 > /proc/sys/vm/drop_caches
-
-This is a non-destructive operation and will not free any dirty objects.
-To increase the number of objects freed by this operation, the user may run
-`sync' prior to writing to /proc/sys/vm/drop_caches.  This will minimize the
-number of dirty objects on the system and create more candidates to be
-dropped.
-
-This file is not a means to control the growth of the various kernel caches
-(inodes, dentries, pagecache, etc...)  These objects are automatically
-reclaimed by the kernel when memory is needed elsewhere on the system.
-
-Use of this file can cause performance problems.  Since it discards cached
-objects, it may cost a significant amount of I/O and CPU to recreate the
-dropped objects, especially if they were under heavy use.  Because of this,
-use outside of a testing or debugging environment is not recommended.
-
-You may see informational messages in your kernel log when this file is
-used:
-
-	cat (1234): drop_caches: 3
-
-These are informational only.  They do not mean that anything is wrong
-with your system.  To disable them, echo 4 (bit 2) into drop_caches.
-
-==============================================================
-
-extfrag_threshold
-
-This parameter affects whether the kernel will compact memory or direct
-reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
-debugfs shows what the fragmentation index for each order is in each zone in
-the system. Values tending towards 0 imply allocations would fail due to lack
-of memory, values towards 1000 imply failures are due to fragmentation and -1
-implies that the allocation will succeed as long as watermarks are met.
-
-The kernel will not compact memory in a zone if the
-fragmentation index is <= extfrag_threshold. The default value is 500.
-
-==============================================================
-
-highmem_is_dirtyable
-
-Available only for systems with CONFIG_HIGHMEM enabled (32b systems).
-
-This parameter controls whether the high memory is considered for dirty
-writers throttling.  This is not the case by default which means that
-only the amount of memory directly visible/usable by the kernel can
-be dirtied. As a result, on systems with a large amount of memory and
-lowmem basically depleted writers might be throttled too early and
-streaming writes can get very slow.
-
-Changing the value to non zero would allow more memory to be dirtied
-and thus allow writers to write more data which can be flushed to the
-storage more effectively. Note this also comes with a risk of pre-mature
-OOM killer because some writers (e.g. direct block device writes) can
-only use the low memory and they can fill it up with dirty data without
-any throttling.
-
-==============================================================
-
-hugetlb_shm_group
-
-hugetlb_shm_group contains group id that is allowed to create SysV
-shared memory segment using hugetlb page.
-
-==============================================================
-
-laptop_mode
-
-laptop_mode is a knob that controls "laptop mode". All the things that are
-controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
-
-==============================================================
-
-legacy_va_layout
-
-If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel
-will use the legacy (2.4) layout for all processes.
-
-==============================================================
-
-lowmem_reserve_ratio
-
-For some specialised workloads on highmem machines it is dangerous for
-the kernel to allow process memory to be allocated from the "lowmem"
-zone.  This is because that memory could then be pinned via the mlock()
-system call, or by unavailability of swapspace.
-
-And on large highmem machines this lack of reclaimable lowmem memory
-can be fatal.
-
-So the Linux page allocator has a mechanism which prevents allocations
-which _could_ use highmem from using too much lowmem.  This means that
-a certain amount of lowmem is defended from the possibility of being
-captured into pinned user memory.
-
-(The same argument applies to the old 16 megabyte ISA DMA region.  This
-mechanism will also defend that region from allocations which could use
-highmem or lowmem).
-
-The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is
-in defending these lower zones.
-
-If you have a machine which uses highmem or ISA DMA and your
-applications are using mlock(), or if you are running with no swap then
-you probably should change the lowmem_reserve_ratio setting.
-
-The lowmem_reserve_ratio is an array. You can see them by reading this file.
--
-% cat /proc/sys/vm/lowmem_reserve_ratio
-256     256     32
--
-
-But, these values are not used directly. The kernel calculates # of protection
-pages for each zones from them. These are shown as array of protection pages
-in /proc/zoneinfo like followings. (This is an example of x86-64 box).
-Each zone has an array of protection pages like this.
-
--
-Node 0, zone      DMA
-  pages free     1355
-        min      3
-        low      3
-        high     4
-	:
-	:
-    numa_other   0
-        protection: (0, 2004, 2004, 2004)
-	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-  pagesets
-    cpu: 0 pcp: 0
-        :
--
-These protections are added to score to judge whether this zone should be used
-for page allocation or should be reclaimed.
-
-In this example, if normal pages (index=2) are required to this DMA zone and
-watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
-not be used because pages_free(1355) is smaller than watermark + protection[2]
-(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
-normal page requirement. If requirement is DMA zone(index=0), protection[0]
-(=0) is used.
-
-zone[i]'s protection[j] is calculated by following expression.
-
-(i < j):
-  zone[i]->protection[j]
-  = (total sums of managed_pages from zone[i+1] to zone[j] on the node)
-    / lowmem_reserve_ratio[i];
-(i = j):
-   (should not be protected. = 0;
-(i > j):
-   (not necessary, but looks 0)
-
-The default values of lowmem_reserve_ratio[i] are
-    256 (if zone[i] means DMA or DMA32 zone)
-    32  (others).
-As above expression, they are reciprocal number of ratio.
-256 means 1/256. # of protection pages becomes about "0.39%" of total managed
-pages of higher zones on the node.
-
-If you would like to protect more pages, smaller values are effective.
-The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
-disables protection of the pages.
-
-==============================================================
-
-max_map_count:
-
-This file contains the maximum number of memory map areas a process
-may have. Memory map areas are used as a side-effect of calling
-malloc, directly by mmap, mprotect, and madvise, and also when loading
-shared libraries.
-
-While most applications need less than a thousand maps, certain
-programs, particularly malloc debuggers, may consume lots of them,
-e.g., up to one or two maps per allocation.
-
-The default value is 65536.
-
-=============================================================
-
-memory_failure_early_kill:
-
-Control how to kill processes when uncorrected memory error (typically
-a 2bit error in a memory module) is detected in the background by hardware
-that cannot be handled by the kernel. In some cases (like the page
-still having a valid copy on disk) the kernel will handle the failure
-transparently without affecting any applications. But if there is
-no other uptodate copy of the data it will kill to prevent any data
-corruptions from propagating.
-
-1: Kill all processes that have the corrupted and not reloadable page mapped
-as soon as the corruption is detected.  Note this is not supported
-for a few types of pages, like kernel internally allocated data or
-the swap cache, but works for the majority of user pages.
-
-0: Only unmap the corrupted page from all processes and only kill a process
-who tries to access it.
-
-The kill is done using a catchable SIGBUS with BUS_MCEERR_AO, so processes can
-handle this if they want to.
-
-This is only active on architectures/platforms with advanced machine
-check handling and depends on the hardware capabilities.
-
-Applications can override this setting individually with the PR_MCE_KILL prctl
-
-==============================================================
-
-memory_failure_recovery
-
-Enable memory failure recovery (when supported by the platform)
-
-1: Attempt recovery.
-
-0: Always panic on a memory failure.
-
-==============================================================
-
-min_free_kbytes:
-
-This is used to force the Linux VM to keep a minimum number
-of kilobytes free.  The VM uses this number to compute a
-watermark[WMARK_MIN] value for each lowmem zone in the system.
-Each lowmem zone gets a number of reserved free pages based
-proportionally on its size.
-
-Some minimal amount of memory is needed to satisfy PF_MEMALLOC
-allocations; if you set this to lower than 1024KB, your system will
-become subtly broken, and prone to deadlock under high loads.
-
-Setting this too high will OOM your machine instantly.
-
-=============================================================
-
-min_slab_ratio:
-
-This is available only on NUMA kernels.
-
-A percentage of the total pages in each zone.  On Zone reclaim
-(fallback from the local zone occurs) slabs will be reclaimed if more
-than this percentage of pages in a zone are reclaimable slab pages.
-This insures that the slab growth stays under control even in NUMA
-systems that rarely perform global reclaim.
-
-The default is 5 percent.
-
-Note that slab reclaim is triggered in a per zone / node fashion.
-The process of reclaiming slab memory is currently not node specific
-and may not be fast.
-
-=============================================================
-
-min_unmapped_ratio:
-
-This is available only on NUMA kernels.
-
-This is a percentage of the total pages in each zone. Zone reclaim will
-only occur if more than this percentage of pages are in a state that
-zone_reclaim_mode allows to be reclaimed.
-
-If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
-against all file-backed unmapped pages including swapcache pages and tmpfs
-files. Otherwise, only unmapped pages backed by normal files but not tmpfs
-files and similar are considered.
-
-The default is 1 percent.
-
-==============================================================
-
-mmap_min_addr
-
-This file indicates the amount of address space  which a user process will
-be restricted from mmapping.  Since kernel null dereference bugs could
-accidentally operate based on the information in the first couple of pages
-of memory userspace processes should not be allowed to write to them.  By
-default this value is set to 0 and no protections will be enforced by the
-security module.  Setting this value to something like 64k will allow the
-vast majority of applications to work correctly and provide defense in depth
-against future potential kernel bugs.
-
-==============================================================
-
-mmap_rnd_bits:
-
-This value can be used to select the number of bits to use to
-determine the random offset to the base address of vma regions
-resulting from mmap allocations on architectures which support
-tuning address space randomization.  This value will be bounded
-by the architecture's minimum and maximum supported values.
-
-This value can be changed after boot using the
-/proc/sys/vm/mmap_rnd_bits tunable
-
-==============================================================
-
-mmap_rnd_compat_bits:
-
-This value can be used to select the number of bits to use to
-determine the random offset to the base address of vma regions
-resulting from mmap allocations for applications run in
-compatibility mode on architectures which support tuning address
-space randomization.  This value will be bounded by the
-architecture's minimum and maximum supported values.
-
-This value can be changed after boot using the
-/proc/sys/vm/mmap_rnd_compat_bits tunable
-
-==============================================================
-
-nr_hugepages
-
-Change the minimum size of the hugepage pool.
-
-See Documentation/admin-guide/mm/hugetlbpage.rst
-
-==============================================================
-
-nr_hugepages_mempolicy
-
-Change the size of the hugepage pool at run-time on a specific
-set of NUMA nodes.
-
-See Documentation/admin-guide/mm/hugetlbpage.rst
-
-==============================================================
-
-nr_overcommit_hugepages
-
-Change the maximum size of the hugepage pool. The maximum is
-nr_hugepages + nr_overcommit_hugepages.
-
-See Documentation/admin-guide/mm/hugetlbpage.rst
-
-==============================================================
-
-nr_trim_pages
-
-This is available only on NOMMU kernels.
-
-This value adjusts the excess page trimming behaviour of power-of-2 aligned
-NOMMU mmap allocations.
-
-A value of 0 disables trimming of allocations entirely, while a value of 1
-trims excess pages aggressively. Any value >= 1 acts as the watermark where
-trimming of allocations is initiated.
-
-The default value is 1.
-
-See Documentation/nommu-mmap.txt for more information.
-
-==============================================================
-
-numa_zonelist_order
-
-This sysctl is only for NUMA and it is deprecated. Anything but
-Node order will fail!
-
-'where the memory is allocated from' is controlled by zonelists.
-(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation.
- you may be able to read ZONE_DMA as ZONE_DMA32...)
-
-In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following.
-ZONE_NORMAL -> ZONE_DMA
-This means that a memory allocation request for GFP_KERNEL will
-get memory from ZONE_DMA only when ZONE_NORMAL is not available.
-
-In NUMA case, you can think of following 2 types of order.
-Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL
-
-(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
-(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
-
-Type(A) offers the best locality for processes on Node(0), but ZONE_DMA
-will be used before ZONE_NORMAL exhaustion. This increases possibility of
-out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small.
-
-Type(B) cannot offer the best locality but is more robust against OOM of
-the DMA zone.
-
-Type(A) is called as "Node" order. Type (B) is "Zone" order.
-
-"Node order" orders the zonelists by node, then by zone within each node.
-Specify "[Nn]ode" for node order
-
-"Zone Order" orders the zonelists by zone type, then by node within each
-zone.  Specify "[Zz]one" for zone order.
-
-Specify "[Dd]efault" to request automatic configuration.
-
-On 32-bit, the Normal zone needs to be preserved for allocations accessible
-by the kernel, so "zone" order will be selected.
-
-On 64-bit, devices that require DMA32/DMA are relatively rare, so "node"
-order will be selected.
-
-Default order is recommended unless this is causing problems for your
-system/application.
-
-==============================================================
-
-oom_dump_tasks
-
-Enables a system-wide task dump (excluding kernel threads) to be produced
-when the kernel performs an OOM-killing and includes such information as
-pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
-score, and name.  This is helpful to determine why the OOM killer was
-invoked, to identify the rogue task that caused it, and to determine why
-the OOM killer chose the task it did to kill.
-
-If this is set to zero, this information is suppressed.  On very
-large systems with thousands of tasks it may not be feasible to dump
-the memory state information for each one.  Such systems should not
-be forced to incur a performance penalty in OOM conditions when the
-information may not be desired.
-
-If this is set to non-zero, this information is shown whenever the
-OOM killer actually kills a memory-hogging task.
-
-The default value is 1 (enabled).
-
-==============================================================
-
-oom_kill_allocating_task
-
-This enables or disables killing the OOM-triggering task in
-out-of-memory situations.
-
-If this is set to zero, the OOM killer will scan through the entire
-tasklist and select a task based on heuristics to kill.  This normally
-selects a rogue memory-hogging task that frees up a large amount of
-memory when killed.
-
-If this is set to non-zero, the OOM killer simply kills the task that
-triggered the out-of-memory condition.  This avoids the expensive
-tasklist scan.
-
-If panic_on_oom is selected, it takes precedence over whatever value
-is used in oom_kill_allocating_task.
-
-The default value is 0.
-
-==============================================================
-
-overcommit_kbytes:
-
-When overcommit_memory is set to 2, the committed address space is not
-permitted to exceed swap plus this amount of physical RAM. See below.
-
-Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one
-of them may be specified at a time. Setting one disables the other (which
-then appears as 0 when read).
-
-==============================================================
-
-overcommit_memory:
-
-This value contains a flag that enables memory overcommitment.
-
-When this flag is 0, the kernel attempts to estimate the amount
-of free memory left when userspace requests more memory.
-
-When this flag is 1, the kernel pretends there is always enough
-memory until it actually runs out.
-
-When this flag is 2, the kernel uses a "never overcommit"
-policy that attempts to prevent any overcommit of memory.
-Note that user_reserve_kbytes affects this policy.
-
-This feature can be very useful because there are a lot of
-programs that malloc() huge amounts of memory "just-in-case"
-and don't use much of it.
-
-The default value is 0.
-
-See Documentation/vm/overcommit-accounting.rst and
-mm/util.c::__vm_enough_memory() for more information.
-
-==============================================================
-
-overcommit_ratio:
-
-When overcommit_memory is set to 2, the committed address
-space is not permitted to exceed swap plus this percentage
-of physical RAM.  See above.
-
-==============================================================
-
-page-cluster
-
-page-cluster controls the number of pages up to which consecutive pages
-are read in from swap in a single attempt. This is the swap counterpart
-to page cache readahead.
-The mentioned consecutivity is not in terms of virtual/physical addresses,
-but consecutive on swap space - that means they were swapped out together.
-
-It is a logarithmic value - setting it to zero means "1 page", setting
-it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
-Zero disables swap readahead completely.
-
-The default value is three (eight pages at a time).  There may be some
-small benefits in tuning this to a different value if your workload is
-swap-intensive.
-
-Lower values mean lower latencies for initial faults, but at the same time
-extra faults and I/O delays for following faults if they would have been part of
-that consecutive pages readahead would have brought in.
-
-=============================================================
-
-panic_on_oom
-
-This enables or disables panic on out-of-memory feature.
-
-If this is set to 0, the kernel will kill some rogue process,
-called oom_killer.  Usually, oom_killer can kill rogue processes and
-system will survive.
-
-If this is set to 1, the kernel panics when out-of-memory happens.
-However, if a process limits using nodes by mempolicy/cpusets,
-and those nodes become memory exhaustion status, one process
-may be killed by oom-killer. No panic occurs in this case.
-Because other nodes' memory may be free. This means system total status
-may be not fatal yet.
-
-If this is set to 2, the kernel panics compulsorily even on the
-above-mentioned. Even oom happens under memory cgroup, the whole
-system panics.
-
-The default value is 0.
-1 and 2 are for failover of clustering. Please select either
-according to your policy of failover.
-panic_on_oom=2+kdump gives you very strong tool to investigate
-why oom happens. You can get snapshot.
-
-=============================================================
-
-percpu_pagelist_fraction
-
-This is the fraction of pages at most (high mark pcp->high) in each zone that
-are allocated for each per cpu page list.  The min value for this is 8.  It
-means that we don't allow more than 1/8th of pages in each zone to be
-allocated in any single per_cpu_pagelist.  This entry only changes the value
-of hot per cpu pagelists.  User can specify a number like 100 to allocate
-1/100th of each zone to each per cpu page list.
-
-The batch value of each per cpu pagelist is also updated as a result.  It is
-set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
-
-The initial value is zero.  Kernel does not use this value at boot time to set
-the high water marks for each per cpu page list.  If the user writes '0' to this
-sysctl, it will revert to this default behavior.
-
-==============================================================
-
-stat_interval
-
-The time interval between which vm statistics are updated.  The default
-is 1 second.
-
-==============================================================
-
-stat_refresh
-
-Any read or write (by root only) flushes all the per-cpu vm statistics
-into their global totals, for more accurate reports when testing
-e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
-
-As a side-effect, it also checks for negative totals (elsewhere reported
-as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
-(At time of writing, a few stats are known sometimes to be found negative,
-with no ill effects: errors and warnings on these stats are suppressed.)
-
-==============================================================
-
-numa_stat
-
-This interface allows runtime configuration of numa statistics.
-
-When page allocation performance becomes a bottleneck and you can tolerate
-some possible tool breakage and decreased numa counter precision, you can
-do:
-	echo 0 > /proc/sys/vm/numa_stat
-
-When page allocation performance is not a bottleneck and you want all
-tooling to work, you can do:
-	echo 1 > /proc/sys/vm/numa_stat
-
-==============================================================
-
-swappiness
-
-This control is used to define how aggressive the kernel will swap
-memory pages.  Higher values will increase aggressiveness, lower values
-decrease the amount of swap.  A value of 0 instructs the kernel not to
-initiate swap until the amount of free and file-backed pages is less
-than the high water mark in a zone.
-
-The default value is 60.
-
-==============================================================
-
-unprivileged_userfaultfd
-
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
-
-The default value is 1.
-
-==============================================================
-
-- user_reserve_kbytes
-
-When overcommit_memory is set to 2, "never overcommit" mode, reserve
-min(3% of current process size, user_reserve_kbytes) of free memory.
-This is intended to prevent a user from starting a single memory hogging
-process, such that they cannot recover (kill the hog).
-
-user_reserve_kbytes defaults to min(3% of the current process size, 128MB).
-
-If this is reduced to zero, then the user will be allowed to allocate
-all free memory with a single process, minus admin_reserve_kbytes.
-Any subsequent attempts to execute a command will result in
-"fork: Cannot allocate memory".
-
-Changing this takes effect whenever an application requests memory.
-
-==============================================================
-
-vfs_cache_pressure
-------------------
-
-This percentage value controls the tendency of the kernel to reclaim
-the memory which is used for caching of directory and inode objects.
-
-At the default value of vfs_cache_pressure=100 the kernel will attempt to
-reclaim dentries and inodes at a "fair" rate with respect to pagecache and
-swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
-to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
-never reclaim dentries and inodes due to memory pressure and this can easily
-lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
-causes the kernel to prefer to reclaim dentries and inodes.
-
-Increasing vfs_cache_pressure significantly beyond 100 may have negative
-performance impact. Reclaim code needs to take various locks to find freeable
-directory and inode objects. With vfs_cache_pressure=1000, it will look for
-ten times more freeable objects than there are.
-
-=============================================================
-
-watermark_boost_factor:
-
-This factor controls the level of reclaim when memory is being fragmented.
-It defines the percentage of the high watermark of a zone that will be
-reclaimed if pages of different mobility are being mixed within pageblocks.
-The intent is that compaction has less work to do in the future and to
-increase the success rate of future high-order allocations such as SLUB
-allocations, THP and hugetlbfs pages.
-
-To make it sensible with respect to the watermark_scale_factor
-parameter, the unit is in fractions of 10,000. The default value of
-15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
-watermark will be reclaimed in the event of a pageblock being mixed due
-to fragmentation. The level of reclaim is determined by the number of
-fragmentation events that occurred in the recent past. If this value is
-smaller than a pageblock then a pageblocks worth of pages will be reclaimed
-(e.g.  2MB on 64-bit x86). A boost factor of 0 will disable the feature.
-
-=============================================================
-
-watermark_scale_factor:
-
-This factor controls the aggressiveness of kswapd. It defines the
-amount of memory left in a node/system before kswapd is woken up and
-how much memory needs to be free before kswapd goes back to sleep.
-
-The unit is in fractions of 10,000. The default value of 10 means the
-distances between watermarks are 0.1% of the available memory in the
-node/system. The maximum value is 1000, or 10% of memory.
-
-A high rate of threads entering direct reclaim (allocstall) or kswapd
-going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
-that the number of free pages kswapd maintains for latency reasons is
-too small for the allocation bursts occurring in the system. This knob
-can then be used to tune kswapd aggressiveness accordingly.
-
-==============================================================
-
-zone_reclaim_mode:
-
-Zone_reclaim_mode allows someone to set more or less aggressive approaches to
-reclaim memory when a zone runs out of memory. If it is set to zero then no
-zone reclaim occurs. Allocations will be satisfied from other zones / nodes
-in the system.
-
-This is value ORed together of
-
-1	= Zone reclaim on
-2	= Zone reclaim writes dirty pages out
-4	= Zone reclaim swaps pages
-
-zone_reclaim_mode is disabled by default.  For file servers or workloads
-that benefit from having their data cached, zone_reclaim_mode should be
-left disabled as the caching effect is likely to be more important than
-data locality.
-
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction.  The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
-
-Allowing zone reclaim to write out pages stops processes that are
-writing large amounts of data from dirtying pages on other nodes. Zone
-reclaim will write out dirty pages if a zone fills up and so effectively
-throttle the process. This may decrease the performance of a single process
-since it cannot use all of system memory to buffer the outgoing writes
-anymore but it preserve the memory on other nodes so that the performance
-of other processes running on other nodes will not be affected.
-
-Allowing regular swap effectively restricts allocations to the local
-node unless explicitly overridden by memory policies or cpuset
-configurations.
-
-============ End of Document =================================
diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
index c6d94118fbcc..8ba656f37cd8 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -439,7 +439,7 @@ Compacting MLOCKED Pages
 
 The unevictable LRU can be scanned for compactable regions and the default
 behavior is to do so.  /proc/sys/vm/compact_unevictable_allowed controls
-this behavior (see Documentation/sysctl/vm.txt).  Once scanning of the
+this behavior (see Documentation/sysctl/vm.rst).  Once scanning of the
 unevictable LRU is enabled, the work of compaction is mostly handled by
 the page migration code and the same work flow as described in MIGRATING
 MLOCKED PAGES will apply.
diff --git a/kernel/panic.c b/kernel/panic.c
index 4d9f55bf7d38..e0ea74bbb41d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -372,7 +372,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
 /**
  * print_tainted - return a string to represent the kernel taint state.
  *
- * For individual taint flag meanings, see Documentation/sysctl/kernel.txt
+ * For individual taint flag meanings, see Documentation/sysctl/kernel.rst
  *
  * The string is overwritten by the next call to print_tainted(),
  * but is always NULL terminated.
diff --git a/mm/swap.c b/mm/swap.c
index 607c48229a1d..83a2a15f4836 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -8,7 +8,7 @@
 /*
  * This file contains the default values for the operation of the
  * Linux VM subsystem. Fine-tuning documentation can be found in
- * Documentation/sysctl/vm.txt.
+ * Documentation/sysctl/vm.rst.
  * Started 18.12.91
  * Swap aging added 23.2.95, Stephen Tweedie.
  * Buffermem limits added 12.3.98, Rik van Riel.
-- 
cgit v1.2.3-55-g7522


From 898bd37a92063e46bc8d7b870781cecd66234f92 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 18 Apr 2019 19:45:00 -0300
Subject: docs: block: convert to ReST

Rename the block documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |    8 +-
 Documentation/block/bfq-iosched.rst             |  597 ++++++++++++
 Documentation/block/bfq-iosched.txt             |  583 -----------
 Documentation/block/biodoc.rst                  | 1168 +++++++++++++++++++++++
 Documentation/block/biodoc.txt                  | 1076 ---------------------
 Documentation/block/biovecs.rst                 |  146 +++
 Documentation/block/biovecs.txt                 |  144 ---
 Documentation/block/capability.rst              |   18 +
 Documentation/block/capability.txt              |   15 -
 Documentation/block/cmdline-partition.rst       |   53 +
 Documentation/block/cmdline-partition.txt       |   46 -
 Documentation/block/data-integrity.rst          |  291 ++++++
 Documentation/block/data-integrity.txt          |  281 ------
 Documentation/block/deadline-iosched.rst        |   72 ++
 Documentation/block/deadline-iosched.txt        |   75 --
 Documentation/block/index.rst                   |   25 +
 Documentation/block/ioprio.rst                  |  182 ++++
 Documentation/block/ioprio.txt                  |  183 ----
 Documentation/block/kyber-iosched.rst           |   15 +
 Documentation/block/kyber-iosched.txt           |   14 -
 Documentation/block/null_blk.rst                |  126 +++
 Documentation/block/null_blk.txt                |   99 --
 Documentation/block/pr.rst                      |  119 +++
 Documentation/block/pr.txt                      |  119 ---
 Documentation/block/queue-sysfs.rst             |  254 +++++
 Documentation/block/queue-sysfs.txt             |  253 -----
 Documentation/block/request.rst                 |   99 ++
 Documentation/block/request.txt                 |   88 --
 Documentation/block/stat.rst                    |   93 ++
 Documentation/block/stat.txt                    |   86 --
 Documentation/block/switching-sched.rst         |   39 +
 Documentation/block/switching-sched.txt         |   35 -
 Documentation/block/writeback_cache_control.rst |   86 ++
 Documentation/block/writeback_cache_control.txt |   86 --
 Documentation/blockdev/zram.rst                 |    2 +-
 MAINTAINERS                                     |    2 +-
 block/Kconfig                                   |    2 +-
 block/Kconfig.iosched                           |    2 +-
 block/bfq-iosched.c                             |    2 +-
 block/blk-integrity.c                           |    2 +-
 block/ioprio.c                                  |    2 +-
 block/mq-deadline.c                             |    2 +-
 block/partitions/cmdline.c                      |    2 +-
 43 files changed, 3396 insertions(+), 3196 deletions(-)
 create mode 100644 Documentation/block/bfq-iosched.rst
 delete mode 100644 Documentation/block/bfq-iosched.txt
 create mode 100644 Documentation/block/biodoc.rst
 delete mode 100644 Documentation/block/biodoc.txt
 create mode 100644 Documentation/block/biovecs.rst
 delete mode 100644 Documentation/block/biovecs.txt
 create mode 100644 Documentation/block/capability.rst
 delete mode 100644 Documentation/block/capability.txt
 create mode 100644 Documentation/block/cmdline-partition.rst
 delete mode 100644 Documentation/block/cmdline-partition.txt
 create mode 100644 Documentation/block/data-integrity.rst
 delete mode 100644 Documentation/block/data-integrity.txt
 create mode 100644 Documentation/block/deadline-iosched.rst
 delete mode 100644 Documentation/block/deadline-iosched.txt
 create mode 100644 Documentation/block/index.rst
 create mode 100644 Documentation/block/ioprio.rst
 delete mode 100644 Documentation/block/ioprio.txt
 create mode 100644 Documentation/block/kyber-iosched.rst
 delete mode 100644 Documentation/block/kyber-iosched.txt
 create mode 100644 Documentation/block/null_blk.rst
 delete mode 100644 Documentation/block/null_blk.txt
 create mode 100644 Documentation/block/pr.rst
 delete mode 100644 Documentation/block/pr.txt
 create mode 100644 Documentation/block/queue-sysfs.rst
 delete mode 100644 Documentation/block/queue-sysfs.txt
 create mode 100644 Documentation/block/request.rst
 delete mode 100644 Documentation/block/request.txt
 create mode 100644 Documentation/block/stat.rst
 delete mode 100644 Documentation/block/stat.txt
 create mode 100644 Documentation/block/switching-sched.rst
 delete mode 100644 Documentation/block/switching-sched.txt
 create mode 100644 Documentation/block/writeback_cache_control.rst
 delete mode 100644 Documentation/block/writeback_cache_control.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 01123f1de354..e8e28cac32a3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -430,7 +430,7 @@
 
 	blkdevparts=	Manual partition parsing of block device(s) for
 			embedded devices based on command line input.
-			See Documentation/block/cmdline-partition.txt
+			See Documentation/block/cmdline-partition.rst
 
 	boot_delay=	Milliseconds to delay each printk during boot.
 			Values larger than 10 seconds (10000) are changed to
@@ -1199,9 +1199,9 @@
 
 	elevator=	[IOSCHED]
 			Format: { "mq-deadline" | "kyber" | "bfq" }
-			See Documentation/block/deadline-iosched.txt,
-			Documentation/block/kyber-iosched.txt and
-			Documentation/block/bfq-iosched.txt for details.
+			See Documentation/block/deadline-iosched.rst,
+			Documentation/block/kyber-iosched.rst and
+			Documentation/block/bfq-iosched.rst for details.
 
 	elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
 			Specifies physical address of start of kernel core
diff --git a/Documentation/block/bfq-iosched.rst b/Documentation/block/bfq-iosched.rst
new file mode 100644
index 000000000000..2c13b2fc1888
--- /dev/null
+++ b/Documentation/block/bfq-iosched.rst
@@ -0,0 +1,597 @@
+==========================
+BFQ (Budget Fair Queueing)
+==========================
+
+BFQ is a proportional-share I/O scheduler, with some extra
+low-latency capabilities. In addition to cgroups support (blkio or io
+controllers), BFQ's main features are:
+
+- BFQ guarantees a high system and application responsiveness, and a
+  low latency for time-sensitive applications, such as audio or video
+  players;
+- BFQ distributes bandwidth, and not just time, among processes or
+  groups (switching back to time distribution when needed to keep
+  throughput high).
+
+In its default configuration, BFQ privileges latency over
+throughput. So, when needed for achieving a lower latency, BFQ builds
+schedules that may lead to a lower throughput. If your main or only
+goal, for a given device, is to achieve the maximum-possible
+throughput at all times, then do switch off all low-latency heuristics
+for that device, by setting low_latency to 0. See Section 3 for
+details on how to configure BFQ for the desired tradeoff between
+latency and throughput, or on how to maximize throughput.
+
+As every I/O scheduler, BFQ adds some overhead to per-I/O-request
+processing. To give an idea of this overhead, the total,
+single-lock-protected, per-request processing time of BFQ---i.e., the
+sum of the execution times of the request insertion, dispatch and
+completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
+(dated CPU for notebooks; time measured with simple code
+instrumentation, and using the throughput-sync.sh script of the S
+suite [1], in performance-profiling mode). To put this result into
+context, the total, single-lock-protected, per-request execution time
+of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
+us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
+
+Scheduling overhead further limits the maximum IOPS that a CPU can
+process (already limited by the execution of the rest of the I/O
+stack). To give an idea of the limits with BFQ, on slow or average
+CPUs, here are, first, the limits of BFQ for three different CPUs, on,
+respectively, an average laptop, an old desktop, and a cheap embedded
+system, in case full hierarchical support is enabled (i.e.,
+CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
+set (Section 4-2):
+- Intel i7-4850HQ: 400 KIOPS
+- AMD A8-3850: 250 KIOPS
+- ARM CortexTM-A53 Octa-core: 80 KIOPS
+
+If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
+support is enabled), then the sustainable throughput with BFQ
+decreases, because all blkio.bfq* statistics are created and updated
+(Section 4-2). For BFQ, this leads to the following maximum
+sustainable throughputs, on the same systems as above:
+- Intel i7-4850HQ: 310 KIOPS
+- AMD A8-3850: 200 KIOPS
+- ARM CortexTM-A53 Octa-core: 56 KIOPS
+
+BFQ works for multi-queue devices too.
+
+.. The table of contents follow. Impatients can just jump to Section 3.
+
+.. CONTENTS
+
+   1. When may BFQ be useful?
+    1-1 Personal systems
+    1-2 Server systems
+   2. How does BFQ work?
+   3. What are BFQ's tunables and how to properly configure BFQ?
+   4. BFQ group scheduling
+    4-1 Service guarantees provided
+    4-2 Interface
+
+1. When may BFQ be useful?
+==========================
+
+BFQ provides the following benefits on personal and server systems.
+
+1-1 Personal systems
+--------------------
+
+Low latency for interactive applications
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Regardless of the actual background workload, BFQ guarantees that, for
+interactive tasks, the storage device is virtually as responsive as if
+it was idle. For example, even if one or more of the following
+background workloads are being executed:
+
+- one or more large files are being read, written or copied,
+- a tree of source files is being compiled,
+- one or more virtual machines are performing I/O,
+- a software update is in progress,
+- indexing daemons are scanning filesystems and updating their
+  databases,
+
+starting an application or loading a file from within an application
+takes about the same time as if the storage device was idle. As a
+comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
+applications experience high latencies, or even become unresponsive
+until the background workload terminates (also on SSDs).
+
+Low latency for soft real-time applications
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Also soft real-time applications, such as audio and video
+players/streamers, enjoy a low latency and a low drop rate, regardless
+of the background I/O workload. As a consequence, these applications
+do not suffer from almost any glitch due to the background workload.
+
+Higher speed for code-development tasks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If some additional workload happens to be executed in parallel, then
+BFQ executes the I/O-related components of typical code-development
+tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
+NOOP or DEADLINE.
+
+High throughput
+^^^^^^^^^^^^^^^
+
+On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
+up to 150% higher throughput than DEADLINE and NOOP, with all the
+sequential workloads considered in our tests. With random workloads,
+and with all the workloads on flash-based devices, BFQ achieves,
+instead, about the same throughput as the other schedulers.
+
+Strong fairness, bandwidth and delay guarantees
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+BFQ distributes the device throughput, and not just the device time,
+among I/O-bound applications in proportion their weights, with any
+workload and regardless of the device parameters. From these bandwidth
+guarantees, it is possible to compute tight per-I/O-request delay
+guarantees by a simple formula. If not configured for strict service
+guarantees, BFQ switches to time-based resource sharing (only) for
+applications that would otherwise cause a throughput loss.
+
+1-2 Server systems
+------------------
+
+Most benefits for server systems follow from the same service
+properties as above. In particular, regardless of whether additional,
+possibly heavy workloads are being served, BFQ guarantees:
+
+* audio and video-streaming with zero or very low jitter and drop
+  rate;
+
+* fast retrieval of WEB pages and embedded objects;
+
+* real-time recording of data in live-dumping applications (e.g.,
+  packet logging);
+
+* responsiveness in local and remote access to a server.
+
+
+2. How does BFQ work?
+=====================
+
+BFQ is a proportional-share I/O scheduler, whose general structure,
+plus a lot of code, are borrowed from CFQ.
+
+- Each process doing I/O on a device is associated with a weight and a
+  `(bfq_)queue`.
+
+- BFQ grants exclusive access to the device, for a while, to one queue
+  (process) at a time, and implements this service model by
+  associating every queue with a budget, measured in number of
+  sectors.
+
+  - After a queue is granted access to the device, the budget of the
+    queue is decremented, on each request dispatch, by the size of the
+    request.
+
+  - The in-service queue is expired, i.e., its service is suspended,
+    only if one of the following events occurs: 1) the queue finishes
+    its budget, 2) the queue empties, 3) a "budget timeout" fires.
+
+    - The budget timeout prevents processes doing random I/O from
+      holding the device for too long and dramatically reducing
+      throughput.
+
+    - Actually, as in CFQ, a queue associated with a process issuing
+      sync requests may not be expired immediately when it empties. In
+      contrast, BFQ may idle the device for a short time interval,
+      giving the process the chance to go on being served if it issues
+      a new request in time. Device idling typically boosts the
+      throughput on rotational devices and on non-queueing flash-based
+      devices, if processes do synchronous and sequential I/O. In
+      addition, under BFQ, device idling is also instrumental in
+      guaranteeing the desired throughput fraction to processes
+      issuing sync requests (see the description of the slice_idle
+      tunable in this document, or [1, 2], for more details).
+
+      - With respect to idling for service guarantees, if several
+	processes are competing for the device at the same time, but
+	all processes and groups have the same weight, then BFQ
+	guarantees the expected throughput distribution without ever
+	idling the device. Throughput is thus as high as possible in
+	this common scenario.
+
+     - On flash-based storage with internal queueing of commands
+       (typically NCQ), device idling happens to be always detrimental
+       for throughput. So, with these devices, BFQ performs idling
+       only when strictly needed for service guarantees, i.e., for
+       guaranteeing low latency or fairness. In these cases, overall
+       throughput may be sub-optimal. No solution currently exists to
+       provide both strong service guarantees and optimal throughput
+       on devices with internal queueing.
+
+  - If low-latency mode is enabled (default configuration), BFQ
+    executes some special heuristics to detect interactive and soft
+    real-time applications (e.g., video or audio players/streamers),
+    and to reduce their latency. The most important action taken to
+    achieve this goal is to give to the queues associated with these
+    applications more than their fair share of the device
+    throughput. For brevity, we call just "weight-raising" the whole
+    sets of actions taken by BFQ to privilege these queues. In
+    particular, BFQ provides a milder form of weight-raising for
+    interactive applications, and a stronger form for soft real-time
+    applications.
+
+  - BFQ automatically deactivates idling for queues born in a burst of
+    queue creations. In fact, these queues are usually associated with
+    the processes of applications and services that benefit mostly
+    from a high throughput. Examples are systemd during boot, or git
+    grep.
+
+  - As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
+    performing random I/O that becomes mostly sequential if
+    merged. Differently from CFQ, BFQ achieves this goal with a more
+    reactive mechanism, called Early Queue Merge (EQM). EQM is so
+    responsive in detecting interleaved I/O (cooperating processes),
+    that it enables BFQ to achieve a high throughput, by queue
+    merging, even for queues for which CFQ needs a different
+    mechanism, preemption, to get a high throughput. As such EQM is a
+    unified mechanism to achieve a high throughput with interleaved
+    I/O.
+
+  - Queues are scheduled according to a variant of WF2Q+, named
+    B-WF2Q+, and implemented using an augmented rb-tree to preserve an
+    O(log N) overall complexity.  See [2] for more details. B-WF2Q+ is
+    also ready for hierarchical scheduling, details in Section 4.
+
+  - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
+    perfectly fair, and smooth service. In particular, B-WF2Q+
+    guarantees that each queue receives a fraction of the device
+    throughput proportional to its weight, even if the throughput
+    fluctuates, and regardless of: the device parameters, the current
+    workload and the budgets assigned to the queue.
+
+  - The last, budget-independence, property (although probably
+    counterintuitive in the first place) is definitely beneficial, for
+    the following reasons:
+
+    - First, with any proportional-share scheduler, the maximum
+      deviation with respect to an ideal service is proportional to
+      the maximum budget (slice) assigned to queues. As a consequence,
+      BFQ can keep this deviation tight not only because of the
+      accurate service of B-WF2Q+, but also because BFQ *does not*
+      need to assign a larger budget to a queue to let the queue
+      receive a higher fraction of the device throughput.
+
+    - Second, BFQ is free to choose, for every process (queue), the
+      budget that best fits the needs of the process, or best
+      leverages the I/O pattern of the process. In particular, BFQ
+      updates queue budgets with a simple feedback-loop algorithm that
+      allows a high throughput to be achieved, while still providing
+      tight latency guarantees to time-sensitive applications. When
+      the in-service queue expires, this algorithm computes the next
+      budget of the queue so as to:
+
+      - Let large budgets be eventually assigned to the queues
+	associated with I/O-bound applications performing sequential
+	I/O: in fact, the longer these applications are served once
+	got access to the device, the higher the throughput is.
+
+      - Let small budgets be eventually assigned to the queues
+	associated with time-sensitive applications (which typically
+	perform sporadic and short I/O), because, the smaller the
+	budget assigned to a queue waiting for service is, the sooner
+	B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
+
+- If several processes are competing for the device at the same time,
+  but all processes and groups have the same weight, then BFQ
+  guarantees the expected throughput distribution without ever idling
+  the device. It uses preemption instead. Throughput is then much
+  higher in this common scenario.
+
+- ioprio classes are served in strict priority order, i.e.,
+  lower-priority queues are not served as long as there are
+  higher-priority queues.  Among queues in the same class, the
+  bandwidth is distributed in proportion to the weight of each
+  queue. A very thin extra bandwidth is however guaranteed to
+  the Idle class, to prevent it from starving.
+
+
+3. What are BFQ's tunables and how to properly configure BFQ?
+=============================================================
+
+Most BFQ tunables affect service guarantees (basically latency and
+fairness) and throughput. For full details on how to choose the
+desired tradeoff between service guarantees and throughput, see the
+parameters slice_idle, strict_guarantees and low_latency. For details
+on how to maximise throughput, see slice_idle, timeout_sync and
+max_budget. The other performance-related parameters have been
+inherited from, and have been preserved mostly for compatibility with
+CFQ. So far, no performance improvement has been reported after
+changing the latter parameters in BFQ.
+
+In particular, the tunables back_seek-max, back_seek_penalty,
+fifo_expire_async and fifo_expire_sync below are the same as in
+CFQ. Their description is just copied from that for CFQ. Some
+considerations in the description of slice_idle are copied from CFQ
+too.
+
+per-process ioprio and weight
+-----------------------------
+
+Unless the cgroups interface is used (see "4. BFQ group scheduling"),
+weights can be assigned to processes only indirectly, through I/O
+priorities, and according to the relation:
+weight = (IOPRIO_BE_NR - ioprio) * 10.
+
+Beware that, if low-latency is set, then BFQ automatically raises the
+weight of the queues associated with interactive and soft real-time
+applications. Unset this tunable if you need/want to control weights.
+
+slice_idle
+----------
+
+This parameter specifies how long BFQ should idle for next I/O
+request, when certain sync BFQ queues become empty. By default
+slice_idle is a non-zero value. Idling has a double purpose: boosting
+throughput and making sure that the desired throughput distribution is
+respected (see the description of how BFQ works, and, if needed, the
+papers referred there).
+
+As for throughput, idling can be very helpful on highly seeky media
+like single spindle SATA/SAS disks where we can cut down on overall
+number of seeks and see improved throughput.
+
+Setting slice_idle to 0 will remove all the idling on queues and one
+should see an overall improved throughput on faster storage devices
+like multiple SATA/SAS disks in hardware RAID configuration, as well
+as flash-based storage with internal command queueing (and
+parallelism).
+
+So depending on storage and workload, it might be useful to set
+slice_idle=0.  In general for SATA/SAS disks and software RAID of
+SATA/SAS disks keeping slice_idle enabled should be useful. For any
+configurations where there are multiple spindles behind single LUN
+(Host based hardware RAID controller or for storage arrays), or with
+flash-based fast storage, setting slice_idle=0 might end up in better
+throughput and acceptable latencies.
+
+Idling is however necessary to have service guarantees enforced in
+case of differentiated weights or differentiated I/O-request lengths.
+To see why, suppose that a given BFQ queue A must get several I/O
+requests served for each request served for another queue B. Idling
+ensures that, if A makes a new I/O request slightly after becoming
+empty, then no request of B is dispatched in the middle, and thus A
+does not lose the possibility to get more than one request dispatched
+before the next request of B is dispatched. Note that idling
+guarantees the desired differentiated treatment of queues only in
+terms of I/O-request dispatches. To guarantee that the actual service
+order then corresponds to the dispatch order, the strict_guarantees
+tunable must be set too.
+
+There is an important flipside for idling: apart from the above cases
+where it is beneficial also for throughput, idling can severely impact
+throughput. One important case is random workload. Because of this
+issue, BFQ tends to avoid idling as much as possible, when it is not
+beneficial also for throughput (as detailed in Section 2). As a
+consequence of this behavior, and of further issues described for the
+strict_guarantees tunable, short-term service guarantees may be
+occasionally violated. And, in some cases, these guarantees may be
+more important than guaranteeing maximum throughput. For example, in
+video playing/streaming, a very low drop rate may be more important
+than maximum throughput. In these cases, consider setting the
+strict_guarantees parameter.
+
+slice_idle_us
+-------------
+
+Controls the same tuning parameter as slice_idle, but in microseconds.
+Either tunable can be used to set idling behavior.  Afterwards, the
+other tunable will reflect the newly set value in sysfs.
+
+strict_guarantees
+-----------------
+
+If this parameter is set (default: unset), then BFQ
+
+- always performs idling when the in-service queue becomes empty;
+
+- forces the device to serve one I/O request at a time, by dispatching a
+  new request only if there is no outstanding request.
+
+In the presence of differentiated weights or I/O-request sizes, both
+the above conditions are needed to guarantee that every BFQ queue
+receives its allotted share of the bandwidth. The first condition is
+needed for the reasons explained in the description of the slice_idle
+tunable.  The second condition is needed because all modern storage
+devices reorder internally-queued requests, which may trivially break
+the service guarantees enforced by the I/O scheduler.
+
+Setting strict_guarantees may evidently affect throughput.
+
+back_seek_max
+-------------
+
+This specifies, given in Kbytes, the maximum "distance" for backward seeking.
+The distance is the amount of space from the current head location to the
+sectors that are backward in terms of distance.
+
+This parameter allows the scheduler to anticipate requests in the "backward"
+direction and consider them as being the "next" if they are within this
+distance from the current head location.
+
+back_seek_penalty
+-----------------
+
+This parameter is used to compute the cost of backward seeking. If the
+backward distance of request is just 1/back_seek_penalty from a "front"
+request, then the seeking cost of two requests is considered equivalent.
+
+So scheduler will not bias toward one or the other request (otherwise scheduler
+will bias toward front request). Default value of back_seek_penalty is 2.
+
+fifo_expire_async
+-----------------
+
+This parameter is used to set the timeout of asynchronous requests. Default
+value of this is 248ms.
+
+fifo_expire_sync
+----------------
+
+This parameter is used to set the timeout of synchronous requests. Default
+value of this is 124ms. In case to favor synchronous requests over asynchronous
+one, this value should be decreased relative to fifo_expire_async.
+
+low_latency
+-----------
+
+This parameter is used to enable/disable BFQ's low latency mode. By
+default, low latency mode is enabled. If enabled, interactive and soft
+real-time applications are privileged and experience a lower latency,
+as explained in more detail in the description of how BFQ works.
+
+DISABLE this mode if you need full control on bandwidth
+distribution. In fact, if it is enabled, then BFQ automatically
+increases the bandwidth share of privileged applications, as the main
+means to guarantee a lower latency to them.
+
+In addition, as already highlighted at the beginning of this document,
+DISABLE this mode if your only goal is to achieve a high throughput.
+In fact, privileging the I/O of some application over the rest may
+entail a lower throughput. To achieve the highest-possible throughput
+on a non-rotational device, setting slice_idle to 0 may be needed too
+(at the cost of giving up any strong guarantee on fairness and low
+latency).
+
+timeout_sync
+------------
+
+Maximum amount of device time that can be given to a task (queue) once
+it has been selected for service. On devices with costly seeks,
+increasing this time usually increases maximum throughput. On the
+opposite end, increasing this time coarsens the granularity of the
+short-term bandwidth and latency guarantees, especially if the
+following parameter is set to zero.
+
+max_budget
+----------
+
+Maximum amount of service, measured in sectors, that can be provided
+to a BFQ queue once it is set in service (of course within the limits
+of the above timeout). According to what said in the description of
+the algorithm, larger values increase the throughput in proportion to
+the percentage of sequential I/O requests issued. The price of larger
+values is that they coarsen the granularity of short-term bandwidth
+and latency guarantees.
+
+The default value is 0, which enables auto-tuning: BFQ sets max_budget
+to the maximum number of sectors that can be served during
+timeout_sync, according to the estimated peak rate.
+
+For specific devices, some users have occasionally reported to have
+reached a higher throughput by setting max_budget explicitly, i.e., by
+setting max_budget to a higher value than 0. In particular, they have
+set max_budget to higher values than those to which BFQ would have set
+it with auto-tuning. An alternative way to achieve this goal is to
+just increase the value of timeout_sync, leaving max_budget equal to 0.
+
+weights
+-------
+
+Read-only parameter, used to show the weights of the currently active
+BFQ queues.
+
+
+4. Group scheduling with BFQ
+============================
+
+BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
+blkio and io. In particular, BFQ supports weight-based proportional
+share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
+
+4-1 Service guarantees provided
+-------------------------------
+
+With BFQ, proportional share means true proportional share of the
+device bandwidth, according to group weights. For example, a group
+with weight 200 gets twice the bandwidth, and not just twice the time,
+of a group with weight 100.
+
+BFQ supports hierarchies (group trees) of any depth. Bandwidth is
+distributed among groups and processes in the expected way: for each
+group, the children of the group share the whole bandwidth of the
+group in proportion to their weights. In particular, this implies
+that, for each leaf group, every process of the group receives the
+same share of the whole group bandwidth, unless the ioprio of the
+process is modified.
+
+The resource-sharing guarantee for a group may partially or totally
+switch from bandwidth to time, if providing bandwidth guarantees to
+the group lowers the throughput too much. This switch occurs on a
+per-process basis: if a process of a leaf group causes throughput loss
+if served in such a way to receive its share of the bandwidth, then
+BFQ switches back to just time-based proportional share for that
+process.
+
+4-2 Interface
+-------------
+
+To get proportional sharing of bandwidth with BFQ for a given device,
+BFQ must of course be the active scheduler for that device.
+
+Within each group directory, the names of the files associated with
+BFQ-specific cgroup parameters and stats begin with the "bfq."
+prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
+BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
+parameter to set the weight of a group with BFQ is blkio.bfq.weight
+or io.bfq.weight.
+
+As for cgroups-v1 (blkio controller), the exact set of stat files
+created, and kept up-to-date by bfq, depends on whether
+CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
+the stat files documented in
+Documentation/cgroup-v1/blkio-controller.rst. If, instead,
+CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files::
+
+  blkio.bfq.io_service_bytes
+  blkio.bfq.io_service_bytes_recursive
+  blkio.bfq.io_serviced
+  blkio.bfq.io_serviced_recursive
+
+The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
+throughput sustainable with bfq, because updating the blkio.bfq.*
+stats is rather costly, especially for some of the stats enabled by
+CONFIG_BFQ_CGROUP_DEBUG.
+
+Parameters to set
+-----------------
+
+For each group, there is only the following parameter to set.
+
+weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
+group inside its parent. Available values: 1..10000 (default 100). The
+linear mapping between ioprio and weights, described at the beginning
+of the tunable section, is still valid, but all weights higher than
+IOPRIO_BE_NR*10 are mapped to ioprio 0.
+
+Recall that, if low-latency is set, then BFQ automatically raises the
+weight of the queues associated with interactive and soft real-time
+applications. Unset this tunable if you need/want to control weights.
+
+
+[1]
+    P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
+    Scheduler", Proceedings of the First Workshop on Mobile System
+    Technologies (MST-2015), May 2015.
+
+    http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
+
+[2]
+    P. Valente and M. Andreolini, "Improving Application
+    Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
+    the 5th Annual International Systems and Storage Conference
+    (SYSTOR '12), June 2012.
+
+    Slightly extended version:
+
+    http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-results.pdf
+
+[3]
+   https://github.com/Algodev-github/S
diff --git a/Documentation/block/bfq-iosched.txt b/Documentation/block/bfq-iosched.txt
deleted file mode 100644
index bbd6eb5bbb07..000000000000
--- a/Documentation/block/bfq-iosched.txt
+++ /dev/null
@@ -1,583 +0,0 @@
-BFQ (Budget Fair Queueing)
-==========================
-
-BFQ is a proportional-share I/O scheduler, with some extra
-low-latency capabilities. In addition to cgroups support (blkio or io
-controllers), BFQ's main features are:
-- BFQ guarantees a high system and application responsiveness, and a
-  low latency for time-sensitive applications, such as audio or video
-  players;
-- BFQ distributes bandwidth, and not just time, among processes or
-  groups (switching back to time distribution when needed to keep
-  throughput high).
-
-In its default configuration, BFQ privileges latency over
-throughput. So, when needed for achieving a lower latency, BFQ builds
-schedules that may lead to a lower throughput. If your main or only
-goal, for a given device, is to achieve the maximum-possible
-throughput at all times, then do switch off all low-latency heuristics
-for that device, by setting low_latency to 0. See Section 3 for
-details on how to configure BFQ for the desired tradeoff between
-latency and throughput, or on how to maximize throughput.
-
-As every I/O scheduler, BFQ adds some overhead to per-I/O-request
-processing. To give an idea of this overhead, the total,
-single-lock-protected, per-request processing time of BFQ---i.e., the
-sum of the execution times of the request insertion, dispatch and
-completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
-(dated CPU for notebooks; time measured with simple code
-instrumentation, and using the throughput-sync.sh script of the S
-suite [1], in performance-profiling mode). To put this result into
-context, the total, single-lock-protected, per-request execution time
-of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
-us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
-
-Scheduling overhead further limits the maximum IOPS that a CPU can
-process (already limited by the execution of the rest of the I/O
-stack). To give an idea of the limits with BFQ, on slow or average
-CPUs, here are, first, the limits of BFQ for three different CPUs, on,
-respectively, an average laptop, an old desktop, and a cheap embedded
-system, in case full hierarchical support is enabled (i.e.,
-CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
-set (Section 4-2):
-- Intel i7-4850HQ: 400 KIOPS
-- AMD A8-3850: 250 KIOPS
-- ARM CortexTM-A53 Octa-core: 80 KIOPS
-
-If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
-support is enabled), then the sustainable throughput with BFQ
-decreases, because all blkio.bfq* statistics are created and updated
-(Section 4-2). For BFQ, this leads to the following maximum
-sustainable throughputs, on the same systems as above:
-- Intel i7-4850HQ: 310 KIOPS
-- AMD A8-3850: 200 KIOPS
-- ARM CortexTM-A53 Octa-core: 56 KIOPS
-
-BFQ works for multi-queue devices too.
-
-The table of contents follow. Impatients can just jump to Section 3.
-
-CONTENTS
-
-1. When may BFQ be useful?
- 1-1 Personal systems
- 1-2 Server systems
-2. How does BFQ work?
-3. What are BFQ's tunables and how to properly configure BFQ?
-4. BFQ group scheduling
- 4-1 Service guarantees provided
- 4-2 Interface
-
-1. When may BFQ be useful?
-==========================
-
-BFQ provides the following benefits on personal and server systems.
-
-1-1 Personal systems
---------------------
-
-Low latency for interactive applications
-
-Regardless of the actual background workload, BFQ guarantees that, for
-interactive tasks, the storage device is virtually as responsive as if
-it was idle. For example, even if one or more of the following
-background workloads are being executed:
-- one or more large files are being read, written or copied,
-- a tree of source files is being compiled,
-- one or more virtual machines are performing I/O,
-- a software update is in progress,
-- indexing daemons are scanning filesystems and updating their
-  databases,
-starting an application or loading a file from within an application
-takes about the same time as if the storage device was idle. As a
-comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
-applications experience high latencies, or even become unresponsive
-until the background workload terminates (also on SSDs).
-
-Low latency for soft real-time applications
-
-Also soft real-time applications, such as audio and video
-players/streamers, enjoy a low latency and a low drop rate, regardless
-of the background I/O workload. As a consequence, these applications
-do not suffer from almost any glitch due to the background workload.
-
-Higher speed for code-development tasks
-
-If some additional workload happens to be executed in parallel, then
-BFQ executes the I/O-related components of typical code-development
-tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
-NOOP or DEADLINE.
-
-High throughput
-
-On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
-up to 150% higher throughput than DEADLINE and NOOP, with all the
-sequential workloads considered in our tests. With random workloads,
-and with all the workloads on flash-based devices, BFQ achieves,
-instead, about the same throughput as the other schedulers.
-
-Strong fairness, bandwidth and delay guarantees
-
-BFQ distributes the device throughput, and not just the device time,
-among I/O-bound applications in proportion their weights, with any
-workload and regardless of the device parameters. From these bandwidth
-guarantees, it is possible to compute tight per-I/O-request delay
-guarantees by a simple formula. If not configured for strict service
-guarantees, BFQ switches to time-based resource sharing (only) for
-applications that would otherwise cause a throughput loss.
-
-1-2 Server systems
-------------------
-
-Most benefits for server systems follow from the same service
-properties as above. In particular, regardless of whether additional,
-possibly heavy workloads are being served, BFQ guarantees:
-
-. audio and video-streaming with zero or very low jitter and drop
-  rate;
-
-. fast retrieval of WEB pages and embedded objects;
-
-. real-time recording of data in live-dumping applications (e.g.,
-  packet logging);
-
-. responsiveness in local and remote access to a server.
-
-
-2. How does BFQ work?
-=====================
-
-BFQ is a proportional-share I/O scheduler, whose general structure,
-plus a lot of code, are borrowed from CFQ.
-
-- Each process doing I/O on a device is associated with a weight and a
-  (bfq_)queue.
-
-- BFQ grants exclusive access to the device, for a while, to one queue
-  (process) at a time, and implements this service model by
-  associating every queue with a budget, measured in number of
-  sectors.
-
-  - After a queue is granted access to the device, the budget of the
-    queue is decremented, on each request dispatch, by the size of the
-    request.
-
-  - The in-service queue is expired, i.e., its service is suspended,
-    only if one of the following events occurs: 1) the queue finishes
-    its budget, 2) the queue empties, 3) a "budget timeout" fires.
-
-    - The budget timeout prevents processes doing random I/O from
-      holding the device for too long and dramatically reducing
-      throughput.
-
-    - Actually, as in CFQ, a queue associated with a process issuing
-      sync requests may not be expired immediately when it empties. In
-      contrast, BFQ may idle the device for a short time interval,
-      giving the process the chance to go on being served if it issues
-      a new request in time. Device idling typically boosts the
-      throughput on rotational devices and on non-queueing flash-based
-      devices, if processes do synchronous and sequential I/O. In
-      addition, under BFQ, device idling is also instrumental in
-      guaranteeing the desired throughput fraction to processes
-      issuing sync requests (see the description of the slice_idle
-      tunable in this document, or [1, 2], for more details).
-
-      - With respect to idling for service guarantees, if several
-	processes are competing for the device at the same time, but
-	all processes and groups have the same weight, then BFQ
-	guarantees the expected throughput distribution without ever
-	idling the device. Throughput is thus as high as possible in
-	this common scenario.
-
-     - On flash-based storage with internal queueing of commands
-       (typically NCQ), device idling happens to be always detrimental
-       for throughput. So, with these devices, BFQ performs idling
-       only when strictly needed for service guarantees, i.e., for
-       guaranteeing low latency or fairness. In these cases, overall
-       throughput may be sub-optimal. No solution currently exists to
-       provide both strong service guarantees and optimal throughput
-       on devices with internal queueing.
-
-  - If low-latency mode is enabled (default configuration), BFQ
-    executes some special heuristics to detect interactive and soft
-    real-time applications (e.g., video or audio players/streamers),
-    and to reduce their latency. The most important action taken to
-    achieve this goal is to give to the queues associated with these
-    applications more than their fair share of the device
-    throughput. For brevity, we call just "weight-raising" the whole
-    sets of actions taken by BFQ to privilege these queues. In
-    particular, BFQ provides a milder form of weight-raising for
-    interactive applications, and a stronger form for soft real-time
-    applications.
-
-  - BFQ automatically deactivates idling for queues born in a burst of
-    queue creations. In fact, these queues are usually associated with
-    the processes of applications and services that benefit mostly
-    from a high throughput. Examples are systemd during boot, or git
-    grep.
-
-  - As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
-    performing random I/O that becomes mostly sequential if
-    merged. Differently from CFQ, BFQ achieves this goal with a more
-    reactive mechanism, called Early Queue Merge (EQM). EQM is so
-    responsive in detecting interleaved I/O (cooperating processes),
-    that it enables BFQ to achieve a high throughput, by queue
-    merging, even for queues for which CFQ needs a different
-    mechanism, preemption, to get a high throughput. As such EQM is a
-    unified mechanism to achieve a high throughput with interleaved
-    I/O.
-
-  - Queues are scheduled according to a variant of WF2Q+, named
-    B-WF2Q+, and implemented using an augmented rb-tree to preserve an
-    O(log N) overall complexity.  See [2] for more details. B-WF2Q+ is
-    also ready for hierarchical scheduling, details in Section 4.
-
-  - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
-    perfectly fair, and smooth service. In particular, B-WF2Q+
-    guarantees that each queue receives a fraction of the device
-    throughput proportional to its weight, even if the throughput
-    fluctuates, and regardless of: the device parameters, the current
-    workload and the budgets assigned to the queue.
-
-  - The last, budget-independence, property (although probably
-    counterintuitive in the first place) is definitely beneficial, for
-    the following reasons:
-
-    - First, with any proportional-share scheduler, the maximum
-      deviation with respect to an ideal service is proportional to
-      the maximum budget (slice) assigned to queues. As a consequence,
-      BFQ can keep this deviation tight not only because of the
-      accurate service of B-WF2Q+, but also because BFQ *does not*
-      need to assign a larger budget to a queue to let the queue
-      receive a higher fraction of the device throughput.
-
-    - Second, BFQ is free to choose, for every process (queue), the
-      budget that best fits the needs of the process, or best
-      leverages the I/O pattern of the process. In particular, BFQ
-      updates queue budgets with a simple feedback-loop algorithm that
-      allows a high throughput to be achieved, while still providing
-      tight latency guarantees to time-sensitive applications. When
-      the in-service queue expires, this algorithm computes the next
-      budget of the queue so as to:
-
-      - Let large budgets be eventually assigned to the queues
-	associated with I/O-bound applications performing sequential
-	I/O: in fact, the longer these applications are served once
-	got access to the device, the higher the throughput is.
-
-      - Let small budgets be eventually assigned to the queues
-	associated with time-sensitive applications (which typically
-	perform sporadic and short I/O), because, the smaller the
-	budget assigned to a queue waiting for service is, the sooner
-	B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
-
-- If several processes are competing for the device at the same time,
-  but all processes and groups have the same weight, then BFQ
-  guarantees the expected throughput distribution without ever idling
-  the device. It uses preemption instead. Throughput is then much
-  higher in this common scenario.
-
-- ioprio classes are served in strict priority order, i.e.,
-  lower-priority queues are not served as long as there are
-  higher-priority queues.  Among queues in the same class, the
-  bandwidth is distributed in proportion to the weight of each
-  queue. A very thin extra bandwidth is however guaranteed to
-  the Idle class, to prevent it from starving.
-
-
-3. What are BFQ's tunables and how to properly configure BFQ?
-=============================================================
-
-Most BFQ tunables affect service guarantees (basically latency and
-fairness) and throughput. For full details on how to choose the
-desired tradeoff between service guarantees and throughput, see the
-parameters slice_idle, strict_guarantees and low_latency. For details
-on how to maximise throughput, see slice_idle, timeout_sync and
-max_budget. The other performance-related parameters have been
-inherited from, and have been preserved mostly for compatibility with
-CFQ. So far, no performance improvement has been reported after
-changing the latter parameters in BFQ.
-
-In particular, the tunables back_seek-max, back_seek_penalty,
-fifo_expire_async and fifo_expire_sync below are the same as in
-CFQ. Their description is just copied from that for CFQ. Some
-considerations in the description of slice_idle are copied from CFQ
-too.
-
-per-process ioprio and weight
------------------------------
-
-Unless the cgroups interface is used (see "4. BFQ group scheduling"),
-weights can be assigned to processes only indirectly, through I/O
-priorities, and according to the relation:
-weight = (IOPRIO_BE_NR - ioprio) * 10.
-
-Beware that, if low-latency is set, then BFQ automatically raises the
-weight of the queues associated with interactive and soft real-time
-applications. Unset this tunable if you need/want to control weights.
-
-slice_idle
-----------
-
-This parameter specifies how long BFQ should idle for next I/O
-request, when certain sync BFQ queues become empty. By default
-slice_idle is a non-zero value. Idling has a double purpose: boosting
-throughput and making sure that the desired throughput distribution is
-respected (see the description of how BFQ works, and, if needed, the
-papers referred there).
-
-As for throughput, idling can be very helpful on highly seeky media
-like single spindle SATA/SAS disks where we can cut down on overall
-number of seeks and see improved throughput.
-
-Setting slice_idle to 0 will remove all the idling on queues and one
-should see an overall improved throughput on faster storage devices
-like multiple SATA/SAS disks in hardware RAID configuration, as well
-as flash-based storage with internal command queueing (and
-parallelism).
-
-So depending on storage and workload, it might be useful to set
-slice_idle=0.  In general for SATA/SAS disks and software RAID of
-SATA/SAS disks keeping slice_idle enabled should be useful. For any
-configurations where there are multiple spindles behind single LUN
-(Host based hardware RAID controller or for storage arrays), or with
-flash-based fast storage, setting slice_idle=0 might end up in better
-throughput and acceptable latencies.
-
-Idling is however necessary to have service guarantees enforced in
-case of differentiated weights or differentiated I/O-request lengths.
-To see why, suppose that a given BFQ queue A must get several I/O
-requests served for each request served for another queue B. Idling
-ensures that, if A makes a new I/O request slightly after becoming
-empty, then no request of B is dispatched in the middle, and thus A
-does not lose the possibility to get more than one request dispatched
-before the next request of B is dispatched. Note that idling
-guarantees the desired differentiated treatment of queues only in
-terms of I/O-request dispatches. To guarantee that the actual service
-order then corresponds to the dispatch order, the strict_guarantees
-tunable must be set too.
-
-There is an important flipside for idling: apart from the above cases
-where it is beneficial also for throughput, idling can severely impact
-throughput. One important case is random workload. Because of this
-issue, BFQ tends to avoid idling as much as possible, when it is not
-beneficial also for throughput (as detailed in Section 2). As a
-consequence of this behavior, and of further issues described for the
-strict_guarantees tunable, short-term service guarantees may be
-occasionally violated. And, in some cases, these guarantees may be
-more important than guaranteeing maximum throughput. For example, in
-video playing/streaming, a very low drop rate may be more important
-than maximum throughput. In these cases, consider setting the
-strict_guarantees parameter.
-
-slice_idle_us
--------------
-
-Controls the same tuning parameter as slice_idle, but in microseconds.
-Either tunable can be used to set idling behavior.  Afterwards, the
-other tunable will reflect the newly set value in sysfs.
-
-strict_guarantees
------------------
-
-If this parameter is set (default: unset), then BFQ
-
-- always performs idling when the in-service queue becomes empty;
-
-- forces the device to serve one I/O request at a time, by dispatching a
-  new request only if there is no outstanding request.
-
-In the presence of differentiated weights or I/O-request sizes, both
-the above conditions are needed to guarantee that every BFQ queue
-receives its allotted share of the bandwidth. The first condition is
-needed for the reasons explained in the description of the slice_idle
-tunable.  The second condition is needed because all modern storage
-devices reorder internally-queued requests, which may trivially break
-the service guarantees enforced by the I/O scheduler.
-
-Setting strict_guarantees may evidently affect throughput.
-
-back_seek_max
--------------
-
-This specifies, given in Kbytes, the maximum "distance" for backward seeking.
-The distance is the amount of space from the current head location to the
-sectors that are backward in terms of distance.
-
-This parameter allows the scheduler to anticipate requests in the "backward"
-direction and consider them as being the "next" if they are within this
-distance from the current head location.
-
-back_seek_penalty
------------------
-
-This parameter is used to compute the cost of backward seeking. If the
-backward distance of request is just 1/back_seek_penalty from a "front"
-request, then the seeking cost of two requests is considered equivalent.
-
-So scheduler will not bias toward one or the other request (otherwise scheduler
-will bias toward front request). Default value of back_seek_penalty is 2.
-
-fifo_expire_async
------------------
-
-This parameter is used to set the timeout of asynchronous requests. Default
-value of this is 248ms.
-
-fifo_expire_sync
-----------------
-
-This parameter is used to set the timeout of synchronous requests. Default
-value of this is 124ms. In case to favor synchronous requests over asynchronous
-one, this value should be decreased relative to fifo_expire_async.
-
-low_latency
------------
-
-This parameter is used to enable/disable BFQ's low latency mode. By
-default, low latency mode is enabled. If enabled, interactive and soft
-real-time applications are privileged and experience a lower latency,
-as explained in more detail in the description of how BFQ works.
-
-DISABLE this mode if you need full control on bandwidth
-distribution. In fact, if it is enabled, then BFQ automatically
-increases the bandwidth share of privileged applications, as the main
-means to guarantee a lower latency to them.
-
-In addition, as already highlighted at the beginning of this document,
-DISABLE this mode if your only goal is to achieve a high throughput.
-In fact, privileging the I/O of some application over the rest may
-entail a lower throughput. To achieve the highest-possible throughput
-on a non-rotational device, setting slice_idle to 0 may be needed too
-(at the cost of giving up any strong guarantee on fairness and low
-latency).
-
-timeout_sync
-------------
-
-Maximum amount of device time that can be given to a task (queue) once
-it has been selected for service. On devices with costly seeks,
-increasing this time usually increases maximum throughput. On the
-opposite end, increasing this time coarsens the granularity of the
-short-term bandwidth and latency guarantees, especially if the
-following parameter is set to zero.
-
-max_budget
-----------
-
-Maximum amount of service, measured in sectors, that can be provided
-to a BFQ queue once it is set in service (of course within the limits
-of the above timeout). According to what said in the description of
-the algorithm, larger values increase the throughput in proportion to
-the percentage of sequential I/O requests issued. The price of larger
-values is that they coarsen the granularity of short-term bandwidth
-and latency guarantees.
-
-The default value is 0, which enables auto-tuning: BFQ sets max_budget
-to the maximum number of sectors that can be served during
-timeout_sync, according to the estimated peak rate.
-
-For specific devices, some users have occasionally reported to have
-reached a higher throughput by setting max_budget explicitly, i.e., by
-setting max_budget to a higher value than 0. In particular, they have
-set max_budget to higher values than those to which BFQ would have set
-it with auto-tuning. An alternative way to achieve this goal is to
-just increase the value of timeout_sync, leaving max_budget equal to 0.
-
-weights
--------
-
-Read-only parameter, used to show the weights of the currently active
-BFQ queues.
-
-
-4. Group scheduling with BFQ
-============================
-
-BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
-blkio and io. In particular, BFQ supports weight-based proportional
-share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
-
-4-1 Service guarantees provided
--------------------------------
-
-With BFQ, proportional share means true proportional share of the
-device bandwidth, according to group weights. For example, a group
-with weight 200 gets twice the bandwidth, and not just twice the time,
-of a group with weight 100.
-
-BFQ supports hierarchies (group trees) of any depth. Bandwidth is
-distributed among groups and processes in the expected way: for each
-group, the children of the group share the whole bandwidth of the
-group in proportion to their weights. In particular, this implies
-that, for each leaf group, every process of the group receives the
-same share of the whole group bandwidth, unless the ioprio of the
-process is modified.
-
-The resource-sharing guarantee for a group may partially or totally
-switch from bandwidth to time, if providing bandwidth guarantees to
-the group lowers the throughput too much. This switch occurs on a
-per-process basis: if a process of a leaf group causes throughput loss
-if served in such a way to receive its share of the bandwidth, then
-BFQ switches back to just time-based proportional share for that
-process.
-
-4-2 Interface
--------------
-
-To get proportional sharing of bandwidth with BFQ for a given device,
-BFQ must of course be the active scheduler for that device.
-
-Within each group directory, the names of the files associated with
-BFQ-specific cgroup parameters and stats begin with the "bfq."
-prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
-BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
-parameter to set the weight of a group with BFQ is blkio.bfq.weight
-or io.bfq.weight.
-
-As for cgroups-v1 (blkio controller), the exact set of stat files
-created, and kept up-to-date by bfq, depends on whether
-CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
-the stat files documented in
-Documentation/cgroup-v1/blkio-controller.rst. If, instead,
-CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files
-blkio.bfq.io_service_bytes
-blkio.bfq.io_service_bytes_recursive
-blkio.bfq.io_serviced
-blkio.bfq.io_serviced_recursive
-
-The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
-throughput sustainable with bfq, because updating the blkio.bfq.*
-stats is rather costly, especially for some of the stats enabled by
-CONFIG_BFQ_CGROUP_DEBUG.
-
-Parameters to set
------------------
-
-For each group, there is only the following parameter to set.
-
-weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
-group inside its parent. Available values: 1..10000 (default 100). The
-linear mapping between ioprio and weights, described at the beginning
-of the tunable section, is still valid, but all weights higher than
-IOPRIO_BE_NR*10 are mapped to ioprio 0.
-
-Recall that, if low-latency is set, then BFQ automatically raises the
-weight of the queues associated with interactive and soft real-time
-applications. Unset this tunable if you need/want to control weights.
-
-
-[1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
-    Scheduler", Proceedings of the First Workshop on Mobile System
-    Technologies (MST-2015), May 2015.
-    http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
-
-[2] P. Valente and M. Andreolini, "Improving Application
-    Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
-    the 5th Annual International Systems and Storage Conference
-    (SYSTOR '12), June 2012.
-    Slightly extended version:
-    http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
-							results.pdf
-
-[3] https://github.com/Algodev-github/S
diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst
new file mode 100644
index 000000000000..d6e30b680405
--- /dev/null
+++ b/Documentation/block/biodoc.rst
@@ -0,0 +1,1168 @@
+=====================================================
+Notes on the Generic Block Layer Rewrite in Linux 2.5
+=====================================================
+
+.. note::
+
+	It seems that there are lot of outdated stuff here. This seems
+	to be written somewhat as a task list. Yet, eventually, something
+	here might still be useful.
+
+Notes Written on Jan 15, 2002:
+	- Jens Axboe <jens.axboe@oracle.com>
+	- Suparna Bhattacharya <suparna@in.ibm.com>
+
+Last Updated May 2, 2002
+
+September 2003: Updated I/O Scheduler portions
+	- Nick Piggin <npiggin@kernel.dk>
+
+Introduction
+============
+
+These are some notes describing some aspects of the 2.5 block layer in the
+context of the bio rewrite. The idea is to bring out some of the key
+changes and a glimpse of the rationale behind those changes.
+
+Please mail corrections & suggestions to suparna@in.ibm.com.
+
+Credits
+=======
+
+2.5 bio rewrite:
+	- Jens Axboe <jens.axboe@oracle.com>
+
+Many aspects of the generic block layer redesign were driven by and evolved
+over discussions, prior patches and the collective experience of several
+people. See sections 8 and 9 for a list of some related references.
+
+The following people helped with review comments and inputs for this
+document:
+
+	- Christoph Hellwig <hch@infradead.org>
+	- Arjan van de Ven <arjanv@redhat.com>
+	- Randy Dunlap <rdunlap@xenotime.net>
+	- Andre Hedrick <andre@linux-ide.org>
+
+The following people helped with fixes/contributions to the bio patches
+while it was still work-in-progress:
+
+	- David S. Miller <davem@redhat.com>
+
+
+.. Description of Contents:
+
+   1. Scope for tuning of logic to various needs
+     1.1 Tuning based on device or low level driver capabilities
+	- Per-queue parameters
+	- Highmem I/O support
+	- I/O scheduler modularization
+     1.2 Tuning based on high level requirements/capabilities
+	1.2.1 Request Priority/Latency
+     1.3 Direct access/bypass to lower layers for diagnostics and special
+	 device operations
+	1.3.1 Pre-built commands
+   2. New flexible and generic but minimalist i/o structure or descriptor
+      (instead of using buffer heads at the i/o layer)
+     2.1 Requirements/Goals addressed
+     2.2 The bio struct in detail (multi-page io unit)
+     2.3 Changes in the request structure
+   3. Using bios
+     3.1 Setup/teardown (allocation, splitting)
+     3.2 Generic bio helper routines
+       3.2.1 Traversing segments and completion units in a request
+       3.2.2 Setting up DMA scatterlists
+       3.2.3 I/O completion
+       3.2.4 Implications for drivers that do not interpret bios (don't handle
+	  multiple segments)
+     3.3 I/O submission
+   4. The I/O scheduler
+   5. Scalability related changes
+     5.1 Granular locking: Removal of io_request_lock
+     5.2 Prepare for transition to 64 bit sector_t
+   6. Other Changes/Implications
+     6.1 Partition re-mapping handled by the generic block layer
+   7. A few tips on migration of older drivers
+   8. A list of prior/related/impacted patches/ideas
+   9. Other References/Discussion Threads
+
+
+Bio Notes
+=========
+
+Let us discuss the changes in the context of how some overall goals for the
+block layer are addressed.
+
+1. Scope for tuning the generic logic to satisfy various requirements
+=====================================================================
+
+The block layer design supports adaptable abstractions to handle common
+processing with the ability to tune the logic to an appropriate extent
+depending on the nature of the device and the requirements of the caller.
+One of the objectives of the rewrite was to increase the degree of tunability
+and to enable higher level code to utilize underlying device/driver
+capabilities to the maximum extent for better i/o performance. This is
+important especially in the light of ever improving hardware capabilities
+and application/middleware software designed to take advantage of these
+capabilities.
+
+1.1 Tuning based on low level device / driver capabilities
+----------------------------------------------------------
+
+Sophisticated devices with large built-in caches, intelligent i/o scheduling
+optimizations, high memory DMA support, etc may find some of the
+generic processing an overhead, while for less capable devices the
+generic functionality is essential for performance or correctness reasons.
+Knowledge of some of the capabilities or parameters of the device should be
+used at the generic block layer to take the right decisions on
+behalf of the driver.
+
+How is this achieved ?
+
+Tuning at a per-queue level:
+
+i. Per-queue limits/values exported to the generic layer by the driver
+
+Various parameters that the generic i/o scheduler logic uses are set at
+a per-queue level (e.g maximum request size, maximum number of segments in
+a scatter-gather list, logical block size)
+
+Some parameters that were earlier available as global arrays indexed by
+major/minor are now directly associated with the queue. Some of these may
+move into the block device structure in the future. Some characteristics
+have been incorporated into a queue flags field rather than separate fields
+in themselves.  There are blk_queue_xxx functions to set the parameters,
+rather than update the fields directly
+
+Some new queue property settings:
+
+	blk_queue_bounce_limit(q, u64 dma_address)
+		Enable I/O to highmem pages, dma_address being the
+		limit. No highmem default.
+
+	blk_queue_max_sectors(q, max_sectors)
+		Sets two variables that limit the size of the request.
+
+		- The request queue's max_sectors, which is a soft size in
+		  units of 512 byte sectors, and could be dynamically varied
+		  by the core kernel.
+
+		- The request queue's max_hw_sectors, which is a hard limit
+		  and reflects the maximum size request a driver can handle
+		  in units of 512 byte sectors.
+
+		The default for both max_sectors and max_hw_sectors is
+		255. The upper limit of max_sectors is 1024.
+
+	blk_queue_max_phys_segments(q, max_segments)
+		Maximum physical segments you can handle in a request. 128
+		default (driver limit). (See 3.2.2)
+
+	blk_queue_max_hw_segments(q, max_segments)
+		Maximum dma segments the hardware can handle in a request. 128
+		default (host adapter limit, after dma remapping).
+		(See 3.2.2)
+
+	blk_queue_max_segment_size(q, max_seg_size)
+		Maximum size of a clustered segment, 64kB default.
+
+	blk_queue_logical_block_size(q, logical_block_size)
+		Lowest possible sector size that the hardware can operate
+		on, 512 bytes default.
+
+New queue flags:
+
+	QUEUE_FLAG_CLUSTER (see 3.2.2)
+	QUEUE_FLAG_QUEUED (see 3.2.4)
+
+
+ii. High-mem i/o capabilities are now considered the default
+
+The generic bounce buffer logic, present in 2.4, where the block layer would
+by default copyin/out i/o requests on high-memory buffers to low-memory buffers
+assuming that the driver wouldn't be able to handle it directly, has been
+changed in 2.5. The bounce logic is now applied only for memory ranges
+for which the device cannot handle i/o. A driver can specify this by
+setting the queue bounce limit for the request queue for the device
+(blk_queue_bounce_limit()). This avoids the inefficiencies of the copyin/out
+where a device is capable of handling high memory i/o.
+
+In order to enable high-memory i/o where the device is capable of supporting
+it, the pci dma mapping routines and associated data structures have now been
+modified to accomplish a direct page -> bus translation, without requiring
+a virtual address mapping (unlike the earlier scheme of virtual address
+-> bus translation). So this works uniformly for high-memory pages (which
+do not have a corresponding kernel virtual address space mapping) and
+low-memory pages.
+
+Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion
+on PCI high mem DMA aspects and mapping of scatter gather lists, and support
+for 64 bit PCI.
+
+Special handling is required only for cases where i/o needs to happen on
+pages at physical memory addresses beyond what the device can support. In these
+cases, a bounce bio representing a buffer from the supported memory range
+is used for performing the i/o with copyin/copyout as needed depending on
+the type of the operation.  For example, in case of a read operation, the
+data read has to be copied to the original buffer on i/o completion, so a
+callback routine is set up to do this, while for write, the data is copied
+from the original buffer to the bounce buffer prior to issuing the
+operation. Since an original buffer may be in a high memory area that's not
+mapped in kernel virtual addr, a kmap operation may be required for
+performing the copy, and special care may be needed in the completion path
+as it may not be in irq context. Special care is also required (by way of
+GFP flags) when allocating bounce buffers, to avoid certain highmem
+deadlock possibilities.
+
+It is also possible that a bounce buffer may be allocated from high-memory
+area that's not mapped in kernel virtual addr, but within the range that the
+device can use directly; so the bounce page may need to be kmapped during
+copy operations. [Note: This does not hold in the current implementation,
+though]
+
+There are some situations when pages from high memory may need to
+be kmapped, even if bounce buffers are not necessary. For example a device
+may need to abort DMA operations and revert to PIO for the transfer, in
+which case a virtual mapping of the page is required. For SCSI it is also
+done in some scenarios where the low level driver cannot be trusted to
+handle a single sg entry correctly. The driver is expected to perform the
+kmaps as needed on such occasions as appropriate. A driver could also use
+the blk_queue_bounce() routine on its own to bounce highmem i/o to low
+memory for specific requests if so desired.
+
+iii. The i/o scheduler algorithm itself can be replaced/set as appropriate
+
+As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular
+queue or pick from (copy) existing generic schedulers and replace/override
+certain portions of it. The 2.5 rewrite provides improved modularization
+of the i/o scheduler. There are more pluggable callbacks, e.g for init,
+add request, extract request, which makes it possible to abstract specific
+i/o scheduling algorithm aspects and details outside of the generic loop.
+It also makes it possible to completely hide the implementation details of
+the i/o scheduler from block drivers.
+
+I/O scheduler wrappers are to be used instead of accessing the queue directly.
+See section 4. The I/O scheduler for details.
+
+1.2 Tuning Based on High level code capabilities
+------------------------------------------------
+
+i. Application capabilities for raw i/o
+
+This comes from some of the high-performance database/middleware
+requirements where an application prefers to make its own i/o scheduling
+decisions based on an understanding of the access patterns and i/o
+characteristics
+
+ii. High performance filesystems or other higher level kernel code's
+capabilities
+
+Kernel components like filesystems could also take their own i/o scheduling
+decisions for optimizing performance. Journalling filesystems may need
+some control over i/o ordering.
+
+What kind of support exists at the generic block layer for this ?
+
+The flags and rw fields in the bio structure can be used for some tuning
+from above e.g indicating that an i/o is just a readahead request, or priority
+settings (currently unused). As far as user applications are concerned they
+would need an additional mechanism either via open flags or ioctls, or some
+other upper level mechanism to communicate such settings to block.
+
+1.2.1 Request Priority/Latency
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Todo/Under discussion::
+
+  Arjan's proposed request priority scheme allows higher levels some broad
+  control (high/med/low) over the priority  of an i/o request vs other pending
+  requests in the queue. For example it allows reads for bringing in an
+  executable page on demand to be given a higher priority over pending write
+  requests which haven't aged too much on the queue. Potentially this priority
+  could even be exposed to applications in some manner, providing higher level
+  tunability. Time based aging avoids starvation of lower priority
+  requests. Some bits in the bi_opf flags field in the bio structure are
+  intended to be used for this priority information.
+
+
+1.3 Direct Access to Low level Device/Driver Capabilities (Bypass mode)
+-----------------------------------------------------------------------
+
+(e.g Diagnostics, Systems Management)
+
+There are situations where high-level code needs to have direct access to
+the low level device capabilities or requires the ability to issue commands
+to the device bypassing some of the intermediate i/o layers.
+These could, for example, be special control commands issued through ioctl
+interfaces, or could be raw read/write commands that stress the drive's
+capabilities for certain kinds of fitness tests. Having direct interfaces at
+multiple levels without having to pass through upper layers makes
+it possible to perform bottom up validation of the i/o path, layer by
+layer, starting from the media.
+
+The normal i/o submission interfaces, e.g submit_bio, could be bypassed
+for specially crafted requests which such ioctl or diagnostics
+interfaces would typically use, and the elevator add_request routine
+can instead be used to directly insert such requests in the queue or preferably
+the blk_do_rq routine can be used to place the request on the queue and
+wait for completion. Alternatively, sometimes the caller might just
+invoke a lower level driver specific interface with the request as a
+parameter.
+
+If the request is a means for passing on special information associated with
+the command, then such information is associated with the request->special
+field (rather than misuse the request->buffer field which is meant for the
+request data buffer's virtual mapping).
+
+For passing request data, the caller must build up a bio descriptor
+representing the concerned memory buffer if the underlying driver interprets
+bio segments or uses the block layer end*request* functions for i/o
+completion. Alternatively one could directly use the request->buffer field to
+specify the virtual address of the buffer, if the driver expects buffer
+addresses passed in this way and ignores bio entries for the request type
+involved. In the latter case, the driver would modify and manage the
+request->buffer, request->sector and request->nr_sectors or
+request->current_nr_sectors fields itself rather than using the block layer
+end_request or end_that_request_first completion interfaces.
+(See 2.3 or Documentation/block/request.rst for a brief explanation of
+the request structure fields)
+
+::
+
+  [TBD: end_that_request_last should be usable even in this case;
+  Perhaps an end_that_direct_request_first routine could be implemented to make
+  handling direct requests easier for such drivers; Also for drivers that
+  expect bios, a helper function could be provided for setting up a bio
+  corresponding to a data buffer]
+
+  <JENS: I dont understand the above, why is end_that_request_first() not
+  usable? Or _last for that matter. I must be missing something>
+
+  <SUP: What I meant here was that if the request doesn't have a bio, then
+   end_that_request_first doesn't modify nr_sectors or current_nr_sectors,
+   and hence can't be used for advancing request state settings on the
+   completion of partial transfers. The driver has to modify these fields
+   directly by hand.
+   This is because end_that_request_first only iterates over the bio list,
+   and always returns 0 if there are none associated with the request.
+   _last works OK in this case, and is not a problem, as I mentioned earlier
+  >
+
+1.3.1 Pre-built Commands
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+A request can be created with a pre-built custom command  to be sent directly
+to the device. The cmd block in the request structure has room for filling
+in the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant for
+command pre-building, and the type of the request is now indicated
+through rq->flags instead of via rq->cmd)
+
+The request structure flags can be set up to indicate the type of request
+in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC:
+packet command issued via blk_do_rq, REQ_SPECIAL: special request).
+
+It can help to pre-build device commands for requests in advance.
+Drivers can now specify a request prepare function (q->prep_rq_fn) that the
+block layer would invoke to pre-build device commands for a given request,
+or perform other preparatory processing for the request. This is routine is
+called by elv_next_request(), i.e. typically just before servicing a request.
+(The prepare function would not be called for requests that have RQF_DONTPREP
+enabled)
+
+Aside:
+  Pre-building could possibly even be done early, i.e before placing the
+  request on the queue, rather than construct the command on the fly in the
+  driver while servicing the request queue when it may affect latencies in
+  interrupt context or responsiveness in general. One way to add early
+  pre-building would be to do it whenever we fail to merge on a request.
+  Now REQ_NOMERGE is set in the request flags to skip this one in the future,
+  which means that it will not change before we feed it to the device. So
+  the pre-builder hook can be invoked there.
+
+
+2. Flexible and generic but minimalist i/o structure/descriptor
+===============================================================
+
+2.1 Reason for a new structure and requirements addressed
+---------------------------------------------------------
+
+Prior to 2.5, buffer heads were used as the unit of i/o at the generic block
+layer, and the low level request structure was associated with a chain of
+buffer heads for a contiguous i/o request. This led to certain inefficiencies
+when it came to large i/o requests and readv/writev style operations, as it
+forced such requests to be broken up into small chunks before being passed
+on to the generic block layer, only to be merged by the i/o scheduler
+when the underlying device was capable of handling the i/o in one shot.
+Also, using the buffer head as an i/o structure for i/os that didn't originate
+from the buffer cache unnecessarily added to the weight of the descriptors
+which were generated for each such chunk.
+
+The following were some of the goals and expectations considered in the
+redesign of the block i/o data structure in 2.5.
+
+1.  Should be appropriate as a descriptor for both raw and buffered i/o  -
+    avoid cache related fields which are irrelevant in the direct/page i/o path,
+    or filesystem block size alignment restrictions which may not be relevant
+    for raw i/o.
+2.  Ability to represent high-memory buffers (which do not have a virtual
+    address mapping in kernel address space).
+3.  Ability to represent large i/os w/o unnecessarily breaking them up (i.e
+    greater than PAGE_SIZE chunks in one shot)
+4.  At the same time, ability to retain independent identity of i/os from
+    different sources or i/o units requiring individual completion (e.g. for
+    latency reasons)
+5.  Ability to represent an i/o involving multiple physical memory segments
+    (including non-page aligned page fragments, as specified via readv/writev)
+    without unnecessarily breaking it up, if the underlying device is capable of
+    handling it.
+6.  Preferably should be based on a memory descriptor structure that can be
+    passed around different types of subsystems or layers, maybe even
+    networking, without duplication or extra copies of data/descriptor fields
+    themselves in the process
+7.  Ability to handle the possibility of splits/merges as the structure passes
+    through layered drivers (lvm, md, evms), with minimal overhead.
+
+The solution was to define a new structure (bio)  for the block layer,
+instead of using the buffer head structure (bh) directly, the idea being
+avoidance of some associated baggage and limitations. The bio structure
+is uniformly used for all i/o at the block layer ; it forms a part of the
+bh structure for buffered i/o, and in the case of raw/direct i/o kiobufs are
+mapped to bio structures.
+
+2.2 The bio struct
+------------------
+
+The bio structure uses a vector representation pointing to an array of tuples
+of <page, offset, len> to describe the i/o buffer, and has various other
+fields describing i/o parameters and state that needs to be maintained for
+performing the i/o.
+
+Notice that this representation means that a bio has no virtual address
+mapping at all (unlike buffer heads).
+
+::
+
+  struct bio_vec {
+       struct page     *bv_page;
+       unsigned short  bv_len;
+       unsigned short  bv_offset;
+  };
+
+  /*
+   * main unit of I/O for the block layer and lower layers (ie drivers)
+   */
+  struct bio {
+       struct bio          *bi_next;    /* request queue link */
+       struct block_device *bi_bdev;	/* target device */
+       unsigned long       bi_flags;    /* status, command, etc */
+       unsigned long       bi_opf;       /* low bits: r/w, high: priority */
+
+       unsigned int	bi_vcnt;     /* how may bio_vec's */
+       struct bvec_iter	bi_iter;	/* current index into bio_vec array */
+
+       unsigned int	bi_size;     /* total size in bytes */
+       unsigned short	bi_hw_segments; /* segments after DMA remapping */
+       unsigned int	bi_max;	     /* max bio_vecs we can hold
+                                        used as index into pool */
+       struct bio_vec   *bi_io_vec;  /* the actual vec list */
+       bio_end_io_t	*bi_end_io;  /* bi_end_io (bio) */
+       atomic_t		bi_cnt;	     /* pin count: free when it hits zero */
+       void             *bi_private;
+  };
+
+With this multipage bio design:
+
+- Large i/os can be sent down in one go using a bio_vec list consisting
+  of an array of <page, offset, len> fragments (similar to the way fragments
+  are represented in the zero-copy network code)
+- Splitting of an i/o request across multiple devices (as in the case of
+  lvm or raid) is achieved by cloning the bio (where the clone points to
+  the same bi_io_vec array, but with the index and size accordingly modified)
+- A linked list of bios is used as before for unrelated merges [*]_ - this
+  avoids reallocs and makes independent completions easier to handle.
+- Code that traverses the req list can find all the segments of a bio
+  by using rq_for_each_segment.  This handles the fact that a request
+  has multiple bios, each of which can have multiple segments.
+- Drivers which can't process a large bio in one shot can use the bi_iter
+  field to keep track of the next bio_vec entry to process.
+  (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
+  [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
+  bi_offset an len fields]
+
+.. [*]
+
+	unrelated merges -- a request ends up containing two or more bios that
+	didn't originate from the same place.
+
+bi_end_io() i/o callback gets called on i/o completion of the entire bio.
+
+At a lower level, drivers build a scatter gather list from the merged bios.
+The scatter gather list is in the form of an array of <page, offset, len>
+entries with their corresponding dma address mappings filled in at the
+appropriate time. As an optimization, contiguous physical pages can be
+covered by a single entry where <page> refers to the first page and <len>
+covers the range of pages (up to 16 contiguous pages could be covered this
+way). There is a helper routine (blk_rq_map_sg) which drivers can use to build
+the sg list.
+
+Note: Right now the only user of bios with more than one page is ll_rw_kio,
+which in turn means that only raw I/O uses it (direct i/o may not work
+right now). The intent however is to enable clustering of pages etc to
+become possible. The pagebuf abstraction layer from SGI also uses multi-page
+bios, but that is currently not included in the stock development kernels.
+The same is true of Andrew Morton's work-in-progress multipage bio writeout
+and readahead patches.
+
+2.3 Changes in the Request Structure
+------------------------------------
+
+The request structure is the structure that gets passed down to low level
+drivers. The block layer make_request function builds up a request structure,
+places it on the queue and invokes the drivers request_fn. The driver makes
+use of block layer helper routine elv_next_request to pull the next request
+off the queue. Control or diagnostic functions might bypass block and directly
+invoke underlying driver entry points passing in a specially constructed
+request structure.
+
+Only some relevant fields (mainly those which changed or may be referred
+to in some of the discussion here) are listed below, not necessarily in
+the order in which they occur in the structure (see include/linux/blkdev.h)
+Refer to Documentation/block/request.rst for details about all the request
+structure fields and a quick reference about the layers which are
+supposed to use or modify those fields::
+
+  struct request {
+	struct list_head queuelist;  /* Not meant to be directly accessed by
+					the driver.
+					Used by q->elv_next_request_fn
+					rq->queue is gone
+					*/
+	.
+	.
+	unsigned char cmd[16]; /* prebuilt command data block */
+	unsigned long flags;   /* also includes earlier rq->cmd settings */
+	.
+	.
+	sector_t sector; /* this field is now of type sector_t instead of int
+			    preparation for 64 bit sectors */
+	.
+	.
+
+	/* Number of scatter-gather DMA addr+len pairs after
+	 * physical address coalescing is performed.
+	 */
+	unsigned short nr_phys_segments;
+
+	/* Number of scatter-gather addr+len pairs after
+	 * physical and DMA remapping hardware coalescing is performed.
+	 * This is the number of scatter-gather entries the driver
+	 * will actually have to deal with after DMA mapping is done.
+	 */
+	unsigned short nr_hw_segments;
+
+	/* Various sector counts */
+	unsigned long nr_sectors;  /* no. of sectors left: driver modifiable */
+	unsigned long hard_nr_sectors;  /* block internal copy of above */
+	unsigned int current_nr_sectors; /* no. of sectors left in the
+					   current segment:driver modifiable */
+	unsigned long hard_cur_sectors; /* block internal copy of the above */
+	.
+	.
+	int tag;	/* command tag associated with request */
+	void *special;  /* same as before */
+	char *buffer;   /* valid only for low memory buffers up to
+			 current_nr_sectors */
+	.
+	.
+	struct bio *bio, *biotail;  /* bio list instead of bh */
+	struct request_list *rl;
+  }
+
+See the req_ops and req_flag_bits definitions for an explanation of the various
+flags available. Some bits are used by the block layer or i/o scheduler.
+
+The behaviour of the various sector counts are almost the same as before,
+except that since we have multi-segment bios, current_nr_sectors refers
+to the numbers of sectors in the current segment being processed which could
+be one of the many segments in the current bio (i.e i/o completion unit).
+The nr_sectors value refers to the total number of sectors in the whole
+request that remain to be transferred (no change). The purpose of the
+hard_xxx values is for block to remember these counts every time it hands
+over the request to the driver. These values are updated by block on
+end_that_request_first, i.e. every time the driver completes a part of the
+transfer and invokes block end*request helpers to mark this. The
+driver should not modify these values. The block layer sets up the
+nr_sectors and current_nr_sectors fields (based on the corresponding
+hard_xxx values and the number of bytes transferred) and updates it on
+every transfer that invokes end_that_request_first. It does the same for the
+buffer, bio, bio->bi_iter fields too.
+
+The buffer field is just a virtual address mapping of the current segment
+of the i/o buffer in cases where the buffer resides in low-memory. For high
+memory i/o, this field is not valid and must not be used by drivers.
+
+Code that sets up its own request structures and passes them down to
+a driver needs to be careful about interoperation with the block layer helper
+functions which the driver uses. (Section 1.3)
+
+3. Using bios
+=============
+
+3.1 Setup/Teardown
+------------------
+
+There are routines for managing the allocation, and reference counting, and
+freeing of bios (bio_alloc, bio_get, bio_put).
+
+This makes use of Ingo Molnar's mempool implementation, which enables
+subsystems like bio to maintain their own reserve memory pools for guaranteed
+deadlock-free allocations during extreme VM load. For example, the VM
+subsystem makes use of the block layer to writeout dirty pages in order to be
+able to free up memory space, a case which needs careful handling. The
+allocation logic draws from the preallocated emergency reserve in situations
+where it cannot allocate through normal means. If the pool is empty and it
+can wait, then it would trigger action that would help free up memory or
+replenish the pool (without deadlocking) and wait for availability in the pool.
+If it is in IRQ context, and hence not in a position to do this, allocation
+could fail if the pool is empty. In general mempool always first tries to
+perform allocation without having to wait, even if it means digging into the
+pool as long it is not less that 50% full.
+
+On a free, memory is released to the pool or directly freed depending on
+the current availability in the pool. The mempool interface lets the
+subsystem specify the routines to be used for normal alloc and free. In the
+case of bio, these routines make use of the standard slab allocator.
+
+The caller of bio_alloc is expected to taken certain steps to avoid
+deadlocks, e.g. avoid trying to allocate more memory from the pool while
+already holding memory obtained from the pool.
+
+::
+
+  [TBD: This is a potential issue, though a rare possibility
+   in the bounce bio allocation that happens in the current code, since
+   it ends up allocating a second bio from the same pool while
+   holding the original bio ]
+
+Memory allocated from the pool should be released back within a limited
+amount of time (in the case of bio, that would be after the i/o is completed).
+This ensures that if part of the pool has been used up, some work (in this
+case i/o) must already be in progress and memory would be available when it
+is over. If allocating from multiple pools in the same code path, the order
+or hierarchy of allocation needs to be consistent, just the way one deals
+with multiple locks.
+
+The bio_alloc routine also needs to allocate the bio_vec_list (bvec_alloc())
+for a non-clone bio. There are the 6 pools setup for different size biovecs,
+so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
+given size from these slabs.
+
+The bio_get() routine may be used to hold an extra reference on a bio prior
+to i/o submission, if the bio fields are likely to be accessed after the
+i/o is issued (since the bio may otherwise get freed in case i/o completion
+happens in the meantime).
+
+The bio_clone_fast() routine may be used to duplicate a bio, where the clone
+shares the bio_vec_list with the original bio (i.e. both point to the
+same bio_vec_list). This would typically be used for splitting i/o requests
+in lvm or md.
+
+3.2 Generic bio helper Routines
+-------------------------------
+
+3.2.1 Traversing segments and completion units in a request
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The macro rq_for_each_segment() should be used for traversing the bios
+in the request list (drivers should avoid directly trying to do it
+themselves). Using these helpers should also make it easier to cope
+with block changes in the future.
+
+::
+
+	struct req_iterator iter;
+	rq_for_each_segment(bio_vec, rq, iter)
+		/* bio_vec is now current segment */
+
+I/O completion callbacks are per-bio rather than per-segment, so drivers
+that traverse bio chains on completion need to keep that in mind. Drivers
+which don't make a distinction between segments and completion units would
+need to be reorganized to support multi-segment bios.
+
+3.2.2 Setting up DMA scatterlists
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The blk_rq_map_sg() helper routine would be used for setting up scatter
+gather lists from a request, so a driver need not do it on its own.
+
+	nr_segments = blk_rq_map_sg(q, rq, scatterlist);
+
+The helper routine provides a level of abstraction which makes it easier
+to modify the internals of request to scatterlist conversion down the line
+without breaking drivers. The blk_rq_map_sg routine takes care of several
+things like collapsing physically contiguous segments (if QUEUE_FLAG_CLUSTER
+is set) and correct segment accounting to avoid exceeding the limits which
+the i/o hardware can handle, based on various queue properties.
+
+- Prevents a clustered segment from crossing a 4GB mem boundary
+- Avoids building segments that would exceed the number of physical
+  memory segments that the driver can handle (phys_segments) and the
+  number that the underlying hardware can handle at once, accounting for
+  DMA remapping (hw_segments)  (i.e. IOMMU aware limits).
+
+Routines which the low level driver can use to set up the segment limits:
+
+blk_queue_max_hw_segments() : Sets an upper limit of the maximum number of
+hw data segments in a request (i.e. the maximum number of address/length
+pairs the host adapter can actually hand to the device at once)
+
+blk_queue_max_phys_segments() : Sets an upper limit on the maximum number
+of physical data segments in a request (i.e. the largest sized scatter list
+a driver could handle)
+
+3.2.3 I/O completion
+^^^^^^^^^^^^^^^^^^^^
+
+The existing generic block layer helper routines end_request,
+end_that_request_first and end_that_request_last can be used for i/o
+completion (and setting things up so the rest of the i/o or the next
+request can be kicked of) as before. With the introduction of multi-page
+bio support, end_that_request_first requires an additional argument indicating
+the number of sectors completed.
+
+3.2.4 Implications for drivers that do not interpret bios
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+(don't handle multiple segments)
+
+Drivers that do not interpret bios e.g those which do not handle multiple
+segments and do not support i/o into high memory addresses (require bounce
+buffers) and expect only virtually mapped buffers, can access the rq->buffer
+field. As before the driver should use current_nr_sectors to determine the
+size of remaining data in the current segment (that is the maximum it can
+transfer in one go unless it interprets segments), and rely on the block layer
+end_request, or end_that_request_first/last to take care of all accounting
+and transparent mapping of the next bio segment when a segment boundary
+is crossed on completion of a transfer. (The end*request* functions should
+be used if only if the request has come down from block/bio path, not for
+direct access requests which only specify rq->buffer without a valid rq->bio)
+
+3.3 I/O Submission
+------------------
+
+The routine submit_bio() is used to submit a single io. Higher level i/o
+routines make use of this:
+
+(a) Buffered i/o:
+
+The routine submit_bh() invokes submit_bio() on a bio corresponding to the
+bh, allocating the bio if required. ll_rw_block() uses submit_bh() as before.
+
+(b) Kiobuf i/o (for raw/direct i/o):
+
+The ll_rw_kio() routine breaks up the kiobuf into page sized chunks and
+maps the array to one or more multi-page bios, issuing submit_bio() to
+perform the i/o on each of these.
+
+The embedded bh array in the kiobuf structure has been removed and no
+preallocation of bios is done for kiobufs. [The intent is to remove the
+blocks array as well, but it's currently in there to kludge around direct i/o.]
+Thus kiobuf allocation has switched back to using kmalloc rather than vmalloc.
+
+Todo/Observation:
+
+ A single kiobuf structure is assumed to correspond to a contiguous range
+ of data, so brw_kiovec() invokes ll_rw_kio for each kiobuf in a kiovec.
+ So right now it wouldn't work for direct i/o on non-contiguous blocks.
+ This is to be resolved.  The eventual direction is to replace kiobuf
+ by kvec's.
+
+ Badari Pulavarty has a patch to implement direct i/o correctly using
+ bio and kvec.
+
+
+(c) Page i/o:
+
+Todo/Under discussion:
+
+ Andrew Morton's multi-page bio patches attempt to issue multi-page
+ writeouts (and reads) from the page cache, by directly building up
+ large bios for submission completely bypassing the usage of buffer
+ heads. This work is still in progress.
+
+ Christoph Hellwig had some code that uses bios for page-io (rather than
+ bh). This isn't included in bio as yet. Christoph was also working on a
+ design for representing virtual/real extents as an entity and modifying
+ some of the address space ops interfaces to utilize this abstraction rather
+ than buffer_heads. (This is somewhat along the lines of the SGI XFS pagebuf
+ abstraction, but intended to be as lightweight as possible).
+
+(d) Direct access i/o:
+
+Direct access requests that do not contain bios would be submitted differently
+as discussed earlier in section 1.3.
+
+Aside:
+
+  Kvec i/o:
+
+  Ben LaHaise's aio code uses a slightly different structure instead
+  of kiobufs, called a kvec_cb. This contains an array of <page, offset, len>
+  tuples (very much like the networking code), together with a callback function
+  and data pointer. This is embedded into a brw_cb structure when passed
+  to brw_kvec_async().
+
+  Now it should be possible to directly map these kvecs to a bio. Just as while
+  cloning, in this case rather than PRE_BUILT bio_vecs, we set the bi_io_vec
+  array pointer to point to the veclet array in kvecs.
+
+  TBD: In order for this to work, some changes are needed in the way multi-page
+  bios are handled today. The values of the tuples in such a vector passed in
+  from higher level code should not be modified by the block layer in the course
+  of its request processing, since that would make it hard for the higher layer
+  to continue to use the vector descriptor (kvec) after i/o completes. Instead,
+  all such transient state should either be maintained in the request structure,
+  and passed on in some way to the endio completion routine.
+
+
+4. The I/O scheduler
+====================
+
+I/O scheduler, a.k.a. elevator, is implemented in two layers.  Generic dispatch
+queue and specific I/O schedulers.  Unless stated otherwise, elevator is used
+to refer to both parts and I/O scheduler to specific I/O schedulers.
+
+Block layer implements generic dispatch queue in `block/*.c`.
+The generic dispatch queue is responsible for requeueing, handling non-fs
+requests and all other subtleties.
+
+Specific I/O schedulers are responsible for ordering normal filesystem
+requests.  They can also choose to delay certain requests to improve
+throughput or whatever purpose.  As the plural form indicates, there are
+multiple I/O schedulers.  They can be built as modules but at least one should
+be built inside the kernel.  Each queue can choose different one and can also
+change to another one dynamically.
+
+A block layer call to the i/o scheduler follows the convention elv_xxx(). This
+calls elevator_xxx_fn in the elevator switch (block/elevator.c). Oh, xxx
+and xxx might not match exactly, but use your imagination. If an elevator
+doesn't implement a function, the switch does nothing or some minimal house
+keeping work.
+
+4.1. I/O scheduler API
+----------------------
+
+The functions an elevator may implement are: (* are mandatory)
+
+=============================== ================================================
+elevator_merge_fn		called to query requests for merge with a bio
+
+elevator_merge_req_fn		called when two requests get merged. the one
+				which gets merged into the other one will be
+				never seen by I/O scheduler again. IOW, after
+				being merged, the request is gone.
+
+elevator_merged_fn		called when a request in the scheduler has been
+				involved in a merge. It is used in the deadline
+				scheduler for example, to reposition the request
+				if its sorting order has changed.
+
+elevator_allow_merge_fn		called whenever the block layer determines
+				that a bio can be merged into an existing
+				request safely. The io scheduler may still
+				want to stop a merge at this point if it
+				results in some sort of conflict internally,
+				this hook allows it to do that. Note however
+				that two *requests* can still be merged at later
+				time. Currently the io scheduler has no way to
+				prevent that. It can only learn about the fact
+				from elevator_merge_req_fn callback.
+
+elevator_dispatch_fn*		fills the dispatch queue with ready requests.
+				I/O schedulers are free to postpone requests by
+				not filling the dispatch queue unless @force
+				is non-zero.  Once dispatched, I/O schedulers
+				are not allowed to manipulate the requests -
+				they belong to generic dispatch queue.
+
+elevator_add_req_fn*		called to add a new request into the scheduler
+
+elevator_former_req_fn
+elevator_latter_req_fn		These return the request before or after the
+				one specified in disk sort order. Used by the
+				block layer to find merge possibilities.
+
+elevator_completed_req_fn	called when a request is completed.
+
+elevator_may_queue_fn		returns true if the scheduler wants to allow the
+				current context to queue a new request even if
+				it is over the queue limit. This must be used
+				very carefully!!
+
+elevator_set_req_fn
+elevator_put_req_fn		Must be used to allocate and free any elevator
+				specific storage for a request.
+
+elevator_activate_req_fn	Called when device driver first sees a request.
+				I/O schedulers can use this callback to
+				determine when actual execution of a request
+				starts.
+elevator_deactivate_req_fn	Called when device driver decides to delay
+				a request by requeueing it.
+
+elevator_init_fn*
+elevator_exit_fn		Allocate and free any elevator specific storage
+				for a queue.
+=============================== ================================================
+
+4.2 Request flows seen by I/O schedulers
+----------------------------------------
+
+All requests seen by I/O schedulers strictly follow one of the following three
+flows.
+
+ set_req_fn ->
+
+ i.   add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn ->
+      (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn
+ ii.  add_req_fn -> (merged_fn ->)* -> merge_req_fn
+ iii. [none]
+
+ -> put_req_fn
+
+4.3 I/O scheduler implementation
+--------------------------------
+
+The generic i/o scheduler algorithm attempts to sort/merge/batch requests for
+optimal disk scan and request servicing performance (based on generic
+principles and device capabilities), optimized for:
+
+i.   improved throughput
+ii.  improved latency
+iii. better utilization of h/w & CPU time
+
+Characteristics:
+
+i. Binary tree
+AS and deadline i/o schedulers use red black binary trees for disk position
+sorting and searching, and a fifo linked list for time-based searching. This
+gives good scalability and good availability of information. Requests are
+almost always dispatched in disk sort order, so a cache is kept of the next
+request in sort order to prevent binary tree lookups.
+
+This arrangement is not a generic block layer characteristic however, so
+elevators may implement queues as they please.
+
+ii. Merge hash
+AS and deadline use a hash table indexed by the last sector of a request. This
+enables merging code to quickly look up "back merge" candidates, even when
+multiple I/O streams are being performed at once on one disk.
+
+"Front merges", a new request being merged at the front of an existing request,
+are far less common than "back merges" due to the nature of most I/O patterns.
+Front merges are handled by the binary trees in AS and deadline schedulers.
+
+iii. Plugging the queue to batch requests in anticipation of opportunities for
+     merge/sort optimizations
+
+Plugging is an approach that the current i/o scheduling algorithm resorts to so
+that it collects up enough requests in the queue to be able to take
+advantage of the sorting/merging logic in the elevator. If the
+queue is empty when a request comes in, then it plugs the request queue
+(sort of like plugging the bath tub of a vessel to get fluid to build up)
+till it fills up with a few more requests, before starting to service
+the requests. This provides an opportunity to merge/sort the requests before
+passing them down to the device. There are various conditions when the queue is
+unplugged (to open up the flow again), either through a scheduled task or
+could be on demand. For example wait_on_buffer sets the unplugging going
+through sync_buffer() running blk_run_address_space(mapping). Or the caller
+can do it explicity through blk_unplug(bdev). So in the read case,
+the queue gets explicitly unplugged as part of waiting for completion on that
+buffer.
+
+Aside:
+  This is kind of controversial territory, as it's not clear if plugging is
+  always the right thing to do. Devices typically have their own queues,
+  and allowing a big queue to build up in software, while letting the device be
+  idle for a while may not always make sense. The trick is to handle the fine
+  balance between when to plug and when to open up. Also now that we have
+  multi-page bios being queued in one shot, we may not need to wait to merge
+  a big request from the broken up pieces coming by.
+
+4.4 I/O contexts
+----------------
+
+I/O contexts provide a dynamically allocated per process data area. They may
+be used in I/O schedulers, and in the block layer (could be used for IO statis,
+priorities for example). See `*io_context` in block/ll_rw_blk.c, and as-iosched.c
+for an example of usage in an i/o scheduler.
+
+
+5. Scalability related changes
+==============================
+
+5.1 Granular Locking: io_request_lock replaced by a per-queue lock
+------------------------------------------------------------------
+
+The global io_request_lock has been removed as of 2.5, to avoid
+the scalability bottleneck it was causing, and has been replaced by more
+granular locking. The request queue structure has a pointer to the
+lock to be used for that queue. As a result, locking can now be
+per-queue, with a provision for sharing a lock across queues if
+necessary (e.g the scsi layer sets the queue lock pointers to the
+corresponding adapter lock, which results in a per host locking
+granularity). The locking semantics are the same, i.e. locking is
+still imposed by the block layer, grabbing the lock before
+request_fn execution which it means that lots of older drivers
+should still be SMP safe. Drivers are free to drop the queue
+lock themselves, if required. Drivers that explicitly used the
+io_request_lock for serialization need to be modified accordingly.
+Usually it's as easy as adding a global lock::
+
+	static DEFINE_SPINLOCK(my_driver_lock);
+
+and passing the address to that lock to blk_init_queue().
+
+5.2 64 bit sector numbers (sector_t prepares for 64 bit support)
+----------------------------------------------------------------
+
+The sector number used in the bio structure has been changed to sector_t,
+which could be defined as 64 bit in preparation for 64 bit sector support.
+
+6. Other Changes/Implications
+=============================
+
+6.1 Partition re-mapping handled by the generic block layer
+-----------------------------------------------------------
+
+In 2.5 some of the gendisk/partition related code has been reorganized.
+Now the generic block layer performs partition-remapping early and thus
+provides drivers with a sector number relative to whole device, rather than
+having to take partition number into account in order to arrive at the true
+sector number. The routine blk_partition_remap() is invoked by
+generic_make_request even before invoking the queue specific make_request_fn,
+so the i/o scheduler also gets to operate on whole disk sector numbers. This
+should typically not require changes to block drivers, it just never gets
+to invoke its own partition sector offset calculations since all bios
+sent are offset from the beginning of the device.
+
+
+7. A Few Tips on Migration of older drivers
+===========================================
+
+Old-style drivers that just use CURRENT and ignores clustered requests,
+may not need much change.  The generic layer will automatically handle
+clustered requests, multi-page bios, etc for the driver.
+
+For a low performance driver or hardware that is PIO driven or just doesn't
+support scatter-gather changes should be minimal too.
+
+The following are some points to keep in mind when converting old drivers
+to bio.
+
+Drivers should use elv_next_request to pick up requests and are no longer
+supposed to handle looping directly over the request list.
+(struct request->queue has been removed)
+
+Now end_that_request_first takes an additional number_of_sectors argument.
+It used to handle always just the first buffer_head in a request, now
+it will loop and handle as many sectors (on a bio-segment granularity)
+as specified.
+
+Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the
+right thing to use is bio_endio(bio) instead.
+
+If the driver is dropping the io_request_lock from its request_fn strategy,
+then it just needs to replace that with q->queue_lock instead.
+
+As described in Sec 1.1, drivers can set max sector size, max segment size
+etc per queue now. Drivers that used to define their own merge functions i
+to handle things like this can now just use the blk_queue_* functions at
+blk_init_queue time.
+
+Drivers no longer have to map a {partition, sector offset} into the
+correct absolute location anymore, this is done by the block layer, so
+where a driver received a request ala this before::
+
+	rq->rq_dev = mk_kdev(3, 5);	/* /dev/hda5 */
+	rq->sector = 0;			/* first sector on hda5 */
+
+it will now see::
+
+	rq->rq_dev = mk_kdev(3, 0);	/* /dev/hda */
+	rq->sector = 123128;		/* offset from start of disk */
+
+As mentioned, there is no virtual mapping of a bio. For DMA, this is
+not a problem as the driver probably never will need a virtual mapping.
+Instead it needs a bus mapping (dma_map_page for a single segment or
+use dma_map_sg for scatter gather) to be able to ship it to the driver. For
+PIO drivers (or drivers that need to revert to PIO transfer once in a
+while (IDE for example)), where the CPU is doing the actual data
+transfer a virtual mapping is needed. If the driver supports highmem I/O,
+(Sec 1.1, (ii) ) it needs to use kmap_atomic or similar to temporarily map
+a bio into the virtual address space.
+
+
+8. Prior/Related/Impacted patches
+=================================
+
+8.1. Earlier kiobuf patches (sct/axboe/chait/hch/mkp)
+-----------------------------------------------------
+
+- orig kiobuf & raw i/o patches (now in 2.4 tree)
+- direct kiobuf based i/o to devices (no intermediate bh's)
+- page i/o using kiobuf
+- kiobuf splitting for lvm (mkp)
+- elevator support for kiobuf request merging (axboe)
+
+8.2. Zero-copy networking (Dave Miller)
+---------------------------------------
+
+8.3. SGI XFS - pagebuf patches - use of kiobufs
+-----------------------------------------------
+8.4. Multi-page pioent patch for bio (Christoph Hellwig)
+--------------------------------------------------------
+8.5. Direct i/o implementation (Andrea Arcangeli) since 2.4.10-pre11
+--------------------------------------------------------------------
+8.6. Async i/o implementation patch (Ben LaHaise)
+-------------------------------------------------
+8.7. EVMS layering design (IBM EVMS team)
+-----------------------------------------
+8.8. Larger page cache size patch (Ben LaHaise) and Large page size (Daniel Phillips)
+-------------------------------------------------------------------------------------
+
+    => larger contiguous physical memory buffers
+
+8.9. VM reservations patch (Ben LaHaise)
+----------------------------------------
+8.10. Write clustering patches ? (Marcelo/Quintela/Riel ?)
+----------------------------------------------------------
+8.11. Block device in page cache patch (Andrea Archangeli) - now in 2.4.10+
+---------------------------------------------------------------------------
+8.12. Multiple block-size transfers for faster raw i/o (Shailabh Nagar, Badari)
+-------------------------------------------------------------------------------
+8.13  Priority based i/o scheduler - prepatches (Arjan van de Ven)
+------------------------------------------------------------------
+8.14  IDE Taskfile i/o patch (Andre Hedrick)
+--------------------------------------------
+8.15  Multi-page writeout and readahead patches (Andrew Morton)
+---------------------------------------------------------------
+8.16  Direct i/o patches for 2.5 using kvec and bio (Badari Pulavarthy)
+-----------------------------------------------------------------------
+
+9. Other References
+===================
+
+9.1 The Splice I/O Model
+------------------------
+
+Larry McVoy (and subsequent discussions on lkml, and Linus' comments - Jan 2001
+
+9.2 Discussions about kiobuf and bh design
+------------------------------------------
+
+On lkml between sct, linus, alan et al - Feb-March 2001 (many of the
+initial thoughts that led to bio were brought up in this discussion thread)
+
+9.3 Discussions on mempool on lkml - Dec 2001.
+----------------------------------------------
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
deleted file mode 100644
index 31c177663ed5..000000000000
--- a/Documentation/block/biodoc.txt
+++ /dev/null
@@ -1,1076 +0,0 @@
-	Notes on the Generic Block Layer Rewrite in Linux 2.5
-	=====================================================
-
-Notes Written on Jan 15, 2002:
-	Jens Axboe <jens.axboe@oracle.com>
-	Suparna Bhattacharya <suparna@in.ibm.com>
-
-Last Updated May 2, 2002
-September 2003: Updated I/O Scheduler portions
-	Nick Piggin <npiggin@kernel.dk>
-
-Introduction:
-
-These are some notes describing some aspects of the 2.5 block layer in the
-context of the bio rewrite. The idea is to bring out some of the key
-changes and a glimpse of the rationale behind those changes.
-
-Please mail corrections & suggestions to suparna@in.ibm.com.
-
-Credits:
----------
-
-2.5 bio rewrite:
-	Jens Axboe <jens.axboe@oracle.com>
-
-Many aspects of the generic block layer redesign were driven by and evolved
-over discussions, prior patches and the collective experience of several
-people. See sections 8 and 9 for a list of some related references.
-
-The following people helped with review comments and inputs for this
-document:
-	Christoph Hellwig <hch@infradead.org>
-	Arjan van de Ven <arjanv@redhat.com>
-	Randy Dunlap <rdunlap@xenotime.net>
-	Andre Hedrick <andre@linux-ide.org>
-
-The following people helped with fixes/contributions to the bio patches
-while it was still work-in-progress:
-	David S. Miller <davem@redhat.com>
-
-
-Description of Contents:
-------------------------
-
-1. Scope for tuning of logic to various needs
-  1.1 Tuning based on device or low level driver capabilities
-	- Per-queue parameters
-	- Highmem I/O support
-	- I/O scheduler modularization
-  1.2 Tuning based on high level requirements/capabilities
-	1.2.1 Request Priority/Latency
-  1.3 Direct access/bypass to lower layers for diagnostics and special
-      device operations
-	1.3.1 Pre-built commands
-2. New flexible and generic but minimalist i/o structure or descriptor
-   (instead of using buffer heads at the i/o layer)
-  2.1 Requirements/Goals addressed
-  2.2 The bio struct in detail (multi-page io unit)
-  2.3 Changes in the request structure
-3. Using bios
-  3.1 Setup/teardown (allocation, splitting)
-  3.2 Generic bio helper routines
-    3.2.1 Traversing segments and completion units in a request
-    3.2.2 Setting up DMA scatterlists
-    3.2.3 I/O completion
-    3.2.4 Implications for drivers that do not interpret bios (don't handle
- 	  multiple segments)
-  3.3 I/O submission
-4. The I/O scheduler
-5. Scalability related changes
-  5.1 Granular locking: Removal of io_request_lock
-  5.2 Prepare for transition to 64 bit sector_t
-6. Other Changes/Implications
-  6.1 Partition re-mapping handled by the generic block layer
-7. A few tips on migration of older drivers
-8. A list of prior/related/impacted patches/ideas
-9. Other References/Discussion Threads
-
----------------------------------------------------------------------------
-
-Bio Notes
---------
-
-Let us discuss the changes in the context of how some overall goals for the
-block layer are addressed.
-
-1. Scope for tuning the generic logic to satisfy various requirements
-
-The block layer design supports adaptable abstractions to handle common
-processing with the ability to tune the logic to an appropriate extent
-depending on the nature of the device and the requirements of the caller.
-One of the objectives of the rewrite was to increase the degree of tunability
-and to enable higher level code to utilize underlying device/driver
-capabilities to the maximum extent for better i/o performance. This is
-important especially in the light of ever improving hardware capabilities
-and application/middleware software designed to take advantage of these
-capabilities.
-
-1.1 Tuning based on low level device / driver capabilities
-
-Sophisticated devices with large built-in caches, intelligent i/o scheduling
-optimizations, high memory DMA support, etc may find some of the
-generic processing an overhead, while for less capable devices the
-generic functionality is essential for performance or correctness reasons.
-Knowledge of some of the capabilities or parameters of the device should be
-used at the generic block layer to take the right decisions on
-behalf of the driver.
-
-How is this achieved ?
-
-Tuning at a per-queue level:
-
-i. Per-queue limits/values exported to the generic layer by the driver
-
-Various parameters that the generic i/o scheduler logic uses are set at
-a per-queue level (e.g maximum request size, maximum number of segments in
-a scatter-gather list, logical block size)
-
-Some parameters that were earlier available as global arrays indexed by
-major/minor are now directly associated with the queue. Some of these may
-move into the block device structure in the future. Some characteristics
-have been incorporated into a queue flags field rather than separate fields
-in themselves.  There are blk_queue_xxx functions to set the parameters,
-rather than update the fields directly
-
-Some new queue property settings:
-
-	blk_queue_bounce_limit(q, u64 dma_address)
-		Enable I/O to highmem pages, dma_address being the
-		limit. No highmem default.
-
-	blk_queue_max_sectors(q, max_sectors)
-		Sets two variables that limit the size of the request.
-
-		- The request queue's max_sectors, which is a soft size in
-		units of 512 byte sectors, and could be dynamically varied
-		by the core kernel.
-
-		- The request queue's max_hw_sectors, which is a hard limit
-		and reflects the maximum size request a driver can handle
-		in units of 512 byte sectors.
-
-		The default for both max_sectors and max_hw_sectors is
-		255. The upper limit of max_sectors is 1024.
-
-	blk_queue_max_phys_segments(q, max_segments)
-		Maximum physical segments you can handle in a request. 128
-		default (driver limit). (See 3.2.2)
-
-	blk_queue_max_hw_segments(q, max_segments)
-		Maximum dma segments the hardware can handle in a request. 128
-		default (host adapter limit, after dma remapping).
-		(See 3.2.2)
-
-	blk_queue_max_segment_size(q, max_seg_size)
-		Maximum size of a clustered segment, 64kB default.
-
-	blk_queue_logical_block_size(q, logical_block_size)
-		Lowest possible sector size that the hardware can operate
-		on, 512 bytes default.
-
-New queue flags:
-
-	QUEUE_FLAG_CLUSTER (see 3.2.2)
-	QUEUE_FLAG_QUEUED (see 3.2.4)
-
-
-ii. High-mem i/o capabilities are now considered the default
-
-The generic bounce buffer logic, present in 2.4, where the block layer would
-by default copyin/out i/o requests on high-memory buffers to low-memory buffers
-assuming that the driver wouldn't be able to handle it directly, has been
-changed in 2.5. The bounce logic is now applied only for memory ranges
-for which the device cannot handle i/o. A driver can specify this by
-setting the queue bounce limit for the request queue for the device
-(blk_queue_bounce_limit()). This avoids the inefficiencies of the copyin/out
-where a device is capable of handling high memory i/o.
-
-In order to enable high-memory i/o where the device is capable of supporting
-it, the pci dma mapping routines and associated data structures have now been
-modified to accomplish a direct page -> bus translation, without requiring
-a virtual address mapping (unlike the earlier scheme of virtual address
--> bus translation). So this works uniformly for high-memory pages (which
-do not have a corresponding kernel virtual address space mapping) and
-low-memory pages.
-
-Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion
-on PCI high mem DMA aspects and mapping of scatter gather lists, and support
-for 64 bit PCI.
-
-Special handling is required only for cases where i/o needs to happen on
-pages at physical memory addresses beyond what the device can support. In these
-cases, a bounce bio representing a buffer from the supported memory range
-is used for performing the i/o with copyin/copyout as needed depending on
-the type of the operation.  For example, in case of a read operation, the
-data read has to be copied to the original buffer on i/o completion, so a
-callback routine is set up to do this, while for write, the data is copied
-from the original buffer to the bounce buffer prior to issuing the
-operation. Since an original buffer may be in a high memory area that's not
-mapped in kernel virtual addr, a kmap operation may be required for
-performing the copy, and special care may be needed in the completion path
-as it may not be in irq context. Special care is also required (by way of
-GFP flags) when allocating bounce buffers, to avoid certain highmem
-deadlock possibilities.
-
-It is also possible that a bounce buffer may be allocated from high-memory
-area that's not mapped in kernel virtual addr, but within the range that the
-device can use directly; so the bounce page may need to be kmapped during
-copy operations. [Note: This does not hold in the current implementation,
-though]
-
-There are some situations when pages from high memory may need to
-be kmapped, even if bounce buffers are not necessary. For example a device
-may need to abort DMA operations and revert to PIO for the transfer, in
-which case a virtual mapping of the page is required. For SCSI it is also
-done in some scenarios where the low level driver cannot be trusted to
-handle a single sg entry correctly. The driver is expected to perform the
-kmaps as needed on such occasions as appropriate. A driver could also use
-the blk_queue_bounce() routine on its own to bounce highmem i/o to low
-memory for specific requests if so desired.
-
-iii. The i/o scheduler algorithm itself can be replaced/set as appropriate
-
-As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular
-queue or pick from (copy) existing generic schedulers and replace/override
-certain portions of it. The 2.5 rewrite provides improved modularization
-of the i/o scheduler. There are more pluggable callbacks, e.g for init,
-add request, extract request, which makes it possible to abstract specific
-i/o scheduling algorithm aspects and details outside of the generic loop.
-It also makes it possible to completely hide the implementation details of
-the i/o scheduler from block drivers.
-
-I/O scheduler wrappers are to be used instead of accessing the queue directly.
-See section 4. The I/O scheduler for details.
-
-1.2 Tuning Based on High level code capabilities
-
-i. Application capabilities for raw i/o
-
-This comes from some of the high-performance database/middleware
-requirements where an application prefers to make its own i/o scheduling
-decisions based on an understanding of the access patterns and i/o
-characteristics
-
-ii. High performance filesystems or other higher level kernel code's
-capabilities
-
-Kernel components like filesystems could also take their own i/o scheduling
-decisions for optimizing performance. Journalling filesystems may need
-some control over i/o ordering.
-
-What kind of support exists at the generic block layer for this ?
-
-The flags and rw fields in the bio structure can be used for some tuning
-from above e.g indicating that an i/o is just a readahead request, or priority
-settings (currently unused). As far as user applications are concerned they
-would need an additional mechanism either via open flags or ioctls, or some
-other upper level mechanism to communicate such settings to block.
-
-1.2.1 Request Priority/Latency
-
-Todo/Under discussion:
-Arjan's proposed request priority scheme allows higher levels some broad
-  control (high/med/low) over the priority  of an i/o request vs other pending
-  requests in the queue. For example it allows reads for bringing in an
-  executable page on demand to be given a higher priority over pending write
-  requests which haven't aged too much on the queue. Potentially this priority
-  could even be exposed to applications in some manner, providing higher level
-  tunability. Time based aging avoids starvation of lower priority
-  requests. Some bits in the bi_opf flags field in the bio structure are
-  intended to be used for this priority information.
-
-
-1.3 Direct Access to Low level Device/Driver Capabilities (Bypass mode)
-    (e.g Diagnostics, Systems Management)
-
-There are situations where high-level code needs to have direct access to
-the low level device capabilities or requires the ability to issue commands
-to the device bypassing some of the intermediate i/o layers.
-These could, for example, be special control commands issued through ioctl
-interfaces, or could be raw read/write commands that stress the drive's
-capabilities for certain kinds of fitness tests. Having direct interfaces at
-multiple levels without having to pass through upper layers makes
-it possible to perform bottom up validation of the i/o path, layer by
-layer, starting from the media.
-
-The normal i/o submission interfaces, e.g submit_bio, could be bypassed
-for specially crafted requests which such ioctl or diagnostics
-interfaces would typically use, and the elevator add_request routine
-can instead be used to directly insert such requests in the queue or preferably
-the blk_do_rq routine can be used to place the request on the queue and
-wait for completion. Alternatively, sometimes the caller might just
-invoke a lower level driver specific interface with the request as a
-parameter.
-
-If the request is a means for passing on special information associated with
-the command, then such information is associated with the request->special
-field (rather than misuse the request->buffer field which is meant for the
-request data buffer's virtual mapping).
-
-For passing request data, the caller must build up a bio descriptor
-representing the concerned memory buffer if the underlying driver interprets
-bio segments or uses the block layer end*request* functions for i/o
-completion. Alternatively one could directly use the request->buffer field to
-specify the virtual address of the buffer, if the driver expects buffer
-addresses passed in this way and ignores bio entries for the request type
-involved. In the latter case, the driver would modify and manage the
-request->buffer, request->sector and request->nr_sectors or
-request->current_nr_sectors fields itself rather than using the block layer
-end_request or end_that_request_first completion interfaces.
-(See 2.3 or Documentation/block/request.txt for a brief explanation of
-the request structure fields)
-
-[TBD: end_that_request_last should be usable even in this case;
-Perhaps an end_that_direct_request_first routine could be implemented to make
-handling direct requests easier for such drivers; Also for drivers that
-expect bios, a helper function could be provided for setting up a bio
-corresponding to a data buffer]
-
-<JENS: I dont understand the above, why is end_that_request_first() not
-usable? Or _last for that matter. I must be missing something>
-<SUP: What I meant here was that if the request doesn't have a bio, then
- end_that_request_first doesn't modify nr_sectors or current_nr_sectors,
- and hence can't be used for advancing request state settings on the
- completion of partial transfers. The driver has to modify these fields 
- directly by hand.
- This is because end_that_request_first only iterates over the bio list,
- and always returns 0 if there are none associated with the request.
- _last works OK in this case, and is not a problem, as I mentioned earlier
->
-
-1.3.1 Pre-built Commands
-
-A request can be created with a pre-built custom command  to be sent directly
-to the device. The cmd block in the request structure has room for filling
-in the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant for
-command pre-building, and the type of the request is now indicated
-through rq->flags instead of via rq->cmd)
-
-The request structure flags can be set up to indicate the type of request
-in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC:
-packet command issued via blk_do_rq, REQ_SPECIAL: special request).
-
-It can help to pre-build device commands for requests in advance.
-Drivers can now specify a request prepare function (q->prep_rq_fn) that the
-block layer would invoke to pre-build device commands for a given request,
-or perform other preparatory processing for the request. This is routine is
-called by elv_next_request(), i.e. typically just before servicing a request.
-(The prepare function would not be called for requests that have RQF_DONTPREP
-enabled)
-
-Aside:
-  Pre-building could possibly even be done early, i.e before placing the
-  request on the queue, rather than construct the command on the fly in the
-  driver while servicing the request queue when it may affect latencies in
-  interrupt context or responsiveness in general. One way to add early
-  pre-building would be to do it whenever we fail to merge on a request.
-  Now REQ_NOMERGE is set in the request flags to skip this one in the future,
-  which means that it will not change before we feed it to the device. So
-  the pre-builder hook can be invoked there.
-
-
-2. Flexible and generic but minimalist i/o structure/descriptor.
-
-2.1 Reason for a new structure and requirements addressed
-
-Prior to 2.5, buffer heads were used as the unit of i/o at the generic block
-layer, and the low level request structure was associated with a chain of
-buffer heads for a contiguous i/o request. This led to certain inefficiencies
-when it came to large i/o requests and readv/writev style operations, as it
-forced such requests to be broken up into small chunks before being passed
-on to the generic block layer, only to be merged by the i/o scheduler
-when the underlying device was capable of handling the i/o in one shot.
-Also, using the buffer head as an i/o structure for i/os that didn't originate
-from the buffer cache unnecessarily added to the weight of the descriptors
-which were generated for each such chunk.
-
-The following were some of the goals and expectations considered in the
-redesign of the block i/o data structure in 2.5.
-
-i.  Should be appropriate as a descriptor for both raw and buffered i/o  -
-    avoid cache related fields which are irrelevant in the direct/page i/o path,
-    or filesystem block size alignment restrictions which may not be relevant
-    for raw i/o.
-ii. Ability to represent high-memory buffers (which do not have a virtual
-    address mapping in kernel address space).
-iii.Ability to represent large i/os w/o unnecessarily breaking them up (i.e
-    greater than PAGE_SIZE chunks in one shot)
-iv. At the same time, ability to retain independent identity of i/os from
-    different sources or i/o units requiring individual completion (e.g. for
-    latency reasons)
-v.  Ability to represent an i/o involving multiple physical memory segments
-    (including non-page aligned page fragments, as specified via readv/writev)
-    without unnecessarily breaking it up, if the underlying device is capable of
-    handling it.
-vi. Preferably should be based on a memory descriptor structure that can be
-    passed around different types of subsystems or layers, maybe even
-    networking, without duplication or extra copies of data/descriptor fields
-    themselves in the process
-vii.Ability to handle the possibility of splits/merges as the structure passes
-    through layered drivers (lvm, md, evms), with minimal overhead.
-
-The solution was to define a new structure (bio)  for the block layer,
-instead of using the buffer head structure (bh) directly, the idea being
-avoidance of some associated baggage and limitations. The bio structure
-is uniformly used for all i/o at the block layer ; it forms a part of the
-bh structure for buffered i/o, and in the case of raw/direct i/o kiobufs are
-mapped to bio structures.
-
-2.2 The bio struct
-
-The bio structure uses a vector representation pointing to an array of tuples
-of <page, offset, len> to describe the i/o buffer, and has various other
-fields describing i/o parameters and state that needs to be maintained for
-performing the i/o.
-
-Notice that this representation means that a bio has no virtual address
-mapping at all (unlike buffer heads).
-
-struct bio_vec {
-       struct page     *bv_page;
-       unsigned short  bv_len;
-       unsigned short  bv_offset;
-};
-
-/*
- * main unit of I/O for the block layer and lower layers (ie drivers)
- */
-struct bio {
-       struct bio          *bi_next;    /* request queue link */
-       struct block_device *bi_bdev;	/* target device */
-       unsigned long       bi_flags;    /* status, command, etc */
-       unsigned long       bi_opf;       /* low bits: r/w, high: priority */
-
-       unsigned int	bi_vcnt;     /* how may bio_vec's */
-       struct bvec_iter	bi_iter;	/* current index into bio_vec array */
-
-       unsigned int	bi_size;     /* total size in bytes */
-       unsigned short	bi_hw_segments; /* segments after DMA remapping */
-       unsigned int	bi_max;	     /* max bio_vecs we can hold
-                                        used as index into pool */
-       struct bio_vec   *bi_io_vec;  /* the actual vec list */
-       bio_end_io_t	*bi_end_io;  /* bi_end_io (bio) */
-       atomic_t		bi_cnt;	     /* pin count: free when it hits zero */
-       void             *bi_private;
-};
-
-With this multipage bio design:
-
-- Large i/os can be sent down in one go using a bio_vec list consisting
-  of an array of <page, offset, len> fragments (similar to the way fragments
-  are represented in the zero-copy network code)
-- Splitting of an i/o request across multiple devices (as in the case of
-  lvm or raid) is achieved by cloning the bio (where the clone points to
-  the same bi_io_vec array, but with the index and size accordingly modified)
-- A linked list of bios is used as before for unrelated merges (*) - this
-  avoids reallocs and makes independent completions easier to handle.
-- Code that traverses the req list can find all the segments of a bio
-  by using rq_for_each_segment.  This handles the fact that a request
-  has multiple bios, each of which can have multiple segments.
-- Drivers which can't process a large bio in one shot can use the bi_iter
-  field to keep track of the next bio_vec entry to process.
-  (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
-  [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
-   bi_offset an len fields]
-
-(*) unrelated merges -- a request ends up containing two or more bios that
-    didn't originate from the same place.
-
-bi_end_io() i/o callback gets called on i/o completion of the entire bio.
-
-At a lower level, drivers build a scatter gather list from the merged bios.
-The scatter gather list is in the form of an array of <page, offset, len>
-entries with their corresponding dma address mappings filled in at the
-appropriate time. As an optimization, contiguous physical pages can be
-covered by a single entry where <page> refers to the first page and <len>
-covers the range of pages (up to 16 contiguous pages could be covered this
-way). There is a helper routine (blk_rq_map_sg) which drivers can use to build
-the sg list.
-
-Note: Right now the only user of bios with more than one page is ll_rw_kio,
-which in turn means that only raw I/O uses it (direct i/o may not work
-right now). The intent however is to enable clustering of pages etc to
-become possible. The pagebuf abstraction layer from SGI also uses multi-page
-bios, but that is currently not included in the stock development kernels.
-The same is true of Andrew Morton's work-in-progress multipage bio writeout 
-and readahead patches.
-
-2.3 Changes in the Request Structure
-
-The request structure is the structure that gets passed down to low level
-drivers. The block layer make_request function builds up a request structure,
-places it on the queue and invokes the drivers request_fn. The driver makes
-use of block layer helper routine elv_next_request to pull the next request
-off the queue. Control or diagnostic functions might bypass block and directly
-invoke underlying driver entry points passing in a specially constructed
-request structure.
-
-Only some relevant fields (mainly those which changed or may be referred
-to in some of the discussion here) are listed below, not necessarily in
-the order in which they occur in the structure (see include/linux/blkdev.h)
-Refer to Documentation/block/request.txt for details about all the request
-structure fields and a quick reference about the layers which are
-supposed to use or modify those fields.
-
-struct request {
-	struct list_head queuelist;  /* Not meant to be directly accessed by
-					the driver.
-					Used by q->elv_next_request_fn
-					rq->queue is gone
-					*/
-	.
-	.
-	unsigned char cmd[16]; /* prebuilt command data block */
-	unsigned long flags;   /* also includes earlier rq->cmd settings */
-	.
-	.
-	sector_t sector; /* this field is now of type sector_t instead of int
-			    preparation for 64 bit sectors */
-	.
-	.
-
-	/* Number of scatter-gather DMA addr+len pairs after
-	 * physical address coalescing is performed.
-	 */
-	unsigned short nr_phys_segments;
-
-	/* Number of scatter-gather addr+len pairs after
-	 * physical and DMA remapping hardware coalescing is performed.
-	 * This is the number of scatter-gather entries the driver
-	 * will actually have to deal with after DMA mapping is done.
-	 */
-	unsigned short nr_hw_segments;
-
-	/* Various sector counts */
-	unsigned long nr_sectors;  /* no. of sectors left: driver modifiable */
-	unsigned long hard_nr_sectors;  /* block internal copy of above */
-	unsigned int current_nr_sectors; /* no. of sectors left in the
-					   current segment:driver modifiable */
-	unsigned long hard_cur_sectors; /* block internal copy of the above */
-	.
-	.
-	int tag;	/* command tag associated with request */
-	void *special;  /* same as before */
-	char *buffer;   /* valid only for low memory buffers up to
-			 current_nr_sectors */
-	.
-	.
-	struct bio *bio, *biotail;  /* bio list instead of bh */
-	struct request_list *rl;
-}
-	
-See the req_ops and req_flag_bits definitions for an explanation of the various
-flags available. Some bits are used by the block layer or i/o scheduler.
-	
-The behaviour of the various sector counts are almost the same as before,
-except that since we have multi-segment bios, current_nr_sectors refers
-to the numbers of sectors in the current segment being processed which could
-be one of the many segments in the current bio (i.e i/o completion unit).
-The nr_sectors value refers to the total number of sectors in the whole
-request that remain to be transferred (no change). The purpose of the
-hard_xxx values is for block to remember these counts every time it hands
-over the request to the driver. These values are updated by block on
-end_that_request_first, i.e. every time the driver completes a part of the
-transfer and invokes block end*request helpers to mark this. The
-driver should not modify these values. The block layer sets up the
-nr_sectors and current_nr_sectors fields (based on the corresponding
-hard_xxx values and the number of bytes transferred) and updates it on
-every transfer that invokes end_that_request_first. It does the same for the
-buffer, bio, bio->bi_iter fields too.
-
-The buffer field is just a virtual address mapping of the current segment
-of the i/o buffer in cases where the buffer resides in low-memory. For high
-memory i/o, this field is not valid and must not be used by drivers.
-
-Code that sets up its own request structures and passes them down to
-a driver needs to be careful about interoperation with the block layer helper
-functions which the driver uses. (Section 1.3)
-
-3. Using bios
-
-3.1 Setup/Teardown
-
-There are routines for managing the allocation, and reference counting, and
-freeing of bios (bio_alloc, bio_get, bio_put).
-
-This makes use of Ingo Molnar's mempool implementation, which enables
-subsystems like bio to maintain their own reserve memory pools for guaranteed
-deadlock-free allocations during extreme VM load. For example, the VM
-subsystem makes use of the block layer to writeout dirty pages in order to be
-able to free up memory space, a case which needs careful handling. The
-allocation logic draws from the preallocated emergency reserve in situations
-where it cannot allocate through normal means. If the pool is empty and it
-can wait, then it would trigger action that would help free up memory or
-replenish the pool (without deadlocking) and wait for availability in the pool.
-If it is in IRQ context, and hence not in a position to do this, allocation
-could fail if the pool is empty. In general mempool always first tries to
-perform allocation without having to wait, even if it means digging into the
-pool as long it is not less that 50% full.
-
-On a free, memory is released to the pool or directly freed depending on
-the current availability in the pool. The mempool interface lets the
-subsystem specify the routines to be used for normal alloc and free. In the
-case of bio, these routines make use of the standard slab allocator.
-
-The caller of bio_alloc is expected to taken certain steps to avoid
-deadlocks, e.g. avoid trying to allocate more memory from the pool while
-already holding memory obtained from the pool.
-[TBD: This is a potential issue, though a rare possibility
- in the bounce bio allocation that happens in the current code, since
- it ends up allocating a second bio from the same pool while
- holding the original bio ]
-
-Memory allocated from the pool should be released back within a limited
-amount of time (in the case of bio, that would be after the i/o is completed).
-This ensures that if part of the pool has been used up, some work (in this
-case i/o) must already be in progress and memory would be available when it
-is over. If allocating from multiple pools in the same code path, the order
-or hierarchy of allocation needs to be consistent, just the way one deals
-with multiple locks.
-
-The bio_alloc routine also needs to allocate the bio_vec_list (bvec_alloc())
-for a non-clone bio. There are the 6 pools setup for different size biovecs,
-so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
-given size from these slabs.
-
-The bio_get() routine may be used to hold an extra reference on a bio prior
-to i/o submission, if the bio fields are likely to be accessed after the
-i/o is issued (since the bio may otherwise get freed in case i/o completion
-happens in the meantime).
-
-The bio_clone_fast() routine may be used to duplicate a bio, where the clone
-shares the bio_vec_list with the original bio (i.e. both point to the
-same bio_vec_list). This would typically be used for splitting i/o requests
-in lvm or md.
-
-3.2 Generic bio helper Routines
-
-3.2.1 Traversing segments and completion units in a request
-
-The macro rq_for_each_segment() should be used for traversing the bios
-in the request list (drivers should avoid directly trying to do it
-themselves). Using these helpers should also make it easier to cope
-with block changes in the future.
-
-	struct req_iterator iter;
-	rq_for_each_segment(bio_vec, rq, iter)
-		/* bio_vec is now current segment */
-
-I/O completion callbacks are per-bio rather than per-segment, so drivers
-that traverse bio chains on completion need to keep that in mind. Drivers
-which don't make a distinction between segments and completion units would
-need to be reorganized to support multi-segment bios.
-
-3.2.2 Setting up DMA scatterlists
-
-The blk_rq_map_sg() helper routine would be used for setting up scatter
-gather lists from a request, so a driver need not do it on its own.
-
-	nr_segments = blk_rq_map_sg(q, rq, scatterlist);
-
-The helper routine provides a level of abstraction which makes it easier
-to modify the internals of request to scatterlist conversion down the line
-without breaking drivers. The blk_rq_map_sg routine takes care of several
-things like collapsing physically contiguous segments (if QUEUE_FLAG_CLUSTER
-is set) and correct segment accounting to avoid exceeding the limits which
-the i/o hardware can handle, based on various queue properties.
-
-- Prevents a clustered segment from crossing a 4GB mem boundary
-- Avoids building segments that would exceed the number of physical
-  memory segments that the driver can handle (phys_segments) and the
-  number that the underlying hardware can handle at once, accounting for
-  DMA remapping (hw_segments)  (i.e. IOMMU aware limits).
-
-Routines which the low level driver can use to set up the segment limits:
-
-blk_queue_max_hw_segments() : Sets an upper limit of the maximum number of
-hw data segments in a request (i.e. the maximum number of address/length
-pairs the host adapter can actually hand to the device at once)
-
-blk_queue_max_phys_segments() : Sets an upper limit on the maximum number
-of physical data segments in a request (i.e. the largest sized scatter list
-a driver could handle)
-
-3.2.3 I/O completion
-
-The existing generic block layer helper routines end_request,
-end_that_request_first and end_that_request_last can be used for i/o
-completion (and setting things up so the rest of the i/o or the next
-request can be kicked of) as before. With the introduction of multi-page
-bio support, end_that_request_first requires an additional argument indicating
-the number of sectors completed.
-
-3.2.4 Implications for drivers that do not interpret bios (don't handle
- multiple segments)
-
-Drivers that do not interpret bios e.g those which do not handle multiple
-segments and do not support i/o into high memory addresses (require bounce
-buffers) and expect only virtually mapped buffers, can access the rq->buffer
-field. As before the driver should use current_nr_sectors to determine the
-size of remaining data in the current segment (that is the maximum it can
-transfer in one go unless it interprets segments), and rely on the block layer
-end_request, or end_that_request_first/last to take care of all accounting
-and transparent mapping of the next bio segment when a segment boundary
-is crossed on completion of a transfer. (The end*request* functions should
-be used if only if the request has come down from block/bio path, not for
-direct access requests which only specify rq->buffer without a valid rq->bio)
-
-3.3 I/O Submission
-
-The routine submit_bio() is used to submit a single io. Higher level i/o
-routines make use of this:
-
-(a) Buffered i/o:
-The routine submit_bh() invokes submit_bio() on a bio corresponding to the
-bh, allocating the bio if required. ll_rw_block() uses submit_bh() as before.
-
-(b) Kiobuf i/o (for raw/direct i/o):
-The ll_rw_kio() routine breaks up the kiobuf into page sized chunks and
-maps the array to one or more multi-page bios, issuing submit_bio() to
-perform the i/o on each of these.
-
-The embedded bh array in the kiobuf structure has been removed and no
-preallocation of bios is done for kiobufs. [The intent is to remove the
-blocks array as well, but it's currently in there to kludge around direct i/o.]
-Thus kiobuf allocation has switched back to using kmalloc rather than vmalloc.
-
-Todo/Observation:
-
- A single kiobuf structure is assumed to correspond to a contiguous range
- of data, so brw_kiovec() invokes ll_rw_kio for each kiobuf in a kiovec.
- So right now it wouldn't work for direct i/o on non-contiguous blocks.
- This is to be resolved.  The eventual direction is to replace kiobuf
- by kvec's.
-
- Badari Pulavarty has a patch to implement direct i/o correctly using
- bio and kvec.
-
-
-(c) Page i/o:
-Todo/Under discussion:
-
- Andrew Morton's multi-page bio patches attempt to issue multi-page
- writeouts (and reads) from the page cache, by directly building up
- large bios for submission completely bypassing the usage of buffer
- heads. This work is still in progress.
-
- Christoph Hellwig had some code that uses bios for page-io (rather than
- bh). This isn't included in bio as yet. Christoph was also working on a
- design for representing virtual/real extents as an entity and modifying
- some of the address space ops interfaces to utilize this abstraction rather
- than buffer_heads. (This is somewhat along the lines of the SGI XFS pagebuf
- abstraction, but intended to be as lightweight as possible).
-
-(d) Direct access i/o:
-Direct access requests that do not contain bios would be submitted differently
-as discussed earlier in section 1.3.
-
-Aside:
-
-  Kvec i/o:
-
-  Ben LaHaise's aio code uses a slightly different structure instead
-  of kiobufs, called a kvec_cb. This contains an array of <page, offset, len>
-  tuples (very much like the networking code), together with a callback function
-  and data pointer. This is embedded into a brw_cb structure when passed
-  to brw_kvec_async().
-
-  Now it should be possible to directly map these kvecs to a bio. Just as while
-  cloning, in this case rather than PRE_BUILT bio_vecs, we set the bi_io_vec
-  array pointer to point to the veclet array in kvecs.
-
-  TBD: In order for this to work, some changes are needed in the way multi-page
-  bios are handled today. The values of the tuples in such a vector passed in
-  from higher level code should not be modified by the block layer in the course
-  of its request processing, since that would make it hard for the higher layer
-  to continue to use the vector descriptor (kvec) after i/o completes. Instead,
-  all such transient state should either be maintained in the request structure,
-  and passed on in some way to the endio completion routine.
-
-
-4. The I/O scheduler
-I/O scheduler, a.k.a. elevator, is implemented in two layers.  Generic dispatch
-queue and specific I/O schedulers.  Unless stated otherwise, elevator is used
-to refer to both parts and I/O scheduler to specific I/O schedulers.
-
-Block layer implements generic dispatch queue in block/*.c.
-The generic dispatch queue is responsible for requeueing, handling non-fs
-requests and all other subtleties.
-
-Specific I/O schedulers are responsible for ordering normal filesystem
-requests.  They can also choose to delay certain requests to improve
-throughput or whatever purpose.  As the plural form indicates, there are
-multiple I/O schedulers.  They can be built as modules but at least one should
-be built inside the kernel.  Each queue can choose different one and can also
-change to another one dynamically.
-
-A block layer call to the i/o scheduler follows the convention elv_xxx(). This
-calls elevator_xxx_fn in the elevator switch (block/elevator.c). Oh, xxx
-and xxx might not match exactly, but use your imagination. If an elevator
-doesn't implement a function, the switch does nothing or some minimal house
-keeping work.
-
-4.1. I/O scheduler API
-
-The functions an elevator may implement are: (* are mandatory)
-elevator_merge_fn		called to query requests for merge with a bio
-
-elevator_merge_req_fn		called when two requests get merged. the one
-				which gets merged into the other one will be
-				never seen by I/O scheduler again. IOW, after
-				being merged, the request is gone.
-
-elevator_merged_fn		called when a request in the scheduler has been
-				involved in a merge. It is used in the deadline
-				scheduler for example, to reposition the request
-				if its sorting order has changed.
-
-elevator_allow_merge_fn		called whenever the block layer determines
-				that a bio can be merged into an existing
-				request safely. The io scheduler may still
-				want to stop a merge at this point if it
-				results in some sort of conflict internally,
-				this hook allows it to do that. Note however
-				that two *requests* can still be merged at later
-				time. Currently the io scheduler has no way to
-				prevent that. It can only learn about the fact
-				from elevator_merge_req_fn callback.
-
-elevator_dispatch_fn*		fills the dispatch queue with ready requests.
-				I/O schedulers are free to postpone requests by
-				not filling the dispatch queue unless @force
-				is non-zero.  Once dispatched, I/O schedulers
-				are not allowed to manipulate the requests -
-				they belong to generic dispatch queue.
-
-elevator_add_req_fn*		called to add a new request into the scheduler
-
-elevator_former_req_fn
-elevator_latter_req_fn		These return the request before or after the
-				one specified in disk sort order. Used by the
-				block layer to find merge possibilities.
-
-elevator_completed_req_fn	called when a request is completed.
-
-elevator_may_queue_fn		returns true if the scheduler wants to allow the
-				current context to queue a new request even if
-				it is over the queue limit. This must be used
-				very carefully!!
-
-elevator_set_req_fn
-elevator_put_req_fn		Must be used to allocate and free any elevator
-				specific storage for a request.
-
-elevator_activate_req_fn	Called when device driver first sees a request.
-				I/O schedulers can use this callback to
-				determine when actual execution of a request
-				starts.
-elevator_deactivate_req_fn	Called when device driver decides to delay
-				a request by requeueing it.
-
-elevator_init_fn*
-elevator_exit_fn		Allocate and free any elevator specific storage
-				for a queue.
-
-4.2 Request flows seen by I/O schedulers
-All requests seen by I/O schedulers strictly follow one of the following three
-flows.
-
- set_req_fn ->
-
- i.   add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn ->
-      (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn
- ii.  add_req_fn -> (merged_fn ->)* -> merge_req_fn
- iii. [none]
-
- -> put_req_fn
-
-4.3 I/O scheduler implementation
-The generic i/o scheduler algorithm attempts to sort/merge/batch requests for
-optimal disk scan and request servicing performance (based on generic
-principles and device capabilities), optimized for:
-i.   improved throughput
-ii.  improved latency
-iii. better utilization of h/w & CPU time
-
-Characteristics:
-
-i. Binary tree
-AS and deadline i/o schedulers use red black binary trees for disk position
-sorting and searching, and a fifo linked list for time-based searching. This
-gives good scalability and good availability of information. Requests are
-almost always dispatched in disk sort order, so a cache is kept of the next
-request in sort order to prevent binary tree lookups.
-
-This arrangement is not a generic block layer characteristic however, so
-elevators may implement queues as they please.
-
-ii. Merge hash
-AS and deadline use a hash table indexed by the last sector of a request. This
-enables merging code to quickly look up "back merge" candidates, even when
-multiple I/O streams are being performed at once on one disk.
-
-"Front merges", a new request being merged at the front of an existing request,
-are far less common than "back merges" due to the nature of most I/O patterns.
-Front merges are handled by the binary trees in AS and deadline schedulers.
-
-iii. Plugging the queue to batch requests in anticipation of opportunities for
-     merge/sort optimizations
-
-Plugging is an approach that the current i/o scheduling algorithm resorts to so
-that it collects up enough requests in the queue to be able to take
-advantage of the sorting/merging logic in the elevator. If the
-queue is empty when a request comes in, then it plugs the request queue
-(sort of like plugging the bath tub of a vessel to get fluid to build up)
-till it fills up with a few more requests, before starting to service
-the requests. This provides an opportunity to merge/sort the requests before
-passing them down to the device. There are various conditions when the queue is
-unplugged (to open up the flow again), either through a scheduled task or
-could be on demand. For example wait_on_buffer sets the unplugging going
-through sync_buffer() running blk_run_address_space(mapping). Or the caller
-can do it explicity through blk_unplug(bdev). So in the read case,
-the queue gets explicitly unplugged as part of waiting for completion on that
-buffer.
-
-Aside:
-  This is kind of controversial territory, as it's not clear if plugging is
-  always the right thing to do. Devices typically have their own queues,
-  and allowing a big queue to build up in software, while letting the device be
-  idle for a while may not always make sense. The trick is to handle the fine
-  balance between when to plug and when to open up. Also now that we have
-  multi-page bios being queued in one shot, we may not need to wait to merge
-  a big request from the broken up pieces coming by.
-
-4.4 I/O contexts
-I/O contexts provide a dynamically allocated per process data area. They may
-be used in I/O schedulers, and in the block layer (could be used for IO statis,
-priorities for example). See *io_context in block/ll_rw_blk.c, and as-iosched.c
-for an example of usage in an i/o scheduler.
-
-
-5. Scalability related changes
-
-5.1 Granular Locking: io_request_lock replaced by a per-queue lock
-
-The global io_request_lock has been removed as of 2.5, to avoid
-the scalability bottleneck it was causing, and has been replaced by more
-granular locking. The request queue structure has a pointer to the
-lock to be used for that queue. As a result, locking can now be
-per-queue, with a provision for sharing a lock across queues if
-necessary (e.g the scsi layer sets the queue lock pointers to the
-corresponding adapter lock, which results in a per host locking
-granularity). The locking semantics are the same, i.e. locking is
-still imposed by the block layer, grabbing the lock before
-request_fn execution which it means that lots of older drivers
-should still be SMP safe. Drivers are free to drop the queue
-lock themselves, if required. Drivers that explicitly used the
-io_request_lock for serialization need to be modified accordingly.
-Usually it's as easy as adding a global lock:
-
-	static DEFINE_SPINLOCK(my_driver_lock);
-
-and passing the address to that lock to blk_init_queue().
-
-5.2 64 bit sector numbers (sector_t prepares for 64 bit support)
-
-The sector number used in the bio structure has been changed to sector_t,
-which could be defined as 64 bit in preparation for 64 bit sector support.
-
-6. Other Changes/Implications
-
-6.1 Partition re-mapping handled by the generic block layer
-
-In 2.5 some of the gendisk/partition related code has been reorganized.
-Now the generic block layer performs partition-remapping early and thus
-provides drivers with a sector number relative to whole device, rather than
-having to take partition number into account in order to arrive at the true
-sector number. The routine blk_partition_remap() is invoked by
-generic_make_request even before invoking the queue specific make_request_fn,
-so the i/o scheduler also gets to operate on whole disk sector numbers. This
-should typically not require changes to block drivers, it just never gets
-to invoke its own partition sector offset calculations since all bios
-sent are offset from the beginning of the device.
-
-
-7. A Few Tips on Migration of older drivers
-
-Old-style drivers that just use CURRENT and ignores clustered requests,
-may not need much change.  The generic layer will automatically handle
-clustered requests, multi-page bios, etc for the driver.
-
-For a low performance driver or hardware that is PIO driven or just doesn't
-support scatter-gather changes should be minimal too.
-
-The following are some points to keep in mind when converting old drivers
-to bio.
-
-Drivers should use elv_next_request to pick up requests and are no longer
-supposed to handle looping directly over the request list.
-(struct request->queue has been removed)
-
-Now end_that_request_first takes an additional number_of_sectors argument.
-It used to handle always just the first buffer_head in a request, now
-it will loop and handle as many sectors (on a bio-segment granularity)
-as specified.
-
-Now bh->b_end_io is replaced by bio->bi_end_io, but most of the time the
-right thing to use is bio_endio(bio) instead.
-
-If the driver is dropping the io_request_lock from its request_fn strategy,
-then it just needs to replace that with q->queue_lock instead.
-
-As described in Sec 1.1, drivers can set max sector size, max segment size
-etc per queue now. Drivers that used to define their own merge functions i
-to handle things like this can now just use the blk_queue_* functions at
-blk_init_queue time.
-
-Drivers no longer have to map a {partition, sector offset} into the
-correct absolute location anymore, this is done by the block layer, so
-where a driver received a request ala this before:
-
-	rq->rq_dev = mk_kdev(3, 5);	/* /dev/hda5 */
-	rq->sector = 0;			/* first sector on hda5 */
-
-  it will now see
-
-	rq->rq_dev = mk_kdev(3, 0);	/* /dev/hda */
-	rq->sector = 123128;		/* offset from start of disk */
-
-As mentioned, there is no virtual mapping of a bio. For DMA, this is
-not a problem as the driver probably never will need a virtual mapping.
-Instead it needs a bus mapping (dma_map_page for a single segment or
-use dma_map_sg for scatter gather) to be able to ship it to the driver. For
-PIO drivers (or drivers that need to revert to PIO transfer once in a
-while (IDE for example)), where the CPU is doing the actual data
-transfer a virtual mapping is needed. If the driver supports highmem I/O,
-(Sec 1.1, (ii) ) it needs to use kmap_atomic or similar to temporarily map
-a bio into the virtual address space.
-
-
-8. Prior/Related/Impacted patches
-
-8.1. Earlier kiobuf patches (sct/axboe/chait/hch/mkp)
-- orig kiobuf & raw i/o patches (now in 2.4 tree)
-- direct kiobuf based i/o to devices (no intermediate bh's)
-- page i/o using kiobuf
-- kiobuf splitting for lvm (mkp)
-- elevator support for kiobuf request merging (axboe)
-8.2. Zero-copy networking (Dave Miller)
-8.3. SGI XFS - pagebuf patches - use of kiobufs
-8.4. Multi-page pioent patch for bio (Christoph Hellwig)
-8.5. Direct i/o implementation (Andrea Arcangeli) since 2.4.10-pre11
-8.6. Async i/o implementation patch (Ben LaHaise)
-8.7. EVMS layering design (IBM EVMS team)
-8.8. Larger page cache size patch (Ben LaHaise) and
-     Large page size (Daniel Phillips)
-    => larger contiguous physical memory buffers
-8.9. VM reservations patch (Ben LaHaise)
-8.10. Write clustering patches ? (Marcelo/Quintela/Riel ?)
-8.11. Block device in page cache patch (Andrea Archangeli) - now in 2.4.10+
-8.12. Multiple block-size transfers for faster raw i/o (Shailabh Nagar,
-      Badari)
-8.13  Priority based i/o scheduler - prepatches (Arjan van de Ven)
-8.14  IDE Taskfile i/o patch (Andre Hedrick)
-8.15  Multi-page writeout and readahead patches (Andrew Morton)
-8.16  Direct i/o patches for 2.5 using kvec and bio (Badari Pulavarthy)
-
-9. Other References:
-
-9.1 The Splice I/O Model - Larry McVoy (and subsequent discussions on lkml,
-and Linus' comments - Jan 2001)
-9.2 Discussions about kiobuf and bh design on lkml between sct, linus, alan
-et al - Feb-March 2001 (many of the initial thoughts that led to bio were
-brought up in this discussion thread)
-9.3 Discussions on mempool on lkml - Dec 2001.
-
diff --git a/Documentation/block/biovecs.rst b/Documentation/block/biovecs.rst
new file mode 100644
index 000000000000..86fa66c87172
--- /dev/null
+++ b/Documentation/block/biovecs.rst
@@ -0,0 +1,146 @@
+======================================
+Immutable biovecs and biovec iterators
+======================================
+
+Kent Overstreet <kmo@daterainc.com>
+
+As of 3.13, biovecs should never be modified after a bio has been submitted.
+Instead, we have a new struct bvec_iter which represents a range of a biovec -
+the iterator will be modified as the bio is completed, not the biovec.
+
+More specifically, old code that needed to partially complete a bio would
+update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
+ended up partway through a biovec, it would increment bv_offset and decrement
+bv_len by the number of bytes completed in that biovec.
+
+In the new scheme of things, everything that must be mutated in order to
+partially complete a bio is segregated into struct bvec_iter: bi_sector,
+bi_size and bi_idx have been moved there; and instead of modifying bv_offset
+and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
+bytes completed in the current bvec.
+
+There are a bunch of new helper macros for hiding the gory details - in
+particular, presenting the illusion of partially completed biovecs so that
+normal code doesn't have to deal with bi_bvec_done.
+
+ * Driver code should no longer refer to biovecs directly; we now have
+   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
+   constructed from the raw biovecs but taking into account bi_bvec_done and
+   bi_size.
+
+   bio_for_each_segment() has been updated to take a bvec_iter argument
+   instead of an integer (that corresponded to bi_idx); for a lot of code the
+   conversion just required changing the types of the arguments to
+   bio_for_each_segment().
+
+ * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
+   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
+   advances the bio integrity's iter if present.
+
+   There is a lower level advance function - bvec_iter_advance() - which takes
+   a pointer to a biovec, not a bio; this is used by the bio integrity code.
+
+What's all this get us?
+=======================
+
+Having a real iterator, and making biovecs immutable, has a number of
+advantages:
+
+ * Before, iterating over bios was very awkward when you weren't processing
+   exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c,
+   which copies the contents of one bio into another. Because the biovecs
+   wouldn't necessarily be the same size, the old code was tricky convoluted -
+   it had to walk two different bios at the same time, keeping both bi_idx and
+   and offset into the current biovec for each.
+
+   The new code is much more straightforward - have a look. This sort of
+   pattern comes up in a lot of places; a lot of drivers were essentially open
+   coding bvec iterators before, and having common implementation considerably
+   simplifies a lot of code.
+
+ * Before, any code that might need to use the biovec after the bio had been
+   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
+   it somewhere else if there was an error) had to save the entire bvec array
+   - again, this was being done in a fair number of places.
+
+ * Biovecs can be shared between multiple bios - a bvec iter can represent an
+   arbitrary range of an existing biovec, both starting and ending midway
+   through biovecs. This is what enables efficient splitting of arbitrary
+   bios. Note that this means we _only_ use bi_size to determine when we've
+   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
+   bi_size into account when constructing biovecs.
+
+ * Splitting bios is now much simpler. The old bio_split() didn't even work on
+   bios with more than a single bvec! Now, we can efficiently split arbitrary
+   size bios - because the new bio can share the old bio's biovec.
+
+   Care must be taken to ensure the biovec isn't freed while the split bio is
+   still using it, in case the original bio completes first, though. Using
+   bio_chain() when splitting bios helps with this.
+
+ * Submitting partially completed bios is now perfectly fine - this comes up
+   occasionally in stacking block drivers and various code (e.g. md and
+   bcache) had some ugly workarounds for this.
+
+   It used to be the case that submitting a partially completed bio would work
+   fine to _most_ devices, but since accessing the raw bvec array was the
+   norm, not all drivers would respect bi_idx and those would break. Now,
+   since all drivers _must_ go through the bvec iterator - and have been
+   audited to make sure they are - submitting partially completed bios is
+   perfectly fine.
+
+Other implications:
+===================
+
+ * Almost all usage of bi_idx is now incorrect and has been removed; instead,
+   where previously you would have used bi_idx you'd now use a bvec_iter,
+   probably passing it to one of the helper macros.
+
+   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
+   now use bio_iter_iovec(), which takes a bvec_iter and returns a
+   literal struct bio_vec - constructed on the fly from the raw biovec but
+   taking into account bi_bvec_done (and bi_size).
+
+ * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
+   doesn't actually own the bio. The reason is twofold: firstly, it's not
+   actually needed for iterating over the bio anymore - we only use bi_size.
+   Secondly, when cloning a bio and reusing (a portion of) the original bio's
+   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
+   over all the biovecs in the new bio - which is silly as it's not needed.
+
+   So, don't use bi_vcnt anymore.
+
+ * The current interface allows the block layer to split bios as needed, so we
+   could eliminate a lot of complexity particularly in stacked drivers. Code
+   that creates bios can then create whatever size bios are convenient, and
+   more importantly stacked drivers don't have to deal with both their own bio
+   size limitations and the limitations of the underlying devices. Thus
+   there's no need to define ->merge_bvec_fn() callbacks for individual block
+   drivers.
+
+Usage of helpers:
+=================
+
+* The following helpers whose names have the suffix of `_all` can only be used
+  on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
+  shouldn't use them because the bio may have been split before it reached the
+  driver.
+
+::
+
+	bio_for_each_segment_all()
+	bio_first_bvec_all()
+	bio_first_page_all()
+	bio_last_bvec_all()
+
+* The following helpers iterate over single-page segment. The passed 'struct
+  bio_vec' will contain a single-page IO vector during the iteration::
+
+	bio_for_each_segment()
+	bio_for_each_segment_all()
+
+* The following helpers iterate over multi-page bvec. The passed 'struct
+  bio_vec' will contain a multi-page IO vector during the iteration::
+
+	bio_for_each_bvec()
+	rq_for_each_bvec()
diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
deleted file mode 100644
index ce6eccaf5df7..000000000000
--- a/Documentation/block/biovecs.txt
+++ /dev/null
@@ -1,144 +0,0 @@
-
-Immutable biovecs and biovec iterators:
-=======================================
-
-Kent Overstreet <kmo@daterainc.com>
-
-As of 3.13, biovecs should never be modified after a bio has been submitted.
-Instead, we have a new struct bvec_iter which represents a range of a biovec -
-the iterator will be modified as the bio is completed, not the biovec.
-
-More specifically, old code that needed to partially complete a bio would
-update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
-ended up partway through a biovec, it would increment bv_offset and decrement
-bv_len by the number of bytes completed in that biovec.
-
-In the new scheme of things, everything that must be mutated in order to
-partially complete a bio is segregated into struct bvec_iter: bi_sector,
-bi_size and bi_idx have been moved there; and instead of modifying bv_offset
-and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
-bytes completed in the current bvec.
-
-There are a bunch of new helper macros for hiding the gory details - in
-particular, presenting the illusion of partially completed biovecs so that
-normal code doesn't have to deal with bi_bvec_done.
-
- * Driver code should no longer refer to biovecs directly; we now have
-   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
-   constructed from the raw biovecs but taking into account bi_bvec_done and
-   bi_size.
-
-   bio_for_each_segment() has been updated to take a bvec_iter argument
-   instead of an integer (that corresponded to bi_idx); for a lot of code the
-   conversion just required changing the types of the arguments to
-   bio_for_each_segment().
-
- * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
-   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
-   advances the bio integrity's iter if present.
-
-   There is a lower level advance function - bvec_iter_advance() - which takes
-   a pointer to a biovec, not a bio; this is used by the bio integrity code.
-
-What's all this get us?
-=======================
-
-Having a real iterator, and making biovecs immutable, has a number of
-advantages:
-
- * Before, iterating over bios was very awkward when you weren't processing
-   exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c,
-   which copies the contents of one bio into another. Because the biovecs
-   wouldn't necessarily be the same size, the old code was tricky convoluted -
-   it had to walk two different bios at the same time, keeping both bi_idx and
-   and offset into the current biovec for each.
-
-   The new code is much more straightforward - have a look. This sort of
-   pattern comes up in a lot of places; a lot of drivers were essentially open
-   coding bvec iterators before, and having common implementation considerably
-   simplifies a lot of code.
-
- * Before, any code that might need to use the biovec after the bio had been
-   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
-   it somewhere else if there was an error) had to save the entire bvec array
-   - again, this was being done in a fair number of places.
-
- * Biovecs can be shared between multiple bios - a bvec iter can represent an
-   arbitrary range of an existing biovec, both starting and ending midway
-   through biovecs. This is what enables efficient splitting of arbitrary
-   bios. Note that this means we _only_ use bi_size to determine when we've
-   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
-   bi_size into account when constructing biovecs.
-
- * Splitting bios is now much simpler. The old bio_split() didn't even work on
-   bios with more than a single bvec! Now, we can efficiently split arbitrary
-   size bios - because the new bio can share the old bio's biovec.
-
-   Care must be taken to ensure the biovec isn't freed while the split bio is
-   still using it, in case the original bio completes first, though. Using
-   bio_chain() when splitting bios helps with this.
-
- * Submitting partially completed bios is now perfectly fine - this comes up
-   occasionally in stacking block drivers and various code (e.g. md and
-   bcache) had some ugly workarounds for this.
-
-   It used to be the case that submitting a partially completed bio would work
-   fine to _most_ devices, but since accessing the raw bvec array was the
-   norm, not all drivers would respect bi_idx and those would break. Now,
-   since all drivers _must_ go through the bvec iterator - and have been
-   audited to make sure they are - submitting partially completed bios is
-   perfectly fine.
-
-Other implications:
-===================
-
- * Almost all usage of bi_idx is now incorrect and has been removed; instead,
-   where previously you would have used bi_idx you'd now use a bvec_iter,
-   probably passing it to one of the helper macros.
-
-   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
-   now use bio_iter_iovec(), which takes a bvec_iter and returns a
-   literal struct bio_vec - constructed on the fly from the raw biovec but
-   taking into account bi_bvec_done (and bi_size).
-
- * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
-   doesn't actually own the bio. The reason is twofold: firstly, it's not
-   actually needed for iterating over the bio anymore - we only use bi_size.
-   Secondly, when cloning a bio and reusing (a portion of) the original bio's
-   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
-   over all the biovecs in the new bio - which is silly as it's not needed.
-
-   So, don't use bi_vcnt anymore.
-
- * The current interface allows the block layer to split bios as needed, so we
-   could eliminate a lot of complexity particularly in stacked drivers. Code
-   that creates bios can then create whatever size bios are convenient, and
-   more importantly stacked drivers don't have to deal with both their own bio
-   size limitations and the limitations of the underlying devices. Thus
-   there's no need to define ->merge_bvec_fn() callbacks for individual block
-   drivers.
-
-Usage of helpers:
-=================
-
-* The following helpers whose names have the suffix of "_all" can only be used
-on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
-shouldn't use them because the bio may have been split before it reached the
-driver.
-
-	bio_for_each_segment_all()
-	bio_first_bvec_all()
-	bio_first_page_all()
-	bio_last_bvec_all()
-
-* The following helpers iterate over single-page segment. The passed 'struct
-bio_vec' will contain a single-page IO vector during the iteration
-
-	bio_for_each_segment()
-	bio_for_each_segment_all()
-
-* The following helpers iterate over multi-page bvec. The passed 'struct
-bio_vec' will contain a multi-page IO vector during the iteration
-
-	bio_for_each_bvec()
-	rq_for_each_bvec()
diff --git a/Documentation/block/capability.rst b/Documentation/block/capability.rst
new file mode 100644
index 000000000000..2cf258d64bbe
--- /dev/null
+++ b/Documentation/block/capability.rst
@@ -0,0 +1,18 @@
+===============================
+Generic Block Device Capability
+===============================
+
+This file documents the sysfs file block/<disk>/capability
+
+capability is a hex word indicating which capabilities a specific disk
+supports.  For more information on bits not listed here, see
+include/linux/genhd.h
+
+GENHD_FL_MEDIA_CHANGE_NOTIFY
+----------------------------
+
+Value: 4
+
+When this bit is set, the disk supports Asynchronous Notification
+of media change events.  These events will be broadcast to user
+space via kernel uevent.
diff --git a/Documentation/block/capability.txt b/Documentation/block/capability.txt
deleted file mode 100644
index 2f1729424ef4..000000000000
--- a/Documentation/block/capability.txt
+++ /dev/null
@@ -1,15 +0,0 @@
-Generic Block Device Capability
-===============================================================================
-This file documents the sysfs file block/<disk>/capability
-
-capability is a hex word indicating which capabilities a specific disk
-supports.  For more information on bits not listed here, see
-include/linux/genhd.h
-
-Capability				Value
--------------------------------------------------------------------------------
-GENHD_FL_MEDIA_CHANGE_NOTIFY		4
-	When this bit is set, the disk supports Asynchronous Notification
-	of media change events.  These events will be broadcast to user
-	space via kernel uevent.
-
diff --git a/Documentation/block/cmdline-partition.rst b/Documentation/block/cmdline-partition.rst
new file mode 100644
index 000000000000..530bedff548a
--- /dev/null
+++ b/Documentation/block/cmdline-partition.rst
@@ -0,0 +1,53 @@
+==============================================
+Embedded device command line partition parsing
+==============================================
+
+The "blkdevparts" command line option adds support for reading the
+block device partition table from the kernel command line.
+
+It is typically used for fixed block (eMMC) embedded devices.
+It has no MBR, so saves storage space. Bootloader can be easily accessed
+by absolute address of data on the block device.
+Users can easily change the partition.
+
+The format for the command line is just like mtdparts:
+
+blkdevparts=<blkdev-def>[;<blkdev-def>]
+  <blkdev-def> := <blkdev-id>:<partdef>[,<partdef>]
+    <partdef> := <size>[@<offset>](part-name)
+
+<blkdev-id>
+    block device disk name. Embedded device uses fixed block device.
+    Its disk name is also fixed, such as: mmcblk0, mmcblk1, mmcblk0boot0.
+
+<size>
+    partition size, in bytes, such as: 512, 1m, 1G.
+    size may contain an optional suffix of (upper or lower case):
+
+      K, M, G, T, P, E.
+
+    "-" is used to denote all remaining space.
+
+<offset>
+    partition start address, in bytes.
+    offset may contain an optional suffix of (upper or lower case):
+
+      K, M, G, T, P, E.
+
+(part-name)
+    partition name. Kernel sends uevent with "PARTNAME". Application can
+    create a link to block device partition with the name "PARTNAME".
+    User space application can access partition by partition name.
+
+Example:
+
+    eMMC disk names are "mmcblk0" and "mmcblk0boot0".
+
+  bootargs::
+
+    'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'
+
+  dmesg::
+
+    mmcblk0: p1(data0) p2(data1) p3()
+    mmcblk0boot0: p1(boot) p2(kernel)
diff --git a/Documentation/block/cmdline-partition.txt b/Documentation/block/cmdline-partition.txt
deleted file mode 100644
index 760a3f7c3ed4..000000000000
--- a/Documentation/block/cmdline-partition.txt
+++ /dev/null
@@ -1,46 +0,0 @@
-Embedded device command line partition parsing
-=====================================================================
-
-The "blkdevparts" command line option adds support for reading the
-block device partition table from the kernel command line.
-
-It is typically used for fixed block (eMMC) embedded devices.
-It has no MBR, so saves storage space. Bootloader can be easily accessed
-by absolute address of data on the block device.
-Users can easily change the partition.
-
-The format for the command line is just like mtdparts:
-
-blkdevparts=<blkdev-def>[;<blkdev-def>]
-  <blkdev-def> := <blkdev-id>:<partdef>[,<partdef>]
-    <partdef> := <size>[@<offset>](part-name)
-
-<blkdev-id>
-    block device disk name. Embedded device uses fixed block device.
-    Its disk name is also fixed, such as: mmcblk0, mmcblk1, mmcblk0boot0.
-
-<size>
-    partition size, in bytes, such as: 512, 1m, 1G.
-    size may contain an optional suffix of (upper or lower case):
-      K, M, G, T, P, E.
-    "-" is used to denote all remaining space.
-
-<offset>
-    partition start address, in bytes.
-    offset may contain an optional suffix of (upper or lower case):
-      K, M, G, T, P, E.
-
-(part-name)
-    partition name. Kernel sends uevent with "PARTNAME". Application can
-    create a link to block device partition with the name "PARTNAME".
-    User space application can access partition by partition name.
-
-Example:
-    eMMC disk names are "mmcblk0" and "mmcblk0boot0".
-
-  bootargs:
-    'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'
-
-  dmesg:
-    mmcblk0: p1(data0) p2(data1) p3()
-    mmcblk0boot0: p1(boot) p2(kernel)
diff --git a/Documentation/block/data-integrity.rst b/Documentation/block/data-integrity.rst
new file mode 100644
index 000000000000..4f2452a95c43
--- /dev/null
+++ b/Documentation/block/data-integrity.rst
@@ -0,0 +1,291 @@
+﻿==============
+Data Integrity
+==============
+
+1. Introduction
+===============
+
+Modern filesystems feature checksumming of data and metadata to
+protect against data corruption.  However, the detection of the
+corruption is done at read time which could potentially be months
+after the data was written.  At that point the original data that the
+application tried to write is most likely lost.
+
+The solution is to ensure that the disk is actually storing what the
+application meant it to.  Recent additions to both the SCSI family
+protocols (SBC Data Integrity Field, SCC protection proposal) as well
+as SATA/T13 (External Path Protection) try to remedy this by adding
+support for appending integrity metadata to an I/O.  The integrity
+metadata (or protection information in SCSI terminology) includes a
+checksum for each sector as well as an incrementing counter that
+ensures the individual sectors are written in the right order.  And
+for some protection schemes also that the I/O is written to the right
+place on disk.
+
+Current storage controllers and devices implement various protective
+measures, for instance checksumming and scrubbing.  But these
+technologies are working in their own isolated domains or at best
+between adjacent nodes in the I/O path.  The interesting thing about
+DIF and the other integrity extensions is that the protection format
+is well defined and every node in the I/O path can verify the
+integrity of the I/O and reject it if corruption is detected.  This
+allows not only corruption prevention but also isolation of the point
+of failure.
+
+2. The Data Integrity Extensions
+================================
+
+As written, the protocol extensions only protect the path between
+controller and storage device.  However, many controllers actually
+allow the operating system to interact with the integrity metadata
+(IMD).  We have been working with several FC/SAS HBA vendors to enable
+the protection information to be transferred to and from their
+controllers.
+
+The SCSI Data Integrity Field works by appending 8 bytes of protection
+information to each sector.  The data + integrity metadata is stored
+in 520 byte sectors on disk.  Data + IMD are interleaved when
+transferred between the controller and target.  The T13 proposal is
+similar.
+
+Because it is highly inconvenient for operating systems to deal with
+520 (and 4104) byte sectors, we approached several HBA vendors and
+encouraged them to allow separation of the data and integrity metadata
+scatter-gather lists.
+
+The controller will interleave the buffers on write and split them on
+read.  This means that Linux can DMA the data buffers to and from
+host memory without changes to the page cache.
+
+Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
+is somewhat heavy to compute in software.  Benchmarks found that
+calculating this checksum had a significant impact on system
+performance for a number of workloads.  Some controllers allow a
+lighter-weight checksum to be used when interfacing with the operating
+system.  Emulex, for instance, supports the TCP/IP checksum instead.
+The IP checksum received from the OS is converted to the 16-bit CRC
+when writing and vice versa.  This allows the integrity metadata to be
+generated by Linux or the application at very low cost (comparable to
+software RAID5).
+
+The IP checksum is weaker than the CRC in terms of detecting bit
+errors.  However, the strength is really in the separation of the data
+buffers and the integrity metadata.  These two distinct buffers must
+match up for an I/O to complete.
+
+The separation of the data and integrity metadata buffers as well as
+the choice in checksums is referred to as the Data Integrity
+Extensions.  As these extensions are outside the scope of the protocol
+bodies (T10, T13), Oracle and its partners are trying to standardize
+them within the Storage Networking Industry Association.
+
+3. Kernel Changes
+=================
+
+The data integrity framework in Linux enables protection information
+to be pinned to I/Os and sent to/received from controllers that
+support it.
+
+The advantage to the integrity extensions in SCSI and SATA is that
+they enable us to protect the entire path from application to storage
+device.  However, at the same time this is also the biggest
+disadvantage. It means that the protection information must be in a
+format that can be understood by the disk.
+
+Generally Linux/POSIX applications are agnostic to the intricacies of
+the storage devices they are accessing.  The virtual filesystem switch
+and the block layer make things like hardware sector size and
+transport protocols completely transparent to the application.
+
+However, this level of detail is required when preparing the
+protection information to send to a disk.  Consequently, the very
+concept of an end-to-end protection scheme is a layering violation.
+It is completely unreasonable for an application to be aware whether
+it is accessing a SCSI or SATA disk.
+
+The data integrity support implemented in Linux attempts to hide this
+from the application.  As far as the application (and to some extent
+the kernel) is concerned, the integrity metadata is opaque information
+that's attached to the I/O.
+
+The current implementation allows the block layer to automatically
+generate the protection information for any I/O.  Eventually the
+intent is to move the integrity metadata calculation to userspace for
+user data.  Metadata and other I/O that originates within the kernel
+will still use the automatic generation interface.
+
+Some storage devices allow each hardware sector to be tagged with a
+16-bit value.  The owner of this tag space is the owner of the block
+device.  I.e. the filesystem in most cases.  The filesystem can use
+this extra space to tag sectors as they see fit.  Because the tag
+space is limited, the block interface allows tagging bigger chunks by
+way of interleaving.  This way, 8*16 bits of information can be
+attached to a typical 4KB filesystem block.
+
+This also means that applications such as fsck and mkfs will need
+access to manipulate the tags from user space.  A passthrough
+interface for this is being worked on.
+
+
+4. Block Layer Implementation Details
+=====================================
+
+4.1 Bio
+-------
+
+The data integrity patches add a new field to struct bio when
+CONFIG_BLK_DEV_INTEGRITY is enabled.  bio_integrity(bio) returns a
+pointer to a struct bip which contains the bio integrity payload.
+Essentially a bip is a trimmed down struct bio which holds a bio_vec
+containing the integrity metadata and the required housekeeping
+information (bvec pool, vector count, etc.)
+
+A kernel subsystem can enable data integrity protection on a bio by
+calling bio_integrity_alloc(bio).  This will allocate and attach the
+bip to the bio.
+
+Individual pages containing integrity metadata can subsequently be
+attached using bio_integrity_add_page().
+
+bio_free() will automatically free the bip.
+
+
+4.2 Block Device
+----------------
+
+Because the format of the protection data is tied to the physical
+disk, each block device has been extended with a block integrity
+profile (struct blk_integrity).  This optional profile is registered
+with the block layer using blk_integrity_register().
+
+The profile contains callback functions for generating and verifying
+the protection data, as well as getting and setting application tags.
+The profile also contains a few constants to aid in completing,
+merging and splitting the integrity metadata.
+
+Layered block devices will need to pick a profile that's appropriate
+for all subdevices.  blk_integrity_compare() can help with that.  DM
+and MD linear, RAID0 and RAID1 are currently supported.  RAID4/5/6
+will require extra work due to the application tag.
+
+
+5.0 Block Layer Integrity API
+=============================
+
+5.1 Normal Filesystem
+---------------------
+
+    The normal filesystem is unaware that the underlying block device
+    is capable of sending/receiving integrity metadata.  The IMD will
+    be automatically generated by the block layer at submit_bio() time
+    in case of a WRITE.  A READ request will cause the I/O integrity
+    to be verified upon completion.
+
+    IMD generation and verification can be toggled using the::
+
+      /sys/block/<bdev>/integrity/write_generate
+
+    and::
+
+      /sys/block/<bdev>/integrity/read_verify
+
+    flags.
+
+
+5.2 Integrity-Aware Filesystem
+------------------------------
+
+    A filesystem that is integrity-aware can prepare I/Os with IMD
+    attached.  It can also use the application tag space if this is
+    supported by the block device.
+
+
+    `bool bio_integrity_prep(bio);`
+
+      To generate IMD for WRITE and to set up buffers for READ, the
+      filesystem must call bio_integrity_prep(bio).
+
+      Prior to calling this function, the bio data direction and start
+      sector must be set, and the bio should have all data pages
+      added.  It is up to the caller to ensure that the bio does not
+      change while I/O is in progress.
+      Complete bio with error if prepare failed for some reson.
+
+
+5.3 Passing Existing Integrity Metadata
+---------------------------------------
+
+    Filesystems that either generate their own integrity metadata or
+    are capable of transferring IMD from user space can use the
+    following calls:
+
+
+    `struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);`
+
+      Allocates the bio integrity payload and hangs it off of the bio.
+      nr_pages indicate how many pages of protection data need to be
+      stored in the integrity bio_vec list (similar to bio_alloc()).
+
+      The integrity payload will be freed at bio_free() time.
+
+
+    `int bio_integrity_add_page(bio, page, len, offset);`
+
+      Attaches a page containing integrity metadata to an existing
+      bio.  The bio must have an existing bip,
+      i.e. bio_integrity_alloc() must have been called.  For a WRITE,
+      the integrity metadata in the pages must be in a format
+      understood by the target device with the notable exception that
+      the sector numbers will be remapped as the request traverses the
+      I/O stack.  This implies that the pages added using this call
+      will be modified during I/O!  The first reference tag in the
+      integrity metadata must have a value of bip->bip_sector.
+
+      Pages can be added using bio_integrity_add_page() as long as
+      there is room in the bip bio_vec array (nr_pages).
+
+      Upon completion of a READ operation, the attached pages will
+      contain the integrity metadata received from the storage device.
+      It is up to the receiver to process them and verify data
+      integrity upon completion.
+
+
+5.4 Registering A Block Device As Capable Of Exchanging Integrity Metadata
+--------------------------------------------------------------------------
+
+    To enable integrity exchange on a block device the gendisk must be
+    registered as capable:
+
+    `int blk_integrity_register(gendisk, blk_integrity);`
+
+      The blk_integrity struct is a template and should contain the
+      following::
+
+        static struct blk_integrity my_profile = {
+            .name                   = "STANDARDSBODY-TYPE-VARIANT-CSUM",
+            .generate_fn            = my_generate_fn,
+	    .verify_fn              = my_verify_fn,
+	    .tuple_size             = sizeof(struct my_tuple_size),
+	    .tag_size               = <tag bytes per hw sector>,
+        };
+
+      'name' is a text string which will be visible in sysfs.  This is
+      part of the userland API so chose it carefully and never change
+      it.  The format is standards body-type-variant.
+      E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
+
+      'generate_fn' generates appropriate integrity metadata (for WRITE).
+
+      'verify_fn' verifies that the data buffer matches the integrity
+      metadata.
+
+      'tuple_size' must be set to match the size of the integrity
+      metadata per sector.  I.e. 8 for DIF and EPP.
+
+      'tag_size' must be set to identify how many bytes of tag space
+      are available per hardware sector.  For DIF this is either 2 or
+      0 depending on the value of the Control Mode Page ATO bit.
+
+----------------------------------------------------------------------
+
+2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt
deleted file mode 100644
index 934c44ea0c57..000000000000
--- a/Documentation/block/data-integrity.txt
+++ /dev/null
@@ -1,281 +0,0 @@
-----------------------------------------------------------------------
-1. INTRODUCTION
-
-Modern filesystems feature checksumming of data and metadata to
-protect against data corruption.  However, the detection of the
-corruption is done at read time which could potentially be months
-after the data was written.  At that point the original data that the
-application tried to write is most likely lost.
-
-The solution is to ensure that the disk is actually storing what the
-application meant it to.  Recent additions to both the SCSI family
-protocols (SBC Data Integrity Field, SCC protection proposal) as well
-as SATA/T13 (External Path Protection) try to remedy this by adding
-support for appending integrity metadata to an I/O.  The integrity
-metadata (or protection information in SCSI terminology) includes a
-checksum for each sector as well as an incrementing counter that
-ensures the individual sectors are written in the right order.  And
-for some protection schemes also that the I/O is written to the right
-place on disk.
-
-Current storage controllers and devices implement various protective
-measures, for instance checksumming and scrubbing.  But these
-technologies are working in their own isolated domains or at best
-between adjacent nodes in the I/O path.  The interesting thing about
-DIF and the other integrity extensions is that the protection format
-is well defined and every node in the I/O path can verify the
-integrity of the I/O and reject it if corruption is detected.  This
-allows not only corruption prevention but also isolation of the point
-of failure.
-
-----------------------------------------------------------------------
-2. THE DATA INTEGRITY EXTENSIONS
-
-As written, the protocol extensions only protect the path between
-controller and storage device.  However, many controllers actually
-allow the operating system to interact with the integrity metadata
-(IMD).  We have been working with several FC/SAS HBA vendors to enable
-the protection information to be transferred to and from their
-controllers.
-
-The SCSI Data Integrity Field works by appending 8 bytes of protection
-information to each sector.  The data + integrity metadata is stored
-in 520 byte sectors on disk.  Data + IMD are interleaved when
-transferred between the controller and target.  The T13 proposal is
-similar.
-
-Because it is highly inconvenient for operating systems to deal with
-520 (and 4104) byte sectors, we approached several HBA vendors and
-encouraged them to allow separation of the data and integrity metadata
-scatter-gather lists.
-
-The controller will interleave the buffers on write and split them on
-read.  This means that Linux can DMA the data buffers to and from
-host memory without changes to the page cache.
-
-Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
-is somewhat heavy to compute in software.  Benchmarks found that
-calculating this checksum had a significant impact on system
-performance for a number of workloads.  Some controllers allow a
-lighter-weight checksum to be used when interfacing with the operating
-system.  Emulex, for instance, supports the TCP/IP checksum instead.
-The IP checksum received from the OS is converted to the 16-bit CRC
-when writing and vice versa.  This allows the integrity metadata to be
-generated by Linux or the application at very low cost (comparable to
-software RAID5).
-
-The IP checksum is weaker than the CRC in terms of detecting bit
-errors.  However, the strength is really in the separation of the data
-buffers and the integrity metadata.  These two distinct buffers must
-match up for an I/O to complete.
-
-The separation of the data and integrity metadata buffers as well as
-the choice in checksums is referred to as the Data Integrity
-Extensions.  As these extensions are outside the scope of the protocol
-bodies (T10, T13), Oracle and its partners are trying to standardize
-them within the Storage Networking Industry Association.
-
-----------------------------------------------------------------------
-3. KERNEL CHANGES
-
-The data integrity framework in Linux enables protection information
-to be pinned to I/Os and sent to/received from controllers that
-support it.
-
-The advantage to the integrity extensions in SCSI and SATA is that
-they enable us to protect the entire path from application to storage
-device.  However, at the same time this is also the biggest
-disadvantage. It means that the protection information must be in a
-format that can be understood by the disk.
-
-Generally Linux/POSIX applications are agnostic to the intricacies of
-the storage devices they are accessing.  The virtual filesystem switch
-and the block layer make things like hardware sector size and
-transport protocols completely transparent to the application.
-
-However, this level of detail is required when preparing the
-protection information to send to a disk.  Consequently, the very
-concept of an end-to-end protection scheme is a layering violation.
-It is completely unreasonable for an application to be aware whether
-it is accessing a SCSI or SATA disk.
-
-The data integrity support implemented in Linux attempts to hide this
-from the application.  As far as the application (and to some extent
-the kernel) is concerned, the integrity metadata is opaque information
-that's attached to the I/O.
-
-The current implementation allows the block layer to automatically
-generate the protection information for any I/O.  Eventually the
-intent is to move the integrity metadata calculation to userspace for
-user data.  Metadata and other I/O that originates within the kernel
-will still use the automatic generation interface.
-
-Some storage devices allow each hardware sector to be tagged with a
-16-bit value.  The owner of this tag space is the owner of the block
-device.  I.e. the filesystem in most cases.  The filesystem can use
-this extra space to tag sectors as they see fit.  Because the tag
-space is limited, the block interface allows tagging bigger chunks by
-way of interleaving.  This way, 8*16 bits of information can be
-attached to a typical 4KB filesystem block.
-
-This also means that applications such as fsck and mkfs will need
-access to manipulate the tags from user space.  A passthrough
-interface for this is being worked on.
-
-
-----------------------------------------------------------------------
-4. BLOCK LAYER IMPLEMENTATION DETAILS
-
-4.1 BIO
-
-The data integrity patches add a new field to struct bio when
-CONFIG_BLK_DEV_INTEGRITY is enabled.  bio_integrity(bio) returns a
-pointer to a struct bip which contains the bio integrity payload.
-Essentially a bip is a trimmed down struct bio which holds a bio_vec
-containing the integrity metadata and the required housekeeping
-information (bvec pool, vector count, etc.)
-
-A kernel subsystem can enable data integrity protection on a bio by
-calling bio_integrity_alloc(bio).  This will allocate and attach the
-bip to the bio.
-
-Individual pages containing integrity metadata can subsequently be
-attached using bio_integrity_add_page().
-
-bio_free() will automatically free the bip.
-
-
-4.2 BLOCK DEVICE
-
-Because the format of the protection data is tied to the physical
-disk, each block device has been extended with a block integrity
-profile (struct blk_integrity).  This optional profile is registered
-with the block layer using blk_integrity_register().
-
-The profile contains callback functions for generating and verifying
-the protection data, as well as getting and setting application tags.
-The profile also contains a few constants to aid in completing,
-merging and splitting the integrity metadata.
-
-Layered block devices will need to pick a profile that's appropriate
-for all subdevices.  blk_integrity_compare() can help with that.  DM
-and MD linear, RAID0 and RAID1 are currently supported.  RAID4/5/6
-will require extra work due to the application tag.
-
-
-----------------------------------------------------------------------
-5.0 BLOCK LAYER INTEGRITY API
-
-5.1 NORMAL FILESYSTEM
-
-    The normal filesystem is unaware that the underlying block device
-    is capable of sending/receiving integrity metadata.  The IMD will
-    be automatically generated by the block layer at submit_bio() time
-    in case of a WRITE.  A READ request will cause the I/O integrity
-    to be verified upon completion.
-
-    IMD generation and verification can be toggled using the
-
-      /sys/block/<bdev>/integrity/write_generate
-
-    and
-
-      /sys/block/<bdev>/integrity/read_verify
-
-    flags.
-
-
-5.2 INTEGRITY-AWARE FILESYSTEM
-
-    A filesystem that is integrity-aware can prepare I/Os with IMD
-    attached.  It can also use the application tag space if this is
-    supported by the block device.
-
-
-    bool bio_integrity_prep(bio);
-
-      To generate IMD for WRITE and to set up buffers for READ, the
-      filesystem must call bio_integrity_prep(bio).
-
-      Prior to calling this function, the bio data direction and start
-      sector must be set, and the bio should have all data pages
-      added.  It is up to the caller to ensure that the bio does not
-      change while I/O is in progress.
-      Complete bio with error if prepare failed for some reson.
-
-
-5.3 PASSING EXISTING INTEGRITY METADATA
-
-    Filesystems that either generate their own integrity metadata or
-    are capable of transferring IMD from user space can use the
-    following calls:
-
-
-    struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
-
-      Allocates the bio integrity payload and hangs it off of the bio.
-      nr_pages indicate how many pages of protection data need to be
-      stored in the integrity bio_vec list (similar to bio_alloc()).
-
-      The integrity payload will be freed at bio_free() time.
-
-
-    int bio_integrity_add_page(bio, page, len, offset);
-
-      Attaches a page containing integrity metadata to an existing
-      bio.  The bio must have an existing bip,
-      i.e. bio_integrity_alloc() must have been called.  For a WRITE,
-      the integrity metadata in the pages must be in a format
-      understood by the target device with the notable exception that
-      the sector numbers will be remapped as the request traverses the
-      I/O stack.  This implies that the pages added using this call
-      will be modified during I/O!  The first reference tag in the
-      integrity metadata must have a value of bip->bip_sector.
-
-      Pages can be added using bio_integrity_add_page() as long as
-      there is room in the bip bio_vec array (nr_pages).
-
-      Upon completion of a READ operation, the attached pages will
-      contain the integrity metadata received from the storage device.
-      It is up to the receiver to process them and verify data
-      integrity upon completion.
-
-
-5.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
-    METADATA
-
-    To enable integrity exchange on a block device the gendisk must be
-    registered as capable:
-
-    int blk_integrity_register(gendisk, blk_integrity);
-
-      The blk_integrity struct is a template and should contain the
-      following:
-
-        static struct blk_integrity my_profile = {
-            .name                   = "STANDARDSBODY-TYPE-VARIANT-CSUM",
-            .generate_fn            = my_generate_fn,
-       	    .verify_fn              = my_verify_fn,
-	    .tuple_size             = sizeof(struct my_tuple_size),
-	    .tag_size               = <tag bytes per hw sector>,
-        };
-
-      'name' is a text string which will be visible in sysfs.  This is
-      part of the userland API so chose it carefully and never change
-      it.  The format is standards body-type-variant.
-      E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
-
-      'generate_fn' generates appropriate integrity metadata (for WRITE).
-
-      'verify_fn' verifies that the data buffer matches the integrity
-      metadata.
-
-      'tuple_size' must be set to match the size of the integrity
-      metadata per sector.  I.e. 8 for DIF and EPP.
-
-      'tag_size' must be set to identify how many bytes of tag space
-      are available per hardware sector.  For DIF this is either 2 or
-      0 depending on the value of the Control Mode Page ATO bit.
-
-----------------------------------------------------------------------
-2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/Documentation/block/deadline-iosched.rst b/Documentation/block/deadline-iosched.rst
new file mode 100644
index 000000000000..9f5c5a4c370e
--- /dev/null
+++ b/Documentation/block/deadline-iosched.rst
@@ -0,0 +1,72 @@
+==============================
+Deadline IO scheduler tunables
+==============================
+
+This little file attempts to document how the deadline io scheduler works.
+In particular, it will clarify the meaning of the exposed tunables that may be
+of interest to power users.
+
+Selecting IO schedulers
+-----------------------
+Refer to Documentation/block/switching-sched.rst for information on
+selecting an io scheduler on a per-device basis.
+
+------------------------------------------------------------------------------
+
+read_expire	(in ms)
+-----------------------
+
+The goal of the deadline io scheduler is to attempt to guarantee a start
+service time for a request. As we focus mainly on read latencies, this is
+tunable. When a read request first enters the io scheduler, it is assigned
+a deadline that is the current time + the read_expire value in units of
+milliseconds.
+
+
+write_expire	(in ms)
+-----------------------
+
+Similar to read_expire mentioned above, but for writes.
+
+
+fifo_batch	(number of requests)
+------------------------------------
+
+Requests are grouped into ``batches`` of a particular data direction (read or
+write) which are serviced in increasing sector order.  To limit extra seeking,
+deadline expiries are only checked between batches.  fifo_batch controls the
+maximum number of requests per batch.
+
+This parameter tunes the balance between per-request latency and aggregate
+throughput.  When low latency is the primary concern, smaller is better (where
+a value of 1 yields first-come first-served behaviour).  Increasing fifo_batch
+generally improves throughput, at the cost of latency variation.
+
+
+writes_starved	(number of dispatches)
+--------------------------------------
+
+When we have to move requests from the io scheduler queue to the block
+device dispatch queue, we always give a preference to reads. However, we
+don't want to starve writes indefinitely either. So writes_starved controls
+how many times we give preference to reads over writes. When that has been
+done writes_starved number of times, we dispatch some writes based on the
+same criteria as reads.
+
+
+front_merges	(bool)
+----------------------
+
+Sometimes it happens that a request enters the io scheduler that is contiguous
+with a request that is already on the queue. Either it fits in the back of that
+request, or it fits at the front. That is called either a back merge candidate
+or a front merge candidate. Due to the way files are typically laid out,
+back merges are much more common than front merges. For some work loads, you
+may even know that it is a waste of time to spend any time attempting to
+front merge requests. Setting front_merges to 0 disables this functionality.
+Front merges may still occur due to the cached last_merge hint, but since
+that comes at basically 0 cost we leave that on. We simply disable the
+rbtree front sector lookup when the io scheduler merge function is called.
+
+
+Nov 11 2002, Jens Axboe <jens.axboe@oracle.com>
diff --git a/Documentation/block/deadline-iosched.txt b/Documentation/block/deadline-iosched.txt
deleted file mode 100644
index 2d82c80322cb..000000000000
--- a/Documentation/block/deadline-iosched.txt
+++ /dev/null
@@ -1,75 +0,0 @@
-Deadline IO scheduler tunables
-==============================
-
-This little file attempts to document how the deadline io scheduler works.
-In particular, it will clarify the meaning of the exposed tunables that may be
-of interest to power users.
-
-Selecting IO schedulers
------------------------
-Refer to Documentation/block/switching-sched.txt for information on
-selecting an io scheduler on a per-device basis.
-
-
-********************************************************************************
-
-
-read_expire	(in ms)
------------
-
-The goal of the deadline io scheduler is to attempt to guarantee a start
-service time for a request. As we focus mainly on read latencies, this is
-tunable. When a read request first enters the io scheduler, it is assigned
-a deadline that is the current time + the read_expire value in units of
-milliseconds.
-
-
-write_expire	(in ms)
------------
-
-Similar to read_expire mentioned above, but for writes.
-
-
-fifo_batch	(number of requests)
-----------
-
-Requests are grouped into ``batches'' of a particular data direction (read or
-write) which are serviced in increasing sector order.  To limit extra seeking,
-deadline expiries are only checked between batches.  fifo_batch controls the
-maximum number of requests per batch.
-
-This parameter tunes the balance between per-request latency and aggregate
-throughput.  When low latency is the primary concern, smaller is better (where
-a value of 1 yields first-come first-served behaviour).  Increasing fifo_batch
-generally improves throughput, at the cost of latency variation.
-
-
-writes_starved	(number of dispatches)
---------------
-
-When we have to move requests from the io scheduler queue to the block
-device dispatch queue, we always give a preference to reads. However, we
-don't want to starve writes indefinitely either. So writes_starved controls
-how many times we give preference to reads over writes. When that has been
-done writes_starved number of times, we dispatch some writes based on the
-same criteria as reads.
-
-
-front_merges	(bool)
-------------
-
-Sometimes it happens that a request enters the io scheduler that is contiguous
-with a request that is already on the queue. Either it fits in the back of that
-request, or it fits at the front. That is called either a back merge candidate
-or a front merge candidate. Due to the way files are typically laid out,
-back merges are much more common than front merges. For some work loads, you
-may even know that it is a waste of time to spend any time attempting to
-front merge requests. Setting front_merges to 0 disables this functionality.
-Front merges may still occur due to the cached last_merge hint, but since
-that comes at basically 0 cost we leave that on. We simply disable the
-rbtree front sector lookup when the io scheduler merge function is called.
-
-
-Nov 11 2002, Jens Axboe <jens.axboe@oracle.com>
-
-
diff --git a/Documentation/block/index.rst b/Documentation/block/index.rst
new file mode 100644
index 000000000000..8cd226a0e86e
--- /dev/null
+++ b/Documentation/block/index.rst
@@ -0,0 +1,25 @@
+:orphan:
+
+=====
+Block
+=====
+
+.. toctree::
+   :maxdepth: 1
+
+   bfq-iosched
+   biodoc
+   biovecs
+   capability
+   cmdline-partition
+   data-integrity
+   deadline-iosched
+   ioprio
+   kyber-iosched
+   null_blk
+   pr
+   queue-sysfs
+   request
+   stat
+   switching-sched
+   writeback_cache_control
diff --git a/Documentation/block/ioprio.rst b/Documentation/block/ioprio.rst
new file mode 100644
index 000000000000..f72b0de65af7
--- /dev/null
+++ b/Documentation/block/ioprio.rst
@@ -0,0 +1,182 @@
+===================
+Block io priorities
+===================
+
+
+Intro
+-----
+
+With the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io
+priorities are supported for reads on files.  This enables users to io nice
+processes or process groups, similar to what has been possible with cpu
+scheduling for ages.  This document mainly details the current possibilities
+with cfq; other io schedulers do not support io priorities thus far.
+
+Scheduling classes
+------------------
+
+CFQ implements three generic scheduling classes that determine how io is
+served for a process.
+
+IOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given
+higher priority than any other in the system, processes from this class are
+given first access to the disk every time. Thus it needs to be used with some
+care, one io RT process can starve the entire system. Within the RT class,
+there are 8 levels of class data that determine exactly how much time this
+process needs the disk for on each service. In the future this might change
+to be more directly mappable to performance, by passing in a wanted data
+rate instead.
+
+IOPRIO_CLASS_BE: This is the best-effort scheduling class, which is the default
+for any process that hasn't set a specific io priority. The class data
+determines how much io bandwidth the process will get, it's directly mappable
+to the cpu nice levels just more coarsely implemented. 0 is the highest
+BE prio level, 7 is the lowest. The mapping between cpu nice level and io
+nice level is determined as: io_nice = (cpu_nice + 20) / 5.
+
+IOPRIO_CLASS_IDLE: This is the idle scheduling class, processes running at this
+level only get io time when no one else needs the disk. The idle class has no
+class data, since it doesn't really apply here.
+
+Tools
+-----
+
+See below for a sample ionice tool. Usage::
+
+	# ionice -c<class> -n<level> -p<pid>
+
+If pid isn't given, the current process is assumed. IO priority settings
+are inherited on fork, so you can use ionice to start the process at a given
+level::
+
+	# ionice -c2 -n0 /bin/ls
+
+will run ls at the best-effort scheduling class at the highest priority.
+For a running process, you can give the pid instead::
+
+	# ionice -c1 -n2 -p100
+
+will change pid 100 to run at the realtime scheduling class, at priority 2.
+
+ionice.c tool::
+
+  #include <stdio.h>
+  #include <stdlib.h>
+  #include <errno.h>
+  #include <getopt.h>
+  #include <unistd.h>
+  #include <sys/ptrace.h>
+  #include <asm/unistd.h>
+
+  extern int sys_ioprio_set(int, int, int);
+  extern int sys_ioprio_get(int, int);
+
+  #if defined(__i386__)
+  #define __NR_ioprio_set		289
+  #define __NR_ioprio_get		290
+  #elif defined(__ppc__)
+  #define __NR_ioprio_set		273
+  #define __NR_ioprio_get		274
+  #elif defined(__x86_64__)
+  #define __NR_ioprio_set		251
+  #define __NR_ioprio_get		252
+  #elif defined(__ia64__)
+  #define __NR_ioprio_set		1274
+  #define __NR_ioprio_get		1275
+  #else
+  #error "Unsupported arch"
+  #endif
+
+  static inline int ioprio_set(int which, int who, int ioprio)
+  {
+	return syscall(__NR_ioprio_set, which, who, ioprio);
+  }
+
+  static inline int ioprio_get(int which, int who)
+  {
+	return syscall(__NR_ioprio_get, which, who);
+  }
+
+  enum {
+	IOPRIO_CLASS_NONE,
+	IOPRIO_CLASS_RT,
+	IOPRIO_CLASS_BE,
+	IOPRIO_CLASS_IDLE,
+  };
+
+  enum {
+	IOPRIO_WHO_PROCESS = 1,
+	IOPRIO_WHO_PGRP,
+	IOPRIO_WHO_USER,
+  };
+
+  #define IOPRIO_CLASS_SHIFT	13
+
+  const char *to_prio[] = { "none", "realtime", "best-effort", "idle", };
+
+  int main(int argc, char *argv[])
+  {
+	int ioprio = 4, set = 0, ioprio_class = IOPRIO_CLASS_BE;
+	int c, pid = 0;
+
+	while ((c = getopt(argc, argv, "+n:c:p:")) != EOF) {
+		switch (c) {
+		case 'n':
+			ioprio = strtol(optarg, NULL, 10);
+			set = 1;
+			break;
+		case 'c':
+			ioprio_class = strtol(optarg, NULL, 10);
+			set = 1;
+			break;
+		case 'p':
+			pid = strtol(optarg, NULL, 10);
+			break;
+		}
+	}
+
+	switch (ioprio_class) {
+		case IOPRIO_CLASS_NONE:
+			ioprio_class = IOPRIO_CLASS_BE;
+			break;
+		case IOPRIO_CLASS_RT:
+		case IOPRIO_CLASS_BE:
+			break;
+		case IOPRIO_CLASS_IDLE:
+			ioprio = 7;
+			break;
+		default:
+			printf("bad prio class %d\n", ioprio_class);
+			return 1;
+	}
+
+	if (!set) {
+		if (!pid && argv[optind])
+			pid = strtol(argv[optind], NULL, 10);
+
+		ioprio = ioprio_get(IOPRIO_WHO_PROCESS, pid);
+
+		printf("pid=%d, %d\n", pid, ioprio);
+
+		if (ioprio == -1)
+			perror("ioprio_get");
+		else {
+			ioprio_class = ioprio >> IOPRIO_CLASS_SHIFT;
+			ioprio = ioprio & 0xff;
+			printf("%s: prio %d\n", to_prio[ioprio_class], ioprio);
+		}
+	} else {
+		if (ioprio_set(IOPRIO_WHO_PROCESS, pid, ioprio | ioprio_class << IOPRIO_CLASS_SHIFT) == -1) {
+			perror("ioprio_set");
+			return 1;
+		}
+
+		if (argv[optind])
+			execvp(argv[optind], &argv[optind]);
+	}
+
+	return 0;
+  }
+
+
+March 11 2005, Jens Axboe <jens.axboe@oracle.com>
diff --git a/Documentation/block/ioprio.txt b/Documentation/block/ioprio.txt
deleted file mode 100644
index 8ed8c59380b4..000000000000
--- a/Documentation/block/ioprio.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-Block io priorities
-===================
-
-
-Intro
------
-
-With the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io
-priorities are supported for reads on files.  This enables users to io nice
-processes or process groups, similar to what has been possible with cpu
-scheduling for ages.  This document mainly details the current possibilities
-with cfq; other io schedulers do not support io priorities thus far.
-
-Scheduling classes
-------------------
-
-CFQ implements three generic scheduling classes that determine how io is
-served for a process.
-
-IOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given
-higher priority than any other in the system, processes from this class are
-given first access to the disk every time. Thus it needs to be used with some
-care, one io RT process can starve the entire system. Within the RT class,
-there are 8 levels of class data that determine exactly how much time this
-process needs the disk for on each service. In the future this might change
-to be more directly mappable to performance, by passing in a wanted data
-rate instead.
-
-IOPRIO_CLASS_BE: This is the best-effort scheduling class, which is the default
-for any process that hasn't set a specific io priority. The class data
-determines how much io bandwidth the process will get, it's directly mappable
-to the cpu nice levels just more coarsely implemented. 0 is the highest
-BE prio level, 7 is the lowest. The mapping between cpu nice level and io
-nice level is determined as: io_nice = (cpu_nice + 20) / 5.
-
-IOPRIO_CLASS_IDLE: This is the idle scheduling class, processes running at this
-level only get io time when no one else needs the disk. The idle class has no
-class data, since it doesn't really apply here.
-
-Tools
------
-
-See below for a sample ionice tool. Usage:
-
-# ionice -c<class> -n<level> -p<pid>
-
-If pid isn't given, the current process is assumed. IO priority settings
-are inherited on fork, so you can use ionice to start the process at a given
-level:
-
-# ionice -c2 -n0 /bin/ls
-
-will run ls at the best-effort scheduling class at the highest priority.
-For a running process, you can give the pid instead:
-
-# ionice -c1 -n2 -p100
-
-will change pid 100 to run at the realtime scheduling class, at priority 2.
-
----> snip ionice.c tool <---
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <errno.h>
-#include <getopt.h>
-#include <unistd.h>
-#include <sys/ptrace.h>
-#include <asm/unistd.h>
-
-extern int sys_ioprio_set(int, int, int);
-extern int sys_ioprio_get(int, int);
-
-#if defined(__i386__)
-#define __NR_ioprio_set		289
-#define __NR_ioprio_get		290
-#elif defined(__ppc__)
-#define __NR_ioprio_set		273
-#define __NR_ioprio_get		274
-#elif defined(__x86_64__)
-#define __NR_ioprio_set		251
-#define __NR_ioprio_get		252
-#elif defined(__ia64__)
-#define __NR_ioprio_set		1274
-#define __NR_ioprio_get		1275
-#else
-#error "Unsupported arch"
-#endif
-
-static inline int ioprio_set(int which, int who, int ioprio)
-{
-	return syscall(__NR_ioprio_set, which, who, ioprio);
-}
-
-static inline int ioprio_get(int which, int who)
-{
-	return syscall(__NR_ioprio_get, which, who);
-}
-
-enum {
-	IOPRIO_CLASS_NONE,
-	IOPRIO_CLASS_RT,
-	IOPRIO_CLASS_BE,
-	IOPRIO_CLASS_IDLE,
-};
-
-enum {
-	IOPRIO_WHO_PROCESS = 1,
-	IOPRIO_WHO_PGRP,
-	IOPRIO_WHO_USER,
-};
-
-#define IOPRIO_CLASS_SHIFT	13
-
-const char *to_prio[] = { "none", "realtime", "best-effort", "idle", };
-
-int main(int argc, char *argv[])
-{
-	int ioprio = 4, set = 0, ioprio_class = IOPRIO_CLASS_BE;
-	int c, pid = 0;
-
-	while ((c = getopt(argc, argv, "+n:c:p:")) != EOF) {
-		switch (c) {
-		case 'n':
-			ioprio = strtol(optarg, NULL, 10);
-			set = 1;
-			break;
-		case 'c':
-			ioprio_class = strtol(optarg, NULL, 10);
-			set = 1;
-			break;
-		case 'p':
-			pid = strtol(optarg, NULL, 10);
-			break;
-		}
-	}
-
-	switch (ioprio_class) {
-		case IOPRIO_CLASS_NONE:
-			ioprio_class = IOPRIO_CLASS_BE;
-			break;
-		case IOPRIO_CLASS_RT:
-		case IOPRIO_CLASS_BE:
-			break;
-		case IOPRIO_CLASS_IDLE:
-			ioprio = 7;
-			break;
-		default:
-			printf("bad prio class %d\n", ioprio_class);
-			return 1;
-	}
-
-	if (!set) {
-		if (!pid && argv[optind])
-			pid = strtol(argv[optind], NULL, 10);
-
-		ioprio = ioprio_get(IOPRIO_WHO_PROCESS, pid);
-
-		printf("pid=%d, %d\n", pid, ioprio);
-
-		if (ioprio == -1)
-			perror("ioprio_get");
-		else {
-			ioprio_class = ioprio >> IOPRIO_CLASS_SHIFT;
-			ioprio = ioprio & 0xff;
-			printf("%s: prio %d\n", to_prio[ioprio_class], ioprio);
-		}
-	} else {
-		if (ioprio_set(IOPRIO_WHO_PROCESS, pid, ioprio | ioprio_class << IOPRIO_CLASS_SHIFT) == -1) {
-			perror("ioprio_set");
-			return 1;
-		}
-
-		if (argv[optind])
-			execvp(argv[optind], &argv[optind]);
-	}
-
-	return 0;
-}
-
----> snip ionice.c tool <---
-
-
-March 11 2005, Jens Axboe <jens.axboe@oracle.com>
diff --git a/Documentation/block/kyber-iosched.rst b/Documentation/block/kyber-iosched.rst
new file mode 100644
index 000000000000..3e164dd0617c
--- /dev/null
+++ b/Documentation/block/kyber-iosched.rst
@@ -0,0 +1,15 @@
+============================
+Kyber I/O scheduler tunables
+============================
+
+The only two tunables for the Kyber scheduler are the target latencies for
+reads and synchronous writes. Kyber will throttle requests in order to meet
+these target latencies.
+
+read_lat_nsec
+-------------
+Target latency for reads (in nanoseconds).
+
+write_lat_nsec
+--------------
+Target latency for synchronous writes (in nanoseconds).
diff --git a/Documentation/block/kyber-iosched.txt b/Documentation/block/kyber-iosched.txt
deleted file mode 100644
index e94feacd7edc..000000000000
--- a/Documentation/block/kyber-iosched.txt
+++ /dev/null
@@ -1,14 +0,0 @@
-Kyber I/O scheduler tunables
-===========================
-
-The only two tunables for the Kyber scheduler are the target latencies for
-reads and synchronous writes. Kyber will throttle requests in order to meet
-these target latencies.
-
-read_lat_nsec
--------------
-Target latency for reads (in nanoseconds).
-
-write_lat_nsec
---------------
-Target latency for synchronous writes (in nanoseconds).
diff --git a/Documentation/block/null_blk.rst b/Documentation/block/null_blk.rst
new file mode 100644
index 000000000000..31451d80783c
--- /dev/null
+++ b/Documentation/block/null_blk.rst
@@ -0,0 +1,126 @@
+========================
+Null block device driver
+========================
+
+1. Overview
+===========
+
+The null block device (/dev/nullb*) is used for benchmarking the various
+block-layer implementations. It emulates a block device of X gigabytes in size.
+The following instances are possible:
+
+  Single-queue block-layer
+
+    - Request-based.
+    - Single submission queue per device.
+    - Implements IO scheduling algorithms (CFQ, Deadline, noop).
+
+  Multi-queue block-layer
+
+    - Request-based.
+    - Configurable submission queues per device.
+
+  No block-layer (Known as bio-based)
+
+    - Bio-based. IO requests are submitted directly to the device driver.
+    - Directly accepts bio data structure and returns them.
+
+All of them have a completion queue for each core in the system.
+
+2. Module parameters applicable for all instances
+=================================================
+
+queue_mode=[0-2]: Default: 2-Multi-queue
+  Selects which block-layer the module should instantiate with.
+
+  =  ============
+  0  Bio-based
+  1  Single-queue
+  2  Multi-queue
+  =  ============
+
+home_node=[0--nr_nodes]: Default: NUMA_NO_NODE
+  Selects what CPU node the data structures are allocated from.
+
+gb=[Size in GB]: Default: 250GB
+  The size of the device reported to the system.
+
+bs=[Block size (in bytes)]: Default: 512 bytes
+  The block size reported to the system.
+
+nr_devices=[Number of devices]: Default: 1
+  Number of block devices instantiated. They are instantiated as /dev/nullb0,
+  etc.
+
+irqmode=[0-2]: Default: 1-Soft-irq
+  The completion mode used for completing IOs to the block-layer.
+
+  =  ===========================================================================
+  0  None.
+  1  Soft-irq. Uses IPI to complete IOs across CPU nodes. Simulates the overhead
+     when IOs are issued from another CPU node than the home the device is
+     connected to.
+  2  Timer: Waits a specific period (completion_nsec) for each IO before
+     completion.
+  =  ===========================================================================
+
+completion_nsec=[ns]: Default: 10,000ns
+  Combined with irqmode=2 (timer). The time each completion event must wait.
+
+submit_queues=[1..nr_cpus]:
+  The number of submission queues attached to the device driver. If unset, it
+  defaults to 1. For multi-queue, it is ignored when use_per_node_hctx module
+  parameter is 1.
+
+hw_queue_depth=[0..qdepth]: Default: 64
+  The hardware queue depth of the device.
+
+III: Multi-queue specific parameters
+
+use_per_node_hctx=[0/1]: Default: 0
+
+  =  =====================================================================
+  0  The number of submit queues are set to the value of the submit_queues
+     parameter.
+  1  The multi-queue block layer is instantiated with a hardware dispatch
+     queue for each CPU node in the system.
+  =  =====================================================================
+
+no_sched=[0/1]: Default: 0
+
+  =  ======================================
+  0  nullb* use default blk-mq io scheduler
+  1  nullb* doesn't use io scheduler
+  =  ======================================
+
+blocking=[0/1]: Default: 0
+
+  =  ===============================================================
+  0  Register as a non-blocking blk-mq driver device.
+  1  Register as a blocking blk-mq driver device, null_blk will set
+     the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always
+     needs to block in its ->queue_rq() function.
+  =  ===============================================================
+
+shared_tags=[0/1]: Default: 0
+
+  =  ================================================================
+  0  Tag set is not shared.
+  1  Tag set shared between devices for blk-mq. Only makes sense with
+     nr_devices > 1, otherwise there's no tag set to share.
+  =  ================================================================
+
+zoned=[0/1]: Default: 0
+
+  =  ======================================================================
+  0  Block device is exposed as a random-access block device.
+  1  Block device is exposed as a host-managed zoned block device. Requires
+     CONFIG_BLK_DEV_ZONED.
+  =  ======================================================================
+
+zone_size=[MB]: Default: 256
+  Per zone size when exposed as a zoned block device. Must be a power of two.
+
+zone_nr_conv=[nr_conv]: Default: 0
+  The number of conventional zones to create when block device is zoned.  If
+  zone_nr_conv >= nr_zones, it will be reduced to nr_zones - 1.
diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt
deleted file mode 100644
index 41f0a3d33bbd..000000000000
--- a/Documentation/block/null_blk.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-Null block device driver
-================================================================================
-
-I. Overview
-
-The null block device (/dev/nullb*) is used for benchmarking the various
-block-layer implementations. It emulates a block device of X gigabytes in size.
-The following instances are possible:
-
-  Single-queue block-layer
-    - Request-based.
-    - Single submission queue per device.
-    - Implements IO scheduling algorithms (CFQ, Deadline, noop).
-  Multi-queue block-layer
-    - Request-based.
-    - Configurable submission queues per device.
-  No block-layer (Known as bio-based)
-    - Bio-based. IO requests are submitted directly to the device driver.
-    - Directly accepts bio data structure and returns them.
-
-All of them have a completion queue for each core in the system.
-
-II. Module parameters applicable for all instances:
-
-queue_mode=[0-2]: Default: 2-Multi-queue
-  Selects which block-layer the module should instantiate with.
-
-  0: Bio-based.
-  1: Single-queue.
-  2: Multi-queue.
-
-home_node=[0--nr_nodes]: Default: NUMA_NO_NODE
-  Selects what CPU node the data structures are allocated from.
-
-gb=[Size in GB]: Default: 250GB
-  The size of the device reported to the system.
-
-bs=[Block size (in bytes)]: Default: 512 bytes
-  The block size reported to the system.
-
-nr_devices=[Number of devices]: Default: 1
-  Number of block devices instantiated. They are instantiated as /dev/nullb0,
-  etc.
-
-irqmode=[0-2]: Default: 1-Soft-irq
-  The completion mode used for completing IOs to the block-layer.
-
-  0: None.
-  1: Soft-irq. Uses IPI to complete IOs across CPU nodes. Simulates the overhead
-     when IOs are issued from another CPU node than the home the device is
-     connected to.
-  2: Timer: Waits a specific period (completion_nsec) for each IO before
-     completion.
-
-completion_nsec=[ns]: Default: 10,000ns
-  Combined with irqmode=2 (timer). The time each completion event must wait.
-
-submit_queues=[1..nr_cpus]:
-  The number of submission queues attached to the device driver. If unset, it
-  defaults to 1. For multi-queue, it is ignored when use_per_node_hctx module
-  parameter is 1.
-
-hw_queue_depth=[0..qdepth]: Default: 64
-  The hardware queue depth of the device.
-
-III: Multi-queue specific parameters
-
-use_per_node_hctx=[0/1]: Default: 0
-  0: The number of submit queues are set to the value of the submit_queues
-     parameter.
-  1: The multi-queue block layer is instantiated with a hardware dispatch
-     queue for each CPU node in the system.
-
-no_sched=[0/1]: Default: 0
-  0: nullb* use default blk-mq io scheduler.
-  1: nullb* doesn't use io scheduler.
-
-blocking=[0/1]: Default: 0
-  0: Register as a non-blocking blk-mq driver device.
-  1: Register as a blocking blk-mq driver device, null_blk will set
-     the BLK_MQ_F_BLOCKING flag, indicating that it sometimes/always
-     needs to block in its ->queue_rq() function.
-
-shared_tags=[0/1]: Default: 0
-  0: Tag set is not shared.
-  1: Tag set shared between devices for blk-mq. Only makes sense with
-     nr_devices > 1, otherwise there's no tag set to share.
-
-zoned=[0/1]: Default: 0
-  0: Block device is exposed as a random-access block device.
-  1: Block device is exposed as a host-managed zoned block device. Requires
-     CONFIG_BLK_DEV_ZONED.
-
-zone_size=[MB]: Default: 256
-  Per zone size when exposed as a zoned block device. Must be a power of two.
-
-zone_nr_conv=[nr_conv]: Default: 0
-  The number of conventional zones to create when block device is zoned.  If
-  zone_nr_conv >= nr_zones, it will be reduced to nr_zones - 1.
diff --git a/Documentation/block/pr.rst b/Documentation/block/pr.rst
new file mode 100644
index 000000000000..30ea1c2e39eb
--- /dev/null
+++ b/Documentation/block/pr.rst
@@ -0,0 +1,119 @@
+===============================================
+Block layer support for Persistent Reservations
+===============================================
+
+The Linux kernel supports a user space interface for simplified
+Persistent Reservations which map to block devices that support
+these (like SCSI). Persistent Reservations allow restricting
+access to block devices to specific initiators in a shared storage
+setup.
+
+This document gives a general overview of the support ioctl commands.
+For a more detailed reference please refer the the SCSI Primary
+Commands standard, specifically the section on Reservations and the
+"PERSISTENT RESERVE IN" and "PERSISTENT RESERVE OUT" commands.
+
+All implementations are expected to ensure the reservations survive
+a power loss and cover all connections in a multi path environment.
+These behaviors are optional in SPC but will be automatically applied
+by Linux.
+
+
+The following types of reservations are supported:
+--------------------------------------------------
+
+ - PR_WRITE_EXCLUSIVE
+	Only the initiator that owns the reservation can write to the
+	device.  Any initiator can read from the device.
+
+ - PR_EXCLUSIVE_ACCESS
+	Only the initiator that owns the reservation can access the
+	device.
+
+ - PR_WRITE_EXCLUSIVE_REG_ONLY
+	Only initiators with a registered key can write to the device,
+	Any initiator can read from the device.
+
+ - PR_EXCLUSIVE_ACCESS_REG_ONLY
+	Only initiators with a registered key can access the device.
+
+ - PR_WRITE_EXCLUSIVE_ALL_REGS
+
+	Only initiators with a registered key can write to the device,
+	Any initiator can read from the device.
+	All initiators with a registered key are considered reservation
+	holders.
+	Please reference the SPC spec on the meaning of a reservation
+	holder if you want to use this type.
+
+ - PR_EXCLUSIVE_ACCESS_ALL_REGS
+	Only initiators with a registered key can access the device.
+	All initiators with a registered key are considered reservation
+	holders.
+	Please reference the SPC spec on the meaning of a reservation
+	holder if you want to use this type.
+
+
+The following ioctl are supported:
+----------------------------------
+
+1. IOC_PR_REGISTER
+^^^^^^^^^^^^^^^^^^
+
+This ioctl command registers a new reservation if the new_key argument
+is non-null.  If no existing reservation exists old_key must be zero,
+if an existing reservation should be replaced old_key must contain
+the old reservation key.
+
+If the new_key argument is 0 it unregisters the existing reservation passed
+in old_key.
+
+
+2. IOC_PR_RESERVE
+^^^^^^^^^^^^^^^^^
+
+This ioctl command reserves the device and thus restricts access for other
+devices based on the type argument.  The key argument must be the existing
+reservation key for the device as acquired by the IOC_PR_REGISTER,
+IOC_PR_REGISTER_IGNORE, IOC_PR_PREEMPT or IOC_PR_PREEMPT_ABORT commands.
+
+
+3. IOC_PR_RELEASE
+^^^^^^^^^^^^^^^^^
+
+This ioctl command releases the reservation specified by key and flags
+and thus removes any access restriction implied by it.
+
+
+4. IOC_PR_PREEMPT
+^^^^^^^^^^^^^^^^^
+
+This ioctl command releases the existing reservation referred to by
+old_key and replaces it with a new reservation of type for the
+reservation key new_key.
+
+
+5. IOC_PR_PREEMPT_ABORT
+^^^^^^^^^^^^^^^^^^^^^^^
+
+This ioctl command works like IOC_PR_PREEMPT except that it also aborts
+any outstanding command sent over a connection identified by old_key.
+
+6. IOC_PR_CLEAR
+^^^^^^^^^^^^^^^
+
+This ioctl command unregisters both key and any other reservation key
+registered with the device and drops any existing reservation.
+
+
+Flags
+-----
+
+All the ioctls have a flag field.  Currently only one flag is supported:
+
+ - PR_FL_IGNORE_KEY
+	Ignore the existing reservation key.  This is commonly supported for
+	IOC_PR_REGISTER, and some implementation may support the flag for
+	IOC_PR_RESERVE.
+
+For all unknown flags the kernel will return -EOPNOTSUPP.
diff --git a/Documentation/block/pr.txt b/Documentation/block/pr.txt
deleted file mode 100644
index ac9b8e70e64b..000000000000
--- a/Documentation/block/pr.txt
+++ /dev/null
@@ -1,119 +0,0 @@
-
-Block layer support for Persistent Reservations
-===============================================
-
-The Linux kernel supports a user space interface for simplified
-Persistent Reservations which map to block devices that support
-these (like SCSI). Persistent Reservations allow restricting
-access to block devices to specific initiators in a shared storage
-setup.
-
-This document gives a general overview of the support ioctl commands.
-For a more detailed reference please refer the the SCSI Primary
-Commands standard, specifically the section on Reservations and the
-"PERSISTENT RESERVE IN" and "PERSISTENT RESERVE OUT" commands.
-
-All implementations are expected to ensure the reservations survive
-a power loss and cover all connections in a multi path environment.
-These behaviors are optional in SPC but will be automatically applied
-by Linux.
-
-
-The following types of reservations are supported:
---------------------------------------------------
-
- - PR_WRITE_EXCLUSIVE
-
-	Only the initiator that owns the reservation can write to the
-	device.  Any initiator can read from the device.
-
- - PR_EXCLUSIVE_ACCESS
-
-	Only the initiator that owns the reservation can access the
-	device.
-
- - PR_WRITE_EXCLUSIVE_REG_ONLY
-
-	Only initiators with a registered key can write to the device,
-	Any initiator can read from the device.
-
- - PR_EXCLUSIVE_ACCESS_REG_ONLY
-
-	Only initiators with a registered key can access the device.
-
- - PR_WRITE_EXCLUSIVE_ALL_REGS
-
-	Only initiators with a registered key can write to the device,
-	Any initiator can read from the device.
-	All initiators with a registered key are considered reservation
-	holders.
-	Please reference the SPC spec on the meaning of a reservation
-	holder if you want to use this type. 
-
- - PR_EXCLUSIVE_ACCESS_ALL_REGS
-
-	Only initiators with a registered key can access the device.
-	All initiators with a registered key are considered reservation
-	holders.
-	Please reference the SPC spec on the meaning of a reservation
-	holder if you want to use this type. 
-
-
-The following ioctl are supported:
-----------------------------------
-
-1. IOC_PR_REGISTER
-
-This ioctl command registers a new reservation if the new_key argument
-is non-null.  If no existing reservation exists old_key must be zero,
-if an existing reservation should be replaced old_key must contain
-the old reservation key.
-
-If the new_key argument is 0 it unregisters the existing reservation passed
-in old_key.
-
-
-2. IOC_PR_RESERVE
-
-This ioctl command reserves the device and thus restricts access for other
-devices based on the type argument.  The key argument must be the existing
-reservation key for the device as acquired by the IOC_PR_REGISTER,
-IOC_PR_REGISTER_IGNORE, IOC_PR_PREEMPT or IOC_PR_PREEMPT_ABORT commands.
-
-
-3. IOC_PR_RELEASE
-
-This ioctl command releases the reservation specified by key and flags
-and thus removes any access restriction implied by it.
-
-
-4. IOC_PR_PREEMPT
-
-This ioctl command releases the existing reservation referred to by
-old_key and replaces it with a new reservation of type for the
-reservation key new_key.
-
-
-5. IOC_PR_PREEMPT_ABORT
-
-This ioctl command works like IOC_PR_PREEMPT except that it also aborts
-any outstanding command sent over a connection identified by old_key.
-
-6. IOC_PR_CLEAR
-
-This ioctl command unregisters both key and any other reservation key
-registered with the device and drops any existing reservation.
-
-
-Flags
------
-
-All the ioctls have a flag field.  Currently only one flag is supported:
-
- - PR_FL_IGNORE_KEY
-
-	Ignore the existing reservation key.  This is commonly supported for
-	IOC_PR_REGISTER, and some implementation may support the flag for
-	IOC_PR_RESERVE.
-
-For all unknown flags the kernel will return -EOPNOTSUPP.
diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
new file mode 100644
index 000000000000..6a8513af9201
--- /dev/null
+++ b/Documentation/block/queue-sysfs.rst
@@ -0,0 +1,254 @@
+=================
+Queue sysfs files
+=================
+
+This text file will detail the queue files that are located in the sysfs tree
+for each block device. Note that stacked devices typically do not export
+any settings, since their queue merely functions are a remapping target.
+These files are the ones found in the /sys/block/xxx/queue/ directory.
+
+Files denoted with a RO postfix are readonly and the RW postfix means
+read-write.
+
+add_random (RW)
+---------------
+This file allows to turn off the disk entropy contribution. Default
+value of this file is '1'(on).
+
+chunk_sectors (RO)
+------------------
+This has different meaning depending on the type of the block device.
+For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
+of the RAID volume stripe segment. For a zoned block device, either host-aware
+or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
+of the device, with the eventual exception of the last zone of the device which
+may be smaller.
+
+dax (RO)
+--------
+This file indicates whether the device supports Direct Access (DAX),
+used by CPU-addressable storage to bypass the pagecache.  It shows '1'
+if true, '0' if not.
+
+discard_granularity (RO)
+------------------------
+This shows the size of internal allocation of the device in bytes, if
+reported by the device. A value of '0' means device does not support
+the discard functionality.
+
+discard_max_hw_bytes (RO)
+-------------------------
+Devices that support discard functionality may have internal limits on
+the number of bytes that can be trimmed or unmapped in a single operation.
+The discard_max_bytes parameter is set by the device driver to the maximum
+number of bytes that can be discarded in a single operation. Discard
+requests issued to the device must not exceed this limit. A discard_max_bytes
+value of 0 means that the device does not support discard functionality.
+
+discard_max_bytes (RW)
+----------------------
+While discard_max_hw_bytes is the hardware limit for the device, this
+setting is the software limit. Some devices exhibit large latencies when
+large discards are issued, setting this value lower will make Linux issue
+smaller discards and potentially help reduce latencies induced by large
+discard operations.
+
+discard_zeroes_data (RO)
+------------------------
+Obsolete. Always zero.
+
+fua (RO)
+--------
+Whether or not the block driver supports the FUA flag for write requests.
+FUA stands for Force Unit Access. If the FUA flag is set that means that
+write requests must bypass the volatile cache of the storage device.
+
+hw_sector_size (RO)
+-------------------
+This is the hardware sector size of the device, in bytes.
+
+io_poll (RW)
+------------
+When read, this file shows whether polling is enabled (1) or disabled
+(0).  Writing '0' to this file will disable polling for this device.
+Writing any non-zero value will enable this feature.
+
+io_poll_delay (RW)
+------------------
+If polling is enabled, this controls what kind of polling will be
+performed. It defaults to -1, which is classic polling. In this mode,
+the CPU will repeatedly ask for completions without giving up any time.
+If set to 0, a hybrid polling mode is used, where the kernel will attempt
+to make an educated guess at when the IO will complete. Based on this
+guess, the kernel will put the process issuing IO to sleep for an amount
+of time, before entering a classic poll loop. This mode might be a
+little slower than pure classic polling, but it will be more efficient.
+If set to a value larger than 0, the kernel will put the process issuing
+IO to sleep for this amount of microseconds before entering classic
+polling.
+
+io_timeout (RW)
+---------------
+io_timeout is the request timeout in milliseconds. If a request does not
+complete in this time then the block driver timeout handler is invoked.
+That timeout handler can decide to retry the request, to fail it or to start
+a device recovery strategy.
+
+iostats (RW)
+-------------
+This file is used to control (on/off) the iostats accounting of the
+disk.
+
+logical_block_size (RO)
+-----------------------
+This is the logical block size of the device, in bytes.
+
+max_discard_segments (RO)
+-------------------------
+The maximum number of DMA scatter/gather entries in a discard request.
+
+max_hw_sectors_kb (RO)
+----------------------
+This is the maximum number of kilobytes supported in a single data transfer.
+
+max_integrity_segments (RO)
+---------------------------
+Maximum number of elements in a DMA scatter/gather list with integrity
+data that will be submitted by the block layer core to the associated
+block driver.
+
+max_sectors_kb (RW)
+-------------------
+This is the maximum number of kilobytes that the block layer will allow
+for a filesystem request. Must be smaller than or equal to the maximum
+size allowed by the hardware.
+
+max_segments (RO)
+-----------------
+Maximum number of elements in a DMA scatter/gather list that is submitted
+to the associated block driver.
+
+max_segment_size (RO)
+---------------------
+Maximum size in bytes of a single element in a DMA scatter/gather list.
+
+minimum_io_size (RO)
+--------------------
+This is the smallest preferred IO size reported by the device.
+
+nomerges (RW)
+-------------
+This enables the user to disable the lookup logic involved with IO
+merging requests in the block layer. By default (0) all merges are
+enabled. When set to 1 only simple one-hit merges will be tried. When
+set to 2 no merge algorithms will be tried (including one-hit or more
+complex tree/hash lookups).
+
+nr_requests (RW)
+----------------
+This controls how many requests may be allocated in the block layer for
+read or write requests. Note that the total allocated number may be twice
+this amount, since it applies only to reads or writes (not the accumulated
+sum).
+
+To avoid priority inversion through request starvation, a request
+queue maintains a separate request pool per each cgroup when
+CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
+per-block-cgroup request pool.  IOW, if there are N block cgroups,
+each request queue may have up to N request pools, each independently
+regulated by nr_requests.
+
+nr_zones (RO)
+-------------
+For zoned block devices (zoned attribute indicating "host-managed" or
+"host-aware"), this indicates the total number of zones of the device.
+This is always 0 for regular block devices.
+
+optimal_io_size (RO)
+--------------------
+This is the optimal IO size reported by the device.
+
+physical_block_size (RO)
+------------------------
+This is the physical block size of device, in bytes.
+
+read_ahead_kb (RW)
+------------------
+Maximum number of kilobytes to read-ahead for filesystems on this block
+device.
+
+rotational (RW)
+---------------
+This file is used to stat if the device is of rotational type or
+non-rotational type.
+
+rq_affinity (RW)
+----------------
+If this option is '1', the block layer will migrate request completions to the
+cpu "group" that originally submitted the request. For some workloads this
+provides a significant reduction in CPU cycles due to caching effects.
+
+For storage configurations that need to maximize distribution of completion
+processing setting this option to '2' forces the completion to run on the
+requesting cpu (bypassing the "group" aggregation logic).
+
+scheduler (RW)
+--------------
+When read, this file will display the current and available IO schedulers
+for this block device. The currently active IO scheduler will be enclosed
+in [] brackets. Writing an IO scheduler name to this file will switch
+control of this block device to that new IO scheduler. Note that writing
+an IO scheduler name to this file will attempt to load that IO scheduler
+module, if it isn't already present in the system.
+
+write_cache (RW)
+----------------
+When read, this file will display whether the device has write back
+caching enabled or not. It will return "write back" for the former
+case, and "write through" for the latter. Writing to this file can
+change the kernels view of the device, but it doesn't alter the
+device state. This means that it might not be safe to toggle the
+setting from "write back" to "write through", since that will also
+eliminate cache flushes issued by the kernel.
+
+write_same_max_bytes (RO)
+-------------------------
+This is the number of bytes the device can write in a single write-same
+command.  A value of '0' means write-same is not supported by this
+device.
+
+wbt_lat_usec (RW)
+-----------------
+If the device is registered for writeback throttling, then this file shows
+the target minimum read latency. If this latency is exceeded in a given
+window of time (see wb_window_usec), then the writeback throttling will start
+scaling back writes. Writing a value of '0' to this file disables the
+feature. Writing a value of '-1' to this file resets the value to the
+default setting.
+
+throttle_sample_time (RW)
+-------------------------
+This is the time window that blk-throttle samples data, in millisecond.
+blk-throttle makes decision based on the samplings. Lower time means cgroups
+have more smooth throughput, but higher CPU overhead. This exists only when
+CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
+
+write_zeroes_max_bytes (RO)
+---------------------------
+For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
+bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
+is not supported.
+
+zoned (RO)
+----------
+This indicates if the device is a zoned block device and the zone model of the
+device if it is indeed zoned. The possible values indicated by zoned are
+"none" for regular block devices and "host-aware" or "host-managed" for zoned
+block devices. The characteristics of host-aware and host-managed zoned block
+devices are described in the ZBC (Zoned Block Commands) and ZAC
+(Zoned Device ATA Command Set) standards. These standards also define the
+"drive-managed" zone model. However, since drive-managed zoned block devices
+do not support zone commands, they will be treated as regular block devices
+and zoned will report "none".
+
+Jens Axboe <jens.axboe@oracle.com>, February 2009
diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt
deleted file mode 100644
index b40b5b7cebd9..000000000000
--- a/Documentation/block/queue-sysfs.txt
+++ /dev/null
@@ -1,253 +0,0 @@
-Queue sysfs files
-=================
-
-This text file will detail the queue files that are located in the sysfs tree
-for each block device. Note that stacked devices typically do not export
-any settings, since their queue merely functions are a remapping target.
-These files are the ones found in the /sys/block/xxx/queue/ directory.
-
-Files denoted with a RO postfix are readonly and the RW postfix means
-read-write.
-
-add_random (RW)
-----------------
-This file allows to turn off the disk entropy contribution. Default
-value of this file is '1'(on).
-
-chunk_sectors (RO)
-------------------
-This has different meaning depending on the type of the block device.
-For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
-of the RAID volume stripe segment. For a zoned block device, either host-aware
-or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
-of the device, with the eventual exception of the last zone of the device which
-may be smaller.
-
-dax (RO)
---------
-This file indicates whether the device supports Direct Access (DAX),
-used by CPU-addressable storage to bypass the pagecache.  It shows '1'
-if true, '0' if not.
-
-discard_granularity (RO)
------------------------
-This shows the size of internal allocation of the device in bytes, if
-reported by the device. A value of '0' means device does not support
-the discard functionality.
-
-discard_max_hw_bytes (RO)
-----------------------
-Devices that support discard functionality may have internal limits on
-the number of bytes that can be trimmed or unmapped in a single operation.
-The discard_max_bytes parameter is set by the device driver to the maximum
-number of bytes that can be discarded in a single operation. Discard
-requests issued to the device must not exceed this limit. A discard_max_bytes
-value of 0 means that the device does not support discard functionality.
-
-discard_max_bytes (RW)
-----------------------
-While discard_max_hw_bytes is the hardware limit for the device, this
-setting is the software limit. Some devices exhibit large latencies when
-large discards are issued, setting this value lower will make Linux issue
-smaller discards and potentially help reduce latencies induced by large
-discard operations.
-
-discard_zeroes_data (RO)
-------------------------
-Obsolete. Always zero.
-
-fua (RO)
---------
-Whether or not the block driver supports the FUA flag for write requests.
-FUA stands for Force Unit Access. If the FUA flag is set that means that
-write requests must bypass the volatile cache of the storage device.
-
-hw_sector_size (RO)
--------------------
-This is the hardware sector size of the device, in bytes.
-
-io_poll (RW)
-------------
-When read, this file shows whether polling is enabled (1) or disabled
-(0).  Writing '0' to this file will disable polling for this device.
-Writing any non-zero value will enable this feature.
-
-io_poll_delay (RW)
-------------------
-If polling is enabled, this controls what kind of polling will be
-performed. It defaults to -1, which is classic polling. In this mode,
-the CPU will repeatedly ask for completions without giving up any time.
-If set to 0, a hybrid polling mode is used, where the kernel will attempt
-to make an educated guess at when the IO will complete. Based on this
-guess, the kernel will put the process issuing IO to sleep for an amount
-of time, before entering a classic poll loop. This mode might be a
-little slower than pure classic polling, but it will be more efficient.
-If set to a value larger than 0, the kernel will put the process issuing
-IO to sleep for this amount of microseconds before entering classic
-polling.
-
-io_timeout (RW)
----------------
-io_timeout is the request timeout in milliseconds. If a request does not
-complete in this time then the block driver timeout handler is invoked.
-That timeout handler can decide to retry the request, to fail it or to start
-a device recovery strategy.
-
-iostats (RW)
--------------
-This file is used to control (on/off) the iostats accounting of the
-disk.
-
-logical_block_size (RO)
------------------------
-This is the logical block size of the device, in bytes.
-
-max_discard_segments (RO)
--------------------------
-The maximum number of DMA scatter/gather entries in a discard request.
-
-max_hw_sectors_kb (RO)
-----------------------
-This is the maximum number of kilobytes supported in a single data transfer.
-
-max_integrity_segments (RO)
----------------------------
-Maximum number of elements in a DMA scatter/gather list with integrity
-data that will be submitted by the block layer core to the associated
-block driver.
-
-max_sectors_kb (RW)
--------------------
-This is the maximum number of kilobytes that the block layer will allow
-for a filesystem request. Must be smaller than or equal to the maximum
-size allowed by the hardware.
-
-max_segments (RO)
------------------
-Maximum number of elements in a DMA scatter/gather list that is submitted
-to the associated block driver.
-
-max_segment_size (RO)
----------------------
-Maximum size in bytes of a single element in a DMA scatter/gather list.
-
-minimum_io_size (RO)
---------------------
-This is the smallest preferred IO size reported by the device.
-
-nomerges (RW)
--------------
-This enables the user to disable the lookup logic involved with IO
-merging requests in the block layer. By default (0) all merges are
-enabled. When set to 1 only simple one-hit merges will be tried. When
-set to 2 no merge algorithms will be tried (including one-hit or more
-complex tree/hash lookups).
-
-nr_requests (RW)
-----------------
-This controls how many requests may be allocated in the block layer for
-read or write requests. Note that the total allocated number may be twice
-this amount, since it applies only to reads or writes (not the accumulated
-sum).
-
-To avoid priority inversion through request starvation, a request
-queue maintains a separate request pool per each cgroup when
-CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
-per-block-cgroup request pool.  IOW, if there are N block cgroups,
-each request queue may have up to N request pools, each independently
-regulated by nr_requests.
-
-nr_zones (RO)
--------------
-For zoned block devices (zoned attribute indicating "host-managed" or
-"host-aware"), this indicates the total number of zones of the device.
-This is always 0 for regular block devices.
-
-optimal_io_size (RO)
---------------------
-This is the optimal IO size reported by the device.
-
-physical_block_size (RO)
-------------------------
-This is the physical block size of device, in bytes.
-
-read_ahead_kb (RW)
-------------------
-Maximum number of kilobytes to read-ahead for filesystems on this block
-device.
-
-rotational (RW)
----------------
-This file is used to stat if the device is of rotational type or
-non-rotational type.
-
-rq_affinity (RW)
-----------------
-If this option is '1', the block layer will migrate request completions to the
-cpu "group" that originally submitted the request. For some workloads this
-provides a significant reduction in CPU cycles due to caching effects.
-
-For storage configurations that need to maximize distribution of completion
-processing setting this option to '2' forces the completion to run on the
-requesting cpu (bypassing the "group" aggregation logic).
-
-scheduler (RW)
---------------
-When read, this file will display the current and available IO schedulers
-for this block device. The currently active IO scheduler will be enclosed
-in [] brackets. Writing an IO scheduler name to this file will switch
-control of this block device to that new IO scheduler. Note that writing
-an IO scheduler name to this file will attempt to load that IO scheduler
-module, if it isn't already present in the system.
-
-write_cache (RW)
-----------------
-When read, this file will display whether the device has write back
-caching enabled or not. It will return "write back" for the former
-case, and "write through" for the latter. Writing to this file can
-change the kernels view of the device, but it doesn't alter the
-device state. This means that it might not be safe to toggle the
-setting from "write back" to "write through", since that will also
-eliminate cache flushes issued by the kernel.
-
-write_same_max_bytes (RO)
--------------------------
-This is the number of bytes the device can write in a single write-same
-command.  A value of '0' means write-same is not supported by this
-device.
-
-wbt_lat_usec (RW)
------------------
-If the device is registered for writeback throttling, then this file shows
-the target minimum read latency. If this latency is exceeded in a given
-window of time (see wb_window_usec), then the writeback throttling will start
-scaling back writes. Writing a value of '0' to this file disables the
-feature. Writing a value of '-1' to this file resets the value to the
-default setting.
-
-throttle_sample_time (RW)
--------------------------
-This is the time window that blk-throttle samples data, in millisecond.
-blk-throttle makes decision based on the samplings. Lower time means cgroups
-have more smooth throughput, but higher CPU overhead. This exists only when
-CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
-
-write_zeroes_max_bytes (RO)
----------------------------
-For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
-bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
-is not supported.
-
-zoned (RO)
-----------
-This indicates if the device is a zoned block device and the zone model of the
-device if it is indeed zoned. The possible values indicated by zoned are
-"none" for regular block devices and "host-aware" or "host-managed" for zoned
-block devices. The characteristics of host-aware and host-managed zoned block
-devices are described in the ZBC (Zoned Block Commands) and ZAC
-(Zoned Device ATA Command Set) standards. These standards also define the
-"drive-managed" zone model. However, since drive-managed zoned block devices
-do not support zone commands, they will be treated as regular block devices
-and zoned will report "none".
-
-Jens Axboe <jens.axboe@oracle.com>, February 2009
diff --git a/Documentation/block/request.rst b/Documentation/block/request.rst
new file mode 100644
index 000000000000..747021e1ffdb
--- /dev/null
+++ b/Documentation/block/request.rst
@@ -0,0 +1,99 @@
+============================
+struct request documentation
+============================
+
+Jens Axboe <jens.axboe@oracle.com> 27/05/02
+
+
+.. FIXME:
+   No idea about what does mean - seems just some noise, so comment it
+
+   1.0
+   Index
+
+   2.0 Struct request members classification
+
+       2.1 struct request members explanation
+
+   3.0
+
+
+   2.0
+
+
+
+Short explanation of request members
+====================================
+
+Classification flags:
+
+	=	====================
+	D	driver member
+	B	block layer member
+	I	I/O scheduler member
+	=	====================
+
+Unless an entry contains a D classification, a device driver must not access
+this member. Some members may contain D classifications, but should only be
+access through certain macros or functions (eg ->flags).
+
+<linux/blkdev.h>
+
+=============================== ======= =======================================
+Member				Flag	Comment
+=============================== ======= =======================================
+struct list_head queuelist	BI	Organization on various internal
+					queues
+
+``void *elevator_private``	I	I/O scheduler private data
+
+unsigned char cmd[16]		D	Driver can use this for setting up
+					a cdb before execution, see
+					blk_queue_prep_rq
+
+unsigned long flags		DBI	Contains info about data direction,
+					request type, etc.
+
+int rq_status			D	Request status bits
+
+kdev_t rq_dev			DBI	Target device
+
+int errors			DB	Error counts
+
+sector_t sector			DBI	Target location
+
+unsigned long hard_nr_sectors	B	Used to keep sector sane
+
+unsigned long nr_sectors	DBI	Total number of sectors in request
+
+unsigned long hard_nr_sectors	B	Used to keep nr_sectors sane
+
+unsigned short nr_phys_segments	DB	Number of physical scatter gather
+					segments in a request
+
+unsigned short nr_hw_segments	DB	Number of hardware scatter gather
+					segments in a request
+
+unsigned int current_nr_sectors	DB	Number of sectors in first segment
+					of request
+
+unsigned int hard_cur_sectors	B	Used to keep current_nr_sectors sane
+
+int tag				DB	TCQ tag, if assigned
+
+``void *special``		D	Free to be used by driver
+
+``char *buffer``		D	Map of first segment, also see
+					section on bouncing SECTION
+
+``struct completion *waiting``	D	Can be used by driver to get signalled
+					on request completion
+
+``struct bio *bio``		DBI	First bio in request
+
+``struct bio *biotail``		DBI	Last bio in request
+
+``struct request_queue *q``	DB	Request queue this request belongs to
+
+``struct request_list *rl``	B	Request list this request came from
+=============================== ======= =======================================
diff --git a/Documentation/block/request.txt b/Documentation/block/request.txt
deleted file mode 100644
index 754e104ed369..000000000000
--- a/Documentation/block/request.txt
+++ /dev/null
@@ -1,88 +0,0 @@
-
-struct request documentation
-
-Jens Axboe <jens.axboe@oracle.com> 27/05/02
-
-1.0
-Index
-
-2.0 Struct request members classification
-
-	2.1 struct request members explanation
-
-3.0
-
-
-2.0
-Short explanation of request members
-
-Classification flags:
-
-	D	driver member
-	B	block layer member
-	I	I/O scheduler member
-
-Unless an entry contains a D classification, a device driver must not access
-this member. Some members may contain D classifications, but should only be
-access through certain macros or functions (eg ->flags).
-
-<linux/blkdev.h>
-
-2.1
-Member				Flag	Comment
-------				----	-------
-
-struct list_head queuelist	BI	Organization on various internal
-					queues
-
-void *elevator_private		I	I/O scheduler private data
-
-unsigned char cmd[16]		D	Driver can use this for setting up
-					a cdb before execution, see
-					blk_queue_prep_rq
-
-unsigned long flags		DBI	Contains info about data direction,
-					request type, etc.
-
-int rq_status			D	Request status bits
-
-kdev_t rq_dev			DBI	Target device
-
-int errors			DB	Error counts
-
-sector_t sector			DBI	Target location
-
-unsigned long hard_nr_sectors	B	Used to keep sector sane
-
-unsigned long nr_sectors	DBI	Total number of sectors in request
-
-unsigned long hard_nr_sectors	B	Used to keep nr_sectors sane
-
-unsigned short nr_phys_segments	DB	Number of physical scatter gather
-					segments in a request
-
-unsigned short nr_hw_segments	DB	Number of hardware scatter gather
-					segments in a request
-
-unsigned int current_nr_sectors	DB	Number of sectors in first segment
-					of request
-
-unsigned int hard_cur_sectors	B	Used to keep current_nr_sectors sane
-
-int tag				DB	TCQ tag, if assigned
-
-void *special			D	Free to be used by driver
-
-char *buffer			D	Map of first segment, also see
-					section on bouncing SECTION
-
-struct completion *waiting	D	Can be used by driver to get signalled
-					on request completion
-
-struct bio *bio			DBI	First bio in request
-
-struct bio *biotail		DBI	Last bio in request
-
-struct request_queue *q		DB	Request queue this request belongs to
-
-struct request_list *rl		B	Request list this request came from
diff --git a/Documentation/block/stat.rst b/Documentation/block/stat.rst
new file mode 100644
index 000000000000..9c07bc22b0bc
--- /dev/null
+++ b/Documentation/block/stat.rst
@@ -0,0 +1,93 @@
+===============================================
+Block layer statistics in /sys/block/<dev>/stat
+===============================================
+
+This file documents the contents of the /sys/block/<dev>/stat file.
+
+The stat file provides several statistics about the state of block
+device <dev>.
+
+Q.
+   Why are there multiple statistics in a single file?  Doesn't sysfs
+   normally contain a single value per file?
+
+A.
+   By having a single file, the kernel can guarantee that the statistics
+   represent a consistent snapshot of the state of the device.  If the
+   statistics were exported as multiple files containing one statistic
+   each, it would be impossible to guarantee that a set of readings
+   represent a single point in time.
+
+The stat file consists of a single line of text containing 11 decimal
+values separated by whitespace.  The fields are summarized in the
+following table, and described in more detail below.
+
+
+=============== ============= =================================================
+Name            units         description
+=============== ============= =================================================
+read I/Os       requests      number of read I/Os processed
+read merges     requests      number of read I/Os merged with in-queue I/O
+read sectors    sectors       number of sectors read
+read ticks      milliseconds  total wait time for read requests
+write I/Os      requests      number of write I/Os processed
+write merges    requests      number of write I/Os merged with in-queue I/O
+write sectors   sectors       number of sectors written
+write ticks     milliseconds  total wait time for write requests
+in_flight       requests      number of I/Os currently in flight
+io_ticks        milliseconds  total time this block device has been active
+time_in_queue   milliseconds  total wait time for all requests
+discard I/Os    requests      number of discard I/Os processed
+discard merges  requests      number of discard I/Os merged with in-queue I/O
+discard sectors sectors       number of sectors discarded
+discard ticks   milliseconds  total wait time for discard requests
+=============== ============= =================================================
+
+read I/Os, write I/Os, discard I/0s
+===================================
+
+These values increment when an I/O request completes.
+
+read merges, write merges, discard merges
+=========================================
+
+These values increment when an I/O request is merged with an
+already-queued I/O request.
+
+read sectors, write sectors, discard_sectors
+============================================
+
+These values count the number of sectors read from, written to, or
+discarded from this block device.  The "sectors" in question are the
+standard UNIX 512-byte sectors, not any device- or filesystem-specific
+block size.  The counters are incremented when the I/O completes.
+
+read ticks, write ticks, discard ticks
+======================================
+
+These values count the number of milliseconds that I/O requests have
+waited on this block device.  If there are multiple I/O requests waiting,
+these values will increase at a rate greater than 1000/second; for
+example, if 60 read requests wait for an average of 30 ms, the read_ticks
+field will increase by 60*30 = 1800.
+
+in_flight
+=========
+
+This value counts the number of I/O requests that have been issued to
+the device driver but have not yet completed.  It does not include I/O
+requests that are in the queue but not yet issued to the device driver.
+
+io_ticks
+========
+
+This value counts the number of milliseconds during which the device has
+had I/O requests queued.
+
+time_in_queue
+=============
+
+This value counts the number of milliseconds that I/O requests have waited
+on this block device.  If there are multiple I/O requests waiting, this
+value will increase as the product of the number of milliseconds times the
+number of requests waiting (see "read ticks" above for an example).
diff --git a/Documentation/block/stat.txt b/Documentation/block/stat.txt
deleted file mode 100644
index 0aace9cc536c..000000000000
--- a/Documentation/block/stat.txt
+++ /dev/null
@@ -1,86 +0,0 @@
-Block layer statistics in /sys/block/<dev>/stat
-===============================================
-
-This file documents the contents of the /sys/block/<dev>/stat file.
-
-The stat file provides several statistics about the state of block
-device <dev>.
-
-Q. Why are there multiple statistics in a single file?  Doesn't sysfs
-   normally contain a single value per file?
-A. By having a single file, the kernel can guarantee that the statistics
-   represent a consistent snapshot of the state of the device.  If the
-   statistics were exported as multiple files containing one statistic
-   each, it would be impossible to guarantee that a set of readings
-   represent a single point in time.
-
-The stat file consists of a single line of text containing 11 decimal
-values separated by whitespace.  The fields are summarized in the
-following table, and described in more detail below.
-
-Name            units         description
-----            -----         -----------
-read I/Os       requests      number of read I/Os processed
-read merges     requests      number of read I/Os merged with in-queue I/O
-read sectors    sectors       number of sectors read
-read ticks      milliseconds  total wait time for read requests
-write I/Os      requests      number of write I/Os processed
-write merges    requests      number of write I/Os merged with in-queue I/O
-write sectors   sectors       number of sectors written
-write ticks     milliseconds  total wait time for write requests
-in_flight       requests      number of I/Os currently in flight
-io_ticks        milliseconds  total time this block device has been active
-time_in_queue   milliseconds  total wait time for all requests
-discard I/Os    requests      number of discard I/Os processed
-discard merges  requests      number of discard I/Os merged with in-queue I/O
-discard sectors sectors       number of sectors discarded
-discard ticks   milliseconds  total wait time for discard requests
-
-read I/Os, write I/Os, discard I/0s
-===================================
-
-These values increment when an I/O request completes.
-
-read merges, write merges, discard merges
-=========================================
-
-These values increment when an I/O request is merged with an
-already-queued I/O request.
-
-read sectors, write sectors, discard_sectors
-============================================
-
-These values count the number of sectors read from, written to, or
-discarded from this block device.  The "sectors" in question are the
-standard UNIX 512-byte sectors, not any device- or filesystem-specific
-block size.  The counters are incremented when the I/O completes.
-
-read ticks, write ticks, discard ticks
-======================================
-
-These values count the number of milliseconds that I/O requests have
-waited on this block device.  If there are multiple I/O requests waiting,
-these values will increase at a rate greater than 1000/second; for
-example, if 60 read requests wait for an average of 30 ms, the read_ticks
-field will increase by 60*30 = 1800.
-
-in_flight
-=========
-
-This value counts the number of I/O requests that have been issued to
-the device driver but have not yet completed.  It does not include I/O
-requests that are in the queue but not yet issued to the device driver.
-
-io_ticks
-========
-
-This value counts the number of milliseconds during which the device has
-had I/O requests queued.
-
-time_in_queue
-=============
-
-This value counts the number of milliseconds that I/O requests have waited
-on this block device.  If there are multiple I/O requests waiting, this
-value will increase as the product of the number of milliseconds times the
-number of requests waiting (see "read ticks" above for an example).
diff --git a/Documentation/block/switching-sched.rst b/Documentation/block/switching-sched.rst
new file mode 100644
index 000000000000..42042417380e
--- /dev/null
+++ b/Documentation/block/switching-sched.rst
@@ -0,0 +1,39 @@
+===================
+Switching Scheduler
+===================
+
+To choose IO schedulers at boot time, use the argument 'elevator=deadline'.
+'noop' and 'cfq' (the default) are also available. IO schedulers are assigned
+globally at boot time only presently.
+
+Each io queue has a set of io scheduler tunables associated with it. These
+tunables control how the io scheduler works. You can find these entries
+in::
+
+	/sys/block/<device>/queue/iosched
+
+assuming that you have sysfs mounted on /sys. If you don't have sysfs mounted,
+you can do so by typing::
+
+	# mount none /sys -t sysfs
+
+It is possible to change the IO scheduler for a given block device on
+the fly to select one of mq-deadline, none, bfq, or kyber schedulers -
+which can improve that device's throughput.
+
+To set a specific scheduler, simply do this::
+
+	echo SCHEDNAME > /sys/block/DEV/queue/scheduler
+
+where SCHEDNAME is the name of a defined IO scheduler, and DEV is the
+device name (hda, hdb, sga, or whatever you happen to have).
+
+The list of defined schedulers can be found by simply doing
+a "cat /sys/block/DEV/queue/scheduler" - the list of valid names
+will be displayed, with the currently selected scheduler in brackets::
+
+  # cat /sys/block/sda/queue/scheduler
+  [mq-deadline] kyber bfq none
+  # echo none >/sys/block/sda/queue/scheduler
+  # cat /sys/block/sda/queue/scheduler
+  [none] mq-deadline kyber bfq
diff --git a/Documentation/block/switching-sched.txt b/Documentation/block/switching-sched.txt
deleted file mode 100644
index 7977f6fb8b20..000000000000
--- a/Documentation/block/switching-sched.txt
+++ /dev/null
@@ -1,35 +0,0 @@
-To choose IO schedulers at boot time, use the argument 'elevator=deadline'.
-'noop' and 'cfq' (the default) are also available. IO schedulers are assigned
-globally at boot time only presently.
-
-Each io queue has a set of io scheduler tunables associated with it. These
-tunables control how the io scheduler works. You can find these entries
-in:
-
-/sys/block/<device>/queue/iosched
-
-assuming that you have sysfs mounted on /sys. If you don't have sysfs mounted,
-you can do so by typing:
-
-# mount none /sys -t sysfs
-
-It is possible to change the IO scheduler for a given block device on
-the fly to select one of mq-deadline, none, bfq, or kyber schedulers -
-which can improve that device's throughput.
-
-To set a specific scheduler, simply do this:
-
-echo SCHEDNAME > /sys/block/DEV/queue/scheduler
-
-where SCHEDNAME is the name of a defined IO scheduler, and DEV is the
-device name (hda, hdb, sga, or whatever you happen to have).
-
-The list of defined schedulers can be found by simply doing
-a "cat /sys/block/DEV/queue/scheduler" - the list of valid names
-will be displayed, with the currently selected scheduler in brackets:
-
-# cat /sys/block/sda/queue/scheduler
-[mq-deadline] kyber bfq none
-# echo none >/sys/block/sda/queue/scheduler
-# cat /sys/block/sda/queue/scheduler
-[none] mq-deadline kyber bfq
diff --git a/Documentation/block/writeback_cache_control.rst b/Documentation/block/writeback_cache_control.rst
new file mode 100644
index 000000000000..2c752c57c14c
--- /dev/null
+++ b/Documentation/block/writeback_cache_control.rst
@@ -0,0 +1,86 @@
+==========================================
+Explicit volatile write back cache control
+==========================================
+
+Introduction
+------------
+
+Many storage devices, especially in the consumer market, come with volatile
+write back caches.  That means the devices signal I/O completion to the
+operating system before data actually has hit the non-volatile storage.  This
+behavior obviously speeds up various workloads, but it means the operating
+system needs to force data out to the non-volatile storage when it performs
+a data integrity operation like fsync, sync or an unmount.
+
+The Linux block layer provides two simple mechanisms that let filesystems
+control the caching behavior of the storage device.  These mechanisms are
+a forced cache flush, and the Force Unit Access (FUA) flag for requests.
+
+
+Explicit cache flushes
+----------------------
+
+The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
+the filesystem and will make sure the volatile cache of the storage device
+has been flushed before the actual I/O operation is started.  This explicitly
+guarantees that previously completed write requests are on non-volatile
+storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
+set on an otherwise empty bio structure, which causes only an explicit cache
+flush without any dependent I/O.  It is recommend to use
+the blkdev_issue_flush() helper for a pure cache flush.
+
+
+Forced Unit Access
+------------------
+
+The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
+filesystem and will make sure that I/O completion for this request is only
+signaled after the data has been committed to non-volatile storage.
+
+
+Implementation details for filesystems
+--------------------------------------
+
+Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
+worry if the underlying devices need any explicit cache flushing and how
+the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
+may both be set on a single bio.
+
+
+Implementation details for make_request_fn based block drivers
+--------------------------------------------------------------
+
+These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
+directly below the submit_bio interface.  For remapping drivers the REQ_FUA
+bits need to be propagated to underlying devices, and a global flush needs
+to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
+drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
+on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
+data can be completed successfully without doing any work.  Drivers for
+devices with volatile caches need to implement the support for these
+flags themselves without any help from the block layer.
+
+
+Implementation details for request_fn based block drivers
+---------------------------------------------------------
+
+For devices that do not support volatile write caches there is no driver
+support required, the block layer completes empty REQ_PREFLUSH requests before
+entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
+requests that have a payload.  For devices with volatile write caches the
+driver needs to tell the block layer that it supports flushing caches by
+doing::
+
+	blk_queue_write_cache(sdkp->disk->queue, true, false);
+
+and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
+REQ_PREFLUSH requests with a payload are automatically turned into a sequence
+of an empty REQ_OP_FLUSH request followed by the actual write by the block
+layer.  For devices that also support the FUA bit the block layer needs
+to be told to pass through the REQ_FUA bit using::
+
+	blk_queue_write_cache(sdkp->disk->queue, true, true);
+
+and the driver must handle write requests that have the REQ_FUA bit set
+in prep_fn/request_fn.  If the FUA bit is not natively supported the block
+layer turns it into an empty REQ_OP_FLUSH request after the actual write.
diff --git a/Documentation/block/writeback_cache_control.txt b/Documentation/block/writeback_cache_control.txt
deleted file mode 100644
index 8a6bdada5f6b..000000000000
--- a/Documentation/block/writeback_cache_control.txt
+++ /dev/null
@@ -1,86 +0,0 @@
-
-Explicit volatile write back cache control
-=====================================
-
-Introduction
-------------
-
-Many storage devices, especially in the consumer market, come with volatile
-write back caches.  That means the devices signal I/O completion to the
-operating system before data actually has hit the non-volatile storage.  This
-behavior obviously speeds up various workloads, but it means the operating
-system needs to force data out to the non-volatile storage when it performs
-a data integrity operation like fsync, sync or an unmount.
-
-The Linux block layer provides two simple mechanisms that let filesystems
-control the caching behavior of the storage device.  These mechanisms are
-a forced cache flush, and the Force Unit Access (FUA) flag for requests.
-
-
-Explicit cache flushes
-----------------------
-
-The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
-the filesystem and will make sure the volatile cache of the storage device
-has been flushed before the actual I/O operation is started.  This explicitly
-guarantees that previously completed write requests are on non-volatile
-storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
-set on an otherwise empty bio structure, which causes only an explicit cache
-flush without any dependent I/O.  It is recommend to use
-the blkdev_issue_flush() helper for a pure cache flush.
-
-
-Forced Unit Access
------------------
-
-The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
-filesystem and will make sure that I/O completion for this request is only
-signaled after the data has been committed to non-volatile storage.
-
-
-Implementation details for filesystems
---------------------------------------
-
-Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
-worry if the underlying devices need any explicit cache flushing and how
-the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
-may both be set on a single bio.
-
-
-Implementation details for make_request_fn based block drivers
---------------------------------------------------------------
-
-These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
-directly below the submit_bio interface.  For remapping drivers the REQ_FUA
-bits need to be propagated to underlying devices, and a global flush needs
-to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
-drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
-on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
-data can be completed successfully without doing any work.  Drivers for
-devices with volatile caches need to implement the support for these
-flags themselves without any help from the block layer.
-
-
-Implementation details for request_fn based block drivers
---------------------------------------------------------------
-
-For devices that do not support volatile write caches there is no driver
-support required, the block layer completes empty REQ_PREFLUSH requests before
-entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
-requests that have a payload.  For devices with volatile write caches the
-driver needs to tell the block layer that it supports flushing caches by
-doing:
-
-	blk_queue_write_cache(sdkp->disk->queue, true, false);
-
-and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
-REQ_PREFLUSH requests with a payload are automatically turned into a sequence
-of an empty REQ_OP_FLUSH request followed by the actual write by the block
-layer.  For devices that also support the FUA bit the block layer needs
-to be told to pass through the REQ_FUA bit using:
-
-	blk_queue_write_cache(sdkp->disk->queue, true, true);
-
-and the driver must handle write requests that have the REQ_FUA bit set
-in prep_fn/request_fn.  If the FUA bit is not natively supported the block
-layer turns it into an empty REQ_OP_FLUSH request after the actual write.
diff --git a/Documentation/blockdev/zram.rst b/Documentation/blockdev/zram.rst
index 2111231c9c0f..6eccf13219ff 100644
--- a/Documentation/blockdev/zram.rst
+++ b/Documentation/blockdev/zram.rst
@@ -215,7 +215,7 @@ User space is advised to use the following files to read the device statistics.
 
 File /sys/block/zram<id>/stat
 
-Represents block layer statistics. Read Documentation/block/stat.txt for
+Represents block layer statistics. Read Documentation/block/stat.rst for
 details.
 
 File /sys/block/zram<id>/io_stat
diff --git a/MAINTAINERS b/MAINTAINERS
index 93e5ac1de255..4b9fd11466a2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2968,7 +2968,7 @@ M:	Jens Axboe <axboe@kernel.dk>
 L:	linux-block@vger.kernel.org
 S:	Maintained
 F:	block/bfq-*
-F:	Documentation/block/bfq-iosched.txt
+F:	Documentation/block/bfq-iosched.rst
 
 BFS FILE SYSTEM
 M:	"Tigran A. Aivazian" <aivazian.tigran@gmail.com>
diff --git a/block/Kconfig b/block/Kconfig
index 56cb1695cd87..b16b3e075d31 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -110,7 +110,7 @@ config BLK_CMDLINE_PARSER
 	which don't otherwise have any standardized method for listing the
 	partitions on a block device.
 
-	See Documentation/block/cmdline-partition.txt for more information.
+	See Documentation/block/cmdline-partition.rst for more information.
 
 config BLK_WBT
 	bool "Enable support for block device writeback throttling"
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 7a6b2f29a582..b89310a022ad 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -26,7 +26,7 @@ config IOSCHED_BFQ
 	regardless of the device parameters and with any workload. It
 	also guarantees a low latency to interactive and soft
 	real-time applications.  Details in
-	Documentation/block/bfq-iosched.txt
+	Documentation/block/bfq-iosched.rst
 
 config BFQ_GROUP_IOSCHED
        bool "BFQ hierarchical scheduling support"
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 50c9d2598500..72860325245a 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -17,7 +17,7 @@
  * low-latency capabilities. BFQ also supports full hierarchical
  * scheduling through cgroups. Next paragraphs provide an introduction
  * on BFQ inner workings. Details on BFQ benefits, usage and
- * limitations can be found in Documentation/block/bfq-iosched.txt.
+ * limitations can be found in Documentation/block/bfq-iosched.rst.
  *
  * BFQ is a proportional-share storage-I/O scheduling algorithm based
  * on the slice-by-slice service scheme of CFQ. But BFQ assigns
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 825c9c070458..ca39b4624cf8 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -383,7 +383,7 @@ static const struct blk_integrity_profile nop_profile = {
  * send/receive integrity metadata it must use this function to register
  * the capability with the block layer. The template is a blk_integrity
  * struct with values appropriate for the underlying hardware. See
- * Documentation/block/data-integrity.txt.
+ * Documentation/block/data-integrity.rst.
  */
 void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template)
 {
diff --git a/block/ioprio.c b/block/ioprio.c
index 2e0559f157c8..77bcab11dce5 100644
--- a/block/ioprio.c
+++ b/block/ioprio.c
@@ -17,7 +17,7 @@
  *
  * ioprio_set(PRIO_PROCESS, pid, prio);
  *
- * See also Documentation/block/ioprio.txt
+ * See also Documentation/block/ioprio.rst
  *
  */
 #include <linux/gfp.h>
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index b8a682b5a1bb..2a2a2e82832e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -25,7 +25,7 @@
 #include "blk-mq-sched.h"
 
 /*
- * See Documentation/block/deadline-iosched.txt
+ * See Documentation/block/deadline-iosched.rst
  */
 static const int read_expire = HZ / 2;  /* max time before a read is submitted. */
 static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
diff --git a/block/partitions/cmdline.c b/block/partitions/cmdline.c
index 60fb3df9897c..f1edd5452249 100644
--- a/block/partitions/cmdline.c
+++ b/block/partitions/cmdline.c
@@ -11,7 +11,7 @@
  *
  * The format for the command line is just like mtdparts.
  *
- * For further information, see "Documentation/block/cmdline-partition.txt"
+ * For further information, see "Documentation/block/cmdline-partition.rst"
  *
  */
 
-- 
cgit v1.2.3-55-g7522


From d5ccd65ab6272f21f442695b0022a4f553d818e5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 19 Apr 2019 19:01:18 -0300
Subject: docs: move gcc_plugins.txt to core-api and rename to .rst

The gcc_plugins.txt file is already a ReST file. Move it
to the core-api book while renaming it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
---
 Documentation/core-api/gcc-plugins.rst | 93 ++++++++++++++++++++++++++++++++++
 Documentation/core-api/index.rst       |  1 +
 Documentation/gcc-plugins.txt          | 93 ----------------------------------
 MAINTAINERS                            |  2 +-
 scripts/gcc-plugins/Kconfig            |  2 +-
 5 files changed, 96 insertions(+), 95 deletions(-)
 create mode 100644 Documentation/core-api/gcc-plugins.rst
 delete mode 100644 Documentation/gcc-plugins.txt

diff --git a/Documentation/core-api/gcc-plugins.rst b/Documentation/core-api/gcc-plugins.rst
new file mode 100644
index 000000000000..8502f24396fb
--- /dev/null
+++ b/Documentation/core-api/gcc-plugins.rst
@@ -0,0 +1,93 @@
+=========================
+GCC plugin infrastructure
+=========================
+
+
+Introduction
+============
+
+GCC plugins are loadable modules that provide extra features to the
+compiler [1]_. They are useful for runtime instrumentation and static analysis.
+We can analyse, change and add further code during compilation via
+callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_.
+
+The GCC plugin infrastructure of the kernel supports all gcc versions from
+4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
+separate directory.
+Plugin source files have to be compilable by both a C and a C++ compiler as well
+because gcc versions 4.5 and 4.6 are compiled by a C compiler,
+gcc-4.7 can be compiled by a C or a C++ compiler,
+and versions 4.8+ can only be compiled by a C++ compiler.
+
+Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
+powerpc architectures.
+
+This infrastructure was ported from grsecurity [6]_ and PaX [7]_.
+
+--
+
+.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
+.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
+.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
+.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
+.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
+.. [6] https://grsecurity.net/
+.. [7] https://pax.grsecurity.net/
+
+
+Files
+=====
+
+**$(src)/scripts/gcc-plugins**
+
+	This is the directory of the GCC plugins.
+
+**$(src)/scripts/gcc-plugins/gcc-common.h**
+
+	This is a compatibility header for GCC plugins.
+	It should be always included instead of individual gcc headers.
+
+**$(src)/scripts/gcc-plugin.sh**
+
+	This script checks the availability of the included headers in
+	gcc-common.h and chooses the proper host compiler to build the plugins
+	(gcc-4.7 can be built by either gcc or g++).
+
+**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h,
+$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h,
+$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h,
+$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h**
+
+	These headers automatically generate the registration structures for
+	GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
+	from 4.5 to 6.0.
+	They should be preferred to creating the structures by hand.
+
+
+Usage
+=====
+
+You must install the gcc plugin headers for your gcc version,
+e.g., on Ubuntu for gcc-4.9::
+
+	apt-get install gcc-4.9-plugin-dev
+
+Enable a GCC plugin based feature in the kernel config::
+
+	CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
+
+To compile only the plugin(s)::
+
+	make gcc-plugins
+
+or just run the kernel make and compile the whole kernel with
+the cyclomatic complexity GCC plugin.
+
+
+4. How to add a new GCC plugin
+==============================
+
+The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory
+here. It must be added to $(src)/scripts/gcc-plugins/Makefile,
+$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig.
+See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 322ac954b390..da0ed972d224 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -36,6 +36,7 @@ Core utilities
    memory-hotplug
    protection-keys
    ../RCU/index
+   gcc-plugins
 
 
 Interfaces for kernel debugging
diff --git a/Documentation/gcc-plugins.txt b/Documentation/gcc-plugins.txt
deleted file mode 100644
index 8502f24396fb..000000000000
--- a/Documentation/gcc-plugins.txt
+++ /dev/null
@@ -1,93 +0,0 @@
-=========================
-GCC plugin infrastructure
-=========================
-
-
-Introduction
-============
-
-GCC plugins are loadable modules that provide extra features to the
-compiler [1]_. They are useful for runtime instrumentation and static analysis.
-We can analyse, change and add further code during compilation via
-callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_.
-
-The GCC plugin infrastructure of the kernel supports all gcc versions from
-4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
-separate directory.
-Plugin source files have to be compilable by both a C and a C++ compiler as well
-because gcc versions 4.5 and 4.6 are compiled by a C compiler,
-gcc-4.7 can be compiled by a C or a C++ compiler,
-and versions 4.8+ can only be compiled by a C++ compiler.
-
-Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
-powerpc architectures.
-
-This infrastructure was ported from grsecurity [6]_ and PaX [7]_.
-
---
-
-.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
-.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
-.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
-.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
-.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
-.. [6] https://grsecurity.net/
-.. [7] https://pax.grsecurity.net/
-
-
-Files
-=====
-
-**$(src)/scripts/gcc-plugins**
-
-	This is the directory of the GCC plugins.
-
-**$(src)/scripts/gcc-plugins/gcc-common.h**
-
-	This is a compatibility header for GCC plugins.
-	It should be always included instead of individual gcc headers.
-
-**$(src)/scripts/gcc-plugin.sh**
-
-	This script checks the availability of the included headers in
-	gcc-common.h and chooses the proper host compiler to build the plugins
-	(gcc-4.7 can be built by either gcc or g++).
-
-**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h,
-$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h**
-
-	These headers automatically generate the registration structures for
-	GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
-	from 4.5 to 6.0.
-	They should be preferred to creating the structures by hand.
-
-
-Usage
-=====
-
-You must install the gcc plugin headers for your gcc version,
-e.g., on Ubuntu for gcc-4.9::
-
-	apt-get install gcc-4.9-plugin-dev
-
-Enable a GCC plugin based feature in the kernel config::
-
-	CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
-
-To compile only the plugin(s)::
-
-	make gcc-plugins
-
-or just run the kernel make and compile the whole kernel with
-the cyclomatic complexity GCC plugin.
-
-
-4. How to add a new GCC plugin
-==============================
-
-The GCC plugins are in $(src)/scripts/gcc-plugins/. You can use a file or a directory
-here. It must be added to $(src)/scripts/gcc-plugins/Makefile,
-$(src)/scripts/Makefile.gcc-plugins and $(src)/arch/Kconfig.
-See the cyc_complexity_plugin.c (CONFIG_GCC_PLUGIN_CYC_COMPLEXITY) GCC plugin.
diff --git a/MAINTAINERS b/MAINTAINERS
index 4b9fd11466a2..db96cd4a229b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6655,7 +6655,7 @@ S:	Maintained
 F:	scripts/gcc-plugins/
 F:	scripts/gcc-plugin.sh
 F:	scripts/Makefile.gcc-plugins
-F:	Documentation/gcc-plugins.txt
+F:	Documentation/core-api/gcc-plugins.rst
 
 GASKET DRIVER FRAMEWORK
 M:	Rob Springer <rspringer@google.com>
diff --git a/scripts/gcc-plugins/Kconfig b/scripts/gcc-plugins/Kconfig
index e9c677a53c74..d33de0b9f4f5 100644
--- a/scripts/gcc-plugins/Kconfig
+++ b/scripts/gcc-plugins/Kconfig
@@ -23,7 +23,7 @@ config GCC_PLUGINS
 	  GCC plugins are loadable modules that provide extra features to the
 	  compiler. They are useful for runtime instrumentation and static analysis.
 
-	  See Documentation/gcc-plugins.txt for details.
+	  See Documentation/core-api/gcc-plugins.rst for details.
 
 menu "GCC plugins"
 	depends on GCC_PLUGINS
-- 
cgit v1.2.3-55-g7522


From 74684f8ff44e8b9cf85542762ec347b96bd92559 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Wed, 26 Jun 2019 12:24:01 -0300
Subject: docs: logo.txt: rename it to COPYING-logo

This file has nothing to do with the Kernel documentation. It
contains the copyright permissions for Tux at Documentation/logo.gif.

So, rename it accordingly.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/COPYING-logo | 13 +++++++++++++
 Documentation/logo.txt     | 13 -------------
 2 files changed, 13 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/COPYING-logo
 delete mode 100644 Documentation/logo.txt

diff --git a/Documentation/COPYING-logo b/Documentation/COPYING-logo
new file mode 100644
index 000000000000..296f0f7f67eb
--- /dev/null
+++ b/Documentation/COPYING-logo
@@ -0,0 +1,13 @@
+This is the full-colour version of the currently unofficial Linux logo
+("currently unofficial" just means that there has been no paperwork and
+that I have not really announced it yet).  It was created by Larry Ewing,
+and is freely usable as long as you acknowledge Larry as the original
+artist. 
+
+Note that there are black-and-white versions of this available that
+scale down to smaller sizes and are better for letterheads or whatever
+you want to use it for: for the full range of logos take a look at
+Larry's web-page:
+
+	http://www.isc.tamu.edu/~lewing/linux/
+
diff --git a/Documentation/logo.txt b/Documentation/logo.txt
deleted file mode 100644
index 296f0f7f67eb..000000000000
--- a/Documentation/logo.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-This is the full-colour version of the currently unofficial Linux logo
-("currently unofficial" just means that there has been no paperwork and
-that I have not really announced it yet).  It was created by Larry Ewing,
-and is freely usable as long as you acknowledge Larry as the original
-artist. 
-
-Note that there are black-and-white versions of this available that
-scale down to smaller sizes and are better for letterheads or whatever
-you want to use it for: for the full range of logos take a look at
-Larry's web-page:
-
-	http://www.isc.tamu.edu/~lewing/linux/
-
-- 
cgit v1.2.3-55-g7522


From d2bdd48a652bd0f7a5c78f3e418b4529fc469e1f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:03:23 -0300
Subject: docs: rapidio: add it to the driver API

This is actually a subsystem description, with contains both
kAPI and uAPI.

While it should ideally be slplit, let's place it at driver-api,
as most things are related to kAPI and driver-specific info.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/index.rst             |   1 +
 Documentation/admin-guide/rapidio.rst           | 107 +++++++
 Documentation/driver-api/index.rst              |   2 +-
 Documentation/driver-api/rapidio.rst            | 107 -------
 Documentation/driver-api/rapidio/index.rst      |  13 +
 Documentation/driver-api/rapidio/mport_cdev.rst | 110 +++++++
 Documentation/driver-api/rapidio/rapidio.rst    | 362 ++++++++++++++++++++++++
 Documentation/driver-api/rapidio/rio_cm.rst     | 135 +++++++++
 Documentation/driver-api/rapidio/sysfs.rst      |   7 +
 Documentation/driver-api/rapidio/tsi721.rst     | 112 ++++++++
 Documentation/rapidio/index.rst                 |  15 -
 Documentation/rapidio/mport_cdev.rst            | 110 -------
 Documentation/rapidio/rapidio.rst               | 362 ------------------------
 Documentation/rapidio/rio_cm.rst                | 135 ---------
 Documentation/rapidio/sysfs.rst                 |   7 -
 Documentation/rapidio/tsi721.rst                | 112 --------
 drivers/rapidio/Kconfig                         |   2 +-
 17 files changed, 849 insertions(+), 850 deletions(-)
 create mode 100644 Documentation/admin-guide/rapidio.rst
 delete mode 100644 Documentation/driver-api/rapidio.rst
 create mode 100644 Documentation/driver-api/rapidio/index.rst
 create mode 100644 Documentation/driver-api/rapidio/mport_cdev.rst
 create mode 100644 Documentation/driver-api/rapidio/rapidio.rst
 create mode 100644 Documentation/driver-api/rapidio/rio_cm.rst
 create mode 100644 Documentation/driver-api/rapidio/sysfs.rst
 create mode 100644 Documentation/driver-api/rapidio/tsi721.rst
 delete mode 100644 Documentation/rapidio/index.rst
 delete mode 100644 Documentation/rapidio/mport_cdev.rst
 delete mode 100644 Documentation/rapidio/rapidio.rst
 delete mode 100644 Documentation/rapidio/rio_cm.rst
 delete mode 100644 Documentation/rapidio/sysfs.rst
 delete mode 100644 Documentation/rapidio/tsi721.rst

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 24fbe0568eff..8853c95ef0d4 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -61,6 +61,7 @@ configure specific aspects of kernel behavior to your liking.
    parport
    md
    module-signing
+   rapidio
    sysrq
    unicode
    vga-softcursor
diff --git a/Documentation/admin-guide/rapidio.rst b/Documentation/admin-guide/rapidio.rst
new file mode 100644
index 000000000000..71ff658ab78e
--- /dev/null
+++ b/Documentation/admin-guide/rapidio.rst
@@ -0,0 +1,107 @@
+=======================
+RapidIO Subsystem Guide
+=======================
+
+:Author: Matt Porter
+
+Introduction
+============
+
+RapidIO is a high speed switched fabric interconnect with features aimed
+at the embedded market. RapidIO provides support for memory-mapped I/O
+as well as message-based transactions over the switched fabric network.
+RapidIO has a standardized discovery mechanism not unlike the PCI bus
+standard that allows simple detection of devices in a network.
+
+This documentation is provided for developers intending to support
+RapidIO on new architectures, write new drivers, or to understand the
+subsystem internals.
+
+Known Bugs and Limitations
+==========================
+
+Bugs
+----
+
+None. ;)
+
+Limitations
+-----------
+
+1. Access/management of RapidIO memory regions is not supported
+
+2. Multiple host enumeration is not supported
+
+RapidIO driver interface
+========================
+
+Drivers are provided a set of calls in order to interface with the
+subsystem to gather info on devices, request/map memory region
+resources, and manage mailboxes/doorbells.
+
+Functions
+---------
+
+.. kernel-doc:: include/linux/rio_drv.h
+   :internal:
+
+.. kernel-doc:: drivers/rapidio/rio-driver.c
+   :export:
+
+.. kernel-doc:: drivers/rapidio/rio.c
+   :export:
+
+Internals
+=========
+
+This chapter contains the autogenerated documentation of the RapidIO
+subsystem.
+
+Structures
+----------
+
+.. kernel-doc:: include/linux/rio.h
+   :internal:
+
+Enumeration and Discovery
+-------------------------
+
+.. kernel-doc:: drivers/rapidio/rio-scan.c
+   :internal:
+
+Driver functionality
+--------------------
+
+.. kernel-doc:: drivers/rapidio/rio.c
+   :internal:
+
+.. kernel-doc:: drivers/rapidio/rio-access.c
+   :internal:
+
+Device model support
+--------------------
+
+.. kernel-doc:: drivers/rapidio/rio-driver.c
+   :internal:
+
+PPC32 support
+-------------
+
+.. kernel-doc:: arch/powerpc/sysdev/fsl_rio.c
+   :internal:
+
+Credits
+=======
+
+The following people have contributed to the RapidIO subsystem directly
+or indirectly:
+
+1. Matt Porter\ mporter@kernel.crashing.org
+
+2. Randy Vinson\ rvinson@mvista.com
+
+3. Dan Malek\ dan@embeddedalley.com
+
+The following people have contributed to this document:
+
+1. Matt Porter\ mporter@kernel.crashing.org
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 6cd750a03ea0..d665cd9ab95f 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -45,7 +45,7 @@ available subsections can be seen below.
    miscellaneous
    mei/index
    w1
-   rapidio
+   rapidio/index
    s390-drivers
    vme
    80211/index
diff --git a/Documentation/driver-api/rapidio.rst b/Documentation/driver-api/rapidio.rst
deleted file mode 100644
index 71ff658ab78e..000000000000
--- a/Documentation/driver-api/rapidio.rst
+++ /dev/null
@@ -1,107 +0,0 @@
-=======================
-RapidIO Subsystem Guide
-=======================
-
-:Author: Matt Porter
-
-Introduction
-============
-
-RapidIO is a high speed switched fabric interconnect with features aimed
-at the embedded market. RapidIO provides support for memory-mapped I/O
-as well as message-based transactions over the switched fabric network.
-RapidIO has a standardized discovery mechanism not unlike the PCI bus
-standard that allows simple detection of devices in a network.
-
-This documentation is provided for developers intending to support
-RapidIO on new architectures, write new drivers, or to understand the
-subsystem internals.
-
-Known Bugs and Limitations
-==========================
-
-Bugs
-----
-
-None. ;)
-
-Limitations
------------
-
-1. Access/management of RapidIO memory regions is not supported
-
-2. Multiple host enumeration is not supported
-
-RapidIO driver interface
-========================
-
-Drivers are provided a set of calls in order to interface with the
-subsystem to gather info on devices, request/map memory region
-resources, and manage mailboxes/doorbells.
-
-Functions
----------
-
-.. kernel-doc:: include/linux/rio_drv.h
-   :internal:
-
-.. kernel-doc:: drivers/rapidio/rio-driver.c
-   :export:
-
-.. kernel-doc:: drivers/rapidio/rio.c
-   :export:
-
-Internals
-=========
-
-This chapter contains the autogenerated documentation of the RapidIO
-subsystem.
-
-Structures
-----------
-
-.. kernel-doc:: include/linux/rio.h
-   :internal:
-
-Enumeration and Discovery
--------------------------
-
-.. kernel-doc:: drivers/rapidio/rio-scan.c
-   :internal:
-
-Driver functionality
---------------------
-
-.. kernel-doc:: drivers/rapidio/rio.c
-   :internal:
-
-.. kernel-doc:: drivers/rapidio/rio-access.c
-   :internal:
-
-Device model support
---------------------
-
-.. kernel-doc:: drivers/rapidio/rio-driver.c
-   :internal:
-
-PPC32 support
--------------
-
-.. kernel-doc:: arch/powerpc/sysdev/fsl_rio.c
-   :internal:
-
-Credits
-=======
-
-The following people have contributed to the RapidIO subsystem directly
-or indirectly:
-
-1. Matt Porter\ mporter@kernel.crashing.org
-
-2. Randy Vinson\ rvinson@mvista.com
-
-3. Dan Malek\ dan@embeddedalley.com
-
-The following people have contributed to this document:
-
-1. Matt Porter\ mporter@kernel.crashing.org
diff --git a/Documentation/driver-api/rapidio/index.rst b/Documentation/driver-api/rapidio/index.rst
new file mode 100644
index 000000000000..4c5e51a05134
--- /dev/null
+++ b/Documentation/driver-api/rapidio/index.rst
@@ -0,0 +1,13 @@
+===========================
+The Linux RapidIO Subsystem
+===========================
+
+.. toctree::
+   :maxdepth: 1
+
+   rapidio
+   sysfs
+
+   tsi721
+   mport_cdev
+   rio_cm
diff --git a/Documentation/driver-api/rapidio/mport_cdev.rst b/Documentation/driver-api/rapidio/mport_cdev.rst
new file mode 100644
index 000000000000..df77a7f7be7d
--- /dev/null
+++ b/Documentation/driver-api/rapidio/mport_cdev.rst
@@ -0,0 +1,110 @@
+==================================================================
+RapidIO subsystem mport character device driver (rio_mport_cdev.c)
+==================================================================
+
+1. Overview
+===========
+
+This device driver is the result of collaboration within the RapidIO.org
+Software Task Group (STG) between Texas Instruments, Freescale,
+Prodrive Technologies, Nokia Networks, BAE and IDT.  Additional input was
+received from other members of RapidIO.org. The objective was to create a
+character mode driver interface which exposes the capabilities of RapidIO
+devices directly to applications, in a manner that allows the numerous and
+varied RapidIO implementations to interoperate.
+
+This driver (MPORT_CDEV) provides access to basic RapidIO subsystem operations
+for user-space applications. Most of RapidIO operations are supported through
+'ioctl' system calls.
+
+When loaded this device driver creates filesystem nodes named rio_mportX in /dev
+directory for each registered RapidIO mport device. 'X' in the node name matches
+to unique port ID assigned to each local mport device.
+
+Using available set of ioctl commands user-space applications can perform
+following RapidIO bus and subsystem operations:
+
+- Reads and writes from/to configuration registers of mport devices
+  (RIO_MPORT_MAINT_READ_LOCAL/RIO_MPORT_MAINT_WRITE_LOCAL)
+- Reads and writes from/to configuration registers of remote RapidIO devices.
+  This operations are defined as RapidIO Maintenance reads/writes in RIO spec.
+  (RIO_MPORT_MAINT_READ_REMOTE/RIO_MPORT_MAINT_WRITE_REMOTE)
+- Set RapidIO Destination ID for mport devices (RIO_MPORT_MAINT_HDID_SET)
+- Set RapidIO Component Tag for mport devices (RIO_MPORT_MAINT_COMPTAG_SET)
+- Query logical index of mport devices (RIO_MPORT_MAINT_PORT_IDX_GET)
+- Query capabilities and RapidIO link configuration of mport devices
+  (RIO_MPORT_GET_PROPERTIES)
+- Enable/Disable reporting of RapidIO doorbell events to user-space applications
+  (RIO_ENABLE_DOORBELL_RANGE/RIO_DISABLE_DOORBELL_RANGE)
+- Enable/Disable reporting of RIO port-write events to user-space applications
+  (RIO_ENABLE_PORTWRITE_RANGE/RIO_DISABLE_PORTWRITE_RANGE)
+- Query/Control type of events reported through this driver: doorbells,
+  port-writes or both (RIO_SET_EVENT_MASK/RIO_GET_EVENT_MASK)
+- Configure/Map mport's outbound requests window(s) for specific size,
+  RapidIO destination ID, hopcount and request type
+  (RIO_MAP_OUTBOUND/RIO_UNMAP_OUTBOUND)
+- Configure/Map mport's inbound requests window(s) for specific size,
+  RapidIO base address and local memory base address
+  (RIO_MAP_INBOUND/RIO_UNMAP_INBOUND)
+- Allocate/Free contiguous DMA coherent memory buffer for DMA data transfers
+  to/from remote RapidIO devices (RIO_ALLOC_DMA/RIO_FREE_DMA)
+- Initiate DMA data transfers to/from remote RapidIO devices (RIO_TRANSFER).
+  Supports blocking, asynchronous and posted (a.k.a 'fire-and-forget') data
+  transfer modes.
+- Check/Wait for completion of asynchronous DMA data transfer
+  (RIO_WAIT_FOR_ASYNC)
+- Manage device objects supported by RapidIO subsystem (RIO_DEV_ADD/RIO_DEV_DEL).
+  This allows implementation of various RapidIO fabric enumeration algorithms
+  as user-space applications while using remaining functionality provided by
+  kernel RapidIO subsystem.
+
+2. Hardware Compatibility
+=========================
+
+This device driver uses standard interfaces defined by kernel RapidIO subsystem
+and therefore it can be used with any mport device driver registered by RapidIO
+subsystem with limitations set by available mport implementation.
+
+At this moment the most common limitation is availability of RapidIO-specific
+DMA engine framework for specific mport device. Users should verify available
+functionality of their platform when planning to use this driver:
+
+- IDT Tsi721 PCIe-to-RapidIO bridge device and its mport device driver are fully
+  compatible with this driver.
+- Freescale SoCs 'fsl_rio' mport driver does not have implementation for RapidIO
+  specific DMA engine support and therefore DMA data transfers mport_cdev driver
+  are not available.
+
+3. Module parameters
+====================
+
+- 'dma_timeout'
+      - DMA transfer completion timeout (in msec, default value 3000).
+        This parameter set a maximum completion wait time for SYNC mode DMA
+        transfer requests and for RIO_WAIT_FOR_ASYNC ioctl requests.
+
+- 'dbg_level'
+      - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        bit masks that correspond to the specific functional blocks.
+        For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+4. Known problems
+=================
+
+  None.
+
+5. User-space Applications and API
+==================================
+
+API library and applications that use this device driver are available from
+RapidIO.org.
+
+6. TODO List
+============
+
+- Add support for sending/receiving "raw" RapidIO messaging packets.
+- Add memory mapped DMA data transfers as an option when RapidIO-specific DMA
+  is not available.
diff --git a/Documentation/driver-api/rapidio/rapidio.rst b/Documentation/driver-api/rapidio/rapidio.rst
new file mode 100644
index 000000000000..fb8942d3ba85
--- /dev/null
+++ b/Documentation/driver-api/rapidio/rapidio.rst
@@ -0,0 +1,362 @@
+============
+Introduction
+============
+
+The RapidIO standard is a packet-based fabric interconnect standard designed for
+use in embedded systems. Development of the RapidIO standard is directed by the
+RapidIO Trade Association (RTA). The current version of the RapidIO specification
+is publicly available for download from the RTA web-site [1].
+
+This document describes the basics of the Linux RapidIO subsystem and provides
+information on its major components.
+
+1 Overview
+==========
+
+Because the RapidIO subsystem follows the Linux device model it is integrated
+into the kernel similarly to other buses by defining RapidIO-specific device and
+bus types and registering them within the device model.
+
+The Linux RapidIO subsystem is architecture independent and therefore defines
+architecture-specific interfaces that provide support for common RapidIO
+subsystem operations.
+
+2. Core Components
+==================
+
+A typical RapidIO network is a combination of endpoints and switches.
+Each of these components is represented in the subsystem by an associated data
+structure. The core logical components of the RapidIO subsystem are defined
+in include/linux/rio.h file.
+
+2.1 Master Port
+---------------
+
+A master port (or mport) is a RapidIO interface controller that is local to the
+processor executing the Linux code. A master port generates and receives RapidIO
+packets (transactions). In the RapidIO subsystem each master port is represented
+by a rio_mport data structure. This structure contains master port specific
+resources such as mailboxes and doorbells. The rio_mport also includes a unique
+host device ID that is valid when a master port is configured as an enumerating
+host.
+
+RapidIO master ports are serviced by subsystem specific mport device drivers
+that provide functionality defined for this subsystem. To provide a hardware
+independent interface for RapidIO subsystem operations, rio_mport structure
+includes rio_ops data structure which contains pointers to hardware specific
+implementations of RapidIO functions.
+
+2.2 Device
+----------
+
+A RapidIO device is any endpoint (other than mport) or switch in the network.
+All devices are presented in the RapidIO subsystem by corresponding rio_dev data
+structure. Devices form one global device list and per-network device lists
+(depending on number of available mports and networks).
+
+2.3 Switch
+----------
+
+A RapidIO switch is a special class of device that routes packets between its
+ports towards their final destination. The packet destination port within a
+switch is defined by an internal routing table. A switch is presented in the
+RapidIO subsystem by rio_dev data structure expanded by additional rio_switch
+data structure, which contains switch specific information such as copy of the
+routing table and pointers to switch specific functions.
+
+The RapidIO subsystem defines the format and initialization method for subsystem
+specific switch drivers that are designed to provide hardware-specific
+implementation of common switch management routines.
+
+2.4 Network
+-----------
+
+A RapidIO network is a combination of interconnected endpoint and switch devices.
+Each RapidIO network known to the system is represented by corresponding rio_net
+data structure. This structure includes lists of all devices and local master
+ports that form the same network. It also contains a pointer to the default
+master port that is used to communicate with devices within the network.
+
+2.5 Device Drivers
+------------------
+
+RapidIO device-specific drivers follow Linux Kernel Driver Model and are
+intended to support specific RapidIO devices attached to the RapidIO network.
+
+2.6 Subsystem Interfaces
+------------------------
+
+RapidIO interconnect specification defines features that may be used to provide
+one or more common service layers for all participating RapidIO devices. These
+common services may act separately from device-specific drivers or be used by
+device-specific drivers. Example of such service provider is the RIONET driver
+which implements Ethernet-over-RapidIO interface. Because only one driver can be
+registered for a device, all common RapidIO services have to be registered as
+subsystem interfaces. This allows to have multiple common services attached to
+the same device without blocking attachment of a device-specific driver.
+
+3. Subsystem Initialization
+===========================
+
+In order to initialize the RapidIO subsystem, a platform must initialize and
+register at least one master port within the RapidIO network. To register mport
+within the subsystem controller driver's initialization code calls function
+rio_register_mport() for each available master port.
+
+After all active master ports are registered with a RapidIO subsystem,
+an enumeration and/or discovery routine may be called automatically or
+by user-space command.
+
+RapidIO subsystem can be configured to be built as a statically linked or
+modular component of the kernel (see details below).
+
+4. Enumeration and Discovery
+============================
+
+4.1 Overview
+------------
+
+RapidIO subsystem configuration options allow users to build enumeration and
+discovery methods as statically linked components or loadable modules.
+An enumeration/discovery method implementation and available input parameters
+define how any given method can be attached to available RapidIO mports:
+simply to all available mports OR individually to the specified mport device.
+
+Depending on selected enumeration/discovery build configuration, there are
+several methods to initiate an enumeration and/or discovery process:
+
+  (a) Statically linked enumeration and discovery process can be started
+  automatically during kernel initialization time using corresponding module
+  parameters. This was the original method used since introduction of RapidIO
+  subsystem. Now this method relies on enumerator module parameter which is
+  'rio-scan.scan' for existing basic enumeration/discovery method.
+  When automatic start of enumeration/discovery is used a user has to ensure
+  that all discovering endpoints are started before the enumerating endpoint
+  and are waiting for enumeration to be completed.
+  Configuration option CONFIG_RAPIDIO_DISC_TIMEOUT defines time that discovering
+  endpoint waits for enumeration to be completed. If the specified timeout
+  expires the discovery process is terminated without obtaining RapidIO network
+  information. NOTE: a timed out discovery process may be restarted later using
+  a user-space command as it is described below (if the given endpoint was
+  enumerated successfully).
+
+  (b) Statically linked enumeration and discovery process can be started by
+  a command from user space. This initiation method provides more flexibility
+  for a system startup compared to the option (a) above. After all participating
+  endpoints have been successfully booted, an enumeration process shall be
+  started first by issuing a user-space command, after an enumeration is
+  completed a discovery process can be started on all remaining endpoints.
+
+  (c) Modular enumeration and discovery process can be started by a command from
+  user space. After an enumeration/discovery module is loaded, a network scan
+  process can be started by issuing a user-space command.
+  Similar to the option (b) above, an enumerator has to be started first.
+
+  (d) Modular enumeration and discovery process can be started by a module
+  initialization routine. In this case an enumerating module shall be loaded
+  first.
+
+When a network scan process is started it calls an enumeration or discovery
+routine depending on the configured role of a master port: host or agent.
+
+Enumeration is performed by a master port if it is configured as a host port by
+assigning a host destination ID greater than or equal to zero. The host
+destination ID can be assigned to a master port using various methods depending
+on RapidIO subsystem build configuration:
+
+  (a) For a statically linked RapidIO subsystem core use command line parameter
+  "rapidio.hdid=" with a list of destination ID assignments in order of mport
+  device registration. For example, in a system with two RapidIO controllers
+  the command line parameter "rapidio.hdid=-1,7" will result in assignment of
+  the host destination ID=7 to the second RapidIO controller, while the first
+  one will be assigned destination ID=-1.
+
+  (b) If the RapidIO subsystem core is built as a loadable module, in addition
+  to the method shown above, the host destination ID(s) can be specified using
+  traditional methods of passing module parameter "hdid=" during its loading:
+
+  - from command line: "modprobe rapidio hdid=-1,7", or
+  - from modprobe configuration file using configuration command "options",
+    like in this example: "options rapidio hdid=-1,7". An example of modprobe
+    configuration file is provided in the section below.
+
+NOTES:
+  (i) if "hdid=" parameter is omitted all available mport will be assigned
+  destination ID = -1;
+
+  (ii) the "hdid=" parameter in systems with multiple mports can have
+  destination ID assignments omitted from the end of list (default = -1).
+
+If the host device ID for a specific master port is set to -1, the discovery
+process will be performed for it.
+
+The enumeration and discovery routines use RapidIO maintenance transactions
+to access the configuration space of devices.
+
+NOTE: If RapidIO switch-specific device drivers are built as loadable modules
+they must be loaded before enumeration/discovery process starts.
+This requirement is cased by the fact that enumeration/discovery methods invoke
+vendor-specific callbacks on early stages.
+
+4.2 Automatic Start of Enumeration and Discovery
+------------------------------------------------
+
+Automatic enumeration/discovery start method is applicable only to built-in
+enumeration/discovery RapidIO configuration selection. To enable automatic
+enumeration/discovery start by existing basic enumerator method set use boot
+command line parameter "rio-scan.scan=1".
+
+This configuration requires synchronized start of all RapidIO endpoints that
+form a network which will be enumerated/discovered. Discovering endpoints have
+to be started before an enumeration starts to ensure that all RapidIO
+controllers have been initialized and are ready to be discovered. Configuration
+parameter CONFIG_RAPIDIO_DISC_TIMEOUT defines time (in seconds) which
+a discovering endpoint will wait for enumeration to be completed.
+
+When automatic enumeration/discovery start is selected, basic method's
+initialization routine calls rio_init_mports() to perform enumeration or
+discovery for all known mport devices.
+
+Depending on RapidIO network size and configuration this automatic
+enumeration/discovery start method may be difficult to use due to the
+requirement for synchronized start of all endpoints.
+
+4.3 User-space Start of Enumeration and Discovery
+-------------------------------------------------
+
+User-space start of enumeration and discovery can be used with built-in and
+modular build configurations. For user-space controlled start RapidIO subsystem
+creates the sysfs write-only attribute file '/sys/bus/rapidio/scan'. To initiate
+an enumeration or discovery process on specific mport device, a user needs to
+write mport_ID (not RapidIO destination ID) into that file. The mport_ID is a
+sequential number (0 ... RIO_MAX_MPORTS) assigned during mport device
+registration. For example for machine with single RapidIO controller, mport_ID
+for that controller always will be 0.
+
+To initiate RapidIO enumeration/discovery on all available mports a user may
+write '-1' (or RIO_MPORT_ANY) into the scan attribute file.
+
+4.4 Basic Enumeration Method
+----------------------------
+
+This is an original enumeration/discovery method which is available since
+first release of RapidIO subsystem code. The enumeration process is
+implemented according to the enumeration algorithm outlined in the RapidIO
+Interconnect Specification: Annex I [1].
+
+This method can be configured as statically linked or loadable module.
+The method's single parameter "scan" allows to trigger the enumeration/discovery
+process from module initialization routine.
+
+This enumeration/discovery method can be started only once and does not support
+unloading if it is built as a module.
+
+The enumeration process traverses the network using a recursive depth-first
+algorithm. When a new device is found, the enumerator takes ownership of that
+device by writing into the Host Device ID Lock CSR. It does this to ensure that
+the enumerator has exclusive right to enumerate the device. If device ownership
+is successfully acquired, the enumerator allocates a new rio_dev structure and
+initializes it according to device capabilities.
+
+If the device is an endpoint, a unique device ID is assigned to it and its value
+is written into the device's Base Device ID CSR.
+
+If the device is a switch, the enumerator allocates an additional rio_switch
+structure to store switch specific information. Then the switch's vendor ID and
+device ID are queried against a table of known RapidIO switches. Each switch
+table entry contains a pointer to a switch-specific initialization routine that
+initializes pointers to the rest of switch specific operations, and performs
+hardware initialization if necessary. A RapidIO switch does not have a unique
+device ID; it relies on hopcount and routing for device ID of an attached
+endpoint if access to its configuration registers is required. If a switch (or
+chain of switches) does not have any endpoint (except enumerator) attached to
+it, a fake device ID will be assigned to configure a route to that switch.
+In the case of a chain of switches without endpoint, one fake device ID is used
+to configure a route through the entire chain and switches are differentiated by
+their hopcount value.
+
+For both endpoints and switches the enumerator writes a unique component tag
+into device's Component Tag CSR. That unique value is used by the error
+management notification mechanism to identify a device that is reporting an
+error management event.
+
+Enumeration beyond a switch is completed by iterating over each active egress
+port of that switch. For each active link, a route to a default device ID
+(0xFF for 8-bit systems and 0xFFFF for 16-bit systems) is temporarily written
+into the routing table. The algorithm recurs by calling itself with hopcount + 1
+and the default device ID in order to access the device on the active port.
+
+After the host has completed enumeration of the entire network it releases
+devices by clearing device ID locks (calls rio_clear_locks()). For each endpoint
+in the system, it sets the Discovered bit in the Port General Control CSR
+to indicate that enumeration is completed and agents are allowed to execute
+passive discovery of the network.
+
+The discovery process is performed by agents and is similar to the enumeration
+process that is described above. However, the discovery process is performed
+without changes to the existing routing because agents only gather information
+about RapidIO network structure and are building an internal map of discovered
+devices. This way each Linux-based component of the RapidIO subsystem has
+a complete view of the network. The discovery process can be performed
+simultaneously by several agents. After initializing its RapidIO master port
+each agent waits for enumeration completion by the host for the configured wait
+time period. If this wait time period expires before enumeration is completed,
+an agent skips RapidIO discovery and continues with remaining kernel
+initialization.
+
+4.5 Adding New Enumeration/Discovery Method
+-------------------------------------------
+
+RapidIO subsystem code organization allows addition of new enumeration/discovery
+methods as new configuration options without significant impact to the core
+RapidIO code.
+
+A new enumeration/discovery method has to be attached to one or more mport
+devices before an enumeration/discovery process can be started. Normally,
+method's module initialization routine calls rio_register_scan() to attach
+an enumerator to a specified mport device (or devices). The basic enumerator
+implementation demonstrates this process.
+
+4.6 Using Loadable RapidIO Switch Drivers
+-----------------------------------------
+
+In the case when RapidIO switch drivers are built as loadable modules a user
+must ensure that they are loaded before the enumeration/discovery starts.
+This process can be automated by specifying pre- or post- dependencies in the
+RapidIO-specific modprobe configuration file as shown in the example below.
+
+File /etc/modprobe.d/rapidio.conf::
+
+  # Configure RapidIO subsystem modules
+
+  # Set enumerator host destination ID (overrides kernel command line option)
+  options rapidio hdid=-1,2
+
+  # Load RapidIO switch drivers immediately after rapidio core module was loaded
+  softdep rapidio post: idt_gen2 idtcps tsi57x
+
+  # OR :
+
+  # Load RapidIO switch drivers just before rio-scan enumerator module is loaded
+  softdep rio-scan pre: idt_gen2 idtcps tsi57x
+
+  --------------------------
+
+NOTE:
+  In the example above, one of "softdep" commands must be removed or
+  commented out to keep required module loading sequence.
+
+5. References
+=============
+
+[1] RapidIO Trade Association. RapidIO Interconnect Specifications.
+    http://www.rapidio.org.
+
+[2] Rapidio TA. Technology Comparisons.
+    http://www.rapidio.org/education/technology_comparisons/
+
+[3] RapidIO support for Linux.
+    http://lwn.net/Articles/139118/
+
+[4] Matt Porter. RapidIO for Linux. Ottawa Linux Symposium, 2005
+    http://www.kernel.org/doc/ols/2005/ols2005v2-pages-43-56.pdf
diff --git a/Documentation/driver-api/rapidio/rio_cm.rst b/Documentation/driver-api/rapidio/rio_cm.rst
new file mode 100644
index 000000000000..5294430a7a74
--- /dev/null
+++ b/Documentation/driver-api/rapidio/rio_cm.rst
@@ -0,0 +1,135 @@
+==========================================================================
+RapidIO subsystem Channelized Messaging character device driver (rio_cm.c)
+==========================================================================
+
+
+1. Overview
+===========
+
+This device driver is the result of collaboration within the RapidIO.org
+Software Task Group (STG) between Texas Instruments, Prodrive Technologies,
+Nokia Networks, BAE and IDT.  Additional input was received from other members
+of RapidIO.org.
+
+The objective was to create a character mode driver interface which exposes
+messaging capabilities of RapidIO endpoint devices (mports) directly
+to applications, in a manner that allows the numerous and varied RapidIO
+implementations to interoperate.
+
+This driver (RIO_CM) provides to user-space applications shared access to
+RapidIO mailbox messaging resources.
+
+RapidIO specification (Part 2) defines that endpoint devices may have up to four
+messaging mailboxes in case of multi-packet message (up to 4KB) and
+up to 64 mailboxes if single-packet messages (up to 256 B) are used. In addition
+to protocol definition limitations, a particular hardware implementation can
+have reduced number of messaging mailboxes.  RapidIO aware applications must
+therefore share the messaging resources of a RapidIO endpoint.
+
+Main purpose of this device driver is to provide RapidIO mailbox messaging
+capability to large number of user-space processes by introducing socket-like
+operations using a single messaging mailbox.  This allows applications to
+use the limited RapidIO messaging hardware resources efficiently.
+
+Most of device driver's operations are supported through 'ioctl' system calls.
+
+When loaded this device driver creates a single file system node named rio_cm
+in /dev directory common for all registered RapidIO mport devices.
+
+Following ioctl commands are available to user-space applications:
+
+- RIO_CM_MPORT_GET_LIST:
+    Returns to caller list of local mport devices that
+    support messaging operations (number of entries up to RIO_MAX_MPORTS).
+    Each list entry is combination of mport's index in the system and RapidIO
+    destination ID assigned to the port.
+- RIO_CM_EP_GET_LIST_SIZE:
+    Returns number of messaging capable remote endpoints
+    in a RapidIO network associated with the specified mport device.
+- RIO_CM_EP_GET_LIST:
+    Returns list of RapidIO destination IDs for messaging
+    capable remote endpoints (peers) available in a RapidIO network associated
+    with the specified mport device.
+- RIO_CM_CHAN_CREATE:
+    Creates RapidIO message exchange channel data structure
+    with channel ID assigned automatically or as requested by a caller.
+- RIO_CM_CHAN_BIND:
+    Binds the specified channel data structure to the specified
+    mport device.
+- RIO_CM_CHAN_LISTEN:
+    Enables listening for connection requests on the specified
+    channel.
+- RIO_CM_CHAN_ACCEPT:
+    Accepts a connection request from peer on the specified
+    channel. If wait timeout for this request is specified by a caller it is
+    a blocking call. If timeout set to 0 this is non-blocking call - ioctl
+    handler checks for a pending connection request and if one is not available
+    exits with -EGAIN error status immediately.
+- RIO_CM_CHAN_CONNECT:
+    Sends a connection request to a remote peer/channel.
+- RIO_CM_CHAN_SEND:
+    Sends a data message through the specified channel.
+    The handler for this request assumes that message buffer specified by
+    a caller includes the reserved space for a packet header required by
+    this driver.
+- RIO_CM_CHAN_RECEIVE:
+    Receives a data message through a connected channel.
+    If the channel does not have an incoming message ready to return this ioctl
+    handler will wait for new message until timeout specified by a caller
+    expires. If timeout value is set to 0, ioctl handler uses a default value
+    defined by MAX_SCHEDULE_TIMEOUT.
+- RIO_CM_CHAN_CLOSE:
+    Closes a specified channel and frees associated buffers.
+    If the specified channel is in the CONNECTED state, sends close notification
+    to the remote peer.
+
+The ioctl command codes and corresponding data structures intended for use by
+user-space applications are defined in 'include/uapi/linux/rio_cm_cdev.h'.
+
+2. Hardware Compatibility
+=========================
+
+This device driver uses standard interfaces defined by kernel RapidIO subsystem
+and therefore it can be used with any mport device driver registered by RapidIO
+subsystem with limitations set by available mport HW implementation of messaging
+mailboxes.
+
+3. Module parameters
+====================
+
+- 'dbg_level'
+      - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        bit masks that correspond to the specific functional block.
+        For mask definitions see 'drivers/rapidio/devices/rio_cm.c'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+- 'cmbox'
+      - Number of RapidIO mailbox to use (default value is 1).
+        This parameter allows to set messaging mailbox number that will be used
+        within entire RapidIO network. It can be used when default mailbox is
+        used by other device drivers or is not supported by some nodes in the
+        RapidIO network.
+
+- 'chstart'
+      - Start channel number for dynamic assignment. Default value - 256.
+        Allows to exclude channel numbers below this parameter from dynamic
+        allocation to avoid conflicts with software components that use
+        reserved predefined channel numbers.
+
+4. Known problems
+=================
+
+  None.
+
+5. User-space Applications and API Library
+==========================================
+
+Messaging API library and applications that use this device driver are available
+from RapidIO.org.
+
+6. TODO List
+============
+
+- Add support for system notification messages (reserved channel 0).
diff --git a/Documentation/driver-api/rapidio/sysfs.rst b/Documentation/driver-api/rapidio/sysfs.rst
new file mode 100644
index 000000000000..540f72683496
--- /dev/null
+++ b/Documentation/driver-api/rapidio/sysfs.rst
@@ -0,0 +1,7 @@
+=============
+Sysfs entries
+=============
+
+The RapidIO sysfs files have moved to:
+Documentation/ABI/testing/sysfs-bus-rapidio and
+Documentation/ABI/testing/sysfs-class-rapidio
diff --git a/Documentation/driver-api/rapidio/tsi721.rst b/Documentation/driver-api/rapidio/tsi721.rst
new file mode 100644
index 000000000000..42aea438cd20
--- /dev/null
+++ b/Documentation/driver-api/rapidio/tsi721.rst
@@ -0,0 +1,112 @@
+=========================================================================
+RapidIO subsystem mport driver for IDT Tsi721 PCI Express-to-SRIO bridge.
+=========================================================================
+
+1. Overview
+===========
+
+This driver implements all currently defined RapidIO mport callback functions.
+It supports maintenance read and write operations, inbound and outbound RapidIO
+doorbells, inbound maintenance port-writes and RapidIO messaging.
+
+To generate SRIO maintenance transactions this driver uses one of Tsi721 DMA
+channels. This mechanism provides access to larger range of hop counts and
+destination IDs without need for changes in outbound window translation.
+
+RapidIO messaging support uses dedicated messaging channels for each mailbox.
+For inbound messages this driver uses destination ID matching to forward messages
+into the corresponding message queue. Messaging callbacks are implemented to be
+fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
+
+1. Module parameters:
+
+- 'dbg_level'
+      - This parameter allows to control amount of debug information
+        generated by this device driver. This parameter is formed by set of
+        This parameter can be changed bit masks that correspond to the specific
+        functional block.
+        For mask definitions see 'drivers/rapidio/devices/tsi721.h'
+        This parameter can be changed dynamically.
+        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
+
+- 'dma_desc_per_channel'
+      - This parameter defines number of hardware buffer
+        descriptors allocated for each registered Tsi721 DMA channel.
+        Its default value is 128.
+
+- 'dma_txqueue_sz'
+      - DMA transactions queue size. Defines number of pending
+        transaction requests that can be accepted by each DMA channel.
+        Default value is 16.
+
+- 'dma_sel'
+      - DMA channel selection mask. Bitmask that defines which hardware
+        DMA channels (0 ... 6) will be registered with DmaEngine core.
+        If bit is set to 1, the corresponding DMA channel will be registered.
+        DMA channels not selected by this mask will not be used by this device
+        driver. Default value is 0x7f (use all channels).
+
+- 'pcie_mrrs'
+      - override value for PCIe Maximum Read Request Size (MRRS).
+        This parameter gives an ability to override MRRS value set during PCIe
+        configuration process. Tsi721 supports read request sizes up to 4096B.
+        Value for this parameter must be set as defined by PCIe specification:
+        0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
+        Default value is '-1' (= keep platform setting).
+
+- 'mbox_sel'
+      - RIO messaging MBOX selection mask. This is a bitmask that defines
+        messaging MBOXes are managed by this device driver. Mask bits 0 - 3
+        correspond to MBOX0 - MBOX3. MBOX is under driver's control if the
+        corresponding bit is set to '1'. Default value is 0x0f (= all).
+
+2. Known problems
+=================
+
+  None.
+
+3. DMA Engine Support
+=====================
+
+Tsi721 mport driver supports DMA data transfers between local system memory and
+remote RapidIO devices. This functionality is implemented according to SLAVE
+mode API defined by common Linux kernel DMA Engine framework.
+
+Depending on system requirements RapidIO DMA operations can be included/excluded
+by setting CONFIG_RAPIDIO_DMA_ENGINE option. Tsi721 miniport driver uses seven
+out of eight available BDMA channels to support DMA data transfers.
+One BDMA channel is reserved for generation of maintenance read/write requests.
+
+If Tsi721 mport driver have been built with RAPIDIO_DMA_ENGINE support included,
+this driver will accept DMA-specific module parameter:
+
+  "dma_desc_per_channel"
+			 - defines number of hardware buffer descriptors used by
+                           each BDMA channel of Tsi721 (by default - 128).
+
+4. Version History
+
+  =====   ====================================================================
+  1.1.0   DMA operations re-worked to support data scatter/gather lists larger
+          than hardware buffer descriptors ring.
+  1.0.0   Initial driver release.
+  =====   ====================================================================
+
+5.  License
+===========
+
+  Copyright(c) 2011 Integrated Device Technology, Inc. All rights reserved.
+
+  This program is free software; you can redistribute it and/or modify it
+  under the terms of the GNU General Public License as published by the Free
+  Software Foundation; either version 2 of the License, or (at your option)
+  any later version.
+
+  This program is distributed in the hope that it will be useful, but WITHOUT
+  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+  more details.
+
+  You should have received a copy of the GNU General Public License along with
+  this program; if not, write to the Free Software Foundation, Inc.,
+  59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
diff --git a/Documentation/rapidio/index.rst b/Documentation/rapidio/index.rst
deleted file mode 100644
index ab7b5541b346..000000000000
--- a/Documentation/rapidio/index.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-:orphan:
-
-===========================
-The Linux RapidIO Subsystem
-===========================
-
-.. toctree::
-   :maxdepth: 1
-
-   rapidio
-   sysfs
-
-   tsi721
-   mport_cdev
-   rio_cm
diff --git a/Documentation/rapidio/mport_cdev.rst b/Documentation/rapidio/mport_cdev.rst
deleted file mode 100644
index df77a7f7be7d..000000000000
--- a/Documentation/rapidio/mport_cdev.rst
+++ /dev/null
@@ -1,110 +0,0 @@
-==================================================================
-RapidIO subsystem mport character device driver (rio_mport_cdev.c)
-==================================================================
-
-1. Overview
-===========
-
-This device driver is the result of collaboration within the RapidIO.org
-Software Task Group (STG) between Texas Instruments, Freescale,
-Prodrive Technologies, Nokia Networks, BAE and IDT.  Additional input was
-received from other members of RapidIO.org. The objective was to create a
-character mode driver interface which exposes the capabilities of RapidIO
-devices directly to applications, in a manner that allows the numerous and
-varied RapidIO implementations to interoperate.
-
-This driver (MPORT_CDEV) provides access to basic RapidIO subsystem operations
-for user-space applications. Most of RapidIO operations are supported through
-'ioctl' system calls.
-
-When loaded this device driver creates filesystem nodes named rio_mportX in /dev
-directory for each registered RapidIO mport device. 'X' in the node name matches
-to unique port ID assigned to each local mport device.
-
-Using available set of ioctl commands user-space applications can perform
-following RapidIO bus and subsystem operations:
-
-- Reads and writes from/to configuration registers of mport devices
-  (RIO_MPORT_MAINT_READ_LOCAL/RIO_MPORT_MAINT_WRITE_LOCAL)
-- Reads and writes from/to configuration registers of remote RapidIO devices.
-  This operations are defined as RapidIO Maintenance reads/writes in RIO spec.
-  (RIO_MPORT_MAINT_READ_REMOTE/RIO_MPORT_MAINT_WRITE_REMOTE)
-- Set RapidIO Destination ID for mport devices (RIO_MPORT_MAINT_HDID_SET)
-- Set RapidIO Component Tag for mport devices (RIO_MPORT_MAINT_COMPTAG_SET)
-- Query logical index of mport devices (RIO_MPORT_MAINT_PORT_IDX_GET)
-- Query capabilities and RapidIO link configuration of mport devices
-  (RIO_MPORT_GET_PROPERTIES)
-- Enable/Disable reporting of RapidIO doorbell events to user-space applications
-  (RIO_ENABLE_DOORBELL_RANGE/RIO_DISABLE_DOORBELL_RANGE)
-- Enable/Disable reporting of RIO port-write events to user-space applications
-  (RIO_ENABLE_PORTWRITE_RANGE/RIO_DISABLE_PORTWRITE_RANGE)
-- Query/Control type of events reported through this driver: doorbells,
-  port-writes or both (RIO_SET_EVENT_MASK/RIO_GET_EVENT_MASK)
-- Configure/Map mport's outbound requests window(s) for specific size,
-  RapidIO destination ID, hopcount and request type
-  (RIO_MAP_OUTBOUND/RIO_UNMAP_OUTBOUND)
-- Configure/Map mport's inbound requests window(s) for specific size,
-  RapidIO base address and local memory base address
-  (RIO_MAP_INBOUND/RIO_UNMAP_INBOUND)
-- Allocate/Free contiguous DMA coherent memory buffer for DMA data transfers
-  to/from remote RapidIO devices (RIO_ALLOC_DMA/RIO_FREE_DMA)
-- Initiate DMA data transfers to/from remote RapidIO devices (RIO_TRANSFER).
-  Supports blocking, asynchronous and posted (a.k.a 'fire-and-forget') data
-  transfer modes.
-- Check/Wait for completion of asynchronous DMA data transfer
-  (RIO_WAIT_FOR_ASYNC)
-- Manage device objects supported by RapidIO subsystem (RIO_DEV_ADD/RIO_DEV_DEL).
-  This allows implementation of various RapidIO fabric enumeration algorithms
-  as user-space applications while using remaining functionality provided by
-  kernel RapidIO subsystem.
-
-2. Hardware Compatibility
-=========================
-
-This device driver uses standard interfaces defined by kernel RapidIO subsystem
-and therefore it can be used with any mport device driver registered by RapidIO
-subsystem with limitations set by available mport implementation.
-
-At this moment the most common limitation is availability of RapidIO-specific
-DMA engine framework for specific mport device. Users should verify available
-functionality of their platform when planning to use this driver:
-
-- IDT Tsi721 PCIe-to-RapidIO bridge device and its mport device driver are fully
-  compatible with this driver.
-- Freescale SoCs 'fsl_rio' mport driver does not have implementation for RapidIO
-  specific DMA engine support and therefore DMA data transfers mport_cdev driver
-  are not available.
-
-3. Module parameters
-====================
-
-- 'dma_timeout'
-      - DMA transfer completion timeout (in msec, default value 3000).
-        This parameter set a maximum completion wait time for SYNC mode DMA
-        transfer requests and for RIO_WAIT_FOR_ASYNC ioctl requests.
-
-- 'dbg_level'
-      - This parameter allows to control amount of debug information
-        generated by this device driver. This parameter is formed by set of
-        bit masks that correspond to the specific functional blocks.
-        For mask definitions see 'drivers/rapidio/devices/rio_mport_cdev.c'
-        This parameter can be changed dynamically.
-        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
-
-4. Known problems
-=================
-
-  None.
-
-5. User-space Applications and API
-==================================
-
-API library and applications that use this device driver are available from
-RapidIO.org.
-
-6. TODO List
-============
-
-- Add support for sending/receiving "raw" RapidIO messaging packets.
-- Add memory mapped DMA data transfers as an option when RapidIO-specific DMA
-  is not available.
diff --git a/Documentation/rapidio/rapidio.rst b/Documentation/rapidio/rapidio.rst
deleted file mode 100644
index fb8942d3ba85..000000000000
--- a/Documentation/rapidio/rapidio.rst
+++ /dev/null
@@ -1,362 +0,0 @@
-============
-Introduction
-============
-
-The RapidIO standard is a packet-based fabric interconnect standard designed for
-use in embedded systems. Development of the RapidIO standard is directed by the
-RapidIO Trade Association (RTA). The current version of the RapidIO specification
-is publicly available for download from the RTA web-site [1].
-
-This document describes the basics of the Linux RapidIO subsystem and provides
-information on its major components.
-
-1 Overview
-==========
-
-Because the RapidIO subsystem follows the Linux device model it is integrated
-into the kernel similarly to other buses by defining RapidIO-specific device and
-bus types and registering them within the device model.
-
-The Linux RapidIO subsystem is architecture independent and therefore defines
-architecture-specific interfaces that provide support for common RapidIO
-subsystem operations.
-
-2. Core Components
-==================
-
-A typical RapidIO network is a combination of endpoints and switches.
-Each of these components is represented in the subsystem by an associated data
-structure. The core logical components of the RapidIO subsystem are defined
-in include/linux/rio.h file.
-
-2.1 Master Port
----------------
-
-A master port (or mport) is a RapidIO interface controller that is local to the
-processor executing the Linux code. A master port generates and receives RapidIO
-packets (transactions). In the RapidIO subsystem each master port is represented
-by a rio_mport data structure. This structure contains master port specific
-resources such as mailboxes and doorbells. The rio_mport also includes a unique
-host device ID that is valid when a master port is configured as an enumerating
-host.
-
-RapidIO master ports are serviced by subsystem specific mport device drivers
-that provide functionality defined for this subsystem. To provide a hardware
-independent interface for RapidIO subsystem operations, rio_mport structure
-includes rio_ops data structure which contains pointers to hardware specific
-implementations of RapidIO functions.
-
-2.2 Device
-----------
-
-A RapidIO device is any endpoint (other than mport) or switch in the network.
-All devices are presented in the RapidIO subsystem by corresponding rio_dev data
-structure. Devices form one global device list and per-network device lists
-(depending on number of available mports and networks).
-
-2.3 Switch
-----------
-
-A RapidIO switch is a special class of device that routes packets between its
-ports towards their final destination. The packet destination port within a
-switch is defined by an internal routing table. A switch is presented in the
-RapidIO subsystem by rio_dev data structure expanded by additional rio_switch
-data structure, which contains switch specific information such as copy of the
-routing table and pointers to switch specific functions.
-
-The RapidIO subsystem defines the format and initialization method for subsystem
-specific switch drivers that are designed to provide hardware-specific
-implementation of common switch management routines.
-
-2.4 Network
------------
-
-A RapidIO network is a combination of interconnected endpoint and switch devices.
-Each RapidIO network known to the system is represented by corresponding rio_net
-data structure. This structure includes lists of all devices and local master
-ports that form the same network. It also contains a pointer to the default
-master port that is used to communicate with devices within the network.
-
-2.5 Device Drivers
-------------------
-
-RapidIO device-specific drivers follow Linux Kernel Driver Model and are
-intended to support specific RapidIO devices attached to the RapidIO network.
-
-2.6 Subsystem Interfaces
-------------------------
-
-RapidIO interconnect specification defines features that may be used to provide
-one or more common service layers for all participating RapidIO devices. These
-common services may act separately from device-specific drivers or be used by
-device-specific drivers. Example of such service provider is the RIONET driver
-which implements Ethernet-over-RapidIO interface. Because only one driver can be
-registered for a device, all common RapidIO services have to be registered as
-subsystem interfaces. This allows to have multiple common services attached to
-the same device without blocking attachment of a device-specific driver.
-
-3. Subsystem Initialization
-===========================
-
-In order to initialize the RapidIO subsystem, a platform must initialize and
-register at least one master port within the RapidIO network. To register mport
-within the subsystem controller driver's initialization code calls function
-rio_register_mport() for each available master port.
-
-After all active master ports are registered with a RapidIO subsystem,
-an enumeration and/or discovery routine may be called automatically or
-by user-space command.
-
-RapidIO subsystem can be configured to be built as a statically linked or
-modular component of the kernel (see details below).
-
-4. Enumeration and Discovery
-============================
-
-4.1 Overview
-------------
-
-RapidIO subsystem configuration options allow users to build enumeration and
-discovery methods as statically linked components or loadable modules.
-An enumeration/discovery method implementation and available input parameters
-define how any given method can be attached to available RapidIO mports:
-simply to all available mports OR individually to the specified mport device.
-
-Depending on selected enumeration/discovery build configuration, there are
-several methods to initiate an enumeration and/or discovery process:
-
-  (a) Statically linked enumeration and discovery process can be started
-  automatically during kernel initialization time using corresponding module
-  parameters. This was the original method used since introduction of RapidIO
-  subsystem. Now this method relies on enumerator module parameter which is
-  'rio-scan.scan' for existing basic enumeration/discovery method.
-  When automatic start of enumeration/discovery is used a user has to ensure
-  that all discovering endpoints are started before the enumerating endpoint
-  and are waiting for enumeration to be completed.
-  Configuration option CONFIG_RAPIDIO_DISC_TIMEOUT defines time that discovering
-  endpoint waits for enumeration to be completed. If the specified timeout
-  expires the discovery process is terminated without obtaining RapidIO network
-  information. NOTE: a timed out discovery process may be restarted later using
-  a user-space command as it is described below (if the given endpoint was
-  enumerated successfully).
-
-  (b) Statically linked enumeration and discovery process can be started by
-  a command from user space. This initiation method provides more flexibility
-  for a system startup compared to the option (a) above. After all participating
-  endpoints have been successfully booted, an enumeration process shall be
-  started first by issuing a user-space command, after an enumeration is
-  completed a discovery process can be started on all remaining endpoints.
-
-  (c) Modular enumeration and discovery process can be started by a command from
-  user space. After an enumeration/discovery module is loaded, a network scan
-  process can be started by issuing a user-space command.
-  Similar to the option (b) above, an enumerator has to be started first.
-
-  (d) Modular enumeration and discovery process can be started by a module
-  initialization routine. In this case an enumerating module shall be loaded
-  first.
-
-When a network scan process is started it calls an enumeration or discovery
-routine depending on the configured role of a master port: host or agent.
-
-Enumeration is performed by a master port if it is configured as a host port by
-assigning a host destination ID greater than or equal to zero. The host
-destination ID can be assigned to a master port using various methods depending
-on RapidIO subsystem build configuration:
-
-  (a) For a statically linked RapidIO subsystem core use command line parameter
-  "rapidio.hdid=" with a list of destination ID assignments in order of mport
-  device registration. For example, in a system with two RapidIO controllers
-  the command line parameter "rapidio.hdid=-1,7" will result in assignment of
-  the host destination ID=7 to the second RapidIO controller, while the first
-  one will be assigned destination ID=-1.
-
-  (b) If the RapidIO subsystem core is built as a loadable module, in addition
-  to the method shown above, the host destination ID(s) can be specified using
-  traditional methods of passing module parameter "hdid=" during its loading:
-
-  - from command line: "modprobe rapidio hdid=-1,7", or
-  - from modprobe configuration file using configuration command "options",
-    like in this example: "options rapidio hdid=-1,7". An example of modprobe
-    configuration file is provided in the section below.
-
-NOTES:
-  (i) if "hdid=" parameter is omitted all available mport will be assigned
-  destination ID = -1;
-
-  (ii) the "hdid=" parameter in systems with multiple mports can have
-  destination ID assignments omitted from the end of list (default = -1).
-
-If the host device ID for a specific master port is set to -1, the discovery
-process will be performed for it.
-
-The enumeration and discovery routines use RapidIO maintenance transactions
-to access the configuration space of devices.
-
-NOTE: If RapidIO switch-specific device drivers are built as loadable modules
-they must be loaded before enumeration/discovery process starts.
-This requirement is cased by the fact that enumeration/discovery methods invoke
-vendor-specific callbacks on early stages.
-
-4.2 Automatic Start of Enumeration and Discovery
-------------------------------------------------
-
-Automatic enumeration/discovery start method is applicable only to built-in
-enumeration/discovery RapidIO configuration selection. To enable automatic
-enumeration/discovery start by existing basic enumerator method set use boot
-command line parameter "rio-scan.scan=1".
-
-This configuration requires synchronized start of all RapidIO endpoints that
-form a network which will be enumerated/discovered. Discovering endpoints have
-to be started before an enumeration starts to ensure that all RapidIO
-controllers have been initialized and are ready to be discovered. Configuration
-parameter CONFIG_RAPIDIO_DISC_TIMEOUT defines time (in seconds) which
-a discovering endpoint will wait for enumeration to be completed.
-
-When automatic enumeration/discovery start is selected, basic method's
-initialization routine calls rio_init_mports() to perform enumeration or
-discovery for all known mport devices.
-
-Depending on RapidIO network size and configuration this automatic
-enumeration/discovery start method may be difficult to use due to the
-requirement for synchronized start of all endpoints.
-
-4.3 User-space Start of Enumeration and Discovery
--------------------------------------------------
-
-User-space start of enumeration and discovery can be used with built-in and
-modular build configurations. For user-space controlled start RapidIO subsystem
-creates the sysfs write-only attribute file '/sys/bus/rapidio/scan'. To initiate
-an enumeration or discovery process on specific mport device, a user needs to
-write mport_ID (not RapidIO destination ID) into that file. The mport_ID is a
-sequential number (0 ... RIO_MAX_MPORTS) assigned during mport device
-registration. For example for machine with single RapidIO controller, mport_ID
-for that controller always will be 0.
-
-To initiate RapidIO enumeration/discovery on all available mports a user may
-write '-1' (or RIO_MPORT_ANY) into the scan attribute file.
-
-4.4 Basic Enumeration Method
-----------------------------
-
-This is an original enumeration/discovery method which is available since
-first release of RapidIO subsystem code. The enumeration process is
-implemented according to the enumeration algorithm outlined in the RapidIO
-Interconnect Specification: Annex I [1].
-
-This method can be configured as statically linked or loadable module.
-The method's single parameter "scan" allows to trigger the enumeration/discovery
-process from module initialization routine.
-
-This enumeration/discovery method can be started only once and does not support
-unloading if it is built as a module.
-
-The enumeration process traverses the network using a recursive depth-first
-algorithm. When a new device is found, the enumerator takes ownership of that
-device by writing into the Host Device ID Lock CSR. It does this to ensure that
-the enumerator has exclusive right to enumerate the device. If device ownership
-is successfully acquired, the enumerator allocates a new rio_dev structure and
-initializes it according to device capabilities.
-
-If the device is an endpoint, a unique device ID is assigned to it and its value
-is written into the device's Base Device ID CSR.
-
-If the device is a switch, the enumerator allocates an additional rio_switch
-structure to store switch specific information. Then the switch's vendor ID and
-device ID are queried against a table of known RapidIO switches. Each switch
-table entry contains a pointer to a switch-specific initialization routine that
-initializes pointers to the rest of switch specific operations, and performs
-hardware initialization if necessary. A RapidIO switch does not have a unique
-device ID; it relies on hopcount and routing for device ID of an attached
-endpoint if access to its configuration registers is required. If a switch (or
-chain of switches) does not have any endpoint (except enumerator) attached to
-it, a fake device ID will be assigned to configure a route to that switch.
-In the case of a chain of switches without endpoint, one fake device ID is used
-to configure a route through the entire chain and switches are differentiated by
-their hopcount value.
-
-For both endpoints and switches the enumerator writes a unique component tag
-into device's Component Tag CSR. That unique value is used by the error
-management notification mechanism to identify a device that is reporting an
-error management event.
-
-Enumeration beyond a switch is completed by iterating over each active egress
-port of that switch. For each active link, a route to a default device ID
-(0xFF for 8-bit systems and 0xFFFF for 16-bit systems) is temporarily written
-into the routing table. The algorithm recurs by calling itself with hopcount + 1
-and the default device ID in order to access the device on the active port.
-
-After the host has completed enumeration of the entire network it releases
-devices by clearing device ID locks (calls rio_clear_locks()). For each endpoint
-in the system, it sets the Discovered bit in the Port General Control CSR
-to indicate that enumeration is completed and agents are allowed to execute
-passive discovery of the network.
-
-The discovery process is performed by agents and is similar to the enumeration
-process that is described above. However, the discovery process is performed
-without changes to the existing routing because agents only gather information
-about RapidIO network structure and are building an internal map of discovered
-devices. This way each Linux-based component of the RapidIO subsystem has
-a complete view of the network. The discovery process can be performed
-simultaneously by several agents. After initializing its RapidIO master port
-each agent waits for enumeration completion by the host for the configured wait
-time period. If this wait time period expires before enumeration is completed,
-an agent skips RapidIO discovery and continues with remaining kernel
-initialization.
-
-4.5 Adding New Enumeration/Discovery Method
--------------------------------------------
-
-RapidIO subsystem code organization allows addition of new enumeration/discovery
-methods as new configuration options without significant impact to the core
-RapidIO code.
-
-A new enumeration/discovery method has to be attached to one or more mport
-devices before an enumeration/discovery process can be started. Normally,
-method's module initialization routine calls rio_register_scan() to attach
-an enumerator to a specified mport device (or devices). The basic enumerator
-implementation demonstrates this process.
-
-4.6 Using Loadable RapidIO Switch Drivers
------------------------------------------
-
-In the case when RapidIO switch drivers are built as loadable modules a user
-must ensure that they are loaded before the enumeration/discovery starts.
-This process can be automated by specifying pre- or post- dependencies in the
-RapidIO-specific modprobe configuration file as shown in the example below.
-
-File /etc/modprobe.d/rapidio.conf::
-
-  # Configure RapidIO subsystem modules
-
-  # Set enumerator host destination ID (overrides kernel command line option)
-  options rapidio hdid=-1,2
-
-  # Load RapidIO switch drivers immediately after rapidio core module was loaded
-  softdep rapidio post: idt_gen2 idtcps tsi57x
-
-  # OR :
-
-  # Load RapidIO switch drivers just before rio-scan enumerator module is loaded
-  softdep rio-scan pre: idt_gen2 idtcps tsi57x
-
-  --------------------------
-
-NOTE:
-  In the example above, one of "softdep" commands must be removed or
-  commented out to keep required module loading sequence.
-
-5. References
-=============
-
-[1] RapidIO Trade Association. RapidIO Interconnect Specifications.
-    http://www.rapidio.org.
-
-[2] Rapidio TA. Technology Comparisons.
-    http://www.rapidio.org/education/technology_comparisons/
-
-[3] RapidIO support for Linux.
-    http://lwn.net/Articles/139118/
-
-[4] Matt Porter. RapidIO for Linux. Ottawa Linux Symposium, 2005
-    http://www.kernel.org/doc/ols/2005/ols2005v2-pages-43-56.pdf
diff --git a/Documentation/rapidio/rio_cm.rst b/Documentation/rapidio/rio_cm.rst
deleted file mode 100644
index 5294430a7a74..000000000000
--- a/Documentation/rapidio/rio_cm.rst
+++ /dev/null
@@ -1,135 +0,0 @@
-==========================================================================
-RapidIO subsystem Channelized Messaging character device driver (rio_cm.c)
-==========================================================================
-
-
-1. Overview
-===========
-
-This device driver is the result of collaboration within the RapidIO.org
-Software Task Group (STG) between Texas Instruments, Prodrive Technologies,
-Nokia Networks, BAE and IDT.  Additional input was received from other members
-of RapidIO.org.
-
-The objective was to create a character mode driver interface which exposes
-messaging capabilities of RapidIO endpoint devices (mports) directly
-to applications, in a manner that allows the numerous and varied RapidIO
-implementations to interoperate.
-
-This driver (RIO_CM) provides to user-space applications shared access to
-RapidIO mailbox messaging resources.
-
-RapidIO specification (Part 2) defines that endpoint devices may have up to four
-messaging mailboxes in case of multi-packet message (up to 4KB) and
-up to 64 mailboxes if single-packet messages (up to 256 B) are used. In addition
-to protocol definition limitations, a particular hardware implementation can
-have reduced number of messaging mailboxes.  RapidIO aware applications must
-therefore share the messaging resources of a RapidIO endpoint.
-
-Main purpose of this device driver is to provide RapidIO mailbox messaging
-capability to large number of user-space processes by introducing socket-like
-operations using a single messaging mailbox.  This allows applications to
-use the limited RapidIO messaging hardware resources efficiently.
-
-Most of device driver's operations are supported through 'ioctl' system calls.
-
-When loaded this device driver creates a single file system node named rio_cm
-in /dev directory common for all registered RapidIO mport devices.
-
-Following ioctl commands are available to user-space applications:
-
-- RIO_CM_MPORT_GET_LIST:
-    Returns to caller list of local mport devices that
-    support messaging operations (number of entries up to RIO_MAX_MPORTS).
-    Each list entry is combination of mport's index in the system and RapidIO
-    destination ID assigned to the port.
-- RIO_CM_EP_GET_LIST_SIZE:
-    Returns number of messaging capable remote endpoints
-    in a RapidIO network associated with the specified mport device.
-- RIO_CM_EP_GET_LIST:
-    Returns list of RapidIO destination IDs for messaging
-    capable remote endpoints (peers) available in a RapidIO network associated
-    with the specified mport device.
-- RIO_CM_CHAN_CREATE:
-    Creates RapidIO message exchange channel data structure
-    with channel ID assigned automatically or as requested by a caller.
-- RIO_CM_CHAN_BIND:
-    Binds the specified channel data structure to the specified
-    mport device.
-- RIO_CM_CHAN_LISTEN:
-    Enables listening for connection requests on the specified
-    channel.
-- RIO_CM_CHAN_ACCEPT:
-    Accepts a connection request from peer on the specified
-    channel. If wait timeout for this request is specified by a caller it is
-    a blocking call. If timeout set to 0 this is non-blocking call - ioctl
-    handler checks for a pending connection request and if one is not available
-    exits with -EGAIN error status immediately.
-- RIO_CM_CHAN_CONNECT:
-    Sends a connection request to a remote peer/channel.
-- RIO_CM_CHAN_SEND:
-    Sends a data message through the specified channel.
-    The handler for this request assumes that message buffer specified by
-    a caller includes the reserved space for a packet header required by
-    this driver.
-- RIO_CM_CHAN_RECEIVE:
-    Receives a data message through a connected channel.
-    If the channel does not have an incoming message ready to return this ioctl
-    handler will wait for new message until timeout specified by a caller
-    expires. If timeout value is set to 0, ioctl handler uses a default value
-    defined by MAX_SCHEDULE_TIMEOUT.
-- RIO_CM_CHAN_CLOSE:
-    Closes a specified channel and frees associated buffers.
-    If the specified channel is in the CONNECTED state, sends close notification
-    to the remote peer.
-
-The ioctl command codes and corresponding data structures intended for use by
-user-space applications are defined in 'include/uapi/linux/rio_cm_cdev.h'.
-
-2. Hardware Compatibility
-=========================
-
-This device driver uses standard interfaces defined by kernel RapidIO subsystem
-and therefore it can be used with any mport device driver registered by RapidIO
-subsystem with limitations set by available mport HW implementation of messaging
-mailboxes.
-
-3. Module parameters
-====================
-
-- 'dbg_level'
-      - This parameter allows to control amount of debug information
-        generated by this device driver. This parameter is formed by set of
-        bit masks that correspond to the specific functional block.
-        For mask definitions see 'drivers/rapidio/devices/rio_cm.c'
-        This parameter can be changed dynamically.
-        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
-
-- 'cmbox'
-      - Number of RapidIO mailbox to use (default value is 1).
-        This parameter allows to set messaging mailbox number that will be used
-        within entire RapidIO network. It can be used when default mailbox is
-        used by other device drivers or is not supported by some nodes in the
-        RapidIO network.
-
-- 'chstart'
-      - Start channel number for dynamic assignment. Default value - 256.
-        Allows to exclude channel numbers below this parameter from dynamic
-        allocation to avoid conflicts with software components that use
-        reserved predefined channel numbers.
-
-4. Known problems
-=================
-
-  None.
-
-5. User-space Applications and API Library
-==========================================
-
-Messaging API library and applications that use this device driver are available
-from RapidIO.org.
-
-6. TODO List
-============
-
-- Add support for system notification messages (reserved channel 0).
diff --git a/Documentation/rapidio/sysfs.rst b/Documentation/rapidio/sysfs.rst
deleted file mode 100644
index 540f72683496..000000000000
--- a/Documentation/rapidio/sysfs.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-=============
-Sysfs entries
-=============
-
-The RapidIO sysfs files have moved to:
-Documentation/ABI/testing/sysfs-bus-rapidio and
-Documentation/ABI/testing/sysfs-class-rapidio
diff --git a/Documentation/rapidio/tsi721.rst b/Documentation/rapidio/tsi721.rst
deleted file mode 100644
index 42aea438cd20..000000000000
--- a/Documentation/rapidio/tsi721.rst
+++ /dev/null
@@ -1,112 +0,0 @@
-=========================================================================
-RapidIO subsystem mport driver for IDT Tsi721 PCI Express-to-SRIO bridge.
-=========================================================================
-
-1. Overview
-===========
-
-This driver implements all currently defined RapidIO mport callback functions.
-It supports maintenance read and write operations, inbound and outbound RapidIO
-doorbells, inbound maintenance port-writes and RapidIO messaging.
-
-To generate SRIO maintenance transactions this driver uses one of Tsi721 DMA
-channels. This mechanism provides access to larger range of hop counts and
-destination IDs without need for changes in outbound window translation.
-
-RapidIO messaging support uses dedicated messaging channels for each mailbox.
-For inbound messages this driver uses destination ID matching to forward messages
-into the corresponding message queue. Messaging callbacks are implemented to be
-fully compatible with RIONET driver (Ethernet over RapidIO messaging services).
-
-1. Module parameters:
-
-- 'dbg_level'
-      - This parameter allows to control amount of debug information
-        generated by this device driver. This parameter is formed by set of
-        This parameter can be changed bit masks that correspond to the specific
-        functional block.
-        For mask definitions see 'drivers/rapidio/devices/tsi721.h'
-        This parameter can be changed dynamically.
-        Use CONFIG_RAPIDIO_DEBUG=y to enable debug output at the top level.
-
-- 'dma_desc_per_channel'
-      - This parameter defines number of hardware buffer
-        descriptors allocated for each registered Tsi721 DMA channel.
-        Its default value is 128.
-
-- 'dma_txqueue_sz'
-      - DMA transactions queue size. Defines number of pending
-        transaction requests that can be accepted by each DMA channel.
-        Default value is 16.
-
-- 'dma_sel'
-      - DMA channel selection mask. Bitmask that defines which hardware
-        DMA channels (0 ... 6) will be registered with DmaEngine core.
-        If bit is set to 1, the corresponding DMA channel will be registered.
-        DMA channels not selected by this mask will not be used by this device
-        driver. Default value is 0x7f (use all channels).
-
-- 'pcie_mrrs'
-      - override value for PCIe Maximum Read Request Size (MRRS).
-        This parameter gives an ability to override MRRS value set during PCIe
-        configuration process. Tsi721 supports read request sizes up to 4096B.
-        Value for this parameter must be set as defined by PCIe specification:
-        0 = 128B, 1 = 256B, 2 = 512B, 3 = 1024B, 4 = 2048B and 5 = 4096B.
-        Default value is '-1' (= keep platform setting).
-
-- 'mbox_sel'
-      - RIO messaging MBOX selection mask. This is a bitmask that defines
-        messaging MBOXes are managed by this device driver. Mask bits 0 - 3
-        correspond to MBOX0 - MBOX3. MBOX is under driver's control if the
-        corresponding bit is set to '1'. Default value is 0x0f (= all).
-
-2. Known problems
-=================
-
-  None.
-
-3. DMA Engine Support
-=====================
-
-Tsi721 mport driver supports DMA data transfers between local system memory and
-remote RapidIO devices. This functionality is implemented according to SLAVE
-mode API defined by common Linux kernel DMA Engine framework.
-
-Depending on system requirements RapidIO DMA operations can be included/excluded
-by setting CONFIG_RAPIDIO_DMA_ENGINE option. Tsi721 miniport driver uses seven
-out of eight available BDMA channels to support DMA data transfers.
-One BDMA channel is reserved for generation of maintenance read/write requests.
-
-If Tsi721 mport driver have been built with RAPIDIO_DMA_ENGINE support included,
-this driver will accept DMA-specific module parameter:
-
-  "dma_desc_per_channel"
-			 - defines number of hardware buffer descriptors used by
-                           each BDMA channel of Tsi721 (by default - 128).
-
-4. Version History
-
-  =====   ====================================================================
-  1.1.0   DMA operations re-worked to support data scatter/gather lists larger
-          than hardware buffer descriptors ring.
-  1.0.0   Initial driver release.
-  =====   ====================================================================
-
-5.  License
-===========
-
-  Copyright(c) 2011 Integrated Device Technology, Inc. All rights reserved.
-
-  This program is free software; you can redistribute it and/or modify it
-  under the terms of the GNU General Public License as published by the Free
-  Software Foundation; either version 2 of the License, or (at your option)
-  any later version.
-
-  This program is distributed in the hope that it will be useful, but WITHOUT
-  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-  more details.
-
-  You should have received a copy of the GNU General Public License along with
-  this program; if not, write to the Free Software Foundation, Inc.,
-  59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig
index 467e8fa06904..677d1aff61b7 100644
--- a/drivers/rapidio/Kconfig
+++ b/drivers/rapidio/Kconfig
@@ -86,7 +86,7 @@ config RAPIDIO_CHMAN
 	  This option includes RapidIO channelized messaging driver which
 	  provides socket-like interface to allow sharing of single RapidIO
 	  messaging mailbox between multiple user-space applications.
-	  See "Documentation/rapidio/rio_cm.rst" for driver description.
+	  See "Documentation/driver-api/rapidio/rio_cm.rst" for driver description.
 
 config RAPIDIO_MPORT_CDEV
 	tristate "RapidIO /dev mport device driver"
-- 
cgit v1.2.3-55-g7522


From 59809fe88224db24432ad50e62fd8d5f0df738a1 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:06:08 -0300
Subject: docs: perf: move to the admin-guide

The perf infrastructure is used for userspace to track issues.
At least a good part of what's described here is related to
it.

So, add it to the admin-guide.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/index.rst              |  1 +
 Documentation/admin-guide/perf/arm-ccn.rst       | 61 ++++++++++++++++++++++++
 Documentation/admin-guide/perf/arm_dsu_pmu.rst   | 29 +++++++++++
 Documentation/admin-guide/perf/hisi-pmu.rst      | 60 +++++++++++++++++++++++
 Documentation/admin-guide/perf/index.rst         | 14 ++++++
 Documentation/admin-guide/perf/qcom_l2_pmu.rst   | 39 +++++++++++++++
 Documentation/admin-guide/perf/qcom_l3_pmu.rst   | 26 ++++++++++
 Documentation/admin-guide/perf/thunderx2-pmu.rst | 42 ++++++++++++++++
 Documentation/admin-guide/perf/xgene-pmu.rst     | 49 +++++++++++++++++++
 Documentation/perf/arm-ccn.rst                   | 61 ------------------------
 Documentation/perf/arm_dsu_pmu.rst               | 29 -----------
 Documentation/perf/hisi-pmu.rst                  | 60 -----------------------
 Documentation/perf/index.rst                     | 16 -------
 Documentation/perf/qcom_l2_pmu.rst               | 39 ---------------
 Documentation/perf/qcom_l3_pmu.rst               | 26 ----------
 Documentation/perf/thunderx2-pmu.rst             | 42 ----------------
 Documentation/perf/xgene-pmu.rst                 | 49 -------------------
 MAINTAINERS                                      |  4 +-
 drivers/perf/qcom_l3_pmu.c                       |  2 +-
 19 files changed, 324 insertions(+), 325 deletions(-)
 create mode 100644 Documentation/admin-guide/perf/arm-ccn.rst
 create mode 100644 Documentation/admin-guide/perf/arm_dsu_pmu.rst
 create mode 100644 Documentation/admin-guide/perf/hisi-pmu.rst
 create mode 100644 Documentation/admin-guide/perf/index.rst
 create mode 100644 Documentation/admin-guide/perf/qcom_l2_pmu.rst
 create mode 100644 Documentation/admin-guide/perf/qcom_l3_pmu.rst
 create mode 100644 Documentation/admin-guide/perf/thunderx2-pmu.rst
 create mode 100644 Documentation/admin-guide/perf/xgene-pmu.rst
 delete mode 100644 Documentation/perf/arm-ccn.rst
 delete mode 100644 Documentation/perf/arm_dsu_pmu.rst
 delete mode 100644 Documentation/perf/hisi-pmu.rst
 delete mode 100644 Documentation/perf/index.rst
 delete mode 100644 Documentation/perf/qcom_l2_pmu.rst
 delete mode 100644 Documentation/perf/qcom_l3_pmu.rst
 delete mode 100644 Documentation/perf/thunderx2-pmu.rst
 delete mode 100644 Documentation/perf/xgene-pmu.rst

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 8853c95ef0d4..f40c4b5a181b 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -38,6 +38,7 @@ problems and bugs in particular.
    ramoops
    dynamic-debug-howto
    init
+   perf/index
 
 This is the beginning of a section with information of interest to
 application developers.  Documents covering various aspects of the kernel
diff --git a/Documentation/admin-guide/perf/arm-ccn.rst b/Documentation/admin-guide/perf/arm-ccn.rst
new file mode 100644
index 000000000000..832b0c64023a
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm-ccn.rst
@@ -0,0 +1,61 @@
+==========================
+ARM Cache Coherent Network
+==========================
+
+CCN-504 is a ring-bus interconnect consisting of 11 crosspoints
+(XPs), with each crosspoint supporting up to two device ports,
+so nodes (devices) 0 and 1 are connected to crosspoint 0,
+nodes 2 and 3 to crosspoint 1 etc.
+
+PMU (perf) driver
+-----------------
+
+The CCN driver registers a perf PMU driver, which provides
+description of available events and configuration options
+in sysfs, see /sys/bus/event_source/devices/ccn*.
+
+The "format" directory describes format of the config, config1
+and config2 fields of the perf_event_attr structure. The "events"
+directory provides configuration templates for all documented
+events, that can be used with perf tool. For example "xp_valid_flit"
+is an equivalent of "type=0x8,event=0x4". Other parameters must be
+explicitly specified.
+
+For events originating from device, "node" defines its index.
+
+Crosspoint PMU events require "xp" (index), "bus" (bus number)
+and "vc" (virtual channel ID).
+
+Crosspoint watchpoint-based events (special "event" value 0xfe)
+require "xp" and "vc" as as above plus "port" (device port index),
+"dir" (transmit/receive direction), comparator values ("cmp_l"
+and "cmp_h") and "mask", being index of the comparator mask.
+
+Masks are defined separately from the event description
+(due to limited number of the config values) in the "cmp_mask"
+directory, with first 8 configurable by user and additional
+4 hardcoded for the most frequent use cases.
+
+Cycle counter is described by a "type" value 0xff and does
+not require any other settings.
+
+The driver also provides a "cpumask" sysfs attribute, which contains
+a single CPU ID, of the processor which will be used to handle all
+the CCN PMU events. It is recommended that the user space tools
+request the events on this processor (if not, the perf_event->cpu value
+will be overwritten anyway). In case of this processor being offlined,
+the events are migrated to another one and the attribute is updated.
+
+Example of perf tool use::
+
+  / # perf list | grep ccn
+    ccn/cycles/                                        [Kernel PMU event]
+  <...>
+    ccn/xp_valid_flit,xp=?,port=?,vc=?,dir=?/          [Kernel PMU event]
+  <...>
+
+  / # perf stat -a -e ccn/cycles/,ccn/xp_valid_flit,xp=1,port=0,vc=1,dir=1/ \
+                                                                         sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/arm_dsu_pmu.rst b/Documentation/admin-guide/perf/arm_dsu_pmu.rst
new file mode 100644
index 000000000000..7fd34db75d13
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm_dsu_pmu.rst
@@ -0,0 +1,29 @@
+==================================
+ARM DynamIQ Shared Unit (DSU) PMU
+==================================
+
+ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
+control logic and external interfaces to form a multicore cluster. The PMU
+allows counting the various events related to the L3 cache, Snoop Control Unit
+etc, using 32bit independent counters. It also provides a 64bit cycle counter.
+
+The PMU can only be accessed via CPU system registers and are common to the
+cores connected to the same DSU. Like most of the other uncore PMUs, DSU
+PMU doesn't support process specific events and cannot be used in sampling mode.
+
+The DSU provides a bitmap for a subset of implemented events via hardware
+registers. There is no way for the driver to determine if the other events
+are available or not. Hence the driver exposes only those events advertised
+by the DSU, in "events" directory under::
+
+  /sys/bus/event_sources/devices/arm_dsu_<N>/
+
+The user should refer to the TRM of the product to figure out the supported events
+and use the raw event code for the unlisted events.
+
+The driver also exposes the CPUs connected to the DSU instance in "associated_cpus".
+
+
+e.g usage::
+
+	perf stat -a -e arm_dsu_0/cycles/
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
new file mode 100644
index 000000000000..404a5c3d9d00
--- /dev/null
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -0,0 +1,60 @@
+======================================================
+HiSilicon SoC uncore Performance Monitoring Unit (PMU)
+======================================================
+
+The HiSilicon SoC chip includes various independent system device PMUs
+such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
+independent and have hardware logic to gather statistics and performance
+information.
+
+The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
+(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
+called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
+two HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
+
+HiSilicon SoC uncore PMU driver
+-------------------------------
+
+Each device PMU has separate registers for event counting, control and
+interrupt, and the PMU driver shall register perf PMU drivers like L3C,
+HHA and DDRC etc. The available events and configuration options shall
+be described in the sysfs, see:
+
+/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
+/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
+The "perf list" command shall list the available events from sysfs.
+
+Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
+name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
+where "sccl-id" is the identifier of the SCCL and "index-id" is the index of
+module.
+
+e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
+SCCL ID #3.
+
+e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
+SCCL ID #1.
+
+The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
+ID used to count the uncore PMU event.
+
+Example usage of perf::
+
+  $# perf list
+  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
+  ------------------------------------------
+
+  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
+  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
+
+The current driver does not support sampling. So "perf record" is unsupported.
+Also attach to a task is unsupported as the events are all uncore.
+
+Note: Please contact the maintainer for a complete list of events supported for
+the PMU devices in the SoC and its information if needed.
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
new file mode 100644
index 000000000000..9d445451ea18
--- /dev/null
+++ b/Documentation/admin-guide/perf/index.rst
@@ -0,0 +1,14 @@
+===========================
+Performance monitor support
+===========================
+
+.. toctree::
+   :maxdepth: 1
+
+   hisi-pmu
+   qcom_l2_pmu
+   qcom_l3_pmu
+   arm-ccn
+   xgene-pmu
+   arm_dsu_pmu
+   thunderx2-pmu
diff --git a/Documentation/admin-guide/perf/qcom_l2_pmu.rst b/Documentation/admin-guide/perf/qcom_l2_pmu.rst
new file mode 100644
index 000000000000..c130178a4a55
--- /dev/null
+++ b/Documentation/admin-guide/perf/qcom_l2_pmu.rst
@@ -0,0 +1,39 @@
+=====================================================================
+Qualcomm Technologies Level-2 Cache Performance Monitoring Unit (PMU)
+=====================================================================
+
+This driver supports the L2 cache clusters found in Qualcomm Technologies
+Centriq SoCs. There are multiple physical L2 cache clusters, each with their
+own PMU. Each cluster has one or more CPUs associated with it.
+
+There is one logical L2 PMU exposed, which aggregates the results from
+the physical PMUs.
+
+The driver provides a description of its available events and configuration
+options in sysfs, see /sys/devices/l2cache_0.
+
+The "format" directory describes the format of the events.
+
+Events can be envisioned as a 2-dimensional array. Each column represents
+a group of events. There are 8 groups. Only one entry from each
+group can be in use at a time. If multiple events from the same group
+are specified, the conflicting events cannot be counted at the same time.
+
+Events are specified as 0xCCG, where CC is 2 hex digits specifying
+the code (array row) and G specifies the group (column) 0-7.
+
+In addition there is a cycle counter event specified by the value 0xFE
+which is outside the above scheme.
+
+The driver provides a "cpumask" sysfs attribute which contains a mask
+consisting of one CPU per cluster which will be used to handle all the PMU
+events on that cluster.
+
+Examples for use with perf::
+
+  perf stat -e l2cache_0/config=0x001/,l2cache_0/config=0x042/ -a sleep 1
+
+  perf stat -e l2cache_0/config=0xfe/ -C 2 sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/qcom_l3_pmu.rst b/Documentation/admin-guide/perf/qcom_l3_pmu.rst
new file mode 100644
index 000000000000..a3d014a46bfd
--- /dev/null
+++ b/Documentation/admin-guide/perf/qcom_l3_pmu.rst
@@ -0,0 +1,26 @@
+===========================================================================
+Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
+===========================================================================
+
+This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
+Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
+by all cores within a socket. Each slice is exposed as a separate uncore perf
+PMU with device name l3cache_<socket>_<instance>. User space is responsible
+for aggregating across slices.
+
+The driver provides a description of its available events and configuration
+options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
+the driver also exposes a "cpumask" sysfs attribute which contains a mask
+consisting of one CPU per socket which will be used to handle all the PMU
+events on that socket.
+
+The hardware implements 32bit event counters and has a flat 8bit event space
+exposed via the "event" format attribute. In addition to the 32bit physical
+counters the driver supports virtual 64bit hardware counters by using hardware
+counter chaining. This feature is exposed via the "lc" (long counter) format
+flag. E.g.::
+
+  perf stat -e l3cache_0_0/read-miss,lc/
+
+Given that these are uncore PMUs the driver does not support sampling, therefore
+"perf record" will not work. Per-task perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/thunderx2-pmu.rst b/Documentation/admin-guide/perf/thunderx2-pmu.rst
new file mode 100644
index 000000000000..08e33675853a
--- /dev/null
+++ b/Documentation/admin-guide/perf/thunderx2-pmu.rst
@@ -0,0 +1,42 @@
+=============================================================
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+=============================================================
+
+The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
+PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
+
+The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
+Events are counted for the default channel (i.e. channel 0) and prorated
+to the total number of channels/tiles.
+
+The DMC and L3C support up to 4 counters. Counters are independently
+programmable and can be started and stopped individually. Each counter
+can be set to a different event. Counters are 32-bit and do not support
+an overflow interrupt; they are read every 2 seconds.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
+L3C devices.  Each PMU can be used to count up to 4 events
+simultaneously. The PMUs provide a description of their available events
+and configuration options under sysfs, see
+/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
+
+The driver does not support sampling, therefore "perf record" will not
+work. Per-task perf sessions are also not supported.
+
+Examples::
+
+  # perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
+
+  # perf stat -a -e \
+  uncore_dmc_0/cnt_cycles/,\
+  uncore_dmc_0/data_transfers/,\
+  uncore_dmc_0/read_txns/,\
+  uncore_dmc_0/write_txns/ sleep 1
+
+  # perf stat -a -e \
+  uncore_l3c_0/read_request/,\
+  uncore_l3c_0/read_hit/,\
+  uncore_l3c_0/inv_request/,\
+  uncore_l3c_0/inv_hit/ sleep 1
diff --git a/Documentation/admin-guide/perf/xgene-pmu.rst b/Documentation/admin-guide/perf/xgene-pmu.rst
new file mode 100644
index 000000000000..644f8ed89152
--- /dev/null
+++ b/Documentation/admin-guide/perf/xgene-pmu.rst
@@ -0,0 +1,49 @@
+================================================
+APM X-Gene SoC Performance Monitoring Unit (PMU)
+================================================
+
+X-Gene SoC PMU consists of various independent system device PMUs such as
+L3 cache(s), I/O bridge(s), memory controller bridge(s) and memory
+controller(s). These PMU devices are loosely architected to follow the
+same model as the PMU for ARM cores. The PMUs share the same top level
+interrupt and status CSR region.
+
+PMU (perf) driver
+-----------------
+
+The xgene-pmu driver registers several perf PMU drivers. Each of the perf
+driver provides description of its available events and configuration options
+in sysfs, see /sys/devices/<l3cX/iobX/mcbX/mcX>/.
+
+The "format" directory describes format of the config (event ID),
+config1 (agent ID) fields of the perf_event_attr structure. The "events"
+directory provides configuration templates for all supported event types that
+can be used with perf tool. For example, "l3c0/bank-fifo-full/" is an
+equivalent of "l3c0/config=0x0b/".
+
+Most of the SoC PMU has a specific list of agent ID used for monitoring
+performance of a specific datapath. For example, agents of a L3 cache can be
+a specific CPU or an I/O bridge. Each PMU has a set of 2 registers capable of
+masking the agents from which the request come from. If the bit with
+the bit number corresponding to the agent is set, the event is counted only if
+it is caused by a request from that agent. Each agent ID bit is inversely mapped
+to a corresponding bit in "config1" field. By default, the event will be
+counted for all agent requests (config1 = 0x0). For all the supported agents of
+each PMU, please refer to APM X-Gene User Manual.
+
+Each perf driver also provides a "cpumask" sysfs attribute, which contains a
+single CPU ID of the processor which will be used to handle all the PMU events.
+
+Example for perf tool use::
+
+ / # perf list | grep -e l3c -e iob -e mcb -e mc
+   l3c0/ackq-full/                                    [Kernel PMU event]
+ <...>
+   mcb1/mcb-csw-stall/                                [Kernel PMU event]
+
+ / # perf stat -a -e l3c0/read-miss/,mcb1/csw-write-request/ sleep 1
+
+ / # perf stat -a -e l3c0/read-miss,config1=0xfffffffffffffffe/ sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/perf/arm-ccn.rst b/Documentation/perf/arm-ccn.rst
deleted file mode 100644
index 832b0c64023a..000000000000
--- a/Documentation/perf/arm-ccn.rst
+++ /dev/null
@@ -1,61 +0,0 @@
-==========================
-ARM Cache Coherent Network
-==========================
-
-CCN-504 is a ring-bus interconnect consisting of 11 crosspoints
-(XPs), with each crosspoint supporting up to two device ports,
-so nodes (devices) 0 and 1 are connected to crosspoint 0,
-nodes 2 and 3 to crosspoint 1 etc.
-
-PMU (perf) driver
------------------
-
-The CCN driver registers a perf PMU driver, which provides
-description of available events and configuration options
-in sysfs, see /sys/bus/event_source/devices/ccn*.
-
-The "format" directory describes format of the config, config1
-and config2 fields of the perf_event_attr structure. The "events"
-directory provides configuration templates for all documented
-events, that can be used with perf tool. For example "xp_valid_flit"
-is an equivalent of "type=0x8,event=0x4". Other parameters must be
-explicitly specified.
-
-For events originating from device, "node" defines its index.
-
-Crosspoint PMU events require "xp" (index), "bus" (bus number)
-and "vc" (virtual channel ID).
-
-Crosspoint watchpoint-based events (special "event" value 0xfe)
-require "xp" and "vc" as as above plus "port" (device port index),
-"dir" (transmit/receive direction), comparator values ("cmp_l"
-and "cmp_h") and "mask", being index of the comparator mask.
-
-Masks are defined separately from the event description
-(due to limited number of the config values) in the "cmp_mask"
-directory, with first 8 configurable by user and additional
-4 hardcoded for the most frequent use cases.
-
-Cycle counter is described by a "type" value 0xff and does
-not require any other settings.
-
-The driver also provides a "cpumask" sysfs attribute, which contains
-a single CPU ID, of the processor which will be used to handle all
-the CCN PMU events. It is recommended that the user space tools
-request the events on this processor (if not, the perf_event->cpu value
-will be overwritten anyway). In case of this processor being offlined,
-the events are migrated to another one and the attribute is updated.
-
-Example of perf tool use::
-
-  / # perf list | grep ccn
-    ccn/cycles/                                        [Kernel PMU event]
-  <...>
-    ccn/xp_valid_flit,xp=?,port=?,vc=?,dir=?/          [Kernel PMU event]
-  <...>
-
-  / # perf stat -a -e ccn/cycles/,ccn/xp_valid_flit,xp=1,port=0,vc=1,dir=1/ \
-                                                                         sleep 1
-
-The driver does not support sampling, therefore "perf record" will
-not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/perf/arm_dsu_pmu.rst b/Documentation/perf/arm_dsu_pmu.rst
deleted file mode 100644
index 7fd34db75d13..000000000000
--- a/Documentation/perf/arm_dsu_pmu.rst
+++ /dev/null
@@ -1,29 +0,0 @@
-==================================
-ARM DynamIQ Shared Unit (DSU) PMU
-==================================
-
-ARM DynamIQ Shared Unit integrates one or more cores with an L3 memory system,
-control logic and external interfaces to form a multicore cluster. The PMU
-allows counting the various events related to the L3 cache, Snoop Control Unit
-etc, using 32bit independent counters. It also provides a 64bit cycle counter.
-
-The PMU can only be accessed via CPU system registers and are common to the
-cores connected to the same DSU. Like most of the other uncore PMUs, DSU
-PMU doesn't support process specific events and cannot be used in sampling mode.
-
-The DSU provides a bitmap for a subset of implemented events via hardware
-registers. There is no way for the driver to determine if the other events
-are available or not. Hence the driver exposes only those events advertised
-by the DSU, in "events" directory under::
-
-  /sys/bus/event_sources/devices/arm_dsu_<N>/
-
-The user should refer to the TRM of the product to figure out the supported events
-and use the raw event code for the unlisted events.
-
-The driver also exposes the CPUs connected to the DSU instance in "associated_cpus".
-
-
-e.g usage::
-
-	perf stat -a -e arm_dsu_0/cycles/
diff --git a/Documentation/perf/hisi-pmu.rst b/Documentation/perf/hisi-pmu.rst
deleted file mode 100644
index 404a5c3d9d00..000000000000
--- a/Documentation/perf/hisi-pmu.rst
+++ /dev/null
@@ -1,60 +0,0 @@
-======================================================
-HiSilicon SoC uncore Performance Monitoring Unit (PMU)
-======================================================
-
-The HiSilicon SoC chip includes various independent system device PMUs
-such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
-independent and have hardware logic to gather statistics and performance
-information.
-
-The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
-(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
-called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
-two HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
-
-HiSilicon SoC uncore PMU driver
--------------------------------
-
-Each device PMU has separate registers for event counting, control and
-interrupt, and the PMU driver shall register perf PMU drivers like L3C,
-HHA and DDRC etc. The available events and configuration options shall
-be described in the sysfs, see:
-
-/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
-/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
-The "perf list" command shall list the available events from sysfs.
-
-Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
-name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
-where "sccl-id" is the identifier of the SCCL and "index-id" is the index of
-module.
-
-e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
-SCCL ID #3.
-
-e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
-SCCL ID #1.
-
-The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
-ID used to count the uncore PMU event.
-
-Example usage of perf::
-
-  $# perf list
-  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
-  ------------------------------------------
-  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
-  ------------------------------------------
-  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
-  ------------------------------------------
-  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
-  ------------------------------------------
-
-  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
-  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
-
-The current driver does not support sampling. So "perf record" is unsupported.
-Also attach to a task is unsupported as the events are all uncore.
-
-Note: Please contact the maintainer for a complete list of events supported for
-the PMU devices in the SoC and its information if needed.
diff --git a/Documentation/perf/index.rst b/Documentation/perf/index.rst
deleted file mode 100644
index 4bf848e27f26..000000000000
--- a/Documentation/perf/index.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-:orphan:
-
-===========================
-Performance monitor support
-===========================
-
-.. toctree::
-   :maxdepth: 1
-
-   hisi-pmu
-   qcom_l2_pmu
-   qcom_l3_pmu
-   arm-ccn
-   xgene-pmu
-   arm_dsu_pmu
-   thunderx2-pmu
diff --git a/Documentation/perf/qcom_l2_pmu.rst b/Documentation/perf/qcom_l2_pmu.rst
deleted file mode 100644
index c130178a4a55..000000000000
--- a/Documentation/perf/qcom_l2_pmu.rst
+++ /dev/null
@@ -1,39 +0,0 @@
-=====================================================================
-Qualcomm Technologies Level-2 Cache Performance Monitoring Unit (PMU)
-=====================================================================
-
-This driver supports the L2 cache clusters found in Qualcomm Technologies
-Centriq SoCs. There are multiple physical L2 cache clusters, each with their
-own PMU. Each cluster has one or more CPUs associated with it.
-
-There is one logical L2 PMU exposed, which aggregates the results from
-the physical PMUs.
-
-The driver provides a description of its available events and configuration
-options in sysfs, see /sys/devices/l2cache_0.
-
-The "format" directory describes the format of the events.
-
-Events can be envisioned as a 2-dimensional array. Each column represents
-a group of events. There are 8 groups. Only one entry from each
-group can be in use at a time. If multiple events from the same group
-are specified, the conflicting events cannot be counted at the same time.
-
-Events are specified as 0xCCG, where CC is 2 hex digits specifying
-the code (array row) and G specifies the group (column) 0-7.
-
-In addition there is a cycle counter event specified by the value 0xFE
-which is outside the above scheme.
-
-The driver provides a "cpumask" sysfs attribute which contains a mask
-consisting of one CPU per cluster which will be used to handle all the PMU
-events on that cluster.
-
-Examples for use with perf::
-
-  perf stat -e l2cache_0/config=0x001/,l2cache_0/config=0x042/ -a sleep 1
-
-  perf stat -e l2cache_0/config=0xfe/ -C 2 sleep 1
-
-The driver does not support sampling, therefore "perf record" will
-not work. Per-task perf sessions are not supported.
diff --git a/Documentation/perf/qcom_l3_pmu.rst b/Documentation/perf/qcom_l3_pmu.rst
deleted file mode 100644
index a3d014a46bfd..000000000000
--- a/Documentation/perf/qcom_l3_pmu.rst
+++ /dev/null
@@ -1,26 +0,0 @@
-===========================================================================
-Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
-===========================================================================
-
-This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
-Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
-by all cores within a socket. Each slice is exposed as a separate uncore perf
-PMU with device name l3cache_<socket>_<instance>. User space is responsible
-for aggregating across slices.
-
-The driver provides a description of its available events and configuration
-options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
-the driver also exposes a "cpumask" sysfs attribute which contains a mask
-consisting of one CPU per socket which will be used to handle all the PMU
-events on that socket.
-
-The hardware implements 32bit event counters and has a flat 8bit event space
-exposed via the "event" format attribute. In addition to the 32bit physical
-counters the driver supports virtual 64bit hardware counters by using hardware
-counter chaining. This feature is exposed via the "lc" (long counter) format
-flag. E.g.::
-
-  perf stat -e l3cache_0_0/read-miss,lc/
-
-Given that these are uncore PMUs the driver does not support sampling, therefore
-"perf record" will not work. Per-task perf sessions are not supported.
diff --git a/Documentation/perf/thunderx2-pmu.rst b/Documentation/perf/thunderx2-pmu.rst
deleted file mode 100644
index 08e33675853a..000000000000
--- a/Documentation/perf/thunderx2-pmu.rst
+++ /dev/null
@@ -1,42 +0,0 @@
-=============================================================
-Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
-=============================================================
-
-The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
-PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
-
-The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
-Events are counted for the default channel (i.e. channel 0) and prorated
-to the total number of channels/tiles.
-
-The DMC and L3C support up to 4 counters. Counters are independently
-programmable and can be started and stopped individually. Each counter
-can be set to a different event. Counters are 32-bit and do not support
-an overflow interrupt; they are read every 2 seconds.
-
-PMU UNCORE (perf) driver:
-
-The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
-L3C devices.  Each PMU can be used to count up to 4 events
-simultaneously. The PMUs provide a description of their available events
-and configuration options under sysfs, see
-/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
-
-The driver does not support sampling, therefore "perf record" will not
-work. Per-task perf sessions are also not supported.
-
-Examples::
-
-  # perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
-
-  # perf stat -a -e \
-  uncore_dmc_0/cnt_cycles/,\
-  uncore_dmc_0/data_transfers/,\
-  uncore_dmc_0/read_txns/,\
-  uncore_dmc_0/write_txns/ sleep 1
-
-  # perf stat -a -e \
-  uncore_l3c_0/read_request/,\
-  uncore_l3c_0/read_hit/,\
-  uncore_l3c_0/inv_request/,\
-  uncore_l3c_0/inv_hit/ sleep 1
diff --git a/Documentation/perf/xgene-pmu.rst b/Documentation/perf/xgene-pmu.rst
deleted file mode 100644
index 644f8ed89152..000000000000
--- a/Documentation/perf/xgene-pmu.rst
+++ /dev/null
@@ -1,49 +0,0 @@
-================================================
-APM X-Gene SoC Performance Monitoring Unit (PMU)
-================================================
-
-X-Gene SoC PMU consists of various independent system device PMUs such as
-L3 cache(s), I/O bridge(s), memory controller bridge(s) and memory
-controller(s). These PMU devices are loosely architected to follow the
-same model as the PMU for ARM cores. The PMUs share the same top level
-interrupt and status CSR region.
-
-PMU (perf) driver
------------------
-
-The xgene-pmu driver registers several perf PMU drivers. Each of the perf
-driver provides description of its available events and configuration options
-in sysfs, see /sys/devices/<l3cX/iobX/mcbX/mcX>/.
-
-The "format" directory describes format of the config (event ID),
-config1 (agent ID) fields of the perf_event_attr structure. The "events"
-directory provides configuration templates for all supported event types that
-can be used with perf tool. For example, "l3c0/bank-fifo-full/" is an
-equivalent of "l3c0/config=0x0b/".
-
-Most of the SoC PMU has a specific list of agent ID used for monitoring
-performance of a specific datapath. For example, agents of a L3 cache can be
-a specific CPU or an I/O bridge. Each PMU has a set of 2 registers capable of
-masking the agents from which the request come from. If the bit with
-the bit number corresponding to the agent is set, the event is counted only if
-it is caused by a request from that agent. Each agent ID bit is inversely mapped
-to a corresponding bit in "config1" field. By default, the event will be
-counted for all agent requests (config1 = 0x0). For all the supported agents of
-each PMU, please refer to APM X-Gene User Manual.
-
-Each perf driver also provides a "cpumask" sysfs attribute, which contains a
-single CPU ID of the processor which will be used to handle all the PMU events.
-
-Example for perf tool use::
-
- / # perf list | grep -e l3c -e iob -e mcb -e mc
-   l3c0/ackq-full/                                    [Kernel PMU event]
- <...>
-   mcb1/mcb-csw-stall/                                [Kernel PMU event]
-
- / # perf stat -a -e l3c0/read-miss/,mcb1/csw-write-request/ sleep 1
-
- / # perf stat -a -e l3c0/read-miss,config1=0xfffffffffffffffe/ sleep 1
-
-The driver does not support sampling, therefore "perf record" will
-not work. Per-task (without "-a") perf sessions are not supported.
diff --git a/MAINTAINERS b/MAINTAINERS
index db96cd4a229b..b8ce346d5254 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1155,7 +1155,7 @@ APPLIED MICRO (APM) X-GENE SOC PMU
 M:	Khuong Dinh <khuong@os.amperecomputing.com>
 S:	Supported
 F:	drivers/perf/xgene_pmu.c
-F:	Documentation/perf/xgene-pmu.rst
+F:	Documentation/admin-guide/perf/xgene-pmu.rst
 F:	Documentation/devicetree/bindings/perf/apm-xgene-pmu.txt
 
 APTINA CAMERA SENSOR PLL
@@ -7262,7 +7262,7 @@ M:	Shaokun Zhang <zhangshaokun@hisilicon.com>
 W:	http://www.hisilicon.com
 S:	Supported
 F:	drivers/perf/hisilicon
-F:	Documentation/perf/hisi-pmu.rst
+F:	Documentation/admin-guide/perf/hisi-pmu.rst
 
 HISILICON ROCE DRIVER
 M:	Lijun Ou <oulijun@huawei.com>
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 90f88ce5192b..656e830798d9 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -8,7 +8,7 @@
  * the slices. User space needs to aggregate to individual counts to provide
  * a global picture.
  *
- * See Documentation/perf/qcom_l3_pmu.rst for more details.
+ * See Documentation/admin-guide/perf/qcom_l3_pmu.rst for more details.
  *
  * Copyright (c) 2015-2017, The Linux Foundation. All rights reserved.
  */
-- 
cgit v1.2.3-55-g7522


From ae4a05027e2f883fb5f822e48d67cacc26bf60e1 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:32:31 -0300
Subject: docs: nvdimm: add it to the driver-api book

The descriptions here are from Kernel driver's PoV.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/driver-api/index.rst           |   1 +
 Documentation/driver-api/nvdimm/btt.rst      | 285 +++++++++
 Documentation/driver-api/nvdimm/index.rst    |  10 +
 Documentation/driver-api/nvdimm/nvdimm.rst   | 887 +++++++++++++++++++++++++++
 Documentation/driver-api/nvdimm/security.rst | 143 +++++
 Documentation/nvdimm/btt.rst                 | 285 ---------
 Documentation/nvdimm/index.rst               |  12 -
 Documentation/nvdimm/nvdimm.rst              | 887 ---------------------------
 Documentation/nvdimm/security.rst            | 143 -----
 drivers/nvdimm/Kconfig                       |   2 +-
 10 files changed, 1327 insertions(+), 1328 deletions(-)
 create mode 100644 Documentation/driver-api/nvdimm/btt.rst
 create mode 100644 Documentation/driver-api/nvdimm/index.rst
 create mode 100644 Documentation/driver-api/nvdimm/nvdimm.rst
 create mode 100644 Documentation/driver-api/nvdimm/security.rst
 delete mode 100644 Documentation/nvdimm/btt.rst
 delete mode 100644 Documentation/nvdimm/index.rst
 delete mode 100644 Documentation/nvdimm/nvdimm.rst
 delete mode 100644 Documentation/nvdimm/security.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index d665cd9ab95f..410dd7110772 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -44,6 +44,7 @@ available subsections can be seen below.
    mtdnand
    miscellaneous
    mei/index
+   nvdimm/index
    w1
    rapidio/index
    s390-drivers
diff --git a/Documentation/driver-api/nvdimm/btt.rst b/Documentation/driver-api/nvdimm/btt.rst
new file mode 100644
index 000000000000..2d8269f834bd
--- /dev/null
+++ b/Documentation/driver-api/nvdimm/btt.rst
@@ -0,0 +1,285 @@
+=============================
+BTT - Block Translation Table
+=============================
+
+
+1. Introduction
+===============
+
+Persistent memory based storage is able to perform IO at byte (or more
+accurately, cache line) granularity. However, we often want to expose such
+storage as traditional block devices. The block drivers for persistent memory
+will do exactly this. However, they do not provide any atomicity guarantees.
+Traditional SSDs typically provide protection against torn sectors in hardware,
+using stored energy in capacitors to complete in-flight block writes, or perhaps
+in firmware. We don't have this luxury with persistent memory - if a write is in
+progress, and we experience a power failure, the block will contain a mix of old
+and new data. Applications may not be prepared to handle such a scenario.
+
+The Block Translation Table (BTT) provides atomic sector update semantics for
+persistent memory devices, so that applications that rely on sector writes not
+being torn can continue to do so. The BTT manifests itself as a stacked block
+device, and reserves a portion of the underlying storage for its metadata. At
+the heart of it, is an indirection table that re-maps all the blocks on the
+volume. It can be thought of as an extremely simple file system that only
+provides atomic sector updates.
+
+
+2. Static Layout
+================
+
+The underlying storage on which a BTT can be laid out is not limited in any way.
+The BTT, however, splits the available space into chunks of up to 512 GiB,
+called "Arenas".
+
+Each arena follows the same layout for its metadata, and all references in an
+arena are internal to it (with the exception of one field that points to the
+next arena). The following depicts the "On-disk" metadata layout::
+
+
+    Backing Store     +------->  Arena
+  +---------------+   |   +------------------+
+  |               |   |   | Arena info block |
+  |    Arena 0    +---+   |       4K         |
+  |     512G      |       +------------------+
+  |               |       |                  |
+  +---------------+       |                  |
+  |               |       |                  |
+  |    Arena 1    |       |   Data Blocks    |
+  |     512G      |       |                  |
+  |               |       |                  |
+  +---------------+       |                  |
+  |       .       |       |                  |
+  |       .       |       |                  |
+  |       .       |       |                  |
+  |               |       |                  |
+  |               |       |                  |
+  +---------------+       +------------------+
+                          |                  |
+                          |     BTT Map      |
+                          |                  |
+                          |                  |
+                          +------------------+
+                          |                  |
+                          |     BTT Flog     |
+                          |                  |
+                          +------------------+
+                          | Info block copy  |
+                          |       4K         |
+                          +------------------+
+
+
+3. Theory of Operation
+======================
+
+
+a. The BTT Map
+--------------
+
+The map is a simple lookup/indirection table that maps an LBA to an internal
+block. Each map entry is 32 bits. The two most significant bits are special
+flags, and the remaining form the internal block number.
+
+======== =============================================================
+Bit      Description
+======== =============================================================
+31 - 30	 Error and Zero flags - Used in the following way:
+
+	   == ==  ====================================================
+	   31 30  Description
+	   == ==  ====================================================
+	   0  0	  Initial state. Reads return zeroes; Premap = Postmap
+	   0  1	  Zero state: Reads return zeroes
+	   1  0	  Error state: Reads fail; Writes clear 'E' bit
+	   1  1	  Normal Block – has valid postmap
+	   == ==  ====================================================
+
+29 - 0	 Mappings to internal 'postmap' blocks
+======== =============================================================
+
+
+Some of the terminology that will be subsequently used:
+
+============	================================================================
+External LBA	LBA as made visible to upper layers.
+ABA		Arena Block Address - Block offset/number within an arena
+Premap ABA	The block offset into an arena, which was decided upon by range
+		checking the External LBA
+Postmap ABA	The block number in the "Data Blocks" area obtained after
+		indirection from the map
+nfree		The number of free blocks that are maintained at any given time.
+		This is the number of concurrent writes that can happen to the
+		arena.
+============	================================================================
+
+
+For example, after adding a BTT, we surface a disk of 1024G. We get a read for
+the external LBA at 768G. This falls into the second arena, and of the 512G
+worth of blocks that this arena contributes, this block is at 256G. Thus, the
+premap ABA is 256G. We now refer to the map, and find out the mapping for block
+'X' (256G) points to block 'Y', say '64'. Thus the postmap ABA is 64.
+
+
+b. The BTT Flog
+---------------
+
+The BTT provides sector atomicity by making every write an "allocating write",
+i.e. Every write goes to a "free" block. A running list of free blocks is
+maintained in the form of the BTT flog. 'Flog' is a combination of the words
+"free list" and "log". The flog contains 'nfree' entries, and an entry contains:
+
+========  =====================================================================
+lba       The premap ABA that is being written to
+old_map   The old postmap ABA - after 'this' write completes, this will be a
+	  free block.
+new_map   The new postmap ABA. The map will up updated to reflect this
+	  lba->postmap_aba mapping, but we log it here in case we have to
+	  recover.
+seq	  Sequence number to mark which of the 2 sections of this flog entry is
+	  valid/newest. It cycles between 01->10->11->01 (binary) under normal
+	  operation, with 00 indicating an uninitialized state.
+lba'	  alternate lba entry
+old_map'  alternate old postmap entry
+new_map'  alternate new postmap entry
+seq'	  alternate sequence number.
+========  =====================================================================
+
+Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also
+padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are
+done such that for any entry being written, it:
+a. overwrites the 'old' section in the entry based on sequence numbers
+b. writes the 'new' section such that the sequence number is written last.
+
+
+c. The concept of lanes
+-----------------------
+
+While 'nfree' describes the number of concurrent IOs an arena can process
+concurrently, 'nlanes' is the number of IOs the BTT device as a whole can
+process::
+
+	nlanes = min(nfree, num_cpus)
+
+A lane number is obtained at the start of any IO, and is used for indexing into
+all the on-disk and in-memory data structures for the duration of the IO. If
+there are more CPUs than the max number of available lanes, than lanes are
+protected by spinlocks.
+
+
+d. In-memory data structure: Read Tracking Table (RTT)
+------------------------------------------------------
+
+Consider a case where we have two threads, one doing reads and the other,
+writes. We can hit a condition where the writer thread grabs a free block to do
+a new IO, but the (slow) reader thread is still reading from it. In other words,
+the reader consulted a map entry, and started reading the corresponding block. A
+writer started writing to the same external LBA, and finished the write updating
+the map for that external LBA to point to its new postmap ABA. At this point the
+internal, postmap block that the reader is (still) reading has been inserted
+into the list of free blocks. If another write comes in for the same LBA, it can
+grab this free block, and start writing to it, causing the reader to read
+incorrect data. To prevent this, we introduce the RTT.
+
+The RTT is a simple, per arena table with 'nfree' entries. Every reader inserts
+into rtt[lane_number], the postmap ABA it is reading, and clears it after the
+read is complete. Every writer thread, after grabbing a free block, checks the
+RTT for its presence. If the postmap free block is in the RTT, it waits till the
+reader clears the RTT entry, and only then starts writing to it.
+
+
+e. In-memory data structure: map locks
+--------------------------------------
+
+Consider a case where two writer threads are writing to the same LBA. There can
+be a race in the following sequence of steps::
+
+	free[lane] = map[premap_aba]
+	map[premap_aba] = postmap_aba
+
+Both threads can update their respective free[lane] with the same old, freed
+postmap_aba. This has made the layout inconsistent by losing a free entry, and
+at the same time, duplicating another free entry for two lanes.
+
+To solve this, we could have a single map lock (per arena) that has to be taken
+before performing the above sequence, but we feel that could be too contentious.
+Instead we use an array of (nfree) map_locks that is indexed by
+(premap_aba modulo nfree).
+
+
+f. Reconstruction from the Flog
+-------------------------------
+
+On startup, we analyze the BTT flog to create our list of free blocks. We walk
+through all the entries, and for each lane, of the set of two possible
+'sections', we always look at the most recent one only (based on the sequence
+number). The reconstruction rules/steps are simple:
+
+- Read map[log_entry.lba].
+- If log_entry.new matches the map entry, then log_entry.old is free.
+- If log_entry.new does not match the map entry, then log_entry.new is free.
+  (This case can only be caused by power-fails/unsafe shutdowns)
+
+
+g. Summarizing - Read and Write flows
+-------------------------------------
+
+Read:
+
+1.  Convert external LBA to arena number + pre-map ABA
+2.  Get a lane (and take lane_lock)
+3.  Read map to get the entry for this pre-map ABA
+4.  Enter post-map ABA into RTT[lane]
+5.  If TRIM flag set in map, return zeroes, and end IO (go to step 8)
+6.  If ERROR flag set in map, end IO with EIO (go to step 8)
+7.  Read data from this block
+8.  Remove post-map ABA entry from RTT[lane]
+9.  Release lane (and lane_lock)
+
+Write:
+
+1.  Convert external LBA to Arena number + pre-map ABA
+2.  Get a lane (and take lane_lock)
+3.  Use lane to index into in-memory free list and obtain a new block, next flog
+    index, next sequence number
+4.  Scan the RTT to check if free block is present, and spin/wait if it is.
+5.  Write data to this free block
+6.  Read map to get the existing post-map ABA entry for this pre-map ABA
+7.  Write flog entry: [premap_aba / old postmap_aba / new postmap_aba / seq_num]
+8.  Write new post-map ABA into map.
+9.  Write old post-map entry into the free list
+10. Calculate next sequence number and write into the free list entry
+11. Release lane (and lane_lock)
+
+
+4. Error Handling
+=================
+
+An arena would be in an error state if any of the metadata is corrupted
+irrecoverably, either due to a bug or a media error. The following conditions
+indicate an error:
+
+- Info block checksum does not match (and recovering from the copy also fails)
+- All internal available blocks are not uniquely and entirely addressed by the
+  sum of mapped blocks and free blocks (from the BTT flog).
+- Rebuilding free list from the flog reveals missing/duplicate/impossible
+  entries
+- A map entry is out of bounds
+
+If any of these error conditions are encountered, the arena is put into a read
+only state using a flag in the info block.
+
+
+5. Usage
+========
+
+The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
+(pmem, or blk mode). The easiest way to set up such a namespace is using the
+'ndctl' utility [1]:
+
+For example, the ndctl command line to setup a btt with a 4k sector size is::
+
+    ndctl create-namespace -f -e namespace0.0 -m sector -l 4k
+
+See ndctl create-namespace --help for more options.
+
+[1]: https://github.com/pmem/ndctl
diff --git a/Documentation/driver-api/nvdimm/index.rst b/Documentation/driver-api/nvdimm/index.rst
new file mode 100644
index 000000000000..19dc8ee371dc
--- /dev/null
+++ b/Documentation/driver-api/nvdimm/index.rst
@@ -0,0 +1,10 @@
+===================================
+Non-Volatile Memory Device (NVDIMM)
+===================================
+
+.. toctree::
+   :maxdepth: 1
+
+   nvdimm
+   btt
+   security
diff --git a/Documentation/driver-api/nvdimm/nvdimm.rst b/Documentation/driver-api/nvdimm/nvdimm.rst
new file mode 100644
index 000000000000..08f855cbb4e6
--- /dev/null
+++ b/Documentation/driver-api/nvdimm/nvdimm.rst
@@ -0,0 +1,887 @@
+===============================
+LIBNVDIMM: Non-Volatile Devices
+===============================
+
+libnvdimm - kernel / libndctl - userspace helper library
+
+linux-nvdimm@lists.01.org
+
+Version 13
+
+.. contents:
+
+	Glossary
+	Overview
+	    Supporting Documents
+	    Git Trees
+	LIBNVDIMM PMEM and BLK
+	Why BLK?
+	    PMEM vs BLK
+	        BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
+	Example NVDIMM Platform
+	LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
+	    LIBNDCTL: Context
+	        libndctl: instantiate a new library context example
+	    LIBNVDIMM/LIBNDCTL: Bus
+	        libnvdimm: control class device in /sys/class
+	        libnvdimm: bus
+	        libndctl: bus enumeration example
+	    LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
+	        libnvdimm: DIMM (NMEM)
+	        libndctl: DIMM enumeration example
+	    LIBNVDIMM/LIBNDCTL: Region
+	        libnvdimm: region
+	        libndctl: region enumeration example
+	        Why Not Encode the Region Type into the Region Name?
+	        How Do I Determine the Major Type of a Region?
+	    LIBNVDIMM/LIBNDCTL: Namespace
+	        libnvdimm: namespace
+	        libndctl: namespace enumeration example
+	        libndctl: namespace creation example
+	        Why the Term "namespace"?
+	    LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
+	        libnvdimm: btt layout
+	        libndctl: btt creation example
+	Summary LIBNDCTL Diagram
+
+
+Glossary
+========
+
+PMEM:
+  A system-physical-address range where writes are persistent.  A
+  block device composed of PMEM is capable of DAX.  A PMEM address range
+  may span an interleave of several DIMMs.
+
+BLK:
+  A set of one or more programmable memory mapped apertures provided
+  by a DIMM to access its media.  This indirection precludes the
+  performance benefit of interleaving, but enables DIMM-bounded failure
+  modes.
+
+DPA:
+  DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
+  the system there would be a 1:1 system-physical-address:DPA association.
+  Once more DIMMs are added a memory controller interleave must be
+  decoded to determine the DPA associated with a given
+  system-physical-address.  BLK capacity always has a 1:1 relationship
+  with a single-DIMM's DPA range.
+
+DAX:
+  File system extensions to bypass the page cache and block layer to
+  mmap persistent memory, from a PMEM block device, directly into a
+  process address space.
+
+DSM:
+  Device Specific Method: ACPI method to to control specific
+  device - in this case the firmware.
+
+DCR:
+  NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
+  It defines a vendor-id, device-id, and interface format for a given DIMM.
+
+BTT:
+  Block Translation Table: Persistent memory is byte addressable.
+  Existing software may have an expectation that the power-fail-atomicity
+  of writes is at least one sector, 512 bytes.  The BTT is an indirection
+  table with atomic update semantics to front a PMEM/BLK block device
+  driver and present arbitrary atomic sector sizes.
+
+LABEL:
+  Metadata stored on a DIMM device that partitions and identifies
+  (persistently names) storage between PMEM and BLK.  It also partitions
+  BLK storage to host BTTs with different parameters per BLK-partition.
+  Note that traditional partition tables, GPT/MBR, are layered on top of a
+  BLK or PMEM device.
+
+
+Overview
+========
+
+The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
+PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
+and BLK mode access.  These three modes of operation are described by
+the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6.  While the LIBNVDIMM
+implementation is generic and supports pre-NFIT platforms, it was guided
+by the superset of capabilities need to support this ACPI 6 definition
+for NVDIMM resources.  The bulk of the kernel implementation is in place
+to handle the case where DPA accessible via PMEM is aliased with DPA
+accessible via BLK.  When that occurs a LABEL is needed to reserve DPA
+for exclusive access via one mode a time.
+
+Supporting Documents
+--------------------
+
+ACPI 6:
+	http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
+NVDIMM Namespace:
+	http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+DSM Interface Example:
+	http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+Driver Writer's Guide:
+	http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
+
+Git Trees
+---------
+
+LIBNVDIMM:
+	https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
+LIBNDCTL:
+	https://github.com/pmem/ndctl.git
+PMEM:
+	https://github.com/01org/prd
+
+
+LIBNVDIMM PMEM and BLK
+======================
+
+Prior to the arrival of the NFIT, non-volatile memory was described to a
+system in various ad-hoc ways.  Usually only the bare minimum was
+provided, namely, a single system-physical-address range where writes
+are expected to be durable after a system power loss.  Now, the NFIT
+specification standardizes not only the description of PMEM, but also
+BLK and platform message-passing entry points for control and
+configuration.
+
+For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
+device driver:
+
+    1. PMEM (nd_pmem.ko): Drives a system-physical-address range.  This
+       range is contiguous in system memory and may be interleaved (hardware
+       memory controller striped) across multiple DIMMs.  When interleaved the
+       platform may optionally provide details of which DIMMs are participating
+       in the interleave.
+
+       Note that while LIBNVDIMM describes system-physical-address ranges that may
+       alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
+       alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
+       distinction.  The different device-types are an implementation detail
+       that userspace can exploit to implement policies like "only interface
+       with address ranges from certain DIMMs".  It is worth noting that when
+       aliasing is present and a DIMM lacks a label, then no block device can
+       be created by default as userspace needs to do at least one allocation
+       of DPA to the PMEM range.  In contrast ND_NAMESPACE_IO ranges, once
+       registered, can be immediately attached to nd_pmem.
+
+    2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
+       defined apertures.  A set of apertures will access just one DIMM.
+       Multiple windows (apertures) allow multiple concurrent accesses, much like
+       tagged-command-queuing, and would likely be used by different threads or
+       different CPUs.
+
+       The NFIT specification defines a standard format for a BLK-aperture, but
+       the spec also allows for vendor specific layouts, and non-NFIT BLK
+       implementations may have other designs for BLK I/O.  For this reason
+       "nd_blk" calls back into platform-specific code to perform the I/O.
+
+       One such implementation is defined in the "Driver Writer's Guide" and "DSM
+       Interface Example".
+
+
+Why BLK?
+========
+
+While PMEM provides direct byte-addressable CPU-load/store access to
+NVDIMM storage, it does not provide the best system RAS (recovery,
+availability, and serviceability) model.  An access to a corrupted
+system-physical-address address causes a CPU exception while an access
+to a corrupted address through an BLK-aperture causes that block window
+to raise an error status in a register.  The latter is more aligned with
+the standard error model that host-bus-adapter attached disks present.
+
+Also, if an administrator ever wants to replace a memory it is easier to
+service a system at DIMM module boundaries.  Compare this to PMEM where
+data could be interleaved in an opaque hardware specific manner across
+several DIMMs.
+
+PMEM vs BLK
+-----------
+
+BLK-apertures solve these RAS problems, but their presence is also the
+major contributing factor to the complexity of the ND subsystem.  They
+complicate the implementation because PMEM and BLK alias in DPA space.
+Any given DIMM's DPA-range may contribute to one or more
+system-physical-address sets of interleaved DIMMs, *and* may also be
+accessed in its entirety through its BLK-aperture.  Accessing a DPA
+through a system-physical-address while simultaneously accessing the
+same DPA through a BLK-aperture has undefined results.  For this reason,
+DIMMs with this dual interface configuration include a DSM function to
+store/retrieve a LABEL.  The LABEL effectively partitions the DPA-space
+into exclusive system-physical-address and BLK-aperture accessible
+regions.  For simplicity a DIMM is allowed a PMEM "region" per each
+interleave set in which it is a member.  The remaining DPA space can be
+carved into an arbitrary number of BLK devices with discontiguous
+extents.
+
+BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+One of the few
+reasons to allow multiple BLK namespaces per REGION is so that each
+BLK-namespace can be configured with a BTT with unique atomic sector
+sizes.  While a PMEM device can host a BTT the LABEL specification does
+not provide for a sector size to be specified for a PMEM namespace.
+
+This is due to the expectation that the primary usage model for PMEM is
+via DAX, and the BTT is incompatible with DAX.  However, for the cases
+where an application or filesystem still needs atomic sector update
+guarantees it can register a BTT on a PMEM device or partition.  See
+LIBNVDIMM/NDCTL: Block Translation Table "btt"
+
+
+Example NVDIMM Platform
+=======================
+
+For the remainder of this document the following diagram will be
+referenced for any example sysfs layouts::
+
+
+                               (a)               (b)           DIMM   BLK-REGION
+            +-------------------+--------+--------+--------+
+  +------+  |       pm0.0       | blk2.0 | pm1.0  | blk2.1 |    0      region2
+  | imc0 +--+- - - region0- - - +--------+        +--------+
+  +--+---+  |       pm0.0       | blk3.0 | pm1.0  | blk3.1 |    1      region3
+     |      +-------------------+--------v        v--------+
+  +--+---+                               |                 |
+  | cpu0 |                                     region1
+  +--+---+                               |                 |
+     |      +----------------------------^        ^--------+
+  +--+---+  |           blk4.0           | pm1.0  | blk4.0 |    2      region4
+  | imc1 +--+----------------------------|        +--------+
+  +------+  |           blk5.0           | pm1.0  | blk5.0 |    3      region5
+            +----------------------------+--------+--------+
+
+In this platform we have four DIMMs and two memory controllers in one
+socket.  Each unique interface (BLK or PMEM) to DPA space is identified
+by a region device with a dynamically assigned id (REGION0 - REGION5).
+
+    1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
+       single PMEM namespace is created in the REGION0-SPA-range that spans most
+       of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
+       interleaved system-physical-address range is reclaimed as BLK-aperture
+       accessed space starting at DPA-offset (a) into each DIMM.  In that
+       reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
+       REGION3 where "blk2.0" and "blk3.0" are just human readable names that
+       could be set to any user-desired name in the LABEL.
+
+    2. In the last portion of DIMM0 and DIMM1 we have an interleaved
+       system-physical-address range, REGION1, that spans those two DIMMs as
+       well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace
+       named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
+       each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
+       "blk5.0".
+
+    3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
+       interleaved system-physical-address range (i.e. the DPA address past
+       offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
+       Note, that this example shows that BLK-aperture namespaces don't need to
+       be contiguous in DPA-space.
+
+    This bus is provided by the kernel under the device
+    /sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
+    the nfit_test.ko module is loaded.  This not only test LIBNVDIMM but the
+    acpi_nfit.ko driver as well.
+
+
+LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
+========================================================
+
+What follows is a description of the LIBNVDIMM sysfs layout and a
+corresponding object hierarchy diagram as viewed through the LIBNDCTL
+API.  The example sysfs paths and diagrams are relative to the Example
+NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
+test.
+
+LIBNDCTL: Context
+-----------------
+
+Every API call in the LIBNDCTL library requires a context that holds the
+logging parameters and other library instance state.  The library is
+based on the libabc template:
+
+	https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
+
+LIBNDCTL: instantiate a new library context example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+	struct ndctl_ctx *ctx;
+
+	if (ndctl_new(&ctx) == 0)
+		return ctx;
+	else
+		return NULL;
+
+LIBNVDIMM/LIBNDCTL: Bus
+-----------------------
+
+A bus has a 1:1 relationship with an NFIT.  The current expectation for
+ACPI based systems is that there is only ever one platform-global NFIT.
+That said, it is trivial to register multiple NFITs, the specification
+does not preclude it.  The infrastructure supports multiple busses and
+we use this capability to test multiple NFIT configurations in the unit
+test.
+
+LIBNVDIMM: control class device in /sys/class
+---------------------------------------------
+
+This character device accepts DSM messages to be passed to DIMM
+identified by its NFIT handle::
+
+	/sys/class/nd/ndctl0
+	|-- dev
+	|-- device -> ../../../ndbus0
+	|-- subsystem -> ../../../../../../../class/nd
+
+
+
+LIBNVDIMM: bus
+--------------
+
+::
+
+	struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
+	       struct nvdimm_bus_descriptor *nfit_desc);
+
+::
+
+	/sys/devices/platform/nfit_test.0/ndbus0
+	|-- commands
+	|-- nd
+	|-- nfit
+	|-- nmem0
+	|-- nmem1
+	|-- nmem2
+	|-- nmem3
+	|-- power
+	|-- provider
+	|-- region0
+	|-- region1
+	|-- region2
+	|-- region3
+	|-- region4
+	|-- region5
+	|-- uevent
+	`-- wait_probe
+
+LIBNDCTL: bus enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Find the bus handle that describes the bus from Example NVDIMM Platform::
+
+	static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
+			const char *provider)
+	{
+		struct ndctl_bus *bus;
+
+		ndctl_bus_foreach(ctx, bus)
+			if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
+				return bus;
+
+		return NULL;
+	}
+
+	bus = get_bus_by_provider(ctx, "nfit_test.0");
+
+
+LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
+-------------------------------
+
+The DIMM device provides a character device for sending commands to
+hardware, and it is a container for LABELs.  If the DIMM is defined by
+NFIT then an optional 'nfit' attribute sub-directory is available to add
+NFIT-specifics.
+
+Note that the kernel device name for "DIMMs" is "nmemX".  The NFIT
+describes these devices via "Memory Device to System Physical Address
+Range Mapping Structure", and there is no requirement that they actually
+be physical DIMMs, so we use a more generic name.
+
+LIBNVDIMM: DIMM (NMEM)
+^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+	struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
+			const struct attribute_group **groups, unsigned long flags,
+			unsigned long *dsm_mask);
+
+::
+
+	/sys/devices/platform/nfit_test.0/ndbus0
+	|-- nmem0
+	|   |-- available_slots
+	|   |-- commands
+	|   |-- dev
+	|   |-- devtype
+	|   |-- driver -> ../../../../../bus/nd/drivers/nvdimm
+	|   |-- modalias
+	|   |-- nfit
+	|   |   |-- device
+	|   |   |-- format
+	|   |   |-- handle
+	|   |   |-- phys_id
+	|   |   |-- rev_id
+	|   |   |-- serial
+	|   |   `-- vendor
+	|   |-- state
+	|   |-- subsystem -> ../../../../../bus/nd
+	|   `-- uevent
+	|-- nmem1
+	[..]
+
+
+LIBNDCTL: DIMM enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note, in this example we are assuming NFIT-defined DIMMs which are
+identified by an "nfit_handle" a 32-bit value where:
+
+   - Bit 3:0 DIMM number within the memory channel
+   - Bit 7:4 memory channel number
+   - Bit 11:8 memory controller ID
+   - Bit 15:12 socket ID (within scope of a Node controller if node
+     controller is present)
+   - Bit 27:16 Node Controller ID
+   - Bit 31:28 Reserved
+
+::
+
+	static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
+	       unsigned int handle)
+	{
+		struct ndctl_dimm *dimm;
+
+		ndctl_dimm_foreach(bus, dimm)
+			if (ndctl_dimm_get_handle(dimm) == handle)
+				return dimm;
+
+		return NULL;
+	}
+
+	#define DIMM_HANDLE(n, s, i, c, d) \
+		(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
+		 | ((c & 0xf) << 4) | (d & 0xf))
+
+	dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
+
+LIBNVDIMM/LIBNDCTL: Region
+--------------------------
+
+A generic REGION device is registered for each PMEM range or BLK-aperture
+set.  Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
+sets on the "nfit_test.0" bus.  The primary role of regions are to be a
+container of "mappings".  A mapping is a tuple of <DIMM,
+DPA-start-offset, length>.
+
+LIBNVDIMM provides a built-in driver for these REGION devices.  This driver
+is responsible for reconciling the aliased DPA mappings across all
+regions, parsing the LABEL, if present, and then emitting NAMESPACE
+devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
+nd_blk device driver to consume.
+
+In addition to the generic attributes of "mapping"s, "interleave_ways"
+and "size" the REGION device also exports some convenience attributes.
+"nstype" indicates the integer type of namespace-device this region
+emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
+'add' event, "modalias" duplicates the MODALIAS variable stored by udev
+at the 'add' event, and finally, the optional "spa_index" is provided in
+the case where the region is defined by a SPA.
+
+LIBNVDIMM: region::
+
+	struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
+			struct nd_region_desc *ndr_desc);
+	struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
+			struct nd_region_desc *ndr_desc);
+
+::
+
+	/sys/devices/platform/nfit_test.0/ndbus0
+	|-- region0
+	|   |-- available_size
+	|   |-- btt0
+	|   |-- btt_seed
+	|   |-- devtype
+	|   |-- driver -> ../../../../../bus/nd/drivers/nd_region
+	|   |-- init_namespaces
+	|   |-- mapping0
+	|   |-- mapping1
+	|   |-- mappings
+	|   |-- modalias
+	|   |-- namespace0.0
+	|   |-- namespace_seed
+	|   |-- numa_node
+	|   |-- nfit
+	|   |   `-- spa_index
+	|   |-- nstype
+	|   |-- set_cookie
+	|   |-- size
+	|   |-- subsystem -> ../../../../../bus/nd
+	|   `-- uevent
+	|-- region1
+	[..]
+
+LIBNDCTL: region enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sample region retrieval routines based on NFIT-unique data like
+"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
+BLK::
+
+	static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
+			unsigned int spa_index)
+	{
+		struct ndctl_region *region;
+
+		ndctl_region_foreach(bus, region) {
+			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
+				continue;
+			if (ndctl_region_get_spa_index(region) == spa_index)
+				return region;
+		}
+		return NULL;
+	}
+
+	static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
+			unsigned int handle)
+	{
+		struct ndctl_region *region;
+
+		ndctl_region_foreach(bus, region) {
+			struct ndctl_mapping *map;
+
+			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
+				continue;
+			ndctl_mapping_foreach(region, map) {
+				struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
+
+				if (ndctl_dimm_get_handle(dimm) == handle)
+					return region;
+			}
+		}
+		return NULL;
+	}
+
+
+Why Not Encode the Region Type into the Region Name?
+----------------------------------------------------
+
+At first glance it seems since NFIT defines just PMEM and BLK interface
+types that we should simply name REGION devices with something derived
+from those type names.  However, the ND subsystem explicitly keeps the
+REGION name generic and expects userspace to always consider the
+region-attributes for four reasons:
+
+    1. There are already more than two REGION and "namespace" types.  For
+       PMEM there are two subtypes.  As mentioned previously we have PMEM where
+       the constituent DIMM devices are known and anonymous PMEM.  For BLK
+       regions the NFIT specification already anticipates vendor specific
+       implementations.  The exact distinction of what a region contains is in
+       the region-attributes not the region-name or the region-devtype.
+
+    2. A region with zero child-namespaces is a possible configuration.  For
+       example, the NFIT allows for a DCR to be published without a
+       corresponding BLK-aperture.  This equates to a DIMM that can only accept
+       control/configuration messages, but no i/o through a descendant block
+       device.  Again, this "type" is advertised in the attributes ('mappings'
+       == 0) and the name does not tell you much.
+
+    3. What if a third major interface type arises in the future?  Outside
+       of vendor specific implementations, it's not difficult to envision a
+       third class of interface type beyond BLK and PMEM.  With a generic name
+       for the REGION level of the device-hierarchy old userspace
+       implementations can still make sense of new kernel advertised
+       region-types.  Userspace can always rely on the generic region
+       attributes like "mappings", "size", etc and the expected child devices
+       named "namespace".  This generic format of the device-model hierarchy
+       allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
+       future-proof.
+
+    4. There are more robust mechanisms for determining the major type of a
+       region than a device name.  See the next section, How Do I Determine the
+       Major Type of a Region?
+
+How Do I Determine the Major Type of a Region?
+----------------------------------------------
+
+Outside of the blanket recommendation of "use libndctl", or simply
+looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
+"nstype" integer attribute, here are some other options.
+
+1. module alias lookup
+^^^^^^^^^^^^^^^^^^^^^^
+
+    The whole point of region/namespace device type differentiation is to
+    decide which block-device driver will attach to a given LIBNVDIMM namespace.
+    One can simply use the modalias to lookup the resulting module.  It's
+    important to note that this method is robust in the presence of a
+    vendor-specific driver down the road.  If a vendor-specific
+    implementation wants to supplant the standard nd_blk driver it can with
+    minimal impact to the rest of LIBNVDIMM.
+
+    In fact, a vendor may also want to have a vendor-specific region-driver
+    (outside of nd_region).  For example, if a vendor defined its own LABEL
+    format it would need its own region driver to parse that LABEL and emit
+    the resulting namespaces.  The output from module resolution is more
+    accurate than a region-name or region-devtype.
+
+2. udev
+^^^^^^^
+
+    The kernel "devtype" is registered in the udev database::
+
+	# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
+	P: /devices/platform/nfit_test.0/ndbus0/region0
+	E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
+	E: DEVTYPE=nd_pmem
+	E: MODALIAS=nd:t2
+	E: SUBSYSTEM=nd
+
+	# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
+	P: /devices/platform/nfit_test.0/ndbus0/region4
+	E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
+	E: DEVTYPE=nd_blk
+	E: MODALIAS=nd:t3
+	E: SUBSYSTEM=nd
+
+    ...and is available as a region attribute, but keep in mind that the
+    "devtype" does not indicate sub-type variations and scripts should
+    really be understanding the other attributes.
+
+3. type specific attributes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    As it currently stands a BLK-aperture region will never have a
+    "nfit/spa_index" attribute, but neither will a non-NFIT PMEM region.  A
+    BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
+    that does not allow I/O.  A PMEM region with a "mappings" value of zero
+    is a simple system-physical-address range.
+
+
+LIBNVDIMM/LIBNDCTL: Namespace
+-----------------------------
+
+A REGION, after resolving DPA aliasing and LABEL specified boundaries,
+surfaces one or more "namespace" devices.  The arrival of a "namespace"
+device currently triggers either the nd_blk or nd_pmem driver to load
+and register a disk/block device.
+
+LIBNVDIMM: namespace
+^^^^^^^^^^^^^^^^^^^^
+
+Here is a sample layout from the three major types of NAMESPACE where
+namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
+attribute), namespace2.0 represents a BLK namespace (note it has a
+'sector_size' attribute) that, and namespace6.0 represents an anonymous
+PMEM namespace (note that has no 'uuid' attribute due to not support a
+LABEL)::
+
+	/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
+	|-- alt_name
+	|-- devtype
+	|-- dpa_extents
+	|-- force_raw
+	|-- modalias
+	|-- numa_node
+	|-- resource
+	|-- size
+	|-- subsystem -> ../../../../../../bus/nd
+	|-- type
+	|-- uevent
+	`-- uuid
+	/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
+	|-- alt_name
+	|-- devtype
+	|-- dpa_extents
+	|-- force_raw
+	|-- modalias
+	|-- numa_node
+	|-- sector_size
+	|-- size
+	|-- subsystem -> ../../../../../../bus/nd
+	|-- type
+	|-- uevent
+	`-- uuid
+	/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
+	|-- block
+	|   `-- pmem0
+	|-- devtype
+	|-- driver -> ../../../../../../bus/nd/drivers/pmem
+	|-- force_raw
+	|-- modalias
+	|-- numa_node
+	|-- resource
+	|-- size
+	|-- subsystem -> ../../../../../../bus/nd
+	|-- type
+	`-- uevent
+
+LIBNDCTL: namespace enumeration example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Namespaces are indexed relative to their parent region, example below.
+These indexes are mostly static from boot to boot, but subsystem makes
+no guarantees in this regard.  For a static namespace identifier use its
+'uuid' attribute.
+
+::
+
+  static struct ndctl_namespace
+  *get_namespace_by_id(struct ndctl_region *region, unsigned int id)
+  {
+          struct ndctl_namespace *ndns;
+
+          ndctl_namespace_foreach(region, ndns)
+                  if (ndctl_namespace_get_id(ndns) == id)
+                          return ndns;
+
+          return NULL;
+  }
+
+LIBNDCTL: namespace creation example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Idle namespaces are automatically created by the kernel if a given
+region has enough available capacity to create a new namespace.
+Namespace instantiation involves finding an idle namespace and
+configuring it.  For the most part the setting of namespace attributes
+can occur in any order, the only constraint is that 'uuid' must be set
+before 'size'.  This enables the kernel to track DPA allocations
+internally with a static identifier::
+
+  static int configure_namespace(struct ndctl_region *region,
+                  struct ndctl_namespace *ndns,
+                  struct namespace_parameters *parameters)
+  {
+          char devname[50];
+
+          snprintf(devname, sizeof(devname), "namespace%d.%d",
+                          ndctl_region_get_id(region), paramaters->id);
+
+          ndctl_namespace_set_alt_name(ndns, devname);
+          /* 'uuid' must be set prior to setting size! */
+          ndctl_namespace_set_uuid(ndns, paramaters->uuid);
+          ndctl_namespace_set_size(ndns, paramaters->size);
+          /* unlike pmem namespaces, blk namespaces have a sector size */
+          if (parameters->lbasize)
+                  ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
+          ndctl_namespace_enable(ndns);
+  }
+
+
+Why the Term "namespace"?
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    1. Why not "volume" for instance?  "volume" ran the risk of confusing
+       ND (libnvdimm subsystem) to a volume manager like device-mapper.
+
+    2. The term originated to describe the sub-devices that can be created
+       within a NVME controller (see the nvme specification:
+       http://www.nvmexpress.org/specifications/), and NFIT namespaces are
+       meant to parallel the capabilities and configurability of
+       NVME-namespaces.
+
+
+LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
+-------------------------------------------------
+
+A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
+block device driver that fronts either the whole block device or a
+partition of a block device emitted by either a PMEM or BLK NAMESPACE.
+
+LIBNVDIMM: btt layout
+^^^^^^^^^^^^^^^^^^^^^
+
+Every region will start out with at least one BTT device which is the
+seed device.  To activate it set the "namespace", "uuid", and
+"sector_size" attributes and then bind the device to the nd_pmem or
+nd_blk driver depending on the region type::
+
+	/sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
+	|-- namespace
+	|-- delete
+	|-- devtype
+	|-- modalias
+	|-- numa_node
+	|-- sector_size
+	|-- subsystem -> ../../../../../bus/nd
+	|-- uevent
+	`-- uuid
+
+LIBNDCTL: btt creation example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Similar to namespaces an idle BTT device is automatically created per
+region.  Each time this "seed" btt device is configured and enabled a new
+seed is created.  Creating a BTT configuration involves two steps of
+finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
+
+	static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
+	{
+		struct ndctl_btt *btt;
+
+		ndctl_btt_foreach(region, btt)
+			if (!ndctl_btt_is_enabled(btt)
+					&& !ndctl_btt_is_configured(btt))
+				return btt;
+
+		return NULL;
+	}
+
+	static int configure_btt(struct ndctl_region *region,
+			struct btt_parameters *parameters)
+	{
+		btt = get_idle_btt(region);
+
+		ndctl_btt_set_uuid(btt, parameters->uuid);
+		ndctl_btt_set_sector_size(btt, parameters->sector_size);
+		ndctl_btt_set_namespace(btt, parameters->ndns);
+		/* turn off raw mode device */
+		ndctl_namespace_disable(parameters->ndns);
+		/* turn on btt access */
+		ndctl_btt_enable(btt);
+	}
+
+Once instantiated a new inactive btt seed device will appear underneath
+the region.
+
+Once a "namespace" is removed from a BTT that instance of the BTT device
+will be deleted or otherwise reset to default values.  This deletion is
+only at the device model level.  In order to destroy a BTT the "info
+block" needs to be destroyed.  Note, that to destroy a BTT the media
+needs to be written in raw mode.  By default, the kernel will autodetect
+the presence of a BTT and disable raw mode.  This autodetect behavior
+can be suppressed by enabling raw mode for the namespace via the
+ndctl_namespace_set_raw_mode() API.
+
+
+Summary LIBNDCTL Diagram
+------------------------
+
+For the given example above, here is the view of the objects as seen by the
+LIBNDCTL API::
+
+              +---+
+              |CTX|    +---------+   +--------------+  +---------------+
+              +-+-+  +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
+                |    | +---------+   +--------------+  +---------------+
+  +-------+     |    | +---------+   +--------------+  +---------------+
+  | DIMM0 <-+   |    +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
+  +-------+ |   |    | +---------+   +--------------+  +---------------+
+  | DIMM1 <-+ +-v--+ | +---------+   +--------------+  +---------------+
+  +-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6  "blk2.0" |
+  | DIMM2 <-+ +----+ | +---------+ | +--------------+  +----------------------+
+  +-------+ |        |             +-> NAMESPACE2.1 +--> ND5  "blk2.1" | BTT2 |
+  | DIMM3 <-+        |               +--------------+  +----------------------+
+  +-------+          | +---------+   +--------------+  +---------------+
+                     +-> REGION3 +-+-> NAMESPACE3.0 +--> ND4  "blk3.0" |
+                     | +---------+ | +--------------+  +----------------------+
+                     |             +-> NAMESPACE3.1 +--> ND3  "blk3.1" | BTT1 |
+                     |               +--------------+  +----------------------+
+                     | +---------+   +--------------+  +---------------+
+                     +-> REGION4 +---> NAMESPACE4.0 +--> ND2  "blk4.0" |
+                     | +---------+   +--------------+  +---------------+
+                     | +---------+   +--------------+  +----------------------+
+                     +-> REGION5 +---> NAMESPACE5.0 +--> ND1  "blk5.0" | BTT0 |
+                       +---------+   +--------------+  +---------------+------+
diff --git a/Documentation/driver-api/nvdimm/security.rst b/Documentation/driver-api/nvdimm/security.rst
new file mode 100644
index 000000000000..ad9dea099b34
--- /dev/null
+++ b/Documentation/driver-api/nvdimm/security.rst
@@ -0,0 +1,143 @@
+===============
+NVDIMM Security
+===============
+
+1. Introduction
+---------------
+
+With the introduction of Intel Device Specific Methods (DSM) v1.8
+specification [1], security DSMs are introduced. The spec added the following
+security DSMs: "get security state", "set passphrase", "disable passphrase",
+"unlock unit", "freeze lock", "secure erase", and "overwrite". A security_ops
+data structure has been added to struct dimm in order to support the security
+operations and generic APIs are exposed to allow vendor neutral operations.
+
+2. Sysfs Interface
+------------------
+The "security" sysfs attribute is provided in the nvdimm sysfs directory. For
+example:
+/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/security
+
+The "show" attribute of that attribute will display the security state for
+that DIMM. The following states are available: disabled, unlocked, locked,
+frozen, and overwrite. If security is not supported, the sysfs attribute
+will not be visible.
+
+The "store" attribute takes several commands when it is being written to
+in order to support some of the security functionalities:
+update <old_keyid> <new_keyid> - enable or update passphrase.
+disable <keyid> - disable enabled security and remove key.
+freeze - freeze changing of security states.
+erase <keyid> - delete existing user encryption key.
+overwrite <keyid> - wipe the entire nvdimm.
+master_update <keyid> <new_keyid> - enable or update master passphrase.
+master_erase <keyid> - delete existing user encryption key.
+
+3. Key Management
+-----------------
+
+The key is associated to the payload by the DIMM id. For example:
+# cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/nfit/id
+8089-a2-1740-00000133
+The DIMM id would be provided along with the key payload (passphrase) to
+the kernel.
+
+The security keys are managed on the basis of a single key per DIMM. The
+key "passphrase" is expected to be 32bytes long. This is similar to the ATA
+security specification [2]. A key is initially acquired via the request_key()
+kernel API call during nvdimm unlock. It is up to the user to make sure that
+all the keys are in the kernel user keyring for unlock.
+
+A nvdimm encrypted-key of format enc32 has the description format of:
+nvdimm:<bus-provider-specific-unique-id>
+
+See file ``Documentation/security/keys/trusted-encrypted.rst`` for creating
+encrypted-keys of enc32 format. TPM usage with a master trusted key is
+preferred for sealing the encrypted-keys.
+
+4. Unlocking
+------------
+When the DIMMs are being enumerated by the kernel, the kernel will attempt to
+retrieve the key from the kernel user keyring. This is the only time
+a locked DIMM can be unlocked. Once unlocked, the DIMM will remain unlocked
+until reboot. Typically an entity (i.e. shell script) will inject all the
+relevant encrypted-keys into the kernel user keyring during the initramfs phase.
+This provides the unlock function access to all the related keys that contain
+the passphrase for the respective nvdimms.  It is also recommended that the
+keys are injected before libnvdimm is loaded by modprobe.
+
+5. Update
+---------
+When doing an update, it is expected that the existing key is removed from
+the kernel user keyring and reinjected as different (old) key. It's irrelevant
+what the key description is for the old key since we are only interested in the
+keyid when doing the update operation. It is also expected that the new key
+is injected with the description format described from earlier in this
+document.  The update command written to the sysfs attribute will be with
+the format:
+update <old keyid> <new keyid>
+
+If there is no old keyid due to a security enabling, then a 0 should be
+passed in.
+
+6. Freeze
+---------
+The freeze operation does not require any keys. The security config can be
+frozen by a user with root privelege.
+
+7. Disable
+----------
+The security disable command format is:
+disable <keyid>
+
+An key with the current passphrase payload that is tied to the nvdimm should be
+in the kernel user keyring.
+
+8. Secure Erase
+---------------
+The command format for doing a secure erase is:
+erase <keyid>
+
+An key with the current passphrase payload that is tied to the nvdimm should be
+in the kernel user keyring.
+
+9. Overwrite
+------------
+The command format for doing an overwrite is:
+overwrite <keyid>
+
+Overwrite can be done without a key if security is not enabled. A key serial
+of 0 can be passed in to indicate no key.
+
+The sysfs attribute "security" can be polled to wait on overwrite completion.
+Overwrite can last tens of minutes or more depending on nvdimm size.
+
+An encrypted-key with the current user passphrase that is tied to the nvdimm
+should be injected and its keyid should be passed in via sysfs.
+
+10. Master Update
+-----------------
+The command format for doing a master update is:
+update <old keyid> <new keyid>
+
+The operating mechanism for master update is identical to update except the
+master passphrase key is passed to the kernel. The master passphrase key
+is just another encrypted-key.
+
+This command is only available when security is disabled.
+
+11. Master Erase
+----------------
+The command format for doing a master erase is:
+master_erase <current keyid>
+
+This command has the same operating mechanism as erase except the master
+passphrase key is passed to the kernel. The master passphrase key is just
+another encrypted-key.
+
+This command is only available when the master security is enabled, indicated
+by the extended security status.
+
+[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
+
+[2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf
diff --git a/Documentation/nvdimm/btt.rst b/Documentation/nvdimm/btt.rst
deleted file mode 100644
index 2d8269f834bd..000000000000
--- a/Documentation/nvdimm/btt.rst
+++ /dev/null
@@ -1,285 +0,0 @@
-=============================
-BTT - Block Translation Table
-=============================
-
-
-1. Introduction
-===============
-
-Persistent memory based storage is able to perform IO at byte (or more
-accurately, cache line) granularity. However, we often want to expose such
-storage as traditional block devices. The block drivers for persistent memory
-will do exactly this. However, they do not provide any atomicity guarantees.
-Traditional SSDs typically provide protection against torn sectors in hardware,
-using stored energy in capacitors to complete in-flight block writes, or perhaps
-in firmware. We don't have this luxury with persistent memory - if a write is in
-progress, and we experience a power failure, the block will contain a mix of old
-and new data. Applications may not be prepared to handle such a scenario.
-
-The Block Translation Table (BTT) provides atomic sector update semantics for
-persistent memory devices, so that applications that rely on sector writes not
-being torn can continue to do so. The BTT manifests itself as a stacked block
-device, and reserves a portion of the underlying storage for its metadata. At
-the heart of it, is an indirection table that re-maps all the blocks on the
-volume. It can be thought of as an extremely simple file system that only
-provides atomic sector updates.
-
-
-2. Static Layout
-================
-
-The underlying storage on which a BTT can be laid out is not limited in any way.
-The BTT, however, splits the available space into chunks of up to 512 GiB,
-called "Arenas".
-
-Each arena follows the same layout for its metadata, and all references in an
-arena are internal to it (with the exception of one field that points to the
-next arena). The following depicts the "On-disk" metadata layout::
-
-
-    Backing Store     +------->  Arena
-  +---------------+   |   +------------------+
-  |               |   |   | Arena info block |
-  |    Arena 0    +---+   |       4K         |
-  |     512G      |       +------------------+
-  |               |       |                  |
-  +---------------+       |                  |
-  |               |       |                  |
-  |    Arena 1    |       |   Data Blocks    |
-  |     512G      |       |                  |
-  |               |       |                  |
-  +---------------+       |                  |
-  |       .       |       |                  |
-  |       .       |       |                  |
-  |       .       |       |                  |
-  |               |       |                  |
-  |               |       |                  |
-  +---------------+       +------------------+
-                          |                  |
-                          |     BTT Map      |
-                          |                  |
-                          |                  |
-                          +------------------+
-                          |                  |
-                          |     BTT Flog     |
-                          |                  |
-                          +------------------+
-                          | Info block copy  |
-                          |       4K         |
-                          +------------------+
-
-
-3. Theory of Operation
-======================
-
-
-a. The BTT Map
---------------
-
-The map is a simple lookup/indirection table that maps an LBA to an internal
-block. Each map entry is 32 bits. The two most significant bits are special
-flags, and the remaining form the internal block number.
-
-======== =============================================================
-Bit      Description
-======== =============================================================
-31 - 30	 Error and Zero flags - Used in the following way:
-
-	   == ==  ====================================================
-	   31 30  Description
-	   == ==  ====================================================
-	   0  0	  Initial state. Reads return zeroes; Premap = Postmap
-	   0  1	  Zero state: Reads return zeroes
-	   1  0	  Error state: Reads fail; Writes clear 'E' bit
-	   1  1	  Normal Block – has valid postmap
-	   == ==  ====================================================
-
-29 - 0	 Mappings to internal 'postmap' blocks
-======== =============================================================
-
-
-Some of the terminology that will be subsequently used:
-
-============	================================================================
-External LBA	LBA as made visible to upper layers.
-ABA		Arena Block Address - Block offset/number within an arena
-Premap ABA	The block offset into an arena, which was decided upon by range
-		checking the External LBA
-Postmap ABA	The block number in the "Data Blocks" area obtained after
-		indirection from the map
-nfree		The number of free blocks that are maintained at any given time.
-		This is the number of concurrent writes that can happen to the
-		arena.
-============	================================================================
-
-
-For example, after adding a BTT, we surface a disk of 1024G. We get a read for
-the external LBA at 768G. This falls into the second arena, and of the 512G
-worth of blocks that this arena contributes, this block is at 256G. Thus, the
-premap ABA is 256G. We now refer to the map, and find out the mapping for block
-'X' (256G) points to block 'Y', say '64'. Thus the postmap ABA is 64.
-
-
-b. The BTT Flog
----------------
-
-The BTT provides sector atomicity by making every write an "allocating write",
-i.e. Every write goes to a "free" block. A running list of free blocks is
-maintained in the form of the BTT flog. 'Flog' is a combination of the words
-"free list" and "log". The flog contains 'nfree' entries, and an entry contains:
-
-========  =====================================================================
-lba       The premap ABA that is being written to
-old_map   The old postmap ABA - after 'this' write completes, this will be a
-	  free block.
-new_map   The new postmap ABA. The map will up updated to reflect this
-	  lba->postmap_aba mapping, but we log it here in case we have to
-	  recover.
-seq	  Sequence number to mark which of the 2 sections of this flog entry is
-	  valid/newest. It cycles between 01->10->11->01 (binary) under normal
-	  operation, with 00 indicating an uninitialized state.
-lba'	  alternate lba entry
-old_map'  alternate old postmap entry
-new_map'  alternate new postmap entry
-seq'	  alternate sequence number.
-========  =====================================================================
-
-Each of the above fields is 32-bit, making one entry 32 bytes. Entries are also
-padded to 64 bytes to avoid cache line sharing or aliasing. Flog updates are
-done such that for any entry being written, it:
-a. overwrites the 'old' section in the entry based on sequence numbers
-b. writes the 'new' section such that the sequence number is written last.
-
-
-c. The concept of lanes
------------------------
-
-While 'nfree' describes the number of concurrent IOs an arena can process
-concurrently, 'nlanes' is the number of IOs the BTT device as a whole can
-process::
-
-	nlanes = min(nfree, num_cpus)
-
-A lane number is obtained at the start of any IO, and is used for indexing into
-all the on-disk and in-memory data structures for the duration of the IO. If
-there are more CPUs than the max number of available lanes, than lanes are
-protected by spinlocks.
-
-
-d. In-memory data structure: Read Tracking Table (RTT)
-------------------------------------------------------
-
-Consider a case where we have two threads, one doing reads and the other,
-writes. We can hit a condition where the writer thread grabs a free block to do
-a new IO, but the (slow) reader thread is still reading from it. In other words,
-the reader consulted a map entry, and started reading the corresponding block. A
-writer started writing to the same external LBA, and finished the write updating
-the map for that external LBA to point to its new postmap ABA. At this point the
-internal, postmap block that the reader is (still) reading has been inserted
-into the list of free blocks. If another write comes in for the same LBA, it can
-grab this free block, and start writing to it, causing the reader to read
-incorrect data. To prevent this, we introduce the RTT.
-
-The RTT is a simple, per arena table with 'nfree' entries. Every reader inserts
-into rtt[lane_number], the postmap ABA it is reading, and clears it after the
-read is complete. Every writer thread, after grabbing a free block, checks the
-RTT for its presence. If the postmap free block is in the RTT, it waits till the
-reader clears the RTT entry, and only then starts writing to it.
-
-
-e. In-memory data structure: map locks
---------------------------------------
-
-Consider a case where two writer threads are writing to the same LBA. There can
-be a race in the following sequence of steps::
-
-	free[lane] = map[premap_aba]
-	map[premap_aba] = postmap_aba
-
-Both threads can update their respective free[lane] with the same old, freed
-postmap_aba. This has made the layout inconsistent by losing a free entry, and
-at the same time, duplicating another free entry for two lanes.
-
-To solve this, we could have a single map lock (per arena) that has to be taken
-before performing the above sequence, but we feel that could be too contentious.
-Instead we use an array of (nfree) map_locks that is indexed by
-(premap_aba modulo nfree).
-
-
-f. Reconstruction from the Flog
--------------------------------
-
-On startup, we analyze the BTT flog to create our list of free blocks. We walk
-through all the entries, and for each lane, of the set of two possible
-'sections', we always look at the most recent one only (based on the sequence
-number). The reconstruction rules/steps are simple:
-
-- Read map[log_entry.lba].
-- If log_entry.new matches the map entry, then log_entry.old is free.
-- If log_entry.new does not match the map entry, then log_entry.new is free.
-  (This case can only be caused by power-fails/unsafe shutdowns)
-
-
-g. Summarizing - Read and Write flows
--------------------------------------
-
-Read:
-
-1.  Convert external LBA to arena number + pre-map ABA
-2.  Get a lane (and take lane_lock)
-3.  Read map to get the entry for this pre-map ABA
-4.  Enter post-map ABA into RTT[lane]
-5.  If TRIM flag set in map, return zeroes, and end IO (go to step 8)
-6.  If ERROR flag set in map, end IO with EIO (go to step 8)
-7.  Read data from this block
-8.  Remove post-map ABA entry from RTT[lane]
-9.  Release lane (and lane_lock)
-
-Write:
-
-1.  Convert external LBA to Arena number + pre-map ABA
-2.  Get a lane (and take lane_lock)
-3.  Use lane to index into in-memory free list and obtain a new block, next flog
-    index, next sequence number
-4.  Scan the RTT to check if free block is present, and spin/wait if it is.
-5.  Write data to this free block
-6.  Read map to get the existing post-map ABA entry for this pre-map ABA
-7.  Write flog entry: [premap_aba / old postmap_aba / new postmap_aba / seq_num]
-8.  Write new post-map ABA into map.
-9.  Write old post-map entry into the free list
-10. Calculate next sequence number and write into the free list entry
-11. Release lane (and lane_lock)
-
-
-4. Error Handling
-=================
-
-An arena would be in an error state if any of the metadata is corrupted
-irrecoverably, either due to a bug or a media error. The following conditions
-indicate an error:
-
-- Info block checksum does not match (and recovering from the copy also fails)
-- All internal available blocks are not uniquely and entirely addressed by the
-  sum of mapped blocks and free blocks (from the BTT flog).
-- Rebuilding free list from the flog reveals missing/duplicate/impossible
-  entries
-- A map entry is out of bounds
-
-If any of these error conditions are encountered, the arena is put into a read
-only state using a flag in the info block.
-
-
-5. Usage
-========
-
-The BTT can be set up on any disk (namespace) exposed by the libnvdimm subsystem
-(pmem, or blk mode). The easiest way to set up such a namespace is using the
-'ndctl' utility [1]:
-
-For example, the ndctl command line to setup a btt with a 4k sector size is::
-
-    ndctl create-namespace -f -e namespace0.0 -m sector -l 4k
-
-See ndctl create-namespace --help for more options.
-
-[1]: https://github.com/pmem/ndctl
diff --git a/Documentation/nvdimm/index.rst b/Documentation/nvdimm/index.rst
deleted file mode 100644
index 1a3402d3775e..000000000000
--- a/Documentation/nvdimm/index.rst
+++ /dev/null
@@ -1,12 +0,0 @@
-:orphan:
-
-===================================
-Non-Volatile Memory Device (NVDIMM)
-===================================
-
-.. toctree::
-   :maxdepth: 1
-
-   nvdimm
-   btt
-   security
diff --git a/Documentation/nvdimm/nvdimm.rst b/Documentation/nvdimm/nvdimm.rst
deleted file mode 100644
index 08f855cbb4e6..000000000000
--- a/Documentation/nvdimm/nvdimm.rst
+++ /dev/null
@@ -1,887 +0,0 @@
-===============================
-LIBNVDIMM: Non-Volatile Devices
-===============================
-
-libnvdimm - kernel / libndctl - userspace helper library
-
-linux-nvdimm@lists.01.org
-
-Version 13
-
-.. contents:
-
-	Glossary
-	Overview
-	    Supporting Documents
-	    Git Trees
-	LIBNVDIMM PMEM and BLK
-	Why BLK?
-	    PMEM vs BLK
-	        BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
-	Example NVDIMM Platform
-	LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
-	    LIBNDCTL: Context
-	        libndctl: instantiate a new library context example
-	    LIBNVDIMM/LIBNDCTL: Bus
-	        libnvdimm: control class device in /sys/class
-	        libnvdimm: bus
-	        libndctl: bus enumeration example
-	    LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
-	        libnvdimm: DIMM (NMEM)
-	        libndctl: DIMM enumeration example
-	    LIBNVDIMM/LIBNDCTL: Region
-	        libnvdimm: region
-	        libndctl: region enumeration example
-	        Why Not Encode the Region Type into the Region Name?
-	        How Do I Determine the Major Type of a Region?
-	    LIBNVDIMM/LIBNDCTL: Namespace
-	        libnvdimm: namespace
-	        libndctl: namespace enumeration example
-	        libndctl: namespace creation example
-	        Why the Term "namespace"?
-	    LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
-	        libnvdimm: btt layout
-	        libndctl: btt creation example
-	Summary LIBNDCTL Diagram
-
-
-Glossary
-========
-
-PMEM:
-  A system-physical-address range where writes are persistent.  A
-  block device composed of PMEM is capable of DAX.  A PMEM address range
-  may span an interleave of several DIMMs.
-
-BLK:
-  A set of one or more programmable memory mapped apertures provided
-  by a DIMM to access its media.  This indirection precludes the
-  performance benefit of interleaving, but enables DIMM-bounded failure
-  modes.
-
-DPA:
-  DIMM Physical Address, is a DIMM-relative offset.  With one DIMM in
-  the system there would be a 1:1 system-physical-address:DPA association.
-  Once more DIMMs are added a memory controller interleave must be
-  decoded to determine the DPA associated with a given
-  system-physical-address.  BLK capacity always has a 1:1 relationship
-  with a single-DIMM's DPA range.
-
-DAX:
-  File system extensions to bypass the page cache and block layer to
-  mmap persistent memory, from a PMEM block device, directly into a
-  process address space.
-
-DSM:
-  Device Specific Method: ACPI method to to control specific
-  device - in this case the firmware.
-
-DCR:
-  NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
-  It defines a vendor-id, device-id, and interface format for a given DIMM.
-
-BTT:
-  Block Translation Table: Persistent memory is byte addressable.
-  Existing software may have an expectation that the power-fail-atomicity
-  of writes is at least one sector, 512 bytes.  The BTT is an indirection
-  table with atomic update semantics to front a PMEM/BLK block device
-  driver and present arbitrary atomic sector sizes.
-
-LABEL:
-  Metadata stored on a DIMM device that partitions and identifies
-  (persistently names) storage between PMEM and BLK.  It also partitions
-  BLK storage to host BTTs with different parameters per BLK-partition.
-  Note that traditional partition tables, GPT/MBR, are layered on top of a
-  BLK or PMEM device.
-
-
-Overview
-========
-
-The LIBNVDIMM subsystem provides support for three types of NVDIMMs, namely,
-PMEM, BLK, and NVDIMM devices that can simultaneously support both PMEM
-and BLK mode access.  These three modes of operation are described by
-the "NVDIMM Firmware Interface Table" (NFIT) in ACPI 6.  While the LIBNVDIMM
-implementation is generic and supports pre-NFIT platforms, it was guided
-by the superset of capabilities need to support this ACPI 6 definition
-for NVDIMM resources.  The bulk of the kernel implementation is in place
-to handle the case where DPA accessible via PMEM is aliased with DPA
-accessible via BLK.  When that occurs a LABEL is needed to reserve DPA
-for exclusive access via one mode a time.
-
-Supporting Documents
---------------------
-
-ACPI 6:
-	http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
-NVDIMM Namespace:
-	http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
-DSM Interface Example:
-	http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
-Driver Writer's Guide:
-	http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
-
-Git Trees
----------
-
-LIBNVDIMM:
-	https://git.kernel.org/cgit/linux/kernel/git/djbw/nvdimm.git
-LIBNDCTL:
-	https://github.com/pmem/ndctl.git
-PMEM:
-	https://github.com/01org/prd
-
-
-LIBNVDIMM PMEM and BLK
-======================
-
-Prior to the arrival of the NFIT, non-volatile memory was described to a
-system in various ad-hoc ways.  Usually only the bare minimum was
-provided, namely, a single system-physical-address range where writes
-are expected to be durable after a system power loss.  Now, the NFIT
-specification standardizes not only the description of PMEM, but also
-BLK and platform message-passing entry points for control and
-configuration.
-
-For each NVDIMM access method (PMEM, BLK), LIBNVDIMM provides a block
-device driver:
-
-    1. PMEM (nd_pmem.ko): Drives a system-physical-address range.  This
-       range is contiguous in system memory and may be interleaved (hardware
-       memory controller striped) across multiple DIMMs.  When interleaved the
-       platform may optionally provide details of which DIMMs are participating
-       in the interleave.
-
-       Note that while LIBNVDIMM describes system-physical-address ranges that may
-       alias with BLK access as ND_NAMESPACE_PMEM ranges and those without
-       alias as ND_NAMESPACE_IO ranges, to the nd_pmem driver there is no
-       distinction.  The different device-types are an implementation detail
-       that userspace can exploit to implement policies like "only interface
-       with address ranges from certain DIMMs".  It is worth noting that when
-       aliasing is present and a DIMM lacks a label, then no block device can
-       be created by default as userspace needs to do at least one allocation
-       of DPA to the PMEM range.  In contrast ND_NAMESPACE_IO ranges, once
-       registered, can be immediately attached to nd_pmem.
-
-    2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
-       defined apertures.  A set of apertures will access just one DIMM.
-       Multiple windows (apertures) allow multiple concurrent accesses, much like
-       tagged-command-queuing, and would likely be used by different threads or
-       different CPUs.
-
-       The NFIT specification defines a standard format for a BLK-aperture, but
-       the spec also allows for vendor specific layouts, and non-NFIT BLK
-       implementations may have other designs for BLK I/O.  For this reason
-       "nd_blk" calls back into platform-specific code to perform the I/O.
-
-       One such implementation is defined in the "Driver Writer's Guide" and "DSM
-       Interface Example".
-
-
-Why BLK?
-========
-
-While PMEM provides direct byte-addressable CPU-load/store access to
-NVDIMM storage, it does not provide the best system RAS (recovery,
-availability, and serviceability) model.  An access to a corrupted
-system-physical-address address causes a CPU exception while an access
-to a corrupted address through an BLK-aperture causes that block window
-to raise an error status in a register.  The latter is more aligned with
-the standard error model that host-bus-adapter attached disks present.
-
-Also, if an administrator ever wants to replace a memory it is easier to
-service a system at DIMM module boundaries.  Compare this to PMEM where
-data could be interleaved in an opaque hardware specific manner across
-several DIMMs.
-
-PMEM vs BLK
------------
-
-BLK-apertures solve these RAS problems, but their presence is also the
-major contributing factor to the complexity of the ND subsystem.  They
-complicate the implementation because PMEM and BLK alias in DPA space.
-Any given DIMM's DPA-range may contribute to one or more
-system-physical-address sets of interleaved DIMMs, *and* may also be
-accessed in its entirety through its BLK-aperture.  Accessing a DPA
-through a system-physical-address while simultaneously accessing the
-same DPA through a BLK-aperture has undefined results.  For this reason,
-DIMMs with this dual interface configuration include a DSM function to
-store/retrieve a LABEL.  The LABEL effectively partitions the DPA-space
-into exclusive system-physical-address and BLK-aperture accessible
-regions.  For simplicity a DIMM is allowed a PMEM "region" per each
-interleave set in which it is a member.  The remaining DPA space can be
-carved into an arbitrary number of BLK devices with discontiguous
-extents.
-
-BLK-REGIONs, PMEM-REGIONs, Atomic Sectors, and DAX
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-One of the few
-reasons to allow multiple BLK namespaces per REGION is so that each
-BLK-namespace can be configured with a BTT with unique atomic sector
-sizes.  While a PMEM device can host a BTT the LABEL specification does
-not provide for a sector size to be specified for a PMEM namespace.
-
-This is due to the expectation that the primary usage model for PMEM is
-via DAX, and the BTT is incompatible with DAX.  However, for the cases
-where an application or filesystem still needs atomic sector update
-guarantees it can register a BTT on a PMEM device or partition.  See
-LIBNVDIMM/NDCTL: Block Translation Table "btt"
-
-
-Example NVDIMM Platform
-=======================
-
-For the remainder of this document the following diagram will be
-referenced for any example sysfs layouts::
-
-
-                               (a)               (b)           DIMM   BLK-REGION
-            +-------------------+--------+--------+--------+
-  +------+  |       pm0.0       | blk2.0 | pm1.0  | blk2.1 |    0      region2
-  | imc0 +--+- - - region0- - - +--------+        +--------+
-  +--+---+  |       pm0.0       | blk3.0 | pm1.0  | blk3.1 |    1      region3
-     |      +-------------------+--------v        v--------+
-  +--+---+                               |                 |
-  | cpu0 |                                     region1
-  +--+---+                               |                 |
-     |      +----------------------------^        ^--------+
-  +--+---+  |           blk4.0           | pm1.0  | blk4.0 |    2      region4
-  | imc1 +--+----------------------------|        +--------+
-  +------+  |           blk5.0           | pm1.0  | blk5.0 |    3      region5
-            +----------------------------+--------+--------+
-
-In this platform we have four DIMMs and two memory controllers in one
-socket.  Each unique interface (BLK or PMEM) to DPA space is identified
-by a region device with a dynamically assigned id (REGION0 - REGION5).
-
-    1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
-       single PMEM namespace is created in the REGION0-SPA-range that spans most
-       of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
-       interleaved system-physical-address range is reclaimed as BLK-aperture
-       accessed space starting at DPA-offset (a) into each DIMM.  In that
-       reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
-       REGION3 where "blk2.0" and "blk3.0" are just human readable names that
-       could be set to any user-desired name in the LABEL.
-
-    2. In the last portion of DIMM0 and DIMM1 we have an interleaved
-       system-physical-address range, REGION1, that spans those two DIMMs as
-       well as DIMM2 and DIMM3.  Some of REGION1 is allocated to a PMEM namespace
-       named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
-       each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
-       "blk5.0".
-
-    3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
-       interleaved system-physical-address range (i.e. the DPA address past
-       offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
-       Note, that this example shows that BLK-aperture namespaces don't need to
-       be contiguous in DPA-space.
-
-    This bus is provided by the kernel under the device
-    /sys/devices/platform/nfit_test.0 when CONFIG_NFIT_TEST is enabled and
-    the nfit_test.ko module is loaded.  This not only test LIBNVDIMM but the
-    acpi_nfit.ko driver as well.
-
-
-LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
-========================================================
-
-What follows is a description of the LIBNVDIMM sysfs layout and a
-corresponding object hierarchy diagram as viewed through the LIBNDCTL
-API.  The example sysfs paths and diagrams are relative to the Example
-NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
-test.
-
-LIBNDCTL: Context
------------------
-
-Every API call in the LIBNDCTL library requires a context that holds the
-logging parameters and other library instance state.  The library is
-based on the libabc template:
-
-	https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
-
-LIBNDCTL: instantiate a new library context example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-::
-
-	struct ndctl_ctx *ctx;
-
-	if (ndctl_new(&ctx) == 0)
-		return ctx;
-	else
-		return NULL;
-
-LIBNVDIMM/LIBNDCTL: Bus
------------------------
-
-A bus has a 1:1 relationship with an NFIT.  The current expectation for
-ACPI based systems is that there is only ever one platform-global NFIT.
-That said, it is trivial to register multiple NFITs, the specification
-does not preclude it.  The infrastructure supports multiple busses and
-we use this capability to test multiple NFIT configurations in the unit
-test.
-
-LIBNVDIMM: control class device in /sys/class
----------------------------------------------
-
-This character device accepts DSM messages to be passed to DIMM
-identified by its NFIT handle::
-
-	/sys/class/nd/ndctl0
-	|-- dev
-	|-- device -> ../../../ndbus0
-	|-- subsystem -> ../../../../../../../class/nd
-
-
-
-LIBNVDIMM: bus
---------------
-
-::
-
-	struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
-	       struct nvdimm_bus_descriptor *nfit_desc);
-
-::
-
-	/sys/devices/platform/nfit_test.0/ndbus0
-	|-- commands
-	|-- nd
-	|-- nfit
-	|-- nmem0
-	|-- nmem1
-	|-- nmem2
-	|-- nmem3
-	|-- power
-	|-- provider
-	|-- region0
-	|-- region1
-	|-- region2
-	|-- region3
-	|-- region4
-	|-- region5
-	|-- uevent
-	`-- wait_probe
-
-LIBNDCTL: bus enumeration example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Find the bus handle that describes the bus from Example NVDIMM Platform::
-
-	static struct ndctl_bus *get_bus_by_provider(struct ndctl_ctx *ctx,
-			const char *provider)
-	{
-		struct ndctl_bus *bus;
-
-		ndctl_bus_foreach(ctx, bus)
-			if (strcmp(provider, ndctl_bus_get_provider(bus)) == 0)
-				return bus;
-
-		return NULL;
-	}
-
-	bus = get_bus_by_provider(ctx, "nfit_test.0");
-
-
-LIBNVDIMM/LIBNDCTL: DIMM (NMEM)
--------------------------------
-
-The DIMM device provides a character device for sending commands to
-hardware, and it is a container for LABELs.  If the DIMM is defined by
-NFIT then an optional 'nfit' attribute sub-directory is available to add
-NFIT-specifics.
-
-Note that the kernel device name for "DIMMs" is "nmemX".  The NFIT
-describes these devices via "Memory Device to System Physical Address
-Range Mapping Structure", and there is no requirement that they actually
-be physical DIMMs, so we use a more generic name.
-
-LIBNVDIMM: DIMM (NMEM)
-^^^^^^^^^^^^^^^^^^^^^^
-
-::
-
-	struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, void *provider_data,
-			const struct attribute_group **groups, unsigned long flags,
-			unsigned long *dsm_mask);
-
-::
-
-	/sys/devices/platform/nfit_test.0/ndbus0
-	|-- nmem0
-	|   |-- available_slots
-	|   |-- commands
-	|   |-- dev
-	|   |-- devtype
-	|   |-- driver -> ../../../../../bus/nd/drivers/nvdimm
-	|   |-- modalias
-	|   |-- nfit
-	|   |   |-- device
-	|   |   |-- format
-	|   |   |-- handle
-	|   |   |-- phys_id
-	|   |   |-- rev_id
-	|   |   |-- serial
-	|   |   `-- vendor
-	|   |-- state
-	|   |-- subsystem -> ../../../../../bus/nd
-	|   `-- uevent
-	|-- nmem1
-	[..]
-
-
-LIBNDCTL: DIMM enumeration example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Note, in this example we are assuming NFIT-defined DIMMs which are
-identified by an "nfit_handle" a 32-bit value where:
-
-   - Bit 3:0 DIMM number within the memory channel
-   - Bit 7:4 memory channel number
-   - Bit 11:8 memory controller ID
-   - Bit 15:12 socket ID (within scope of a Node controller if node
-     controller is present)
-   - Bit 27:16 Node Controller ID
-   - Bit 31:28 Reserved
-
-::
-
-	static struct ndctl_dimm *get_dimm_by_handle(struct ndctl_bus *bus,
-	       unsigned int handle)
-	{
-		struct ndctl_dimm *dimm;
-
-		ndctl_dimm_foreach(bus, dimm)
-			if (ndctl_dimm_get_handle(dimm) == handle)
-				return dimm;
-
-		return NULL;
-	}
-
-	#define DIMM_HANDLE(n, s, i, c, d) \
-		(((n & 0xfff) << 16) | ((s & 0xf) << 12) | ((i & 0xf) << 8) \
-		 | ((c & 0xf) << 4) | (d & 0xf))
-
-	dimm = get_dimm_by_handle(bus, DIMM_HANDLE(0, 0, 0, 0, 0));
-
-LIBNVDIMM/LIBNDCTL: Region
---------------------------
-
-A generic REGION device is registered for each PMEM range or BLK-aperture
-set.  Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
-sets on the "nfit_test.0" bus.  The primary role of regions are to be a
-container of "mappings".  A mapping is a tuple of <DIMM,
-DPA-start-offset, length>.
-
-LIBNVDIMM provides a built-in driver for these REGION devices.  This driver
-is responsible for reconciling the aliased DPA mappings across all
-regions, parsing the LABEL, if present, and then emitting NAMESPACE
-devices with the resolved/exclusive DPA-boundaries for the nd_pmem or
-nd_blk device driver to consume.
-
-In addition to the generic attributes of "mapping"s, "interleave_ways"
-and "size" the REGION device also exports some convenience attributes.
-"nstype" indicates the integer type of namespace-device this region
-emits, "devtype" duplicates the DEVTYPE variable stored by udev at the
-'add' event, "modalias" duplicates the MODALIAS variable stored by udev
-at the 'add' event, and finally, the optional "spa_index" is provided in
-the case where the region is defined by a SPA.
-
-LIBNVDIMM: region::
-
-	struct nd_region *nvdimm_pmem_region_create(struct nvdimm_bus *nvdimm_bus,
-			struct nd_region_desc *ndr_desc);
-	struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus,
-			struct nd_region_desc *ndr_desc);
-
-::
-
-	/sys/devices/platform/nfit_test.0/ndbus0
-	|-- region0
-	|   |-- available_size
-	|   |-- btt0
-	|   |-- btt_seed
-	|   |-- devtype
-	|   |-- driver -> ../../../../../bus/nd/drivers/nd_region
-	|   |-- init_namespaces
-	|   |-- mapping0
-	|   |-- mapping1
-	|   |-- mappings
-	|   |-- modalias
-	|   |-- namespace0.0
-	|   |-- namespace_seed
-	|   |-- numa_node
-	|   |-- nfit
-	|   |   `-- spa_index
-	|   |-- nstype
-	|   |-- set_cookie
-	|   |-- size
-	|   |-- subsystem -> ../../../../../bus/nd
-	|   `-- uevent
-	|-- region1
-	[..]
-
-LIBNDCTL: region enumeration example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Sample region retrieval routines based on NFIT-unique data like
-"spa_index" (interleave set id) for PMEM and "nfit_handle" (dimm id) for
-BLK::
-
-	static struct ndctl_region *get_pmem_region_by_spa_index(struct ndctl_bus *bus,
-			unsigned int spa_index)
-	{
-		struct ndctl_region *region;
-
-		ndctl_region_foreach(bus, region) {
-			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_PMEM)
-				continue;
-			if (ndctl_region_get_spa_index(region) == spa_index)
-				return region;
-		}
-		return NULL;
-	}
-
-	static struct ndctl_region *get_blk_region_by_dimm_handle(struct ndctl_bus *bus,
-			unsigned int handle)
-	{
-		struct ndctl_region *region;
-
-		ndctl_region_foreach(bus, region) {
-			struct ndctl_mapping *map;
-
-			if (ndctl_region_get_type(region) != ND_DEVICE_REGION_BLOCK)
-				continue;
-			ndctl_mapping_foreach(region, map) {
-				struct ndctl_dimm *dimm = ndctl_mapping_get_dimm(map);
-
-				if (ndctl_dimm_get_handle(dimm) == handle)
-					return region;
-			}
-		}
-		return NULL;
-	}
-
-
-Why Not Encode the Region Type into the Region Name?
-----------------------------------------------------
-
-At first glance it seems since NFIT defines just PMEM and BLK interface
-types that we should simply name REGION devices with something derived
-from those type names.  However, the ND subsystem explicitly keeps the
-REGION name generic and expects userspace to always consider the
-region-attributes for four reasons:
-
-    1. There are already more than two REGION and "namespace" types.  For
-       PMEM there are two subtypes.  As mentioned previously we have PMEM where
-       the constituent DIMM devices are known and anonymous PMEM.  For BLK
-       regions the NFIT specification already anticipates vendor specific
-       implementations.  The exact distinction of what a region contains is in
-       the region-attributes not the region-name or the region-devtype.
-
-    2. A region with zero child-namespaces is a possible configuration.  For
-       example, the NFIT allows for a DCR to be published without a
-       corresponding BLK-aperture.  This equates to a DIMM that can only accept
-       control/configuration messages, but no i/o through a descendant block
-       device.  Again, this "type" is advertised in the attributes ('mappings'
-       == 0) and the name does not tell you much.
-
-    3. What if a third major interface type arises in the future?  Outside
-       of vendor specific implementations, it's not difficult to envision a
-       third class of interface type beyond BLK and PMEM.  With a generic name
-       for the REGION level of the device-hierarchy old userspace
-       implementations can still make sense of new kernel advertised
-       region-types.  Userspace can always rely on the generic region
-       attributes like "mappings", "size", etc and the expected child devices
-       named "namespace".  This generic format of the device-model hierarchy
-       allows the LIBNVDIMM and LIBNDCTL implementations to be more uniform and
-       future-proof.
-
-    4. There are more robust mechanisms for determining the major type of a
-       region than a device name.  See the next section, How Do I Determine the
-       Major Type of a Region?
-
-How Do I Determine the Major Type of a Region?
-----------------------------------------------
-
-Outside of the blanket recommendation of "use libndctl", or simply
-looking at the kernel header (/usr/include/linux/ndctl.h) to decode the
-"nstype" integer attribute, here are some other options.
-
-1. module alias lookup
-^^^^^^^^^^^^^^^^^^^^^^
-
-    The whole point of region/namespace device type differentiation is to
-    decide which block-device driver will attach to a given LIBNVDIMM namespace.
-    One can simply use the modalias to lookup the resulting module.  It's
-    important to note that this method is robust in the presence of a
-    vendor-specific driver down the road.  If a vendor-specific
-    implementation wants to supplant the standard nd_blk driver it can with
-    minimal impact to the rest of LIBNVDIMM.
-
-    In fact, a vendor may also want to have a vendor-specific region-driver
-    (outside of nd_region).  For example, if a vendor defined its own LABEL
-    format it would need its own region driver to parse that LABEL and emit
-    the resulting namespaces.  The output from module resolution is more
-    accurate than a region-name or region-devtype.
-
-2. udev
-^^^^^^^
-
-    The kernel "devtype" is registered in the udev database::
-
-	# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region0
-	P: /devices/platform/nfit_test.0/ndbus0/region0
-	E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region0
-	E: DEVTYPE=nd_pmem
-	E: MODALIAS=nd:t2
-	E: SUBSYSTEM=nd
-
-	# udevadm info --path=/devices/platform/nfit_test.0/ndbus0/region4
-	P: /devices/platform/nfit_test.0/ndbus0/region4
-	E: DEVPATH=/devices/platform/nfit_test.0/ndbus0/region4
-	E: DEVTYPE=nd_blk
-	E: MODALIAS=nd:t3
-	E: SUBSYSTEM=nd
-
-    ...and is available as a region attribute, but keep in mind that the
-    "devtype" does not indicate sub-type variations and scripts should
-    really be understanding the other attributes.
-
-3. type specific attributes
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-    As it currently stands a BLK-aperture region will never have a
-    "nfit/spa_index" attribute, but neither will a non-NFIT PMEM region.  A
-    BLK region with a "mappings" value of 0 is, as mentioned above, a DIMM
-    that does not allow I/O.  A PMEM region with a "mappings" value of zero
-    is a simple system-physical-address range.
-
-
-LIBNVDIMM/LIBNDCTL: Namespace
------------------------------
-
-A REGION, after resolving DPA aliasing and LABEL specified boundaries,
-surfaces one or more "namespace" devices.  The arrival of a "namespace"
-device currently triggers either the nd_blk or nd_pmem driver to load
-and register a disk/block device.
-
-LIBNVDIMM: namespace
-^^^^^^^^^^^^^^^^^^^^
-
-Here is a sample layout from the three major types of NAMESPACE where
-namespace0.0 represents DIMM-info-backed PMEM (note that it has a 'uuid'
-attribute), namespace2.0 represents a BLK namespace (note it has a
-'sector_size' attribute) that, and namespace6.0 represents an anonymous
-PMEM namespace (note that has no 'uuid' attribute due to not support a
-LABEL)::
-
-	/sys/devices/platform/nfit_test.0/ndbus0/region0/namespace0.0
-	|-- alt_name
-	|-- devtype
-	|-- dpa_extents
-	|-- force_raw
-	|-- modalias
-	|-- numa_node
-	|-- resource
-	|-- size
-	|-- subsystem -> ../../../../../../bus/nd
-	|-- type
-	|-- uevent
-	`-- uuid
-	/sys/devices/platform/nfit_test.0/ndbus0/region2/namespace2.0
-	|-- alt_name
-	|-- devtype
-	|-- dpa_extents
-	|-- force_raw
-	|-- modalias
-	|-- numa_node
-	|-- sector_size
-	|-- size
-	|-- subsystem -> ../../../../../../bus/nd
-	|-- type
-	|-- uevent
-	`-- uuid
-	/sys/devices/platform/nfit_test.1/ndbus1/region6/namespace6.0
-	|-- block
-	|   `-- pmem0
-	|-- devtype
-	|-- driver -> ../../../../../../bus/nd/drivers/pmem
-	|-- force_raw
-	|-- modalias
-	|-- numa_node
-	|-- resource
-	|-- size
-	|-- subsystem -> ../../../../../../bus/nd
-	|-- type
-	`-- uevent
-
-LIBNDCTL: namespace enumeration example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Namespaces are indexed relative to their parent region, example below.
-These indexes are mostly static from boot to boot, but subsystem makes
-no guarantees in this regard.  For a static namespace identifier use its
-'uuid' attribute.
-
-::
-
-  static struct ndctl_namespace
-  *get_namespace_by_id(struct ndctl_region *region, unsigned int id)
-  {
-          struct ndctl_namespace *ndns;
-
-          ndctl_namespace_foreach(region, ndns)
-                  if (ndctl_namespace_get_id(ndns) == id)
-                          return ndns;
-
-          return NULL;
-  }
-
-LIBNDCTL: namespace creation example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Idle namespaces are automatically created by the kernel if a given
-region has enough available capacity to create a new namespace.
-Namespace instantiation involves finding an idle namespace and
-configuring it.  For the most part the setting of namespace attributes
-can occur in any order, the only constraint is that 'uuid' must be set
-before 'size'.  This enables the kernel to track DPA allocations
-internally with a static identifier::
-
-  static int configure_namespace(struct ndctl_region *region,
-                  struct ndctl_namespace *ndns,
-                  struct namespace_parameters *parameters)
-  {
-          char devname[50];
-
-          snprintf(devname, sizeof(devname), "namespace%d.%d",
-                          ndctl_region_get_id(region), paramaters->id);
-
-          ndctl_namespace_set_alt_name(ndns, devname);
-          /* 'uuid' must be set prior to setting size! */
-          ndctl_namespace_set_uuid(ndns, paramaters->uuid);
-          ndctl_namespace_set_size(ndns, paramaters->size);
-          /* unlike pmem namespaces, blk namespaces have a sector size */
-          if (parameters->lbasize)
-                  ndctl_namespace_set_sector_size(ndns, parameters->lbasize);
-          ndctl_namespace_enable(ndns);
-  }
-
-
-Why the Term "namespace"?
-^^^^^^^^^^^^^^^^^^^^^^^^^
-
-    1. Why not "volume" for instance?  "volume" ran the risk of confusing
-       ND (libnvdimm subsystem) to a volume manager like device-mapper.
-
-    2. The term originated to describe the sub-devices that can be created
-       within a NVME controller (see the nvme specification:
-       http://www.nvmexpress.org/specifications/), and NFIT namespaces are
-       meant to parallel the capabilities and configurability of
-       NVME-namespaces.
-
-
-LIBNVDIMM/LIBNDCTL: Block Translation Table "btt"
--------------------------------------------------
-
-A BTT (design document: http://pmem.io/2014/09/23/btt.html) is a stacked
-block device driver that fronts either the whole block device or a
-partition of a block device emitted by either a PMEM or BLK NAMESPACE.
-
-LIBNVDIMM: btt layout
-^^^^^^^^^^^^^^^^^^^^^
-
-Every region will start out with at least one BTT device which is the
-seed device.  To activate it set the "namespace", "uuid", and
-"sector_size" attributes and then bind the device to the nd_pmem or
-nd_blk driver depending on the region type::
-
-	/sys/devices/platform/nfit_test.1/ndbus0/region0/btt0/
-	|-- namespace
-	|-- delete
-	|-- devtype
-	|-- modalias
-	|-- numa_node
-	|-- sector_size
-	|-- subsystem -> ../../../../../bus/nd
-	|-- uevent
-	`-- uuid
-
-LIBNDCTL: btt creation example
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Similar to namespaces an idle BTT device is automatically created per
-region.  Each time this "seed" btt device is configured and enabled a new
-seed is created.  Creating a BTT configuration involves two steps of
-finding and idle BTT and assigning it to consume a PMEM or BLK namespace::
-
-	static struct ndctl_btt *get_idle_btt(struct ndctl_region *region)
-	{
-		struct ndctl_btt *btt;
-
-		ndctl_btt_foreach(region, btt)
-			if (!ndctl_btt_is_enabled(btt)
-					&& !ndctl_btt_is_configured(btt))
-				return btt;
-
-		return NULL;
-	}
-
-	static int configure_btt(struct ndctl_region *region,
-			struct btt_parameters *parameters)
-	{
-		btt = get_idle_btt(region);
-
-		ndctl_btt_set_uuid(btt, parameters->uuid);
-		ndctl_btt_set_sector_size(btt, parameters->sector_size);
-		ndctl_btt_set_namespace(btt, parameters->ndns);
-		/* turn off raw mode device */
-		ndctl_namespace_disable(parameters->ndns);
-		/* turn on btt access */
-		ndctl_btt_enable(btt);
-	}
-
-Once instantiated a new inactive btt seed device will appear underneath
-the region.
-
-Once a "namespace" is removed from a BTT that instance of the BTT device
-will be deleted or otherwise reset to default values.  This deletion is
-only at the device model level.  In order to destroy a BTT the "info
-block" needs to be destroyed.  Note, that to destroy a BTT the media
-needs to be written in raw mode.  By default, the kernel will autodetect
-the presence of a BTT and disable raw mode.  This autodetect behavior
-can be suppressed by enabling raw mode for the namespace via the
-ndctl_namespace_set_raw_mode() API.
-
-
-Summary LIBNDCTL Diagram
-------------------------
-
-For the given example above, here is the view of the objects as seen by the
-LIBNDCTL API::
-
-              +---+
-              |CTX|    +---------+   +--------------+  +---------------+
-              +-+-+  +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
-                |    | +---------+   +--------------+  +---------------+
-  +-------+     |    | +---------+   +--------------+  +---------------+
-  | DIMM0 <-+   |    +-> REGION1 +---> NAMESPACE1.0 +--> PMEM6 "pm1.0" |
-  +-------+ |   |    | +---------+   +--------------+  +---------------+
-  | DIMM1 <-+ +-v--+ | +---------+   +--------------+  +---------------+
-  +-------+ +-+BUS0+---> REGION2 +-+-> NAMESPACE2.0 +--> ND6  "blk2.0" |
-  | DIMM2 <-+ +----+ | +---------+ | +--------------+  +----------------------+
-  +-------+ |        |             +-> NAMESPACE2.1 +--> ND5  "blk2.1" | BTT2 |
-  | DIMM3 <-+        |               +--------------+  +----------------------+
-  +-------+          | +---------+   +--------------+  +---------------+
-                     +-> REGION3 +-+-> NAMESPACE3.0 +--> ND4  "blk3.0" |
-                     | +---------+ | +--------------+  +----------------------+
-                     |             +-> NAMESPACE3.1 +--> ND3  "blk3.1" | BTT1 |
-                     |               +--------------+  +----------------------+
-                     | +---------+   +--------------+  +---------------+
-                     +-> REGION4 +---> NAMESPACE4.0 +--> ND2  "blk4.0" |
-                     | +---------+   +--------------+  +---------------+
-                     | +---------+   +--------------+  +----------------------+
-                     +-> REGION5 +---> NAMESPACE5.0 +--> ND1  "blk5.0" | BTT0 |
-                       +---------+   +--------------+  +---------------+------+
diff --git a/Documentation/nvdimm/security.rst b/Documentation/nvdimm/security.rst
deleted file mode 100644
index ad9dea099b34..000000000000
--- a/Documentation/nvdimm/security.rst
+++ /dev/null
@@ -1,143 +0,0 @@
-===============
-NVDIMM Security
-===============
-
-1. Introduction
----------------
-
-With the introduction of Intel Device Specific Methods (DSM) v1.8
-specification [1], security DSMs are introduced. The spec added the following
-security DSMs: "get security state", "set passphrase", "disable passphrase",
-"unlock unit", "freeze lock", "secure erase", and "overwrite". A security_ops
-data structure has been added to struct dimm in order to support the security
-operations and generic APIs are exposed to allow vendor neutral operations.
-
-2. Sysfs Interface
-------------------
-The "security" sysfs attribute is provided in the nvdimm sysfs directory. For
-example:
-/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/security
-
-The "show" attribute of that attribute will display the security state for
-that DIMM. The following states are available: disabled, unlocked, locked,
-frozen, and overwrite. If security is not supported, the sysfs attribute
-will not be visible.
-
-The "store" attribute takes several commands when it is being written to
-in order to support some of the security functionalities:
-update <old_keyid> <new_keyid> - enable or update passphrase.
-disable <keyid> - disable enabled security and remove key.
-freeze - freeze changing of security states.
-erase <keyid> - delete existing user encryption key.
-overwrite <keyid> - wipe the entire nvdimm.
-master_update <keyid> <new_keyid> - enable or update master passphrase.
-master_erase <keyid> - delete existing user encryption key.
-
-3. Key Management
------------------
-
-The key is associated to the payload by the DIMM id. For example:
-# cat /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus0/nmem0/nfit/id
-8089-a2-1740-00000133
-The DIMM id would be provided along with the key payload (passphrase) to
-the kernel.
-
-The security keys are managed on the basis of a single key per DIMM. The
-key "passphrase" is expected to be 32bytes long. This is similar to the ATA
-security specification [2]. A key is initially acquired via the request_key()
-kernel API call during nvdimm unlock. It is up to the user to make sure that
-all the keys are in the kernel user keyring for unlock.
-
-A nvdimm encrypted-key of format enc32 has the description format of:
-nvdimm:<bus-provider-specific-unique-id>
-
-See file ``Documentation/security/keys/trusted-encrypted.rst`` for creating
-encrypted-keys of enc32 format. TPM usage with a master trusted key is
-preferred for sealing the encrypted-keys.
-
-4. Unlocking
-------------
-When the DIMMs are being enumerated by the kernel, the kernel will attempt to
-retrieve the key from the kernel user keyring. This is the only time
-a locked DIMM can be unlocked. Once unlocked, the DIMM will remain unlocked
-until reboot. Typically an entity (i.e. shell script) will inject all the
-relevant encrypted-keys into the kernel user keyring during the initramfs phase.
-This provides the unlock function access to all the related keys that contain
-the passphrase for the respective nvdimms.  It is also recommended that the
-keys are injected before libnvdimm is loaded by modprobe.
-
-5. Update
----------
-When doing an update, it is expected that the existing key is removed from
-the kernel user keyring and reinjected as different (old) key. It's irrelevant
-what the key description is for the old key since we are only interested in the
-keyid when doing the update operation. It is also expected that the new key
-is injected with the description format described from earlier in this
-document.  The update command written to the sysfs attribute will be with
-the format:
-update <old keyid> <new keyid>
-
-If there is no old keyid due to a security enabling, then a 0 should be
-passed in.
-
-6. Freeze
----------
-The freeze operation does not require any keys. The security config can be
-frozen by a user with root privelege.
-
-7. Disable
-----------
-The security disable command format is:
-disable <keyid>
-
-An key with the current passphrase payload that is tied to the nvdimm should be
-in the kernel user keyring.
-
-8. Secure Erase
----------------
-The command format for doing a secure erase is:
-erase <keyid>
-
-An key with the current passphrase payload that is tied to the nvdimm should be
-in the kernel user keyring.
-
-9. Overwrite
-------------
-The command format for doing an overwrite is:
-overwrite <keyid>
-
-Overwrite can be done without a key if security is not enabled. A key serial
-of 0 can be passed in to indicate no key.
-
-The sysfs attribute "security" can be polled to wait on overwrite completion.
-Overwrite can last tens of minutes or more depending on nvdimm size.
-
-An encrypted-key with the current user passphrase that is tied to the nvdimm
-should be injected and its keyid should be passed in via sysfs.
-
-10. Master Update
------------------
-The command format for doing a master update is:
-update <old keyid> <new keyid>
-
-The operating mechanism for master update is identical to update except the
-master passphrase key is passed to the kernel. The master passphrase key
-is just another encrypted-key.
-
-This command is only available when security is disabled.
-
-11. Master Erase
-----------------
-The command format for doing a master erase is:
-master_erase <current keyid>
-
-This command has the same operating mechanism as erase except the master
-passphrase key is passed to the kernel. The master passphrase key is just
-another encrypted-key.
-
-This command is only available when the master security is enabled, indicated
-by the extended security status.
-
-[1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf
-
-[2]: http://www.t13.org/documents/UploadedDocuments/docs2006/e05179r4-ACS-SecurityClarifications.pdf
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index e89c1c332407..a5fde15e91d3 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -33,7 +33,7 @@ config BLK_DEV_PMEM
 	  Documentation/admin-guide/kernel-parameters.rst).  This driver converts
 	  these persistent memory ranges into block devices that are
 	  capable of DAX (direct-access) file system mappings.  See
-	  Documentation/nvdimm/nvdimm.rst for more details.
+	  Documentation/driver-api/nvdimm/nvdimm.rst for more details.
 
 	  Say Y if you want to use an NVDIMM
 
-- 
cgit v1.2.3-55-g7522


From bf6b7a742e3f82b3132e149fb17761e84207f9f1 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:33:50 -0300
Subject: docs: namespace: move it to the admin-guide

As stated at the documentation, this is meant to be for
users to better understand namespaces.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/index.rst                |  1 +
 .../admin-guide/namespaces/compatibility-list.rst  | 43 ++++++++++++++++++++++
 Documentation/admin-guide/namespaces/index.rst     |  9 +++++
 .../admin-guide/namespaces/resource-control.rst    | 18 +++++++++
 Documentation/namespaces/compatibility-list.rst    | 43 ----------------------
 Documentation/namespaces/index.rst                 | 11 ------
 Documentation/namespaces/resource-control.rst      | 18 ---------
 7 files changed, 71 insertions(+), 72 deletions(-)
 create mode 100644 Documentation/admin-guide/namespaces/compatibility-list.rst
 create mode 100644 Documentation/admin-guide/namespaces/index.rst
 create mode 100644 Documentation/admin-guide/namespaces/resource-control.rst
 delete mode 100644 Documentation/namespaces/compatibility-list.rst
 delete mode 100644 Documentation/namespaces/index.rst
 delete mode 100644 Documentation/namespaces/resource-control.rst

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index f40c4b5a181b..abc2c4e83939 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -77,6 +77,7 @@ configure specific aspects of kernel behavior to your liking.
    thunderbolt
    LSM/index
    mm/index
+   namespaces/index
    perf-security
    acpi/index
 
diff --git a/Documentation/admin-guide/namespaces/compatibility-list.rst b/Documentation/admin-guide/namespaces/compatibility-list.rst
new file mode 100644
index 000000000000..318800b2a943
--- /dev/null
+++ b/Documentation/admin-guide/namespaces/compatibility-list.rst
@@ -0,0 +1,43 @@
+=============================
+Namespaces compatibility list
+=============================
+
+This document contains the information about the problems user
+may have when creating tasks living in different namespaces.
+
+Here's the summary. This matrix shows the known problems, that
+occur when tasks share some namespace (the columns) while living
+in different other namespaces (the rows):
+
+====	===	===	===	===	====	===
+-	UTS	IPC	VFS	PID	User	Net
+====	===	===	===	===	====	===
+UTS	 X
+IPC		 X	 1
+VFS			 X
+PID		 1	 1	 X
+User		 2	 2		 X
+Net						 X
+====	===	===	===	===	====	===
+
+1. Both the IPC and the PID namespaces provide IDs to address
+   object inside the kernel. E.g. semaphore with IPCID or
+   process group with pid.
+
+   In both cases, tasks shouldn't try exposing this ID to some
+   other task living in a different namespace via a shared filesystem
+   or IPC shmem/message. The fact is that this ID is only valid
+   within the namespace it was obtained in and may refer to some
+   other object in another namespace.
+
+2. Intentionally, two equal user IDs in different user namespaces
+   should not be equal from the VFS point of view. In other
+   words, user 10 in one user namespace shouldn't have the same
+   access permissions to files, belonging to user 10 in another
+   namespace.
+
+   The same is true for the IPC namespaces being shared - two users
+   from different user namespaces should not access the same IPC objects
+   even having equal UIDs.
+
+   But currently this is not so.
diff --git a/Documentation/admin-guide/namespaces/index.rst b/Documentation/admin-guide/namespaces/index.rst
new file mode 100644
index 000000000000..713ec4949fa7
--- /dev/null
+++ b/Documentation/admin-guide/namespaces/index.rst
@@ -0,0 +1,9 @@
+==========
+Namespaces
+==========
+
+.. toctree::
+   :maxdepth: 1
+
+   compatibility-list
+   resource-control
diff --git a/Documentation/admin-guide/namespaces/resource-control.rst b/Documentation/admin-guide/namespaces/resource-control.rst
new file mode 100644
index 000000000000..369556e00f0c
--- /dev/null
+++ b/Documentation/admin-guide/namespaces/resource-control.rst
@@ -0,0 +1,18 @@
+===========================
+Namespaces research control
+===========================
+
+There are a lot of kinds of objects in the kernel that don't have
+individual limits or that have limits that are ineffective when a set
+of processes is allowed to switch user ids.  With user namespaces
+enabled in a kernel for people who don't trust their users or their
+users programs to play nice this problems becomes more acute.
+
+Therefore it is recommended that memory control groups be enabled in
+kernels that enable user namespaces, and it is further recommended
+that userspace configure memory control groups to limit how much
+memory user's they don't trust to play nice can use.
+
+Memory control groups can be configured by installing the libcgroup
+package present on most distros editing /etc/cgrules.conf,
+/etc/cgconfig.conf and setting up libpam-cgroup.
diff --git a/Documentation/namespaces/compatibility-list.rst b/Documentation/namespaces/compatibility-list.rst
deleted file mode 100644
index 318800b2a943..000000000000
--- a/Documentation/namespaces/compatibility-list.rst
+++ /dev/null
@@ -1,43 +0,0 @@
-=============================
-Namespaces compatibility list
-=============================
-
-This document contains the information about the problems user
-may have when creating tasks living in different namespaces.
-
-Here's the summary. This matrix shows the known problems, that
-occur when tasks share some namespace (the columns) while living
-in different other namespaces (the rows):
-
-====	===	===	===	===	====	===
--	UTS	IPC	VFS	PID	User	Net
-====	===	===	===	===	====	===
-UTS	 X
-IPC		 X	 1
-VFS			 X
-PID		 1	 1	 X
-User		 2	 2		 X
-Net						 X
-====	===	===	===	===	====	===
-
-1. Both the IPC and the PID namespaces provide IDs to address
-   object inside the kernel. E.g. semaphore with IPCID or
-   process group with pid.
-
-   In both cases, tasks shouldn't try exposing this ID to some
-   other task living in a different namespace via a shared filesystem
-   or IPC shmem/message. The fact is that this ID is only valid
-   within the namespace it was obtained in and may refer to some
-   other object in another namespace.
-
-2. Intentionally, two equal user IDs in different user namespaces
-   should not be equal from the VFS point of view. In other
-   words, user 10 in one user namespace shouldn't have the same
-   access permissions to files, belonging to user 10 in another
-   namespace.
-
-   The same is true for the IPC namespaces being shared - two users
-   from different user namespaces should not access the same IPC objects
-   even having equal UIDs.
-
-   But currently this is not so.
diff --git a/Documentation/namespaces/index.rst b/Documentation/namespaces/index.rst
deleted file mode 100644
index bf40625dd11a..000000000000
--- a/Documentation/namespaces/index.rst
+++ /dev/null
@@ -1,11 +0,0 @@
-:orphan:
-
-==========
-Namespaces
-==========
-
-.. toctree::
-   :maxdepth: 1
-
-   compatibility-list
-   resource-control
diff --git a/Documentation/namespaces/resource-control.rst b/Documentation/namespaces/resource-control.rst
deleted file mode 100644
index 369556e00f0c..000000000000
--- a/Documentation/namespaces/resource-control.rst
+++ /dev/null
@@ -1,18 +0,0 @@
-===========================
-Namespaces research control
-===========================
-
-There are a lot of kinds of objects in the kernel that don't have
-individual limits or that have limits that are ineffective when a set
-of processes is allowed to switch user ids.  With user namespaces
-enabled in a kernel for people who don't trust their users or their
-users programs to play nice this problems becomes more acute.
-
-Therefore it is recommended that memory control groups be enabled in
-kernels that enable user namespaces, and it is further recommended
-that userspace configure memory control groups to limit how much
-memory user's they don't trust to play nice can use.
-
-Memory control groups can be configured by installing the libcgroup
-package present on most distros editing /etc/cgrules.conf,
-/etc/cgconfig.conf and setting up libpam-cgroup.
-- 
cgit v1.2.3-55-g7522


From 43f6c0787c1781b951d686e8302377fcf85ccb8a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:40:16 -0300
Subject: docs: mtd: move it to the driver-api book

While I was tempted to move it to admin-guide, as some docs
there are more userspace-faced, there are some very technical
discussions about memory error correction code from the Kernel
implementer's PoV. So, let's place it inside the driver-api
book.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst         |   1 +
 Documentation/driver-api/mtd/index.rst     |  10 +
 Documentation/driver-api/mtd/intel-spi.rst |  90 ++++
 Documentation/driver-api/mtd/nand_ecc.rst  | 763 +++++++++++++++++++++++++++++
 Documentation/driver-api/mtd/spi-nor.rst   |  66 +++
 Documentation/mtd/index.rst                |  12 -
 Documentation/mtd/intel-spi.rst            |  90 ----
 Documentation/mtd/nand_ecc.rst             | 763 -----------------------------
 Documentation/mtd/spi-nor.rst              |  66 ---
 drivers/mtd/nand/raw/nand_ecc.c            |   2 +-
 10 files changed, 931 insertions(+), 932 deletions(-)
 create mode 100644 Documentation/driver-api/mtd/index.rst
 create mode 100644 Documentation/driver-api/mtd/intel-spi.rst
 create mode 100644 Documentation/driver-api/mtd/nand_ecc.rst
 create mode 100644 Documentation/driver-api/mtd/spi-nor.rst
 delete mode 100644 Documentation/mtd/index.rst
 delete mode 100644 Documentation/mtd/intel-spi.rst
 delete mode 100644 Documentation/mtd/nand_ecc.rst
 delete mode 100644 Documentation/mtd/spi-nor.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 410dd7110772..7ecc65093493 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -44,6 +44,7 @@ available subsections can be seen below.
    mtdnand
    miscellaneous
    mei/index
+   mtd/index
    nvdimm/index
    w1
    rapidio/index
diff --git a/Documentation/driver-api/mtd/index.rst b/Documentation/driver-api/mtd/index.rst
new file mode 100644
index 000000000000..2e0e7cc4055e
--- /dev/null
+++ b/Documentation/driver-api/mtd/index.rst
@@ -0,0 +1,10 @@
+==============================
+Memory Technology Device (MTD)
+==============================
+
+.. toctree::
+   :maxdepth: 1
+
+   intel-spi
+   nand_ecc
+   spi-nor
diff --git a/Documentation/driver-api/mtd/intel-spi.rst b/Documentation/driver-api/mtd/intel-spi.rst
new file mode 100644
index 000000000000..0e6d9cd5388d
--- /dev/null
+++ b/Documentation/driver-api/mtd/intel-spi.rst
@@ -0,0 +1,90 @@
+==============================
+Upgrading BIOS using intel-spi
+==============================
+
+Many Intel CPUs like Baytrail and Braswell include SPI serial flash host
+controller which is used to hold BIOS and other platform specific data.
+Since contents of the SPI serial flash is crucial for machine to function,
+it is typically protected by different hardware protection mechanisms to
+avoid accidental (or on purpose) overwrite of the content.
+
+Not all manufacturers protect the SPI serial flash, mainly because it
+allows upgrading the BIOS image directly from an OS.
+
+The intel-spi driver makes it possible to read and write the SPI serial
+flash, if certain protection bits are not set and locked. If it finds
+any of them set, the whole MTD device is made read-only to prevent
+partial overwrites. By default the driver exposes SPI serial flash
+contents as read-only but it can be changed from kernel command line,
+passing "intel-spi.writeable=1".
+
+Please keep in mind that overwriting the BIOS image on SPI serial flash
+might render the machine unbootable and requires special equipment like
+Dediprog to revive. You have been warned!
+
+Below are the steps how to upgrade MinnowBoard MAX BIOS directly from
+Linux.
+
+ 1) Download and extract the latest Minnowboard MAX BIOS SPI image
+    [1]. At the time writing this the latest image is v92.
+
+ 2) Install mtd-utils package [2]. We need this in order to erase the SPI
+    serial flash. Distros like Debian and Fedora have this prepackaged with
+    name "mtd-utils".
+
+ 3) Add "intel-spi.writeable=1" to the kernel command line and reboot
+    the board (you can also reload the driver passing "writeable=1" as
+    module parameter to modprobe).
+
+ 4) Once the board is up and running again, find the right MTD partition
+    (it is named as "BIOS")::
+
+	# cat /proc/mtd
+	dev:    size   erasesize  name
+	mtd0: 00800000 00001000 "BIOS"
+
+    So here it will be /dev/mtd0 but it may vary.
+
+ 5) Make backup of the existing image first::
+
+	# dd if=/dev/mtd0ro of=bios.bak
+	16384+0 records in
+	16384+0 records out
+	8388608 bytes (8.4 MB) copied, 10.0269 s, 837 kB/s
+
+ 6) Verify the backup:
+
+	# sha1sum /dev/mtd0ro bios.bak
+	fdbb011920572ca6c991377c4b418a0502668b73  /dev/mtd0ro
+	fdbb011920572ca6c991377c4b418a0502668b73  bios.bak
+
+    The SHA1 sums must match. Otherwise do not continue any further!
+
+ 7) Erase the SPI serial flash. After this step, do not reboot the
+    board! Otherwise it will not start anymore::
+
+	# flash_erase /dev/mtd0 0 0
+	Erasing 4 Kibyte @ 7ff000 -- 100 % complete
+
+ 8) Once completed without errors you can write the new BIOS image:
+
+    # dd if=MNW2MAX1.X64.0092.R01.1605221712.bin of=/dev/mtd0
+
+ 9) Verify that the new content of the SPI serial flash matches the new
+    BIOS image::
+
+	# sha1sum /dev/mtd0ro MNW2MAX1.X64.0092.R01.1605221712.bin
+	9b4df9e4be2057fceec3a5529ec3d950836c87a2  /dev/mtd0ro
+	9b4df9e4be2057fceec3a5529ec3d950836c87a2 MNW2MAX1.X64.0092.R01.1605221712.bin
+
+    The SHA1 sums should match.
+
+ 10) Now you can reboot your board and observe the new BIOS starting up
+     properly.
+
+References
+----------
+
+[1] https://firmware.intel.com/sites/default/files/MinnowBoard%2EMAX_%2EX64%2E92%2ER01%2Ezip
+
+[2] http://www.linux-mtd.infradead.org/
diff --git a/Documentation/driver-api/mtd/nand_ecc.rst b/Documentation/driver-api/mtd/nand_ecc.rst
new file mode 100644
index 000000000000..e8d3c53a5056
--- /dev/null
+++ b/Documentation/driver-api/mtd/nand_ecc.rst
@@ -0,0 +1,763 @@
+==========================
+NAND Error-correction Code
+==========================
+
+Introduction
+============
+
+Having looked at the linux mtd/nand driver and more specific at nand_ecc.c
+I felt there was room for optimisation. I bashed the code for a few hours
+performing tricks like table lookup removing superfluous code etc.
+After that the speed was increased by 35-40%.
+Still I was not too happy as I felt there was additional room for improvement.
+
+Bad! I was hooked.
+I decided to annotate my steps in this file. Perhaps it is useful to someone
+or someone learns something from it.
+
+
+The problem
+===========
+
+NAND flash (at least SLC one) typically has sectors of 256 bytes.
+However NAND flash is not extremely reliable so some error detection
+(and sometimes correction) is needed.
+
+This is done by means of a Hamming code. I'll try to explain it in
+laymans terms (and apologies to all the pro's in the field in case I do
+not use the right terminology, my coding theory class was almost 30
+years ago, and I must admit it was not one of my favourites).
+
+As I said before the ecc calculation is performed on sectors of 256
+bytes. This is done by calculating several parity bits over the rows and
+columns. The parity used is even parity which means that the parity bit = 1
+if the data over which the parity is calculated is 1 and the parity bit = 0
+if the data over which the parity is calculated is 0. So the total
+number of bits over the data over which the parity is calculated + the
+parity bit is even. (see wikipedia if you can't follow this).
+Parity is often calculated by means of an exclusive or operation,
+sometimes also referred to as xor. In C the operator for xor is ^
+
+Back to ecc.
+Let's give a small figure:
+
+=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
+byte   0:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp4 ... rp14
+byte   1:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp2 rp4 ... rp14
+byte   2:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp4 ... rp14
+byte   3:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp4 ... rp14
+byte   4:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp5 ... rp14
+...
+byte 254:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp5 ... rp15
+byte 255:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp5 ... rp15
+           cp1  cp0  cp1  cp0  cp1  cp0  cp1  cp0
+           cp3  cp3  cp2  cp2  cp3  cp3  cp2  cp2
+           cp5  cp5  cp5  cp5  cp4  cp4  cp4  cp4
+=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
+
+This figure represents a sector of 256 bytes.
+cp is my abbreviation for column parity, rp for row parity.
+
+Let's start to explain column parity.
+
+- cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
+
+  so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.
+
+Similarly cp1 is the sum of all bit1, bit3, bit5 and bit7.
+
+- cp2 is the parity over bit0, bit1, bit4 and bit5
+- cp3 is the parity over bit2, bit3, bit6 and bit7.
+- cp4 is the parity over bit0, bit1, bit2 and bit3.
+- cp5 is the parity over bit4, bit5, bit6 and bit7.
+
+Note that each of cp0 .. cp5 is exactly one bit.
+
+Row parity actually works almost the same.
+
+- rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)
+- rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)
+- rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...
+  (so handle two bytes, then skip 2 bytes).
+- rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)
+- for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.
+
+  so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)
+- and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..
+
+The story now becomes quite boring. I guess you get the idea.
+
+- rp6 covers 8 bytes then skips 8 etc
+- rp7 skips 8 bytes then covers 8 etc
+- rp8 covers 16 bytes then skips 16 etc
+- rp9 skips 16 bytes then covers 16 etc
+- rp10 covers 32 bytes then skips 32 etc
+- rp11 skips 32 bytes then covers 32 etc
+- rp12 covers 64 bytes then skips 64 etc
+- rp13 skips 64 bytes then covers 64 etc
+- rp14 covers 128 bytes then skips 128
+- rp15 skips 128 bytes then covers 128
+
+In the end the parity bits are grouped together in three bytes as
+follows:
+
+=====  ===== ===== ===== ===== ===== ===== ===== =====
+ECC    Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
+=====  ===== ===== ===== ===== ===== ===== ===== =====
+ECC 0   rp07  rp06  rp05  rp04  rp03  rp02  rp01  rp00
+ECC 1   rp15  rp14  rp13  rp12  rp11  rp10  rp09  rp08
+ECC 2   cp5   cp4   cp3   cp2   cp1   cp0      1     1
+=====  ===== ===== ===== ===== ===== ===== ===== =====
+
+I detected after writing this that ST application note AN1823
+(http://www.st.com/stonline/) gives a much
+nicer picture.(but they use line parity as term where I use row parity)
+Oh well, I'm graphically challenged, so suffer with me for a moment :-)
+
+And I could not reuse the ST picture anyway for copyright reasons.
+
+
+Attempt 0
+=========
+
+Implementing the parity calculation is pretty simple.
+In C pseudocode::
+
+  for (i = 0; i < 256; i++)
+  {
+    if (i & 0x01)
+       rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
+    else
+       rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
+    if (i & 0x02)
+       rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
+    else
+       rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;
+    if (i & 0x04)
+      rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;
+    else
+      rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;
+    if (i & 0x08)
+      rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;
+    else
+      rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;
+    if (i & 0x10)
+      rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;
+    else
+      rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;
+    if (i & 0x20)
+      rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
+    else
+      rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
+    if (i & 0x40)
+      rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
+    else
+      rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;
+    if (i & 0x80)
+      rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;
+    else
+      rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;
+    cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;
+    cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;
+    cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;
+    cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3
+    cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4
+    cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5
+  }
+
+
+Analysis 0
+==========
+
+C does have bitwise operators but not really operators to do the above
+efficiently (and most hardware has no such instructions either).
+Therefore without implementing this it was clear that the code above was
+not going to bring me a Nobel prize :-)
+
+Fortunately the exclusive or operation is commutative, so we can combine
+the values in any order. So instead of calculating all the bits
+individually, let us try to rearrange things.
+For the column parity this is easy. We can just xor the bytes and in the
+end filter out the relevant bits. This is pretty nice as it will bring
+all cp calculation out of the for loop.
+
+Similarly we can first xor the bytes for the various rows.
+This leads to:
+
+
+Attempt 1
+=========
+
+::
+
+  const char parity[256] = {
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
+      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0
+  };
+
+  void ecc1(const unsigned char *buf, unsigned char *code)
+  {
+      int i;
+      const unsigned char *bp = buf;
+      unsigned char cur;
+      unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
+      unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
+      unsigned char par;
+
+      par = 0;
+      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
+      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
+      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
+      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
+
+      for (i = 0; i < 256; i++)
+      {
+          cur = *bp++;
+          par ^= cur;
+          if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;
+          if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;
+          if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;
+          if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;
+          if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;
+          if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;
+          if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;
+          if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;
+      }
+      code[0] =
+          (parity[rp7] << 7) |
+          (parity[rp6] << 6) |
+          (parity[rp5] << 5) |
+          (parity[rp4] << 4) |
+          (parity[rp3] << 3) |
+          (parity[rp2] << 2) |
+          (parity[rp1] << 1) |
+          (parity[rp0]);
+      code[1] =
+          (parity[rp15] << 7) |
+          (parity[rp14] << 6) |
+          (parity[rp13] << 5) |
+          (parity[rp12] << 4) |
+          (parity[rp11] << 3) |
+          (parity[rp10] << 2) |
+          (parity[rp9]  << 1) |
+          (parity[rp8]);
+      code[2] =
+          (parity[par & 0xf0] << 7) |
+          (parity[par & 0x0f] << 6) |
+          (parity[par & 0xcc] << 5) |
+          (parity[par & 0x33] << 4) |
+          (parity[par & 0xaa] << 3) |
+          (parity[par & 0x55] << 2);
+      code[0] = ~code[0];
+      code[1] = ~code[1];
+      code[2] = ~code[2];
+  }
+
+Still pretty straightforward. The last three invert statements are there to
+give a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash
+all data is 0xff, so the checksum then matches.
+
+I also introduced the parity lookup. I expected this to be the fastest
+way to calculate the parity, but I will investigate alternatives later
+on.
+
+
+Analysis 1
+==========
+
+The code works, but is not terribly efficient. On my system it took
+almost 4 times as much time as the linux driver code. But hey, if it was
+*that* easy this would have been done long before.
+No pain. no gain.
+
+Fortunately there is plenty of room for improvement.
+
+In step 1 we moved from bit-wise calculation to byte-wise calculation.
+However in C we can also use the unsigned long data type and virtually
+every modern microprocessor supports 32 bit operations, so why not try
+to write our code in such a way that we process data in 32 bit chunks.
+
+Of course this means some modification as the row parity is byte by
+byte. A quick analysis:
+for the column parity we use the par variable. When extending to 32 bits
+we can in the end easily calculate rp0 and rp1 from it.
+(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
+respectively, from MSB to LSB)
+also rp2 and rp3 can be easily retrieved from par as rp3 covers the
+first two MSBs and rp2 covers the last two LSBs.
+
+Note that of course now the loop is executed only 64 times (256/4).
+And note that care must taken wrt byte ordering. The way bytes are
+ordered in a long is machine dependent, and might affect us.
+Anyway, if there is an issue: this code is developed on x86 (to be
+precise: a DELL PC with a D920 Intel CPU)
+
+And of course the performance might depend on alignment, but I expect
+that the I/O buffers in the nand driver are aligned properly (and
+otherwise that should be fixed to get maximum performance).
+
+Let's give it a try...
+
+
+Attempt 2
+=========
+
+::
+
+  extern const char parity[256];
+
+  void ecc2(const unsigned char *buf, unsigned char *code)
+  {
+      int i;
+      const unsigned long *bp = (unsigned long *)buf;
+      unsigned long cur;
+      unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
+      unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
+      unsigned long par;
+
+      par = 0;
+      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
+      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
+      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
+      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
+
+      for (i = 0; i < 64; i++)
+      {
+          cur = *bp++;
+          par ^= cur;
+          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
+          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
+          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
+          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
+          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
+          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
+      }
+      /*
+         we need to adapt the code generation for the fact that rp vars are now
+         long; also the column parity calculation needs to be changed.
+         we'll bring rp4 to 15 back to single byte entities by shifting and
+         xoring
+      */
+      rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;
+      rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;
+      rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;
+      rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;
+      rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;
+      rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;
+      rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;
+      rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;
+      rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;
+      rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;
+      rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;
+      rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;
+      rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;
+      rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;
+      par ^= (par >> 16);
+      rp1 = (par >> 8); rp1 &= 0xff;
+      rp0 = (par & 0xff);
+      par ^= (par >> 8); par &= 0xff;
+
+      code[0] =
+          (parity[rp7] << 7) |
+          (parity[rp6] << 6) |
+          (parity[rp5] << 5) |
+          (parity[rp4] << 4) |
+          (parity[rp3] << 3) |
+          (parity[rp2] << 2) |
+          (parity[rp1] << 1) |
+          (parity[rp0]);
+      code[1] =
+          (parity[rp15] << 7) |
+          (parity[rp14] << 6) |
+          (parity[rp13] << 5) |
+          (parity[rp12] << 4) |
+          (parity[rp11] << 3) |
+          (parity[rp10] << 2) |
+          (parity[rp9]  << 1) |
+          (parity[rp8]);
+      code[2] =
+          (parity[par & 0xf0] << 7) |
+          (parity[par & 0x0f] << 6) |
+          (parity[par & 0xcc] << 5) |
+          (parity[par & 0x33] << 4) |
+          (parity[par & 0xaa] << 3) |
+          (parity[par & 0x55] << 2);
+      code[0] = ~code[0];
+      code[1] = ~code[1];
+      code[2] = ~code[2];
+  }
+
+The parity array is not shown any more. Note also that for these
+examples I kinda deviated from my regular programming style by allowing
+multiple statements on a line, not using { } in then and else blocks
+with only a single statement and by using operators like ^=
+
+
+Analysis 2
+==========
+
+The code (of course) works, and hurray: we are a little bit faster than
+the linux driver code (about 15%). But wait, don't cheer too quickly.
+There is more to be gained.
+If we look at e.g. rp14 and rp15 we see that we either xor our data with
+rp14 or with rp15. However we also have par which goes over all data.
+This means there is no need to calculate rp14 as it can be calculated from
+rp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
+(or if desired we can avoid calculating rp15 and calculate it from
+rp14).  That is why some places refer to inverse parity.
+Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
+Effectively this means we can eliminate the else clause from the if
+statements. Also we can optimise the calculation in the end a little bit
+by going from long to byte first. Actually we can even avoid the table
+lookups
+
+Attempt 3
+=========
+
+Odd replaced::
+
+          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
+          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
+          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
+          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
+          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
+          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
+
+with::
+
+          if (i & 0x01) rp5 ^= cur;
+          if (i & 0x02) rp7 ^= cur;
+          if (i & 0x04) rp9 ^= cur;
+          if (i & 0x08) rp11 ^= cur;
+          if (i & 0x10) rp13 ^= cur;
+          if (i & 0x20) rp15 ^= cur;
+
+and outside the loop added::
+
+          rp4  = par ^ rp5;
+          rp6  = par ^ rp7;
+          rp8  = par ^ rp9;
+          rp10  = par ^ rp11;
+          rp12  = par ^ rp13;
+          rp14  = par ^ rp15;
+
+And after that the code takes about 30% more time, although the number of
+statements is reduced. This is also reflected in the assembly code.
+
+
+Analysis 3
+==========
+
+Very weird. Guess it has to do with caching or instruction parallellism
+or so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting
+observation was that this one is only 30% slower (according to time)
+executing the code as my 3Ghz D920 processor.
+
+Well, it was expected not to be easy so maybe instead move to a
+different track: let's move back to the code from attempt2 and do some
+loop unrolling. This will eliminate a few if statements. I'll try
+different amounts of unrolling to see what works best.
+
+
+Attempt 4
+=========
+
+Unrolled the loop 1, 2, 3 and 4 times.
+For 4 the code starts with::
+
+    for (i = 0; i < 4; i++)
+    {
+        cur = *bp++;
+        par ^= cur;
+        rp4 ^= cur;
+        rp6 ^= cur;
+        rp8 ^= cur;
+        rp10 ^= cur;
+        if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;
+        if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;
+        cur = *bp++;
+        par ^= cur;
+        rp5 ^= cur;
+        rp6 ^= cur;
+        ...
+
+
+Analysis 4
+==========
+
+Unrolling once gains about 15%
+
+Unrolling twice keeps the gain at about 15%
+
+Unrolling three times gives a gain of 30% compared to attempt 2.
+
+Unrolling four times gives a marginal improvement compared to unrolling
+three times.
+
+I decided to proceed with a four time unrolled loop anyway. It was my gut
+feeling that in the next steps I would obtain additional gain from it.
+
+The next step was triggered by the fact that par contains the xor of all
+bytes and rp4 and rp5 each contain the xor of half of the bytes.
+So in effect par = rp4 ^ rp5. But as xor is commutative we can also say
+that rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can
+eliminate rp5 (or rp4, but I already foresaw another optimisation).
+The same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.
+
+
+Attempt 5
+=========
+
+Effectively so all odd digit rp assignments in the loop were removed.
+This included the else clause of the if statements.
+Of course after the loop we need to correct things by adding code like::
+
+    rp5 = par ^ rp4;
+
+Also the initial assignments (rp5 = 0; etc) could be removed.
+Along the line I also removed the initialisation of rp0/1/2/3.
+
+
+Analysis 5
+==========
+
+Measurements showed this was a good move. The run-time roughly halved
+compared with attempt 4 with 4 times unrolled, and we only require 1/3rd
+of the processor time compared to the current code in the linux kernel.
+
+However, still I thought there was more. I didn't like all the if
+statements. Why not keep a running parity and only keep the last if
+statement. Time for yet another version!
+
+
+Attempt 6
+=========
+
+THe code within the for loop was changed to::
+
+    for (i = 0; i < 4; i++)
+    {
+        cur = *bp++; tmppar  = cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
+
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
+
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp8 ^= cur;
+
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur;
+
+        par ^= tmppar;
+        if ((i & 0x1) == 0) rp12 ^= tmppar;
+        if ((i & 0x2) == 0) rp14 ^= tmppar;
+    }
+
+As you can see tmppar is used to accumulate the parity within a for
+iteration. In the last 3 statements is added to par and, if needed,
+to rp12 and rp14.
+
+While making the changes I also found that I could exploit that tmppar
+contains the running parity for this iteration. So instead of having:
+rp4 ^= cur; rp6 ^= cur;
+I removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
+statement. A similar change was done for rp8 and rp10
+
+
+Analysis 6
+==========
+
+Measuring this code again showed big gain. When executing the original
+linux code 1 million times, this took about 1 second on my system.
+(using time to measure the performance). After this iteration I was back
+to 0.075 sec. Actually I had to decide to start measuring over 10
+million iterations in order not to lose too much accuracy. This one
+definitely seemed to be the jackpot!
+
+There is a little bit more room for improvement though. There are three
+places with statements::
+
+	rp4 ^= cur; rp6 ^= cur;
+
+It seems more efficient to also maintain a variable rp4_6 in the while
+loop; This eliminates 3 statements per loop. Of course after the loop we
+need to correct by adding::
+
+	rp4 ^= rp4_6;
+	rp6 ^= rp4_6
+
+Furthermore there are 4 sequential assignments to rp8. This can be
+encoded slightly more efficiently by saving tmppar before those 4 lines
+and later do rp8 = rp8 ^ tmppar ^ notrp8;
+(where notrp8 is the value of rp8 before those 4 lines).
+Again a use of the commutative property of xor.
+Time for a new test!
+
+
+Attempt 7
+=========
+
+The new code now looks like::
+
+    for (i = 0; i < 4; i++)
+    {
+        cur = *bp++; tmppar  = cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
+
+        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
+
+        notrp8 = tmppar;
+        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur;
+        rp8 = rp8 ^ tmppar ^ notrp8;
+
+        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
+        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
+        cur = *bp++; tmppar ^= cur;
+
+        par ^= tmppar;
+        if ((i & 0x1) == 0) rp12 ^= tmppar;
+        if ((i & 0x2) == 0) rp14 ^= tmppar;
+    }
+    rp4 ^= rp4_6;
+    rp6 ^= rp4_6;
+
+
+Not a big change, but every penny counts :-)
+
+
+Analysis 7
+==========
+
+Actually this made things worse. Not very much, but I don't want to move
+into the wrong direction. Maybe something to investigate later. Could
+have to do with caching again.
+
+Guess that is what there is to win within the loop. Maybe unrolling one
+more time will help. I'll keep the optimisations from 7 for now.
+
+
+Attempt 8
+=========
+
+Unrolled the loop one more time.
+
+
+Analysis 8
+==========
+
+This makes things worse. Let's stick with attempt 6 and continue from there.
+Although it seems that the code within the loop cannot be optimised
+further there is still room to optimize the generation of the ecc codes.
+We can simply calculate the total parity. If this is 0 then rp4 = rp5
+etc. If the parity is 1, then rp4 = !rp5;
+
+But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
+in the result byte and then do something like::
+
+    code[0] |= (code[0] << 1);
+
+Lets test this.
+
+
+Attempt 9
+=========
+
+Changed the code but again this slightly degrades performance. Tried all
+kind of other things, like having dedicated parity arrays to avoid the
+shift after parity[rp7] << 7; No gain.
+Change the lookup using the parity array by using shift operators (e.g.
+replace parity[rp7] << 7 with::
+
+	rp7 ^= (rp7 << 4);
+	rp7 ^= (rp7 << 2);
+	rp7 ^= (rp7 << 1);
+	rp7 &= 0x80;
+
+No gain.
+
+The only marginal change was inverting the parity bits, so we can remove
+the last three invert statements.
+
+Ah well, pity this does not deliver more. Then again 10 million
+iterations using the linux driver code takes between 13 and 13.5
+seconds, whereas my code now takes about 0.73 seconds for those 10
+million iterations. So basically I've improved the performance by a
+factor 18 on my system. Not that bad. Of course on different hardware
+you will get different results. No warranties!
+
+But of course there is no such thing as a free lunch. The codesize almost
+tripled (from 562 bytes to 1434 bytes). Then again, it is not that much.
+
+
+Correcting errors
+=================
+
+For correcting errors I again used the ST application note as a starter,
+but I also peeked at the existing code.
+
+The algorithm itself is pretty straightforward. Just xor the given and
+the calculated ecc. If all bytes are 0 there is no problem. If 11 bits
+are 1 we have one correctable bit error. If there is 1 bit 1, we have an
+error in the given ecc code.
+
+It proved to be fastest to do some table lookups. Performance gain
+introduced by this is about a factor 2 on my system when a repair had to
+be done, and 1% or so if no repair had to be done.
+
+Code size increased from 330 bytes to 686 bytes for this function.
+(gcc 4.2, -O3)
+
+
+Conclusion
+==========
+
+The gain when calculating the ecc is tremendous. Om my development hardware
+a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
+embedded system with a MIPS core a factor 7 was obtained.
+
+On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
+5 (big endian mode, gcc 4.1.2, -O3)
+
+For correction not much gain could be obtained (as bitflips are rare). Then
+again there are also much less cycles spent there.
+
+It seems there is not much more gain possible in this, at least when
+programmed in C. Of course it might be possible to squeeze something more
+out of it with an assembler program, but due to pipeline behaviour etc
+this is very tricky (at least for intel hw).
+
+Author: Frans Meulenbroeks
+
+Copyright (C) 2008 Koninklijke Philips Electronics NV.
diff --git a/Documentation/driver-api/mtd/spi-nor.rst b/Documentation/driver-api/mtd/spi-nor.rst
new file mode 100644
index 000000000000..f5333e3bf486
--- /dev/null
+++ b/Documentation/driver-api/mtd/spi-nor.rst
@@ -0,0 +1,66 @@
+=================
+SPI NOR framework
+=================
+
+Part I - Why do we need this framework?
+---------------------------------------
+
+SPI bus controllers (drivers/spi/) only deal with streams of bytes; the bus
+controller operates agnostic of the specific device attached. However, some
+controllers (such as Freescale's QuadSPI controller) cannot easily handle
+arbitrary streams of bytes, but rather are designed specifically for SPI NOR.
+
+In particular, Freescale's QuadSPI controller must know the NOR commands to
+find the right LUT sequence. Unfortunately, the SPI subsystem has no notion of
+opcodes, addresses, or data payloads; a SPI controller simply knows to send or
+receive bytes (Tx and Rx). Therefore, we must define a new layering scheme under
+which the controller driver is aware of the opcodes, addressing, and other
+details of the SPI NOR protocol.
+
+Part II - How does the framework work?
+--------------------------------------
+
+This framework just adds a new layer between the MTD and the SPI bus driver.
+With this new layer, the SPI NOR controller driver does not depend on the
+m25p80 code anymore.
+
+Before this framework, the layer is like::
+
+                   MTD
+         ------------------------
+                  m25p80
+         ------------------------
+	       SPI bus driver
+         ------------------------
+	        SPI NOR chip
+
+   After this framework, the layer is like:
+                   MTD
+         ------------------------
+              SPI NOR framework
+         ------------------------
+                  m25p80
+         ------------------------
+	       SPI bus driver
+         ------------------------
+	       SPI NOR chip
+
+  With the SPI NOR controller driver (Freescale QuadSPI), it looks like:
+                   MTD
+         ------------------------
+              SPI NOR framework
+         ------------------------
+                fsl-quadSPI
+         ------------------------
+	       SPI NOR chip
+
+Part III - How can drivers use the framework?
+---------------------------------------------
+
+The main API is spi_nor_scan(). Before you call the hook, a driver should
+initialize the necessary fields for spi_nor{}. Please see
+drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
+when you want to write a new driver for a SPI NOR controller.
+Another API is spi_nor_restore(), this is used to restore the status of SPI
+flash chip such as addressing mode. Call it whenever detach the driver from
+device or reboot the system.
diff --git a/Documentation/mtd/index.rst b/Documentation/mtd/index.rst
deleted file mode 100644
index 4fdae418ac97..000000000000
--- a/Documentation/mtd/index.rst
+++ /dev/null
@@ -1,12 +0,0 @@
-:orphan:
-
-==============================
-Memory Technology Device (MTD)
-==============================
-
-.. toctree::
-   :maxdepth: 1
-
-   intel-spi
-   nand_ecc
-   spi-nor
diff --git a/Documentation/mtd/intel-spi.rst b/Documentation/mtd/intel-spi.rst
deleted file mode 100644
index 0e6d9cd5388d..000000000000
--- a/Documentation/mtd/intel-spi.rst
+++ /dev/null
@@ -1,90 +0,0 @@
-==============================
-Upgrading BIOS using intel-spi
-==============================
-
-Many Intel CPUs like Baytrail and Braswell include SPI serial flash host
-controller which is used to hold BIOS and other platform specific data.
-Since contents of the SPI serial flash is crucial for machine to function,
-it is typically protected by different hardware protection mechanisms to
-avoid accidental (or on purpose) overwrite of the content.
-
-Not all manufacturers protect the SPI serial flash, mainly because it
-allows upgrading the BIOS image directly from an OS.
-
-The intel-spi driver makes it possible to read and write the SPI serial
-flash, if certain protection bits are not set and locked. If it finds
-any of them set, the whole MTD device is made read-only to prevent
-partial overwrites. By default the driver exposes SPI serial flash
-contents as read-only but it can be changed from kernel command line,
-passing "intel-spi.writeable=1".
-
-Please keep in mind that overwriting the BIOS image on SPI serial flash
-might render the machine unbootable and requires special equipment like
-Dediprog to revive. You have been warned!
-
-Below are the steps how to upgrade MinnowBoard MAX BIOS directly from
-Linux.
-
- 1) Download and extract the latest Minnowboard MAX BIOS SPI image
-    [1]. At the time writing this the latest image is v92.
-
- 2) Install mtd-utils package [2]. We need this in order to erase the SPI
-    serial flash. Distros like Debian and Fedora have this prepackaged with
-    name "mtd-utils".
-
- 3) Add "intel-spi.writeable=1" to the kernel command line and reboot
-    the board (you can also reload the driver passing "writeable=1" as
-    module parameter to modprobe).
-
- 4) Once the board is up and running again, find the right MTD partition
-    (it is named as "BIOS")::
-
-	# cat /proc/mtd
-	dev:    size   erasesize  name
-	mtd0: 00800000 00001000 "BIOS"
-
-    So here it will be /dev/mtd0 but it may vary.
-
- 5) Make backup of the existing image first::
-
-	# dd if=/dev/mtd0ro of=bios.bak
-	16384+0 records in
-	16384+0 records out
-	8388608 bytes (8.4 MB) copied, 10.0269 s, 837 kB/s
-
- 6) Verify the backup:
-
-	# sha1sum /dev/mtd0ro bios.bak
-	fdbb011920572ca6c991377c4b418a0502668b73  /dev/mtd0ro
-	fdbb011920572ca6c991377c4b418a0502668b73  bios.bak
-
-    The SHA1 sums must match. Otherwise do not continue any further!
-
- 7) Erase the SPI serial flash. After this step, do not reboot the
-    board! Otherwise it will not start anymore::
-
-	# flash_erase /dev/mtd0 0 0
-	Erasing 4 Kibyte @ 7ff000 -- 100 % complete
-
- 8) Once completed without errors you can write the new BIOS image:
-
-    # dd if=MNW2MAX1.X64.0092.R01.1605221712.bin of=/dev/mtd0
-
- 9) Verify that the new content of the SPI serial flash matches the new
-    BIOS image::
-
-	# sha1sum /dev/mtd0ro MNW2MAX1.X64.0092.R01.1605221712.bin
-	9b4df9e4be2057fceec3a5529ec3d950836c87a2  /dev/mtd0ro
-	9b4df9e4be2057fceec3a5529ec3d950836c87a2 MNW2MAX1.X64.0092.R01.1605221712.bin
-
-    The SHA1 sums should match.
-
- 10) Now you can reboot your board and observe the new BIOS starting up
-     properly.
-
-References
-----------
-
-[1] https://firmware.intel.com/sites/default/files/MinnowBoard%2EMAX_%2EX64%2E92%2ER01%2Ezip
-
-[2] http://www.linux-mtd.infradead.org/
diff --git a/Documentation/mtd/nand_ecc.rst b/Documentation/mtd/nand_ecc.rst
deleted file mode 100644
index e8d3c53a5056..000000000000
--- a/Documentation/mtd/nand_ecc.rst
+++ /dev/null
@@ -1,763 +0,0 @@
-==========================
-NAND Error-correction Code
-==========================
-
-Introduction
-============
-
-Having looked at the linux mtd/nand driver and more specific at nand_ecc.c
-I felt there was room for optimisation. I bashed the code for a few hours
-performing tricks like table lookup removing superfluous code etc.
-After that the speed was increased by 35-40%.
-Still I was not too happy as I felt there was additional room for improvement.
-
-Bad! I was hooked.
-I decided to annotate my steps in this file. Perhaps it is useful to someone
-or someone learns something from it.
-
-
-The problem
-===========
-
-NAND flash (at least SLC one) typically has sectors of 256 bytes.
-However NAND flash is not extremely reliable so some error detection
-(and sometimes correction) is needed.
-
-This is done by means of a Hamming code. I'll try to explain it in
-laymans terms (and apologies to all the pro's in the field in case I do
-not use the right terminology, my coding theory class was almost 30
-years ago, and I must admit it was not one of my favourites).
-
-As I said before the ecc calculation is performed on sectors of 256
-bytes. This is done by calculating several parity bits over the rows and
-columns. The parity used is even parity which means that the parity bit = 1
-if the data over which the parity is calculated is 1 and the parity bit = 0
-if the data over which the parity is calculated is 0. So the total
-number of bits over the data over which the parity is calculated + the
-parity bit is even. (see wikipedia if you can't follow this).
-Parity is often calculated by means of an exclusive or operation,
-sometimes also referred to as xor. In C the operator for xor is ^
-
-Back to ecc.
-Let's give a small figure:
-
-=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
-byte   0:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp4 ... rp14
-byte   1:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp2 rp4 ... rp14
-byte   2:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp4 ... rp14
-byte   3:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp4 ... rp14
-byte   4:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp2 rp5 ... rp14
-...
-byte 254:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp0 rp3 rp5 ... rp15
-byte 255:  bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0   rp1 rp3 rp5 ... rp15
-           cp1  cp0  cp1  cp0  cp1  cp0  cp1  cp0
-           cp3  cp3  cp2  cp2  cp3  cp3  cp2  cp2
-           cp5  cp5  cp5  cp5  cp4  cp4  cp4  cp4
-=========  ==== ==== ==== ==== ==== ==== ==== ====   === === === === ====
-
-This figure represents a sector of 256 bytes.
-cp is my abbreviation for column parity, rp for row parity.
-
-Let's start to explain column parity.
-
-- cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
-
-  so the sum of all bit0, bit2, bit4 and bit6 values + cp0 itself is even.
-
-Similarly cp1 is the sum of all bit1, bit3, bit5 and bit7.
-
-- cp2 is the parity over bit0, bit1, bit4 and bit5
-- cp3 is the parity over bit2, bit3, bit6 and bit7.
-- cp4 is the parity over bit0, bit1, bit2 and bit3.
-- cp5 is the parity over bit4, bit5, bit6 and bit7.
-
-Note that each of cp0 .. cp5 is exactly one bit.
-
-Row parity actually works almost the same.
-
-- rp0 is the parity of all even bytes (0, 2, 4, 6, ... 252, 254)
-- rp1 is the parity of all odd bytes (1, 3, 5, 7, ..., 253, 255)
-- rp2 is the parity of all bytes 0, 1, 4, 5, 8, 9, ...
-  (so handle two bytes, then skip 2 bytes).
-- rp3 is covers the half rp2 does not cover (bytes 2, 3, 6, 7, 10, 11, ...)
-- for rp4 the rule is cover 4 bytes, skip 4 bytes, cover 4 bytes, skip 4 etc.
-
-  so rp4 calculates parity over bytes 0, 1, 2, 3, 8, 9, 10, 11, 16, ...)
-- and rp5 covers the other half, so bytes 4, 5, 6, 7, 12, 13, 14, 15, 20, ..
-
-The story now becomes quite boring. I guess you get the idea.
-
-- rp6 covers 8 bytes then skips 8 etc
-- rp7 skips 8 bytes then covers 8 etc
-- rp8 covers 16 bytes then skips 16 etc
-- rp9 skips 16 bytes then covers 16 etc
-- rp10 covers 32 bytes then skips 32 etc
-- rp11 skips 32 bytes then covers 32 etc
-- rp12 covers 64 bytes then skips 64 etc
-- rp13 skips 64 bytes then covers 64 etc
-- rp14 covers 128 bytes then skips 128
-- rp15 skips 128 bytes then covers 128
-
-In the end the parity bits are grouped together in three bytes as
-follows:
-
-=====  ===== ===== ===== ===== ===== ===== ===== =====
-ECC    Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
-=====  ===== ===== ===== ===== ===== ===== ===== =====
-ECC 0   rp07  rp06  rp05  rp04  rp03  rp02  rp01  rp00
-ECC 1   rp15  rp14  rp13  rp12  rp11  rp10  rp09  rp08
-ECC 2   cp5   cp4   cp3   cp2   cp1   cp0      1     1
-=====  ===== ===== ===== ===== ===== ===== ===== =====
-
-I detected after writing this that ST application note AN1823
-(http://www.st.com/stonline/) gives a much
-nicer picture.(but they use line parity as term where I use row parity)
-Oh well, I'm graphically challenged, so suffer with me for a moment :-)
-
-And I could not reuse the ST picture anyway for copyright reasons.
-
-
-Attempt 0
-=========
-
-Implementing the parity calculation is pretty simple.
-In C pseudocode::
-
-  for (i = 0; i < 256; i++)
-  {
-    if (i & 0x01)
-       rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
-    else
-       rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
-    if (i & 0x02)
-       rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
-    else
-       rp2 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp2;
-    if (i & 0x04)
-      rp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp5;
-    else
-      rp4 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp4;
-    if (i & 0x08)
-      rp7 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp7;
-    else
-      rp6 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp6;
-    if (i & 0x10)
-      rp9 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp9;
-    else
-      rp8 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp8;
-    if (i & 0x20)
-      rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
-    else
-      rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
-    if (i & 0x40)
-      rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
-    else
-      rp12 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp12;
-    if (i & 0x80)
-      rp15 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp15;
-    else
-      rp14 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp14;
-    cp0 = bit6 ^ bit4 ^ bit2 ^ bit0 ^ cp0;
-    cp1 = bit7 ^ bit5 ^ bit3 ^ bit1 ^ cp1;
-    cp2 = bit5 ^ bit4 ^ bit1 ^ bit0 ^ cp2;
-    cp3 = bit7 ^ bit6 ^ bit3 ^ bit2 ^ cp3
-    cp4 = bit3 ^ bit2 ^ bit1 ^ bit0 ^ cp4
-    cp5 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ cp5
-  }
-
-
-Analysis 0
-==========
-
-C does have bitwise operators but not really operators to do the above
-efficiently (and most hardware has no such instructions either).
-Therefore without implementing this it was clear that the code above was
-not going to bring me a Nobel prize :-)
-
-Fortunately the exclusive or operation is commutative, so we can combine
-the values in any order. So instead of calculating all the bits
-individually, let us try to rearrange things.
-For the column parity this is easy. We can just xor the bytes and in the
-end filter out the relevant bits. This is pretty nice as it will bring
-all cp calculation out of the for loop.
-
-Similarly we can first xor the bytes for the various rows.
-This leads to:
-
-
-Attempt 1
-=========
-
-::
-
-  const char parity[256] = {
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1,
-      0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0
-  };
-
-  void ecc1(const unsigned char *buf, unsigned char *code)
-  {
-      int i;
-      const unsigned char *bp = buf;
-      unsigned char cur;
-      unsigned char rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
-      unsigned char rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
-      unsigned char par;
-
-      par = 0;
-      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
-      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
-      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
-      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
-
-      for (i = 0; i < 256; i++)
-      {
-          cur = *bp++;
-          par ^= cur;
-          if (i & 0x01) rp1 ^= cur; else rp0 ^= cur;
-          if (i & 0x02) rp3 ^= cur; else rp2 ^= cur;
-          if (i & 0x04) rp5 ^= cur; else rp4 ^= cur;
-          if (i & 0x08) rp7 ^= cur; else rp6 ^= cur;
-          if (i & 0x10) rp9 ^= cur; else rp8 ^= cur;
-          if (i & 0x20) rp11 ^= cur; else rp10 ^= cur;
-          if (i & 0x40) rp13 ^= cur; else rp12 ^= cur;
-          if (i & 0x80) rp15 ^= cur; else rp14 ^= cur;
-      }
-      code[0] =
-          (parity[rp7] << 7) |
-          (parity[rp6] << 6) |
-          (parity[rp5] << 5) |
-          (parity[rp4] << 4) |
-          (parity[rp3] << 3) |
-          (parity[rp2] << 2) |
-          (parity[rp1] << 1) |
-          (parity[rp0]);
-      code[1] =
-          (parity[rp15] << 7) |
-          (parity[rp14] << 6) |
-          (parity[rp13] << 5) |
-          (parity[rp12] << 4) |
-          (parity[rp11] << 3) |
-          (parity[rp10] << 2) |
-          (parity[rp9]  << 1) |
-          (parity[rp8]);
-      code[2] =
-          (parity[par & 0xf0] << 7) |
-          (parity[par & 0x0f] << 6) |
-          (parity[par & 0xcc] << 5) |
-          (parity[par & 0x33] << 4) |
-          (parity[par & 0xaa] << 3) |
-          (parity[par & 0x55] << 2);
-      code[0] = ~code[0];
-      code[1] = ~code[1];
-      code[2] = ~code[2];
-  }
-
-Still pretty straightforward. The last three invert statements are there to
-give a checksum of 0xff 0xff 0xff for an empty flash. In an empty flash
-all data is 0xff, so the checksum then matches.
-
-I also introduced the parity lookup. I expected this to be the fastest
-way to calculate the parity, but I will investigate alternatives later
-on.
-
-
-Analysis 1
-==========
-
-The code works, but is not terribly efficient. On my system it took
-almost 4 times as much time as the linux driver code. But hey, if it was
-*that* easy this would have been done long before.
-No pain. no gain.
-
-Fortunately there is plenty of room for improvement.
-
-In step 1 we moved from bit-wise calculation to byte-wise calculation.
-However in C we can also use the unsigned long data type and virtually
-every modern microprocessor supports 32 bit operations, so why not try
-to write our code in such a way that we process data in 32 bit chunks.
-
-Of course this means some modification as the row parity is byte by
-byte. A quick analysis:
-for the column parity we use the par variable. When extending to 32 bits
-we can in the end easily calculate rp0 and rp1 from it.
-(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
-respectively, from MSB to LSB)
-also rp2 and rp3 can be easily retrieved from par as rp3 covers the
-first two MSBs and rp2 covers the last two LSBs.
-
-Note that of course now the loop is executed only 64 times (256/4).
-And note that care must taken wrt byte ordering. The way bytes are
-ordered in a long is machine dependent, and might affect us.
-Anyway, if there is an issue: this code is developed on x86 (to be
-precise: a DELL PC with a D920 Intel CPU)
-
-And of course the performance might depend on alignment, but I expect
-that the I/O buffers in the nand driver are aligned properly (and
-otherwise that should be fixed to get maximum performance).
-
-Let's give it a try...
-
-
-Attempt 2
-=========
-
-::
-
-  extern const char parity[256];
-
-  void ecc2(const unsigned char *buf, unsigned char *code)
-  {
-      int i;
-      const unsigned long *bp = (unsigned long *)buf;
-      unsigned long cur;
-      unsigned long rp0, rp1, rp2, rp3, rp4, rp5, rp6, rp7;
-      unsigned long rp8, rp9, rp10, rp11, rp12, rp13, rp14, rp15;
-      unsigned long par;
-
-      par = 0;
-      rp0 = 0; rp1 = 0; rp2 = 0; rp3 = 0;
-      rp4 = 0; rp5 = 0; rp6 = 0; rp7 = 0;
-      rp8 = 0; rp9 = 0; rp10 = 0; rp11 = 0;
-      rp12 = 0; rp13 = 0; rp14 = 0; rp15 = 0;
-
-      for (i = 0; i < 64; i++)
-      {
-          cur = *bp++;
-          par ^= cur;
-          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
-          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
-          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
-          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
-          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
-          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
-      }
-      /*
-         we need to adapt the code generation for the fact that rp vars are now
-         long; also the column parity calculation needs to be changed.
-         we'll bring rp4 to 15 back to single byte entities by shifting and
-         xoring
-      */
-      rp4 ^= (rp4 >> 16); rp4 ^= (rp4 >> 8); rp4 &= 0xff;
-      rp5 ^= (rp5 >> 16); rp5 ^= (rp5 >> 8); rp5 &= 0xff;
-      rp6 ^= (rp6 >> 16); rp6 ^= (rp6 >> 8); rp6 &= 0xff;
-      rp7 ^= (rp7 >> 16); rp7 ^= (rp7 >> 8); rp7 &= 0xff;
-      rp8 ^= (rp8 >> 16); rp8 ^= (rp8 >> 8); rp8 &= 0xff;
-      rp9 ^= (rp9 >> 16); rp9 ^= (rp9 >> 8); rp9 &= 0xff;
-      rp10 ^= (rp10 >> 16); rp10 ^= (rp10 >> 8); rp10 &= 0xff;
-      rp11 ^= (rp11 >> 16); rp11 ^= (rp11 >> 8); rp11 &= 0xff;
-      rp12 ^= (rp12 >> 16); rp12 ^= (rp12 >> 8); rp12 &= 0xff;
-      rp13 ^= (rp13 >> 16); rp13 ^= (rp13 >> 8); rp13 &= 0xff;
-      rp14 ^= (rp14 >> 16); rp14 ^= (rp14 >> 8); rp14 &= 0xff;
-      rp15 ^= (rp15 >> 16); rp15 ^= (rp15 >> 8); rp15 &= 0xff;
-      rp3 = (par >> 16); rp3 ^= (rp3 >> 8); rp3 &= 0xff;
-      rp2 = par & 0xffff; rp2 ^= (rp2 >> 8); rp2 &= 0xff;
-      par ^= (par >> 16);
-      rp1 = (par >> 8); rp1 &= 0xff;
-      rp0 = (par & 0xff);
-      par ^= (par >> 8); par &= 0xff;
-
-      code[0] =
-          (parity[rp7] << 7) |
-          (parity[rp6] << 6) |
-          (parity[rp5] << 5) |
-          (parity[rp4] << 4) |
-          (parity[rp3] << 3) |
-          (parity[rp2] << 2) |
-          (parity[rp1] << 1) |
-          (parity[rp0]);
-      code[1] =
-          (parity[rp15] << 7) |
-          (parity[rp14] << 6) |
-          (parity[rp13] << 5) |
-          (parity[rp12] << 4) |
-          (parity[rp11] << 3) |
-          (parity[rp10] << 2) |
-          (parity[rp9]  << 1) |
-          (parity[rp8]);
-      code[2] =
-          (parity[par & 0xf0] << 7) |
-          (parity[par & 0x0f] << 6) |
-          (parity[par & 0xcc] << 5) |
-          (parity[par & 0x33] << 4) |
-          (parity[par & 0xaa] << 3) |
-          (parity[par & 0x55] << 2);
-      code[0] = ~code[0];
-      code[1] = ~code[1];
-      code[2] = ~code[2];
-  }
-
-The parity array is not shown any more. Note also that for these
-examples I kinda deviated from my regular programming style by allowing
-multiple statements on a line, not using { } in then and else blocks
-with only a single statement and by using operators like ^=
-
-
-Analysis 2
-==========
-
-The code (of course) works, and hurray: we are a little bit faster than
-the linux driver code (about 15%). But wait, don't cheer too quickly.
-There is more to be gained.
-If we look at e.g. rp14 and rp15 we see that we either xor our data with
-rp14 or with rp15. However we also have par which goes over all data.
-This means there is no need to calculate rp14 as it can be calculated from
-rp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
-(or if desired we can avoid calculating rp15 and calculate it from
-rp14).  That is why some places refer to inverse parity.
-Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
-Effectively this means we can eliminate the else clause from the if
-statements. Also we can optimise the calculation in the end a little bit
-by going from long to byte first. Actually we can even avoid the table
-lookups
-
-Attempt 3
-=========
-
-Odd replaced::
-
-          if (i & 0x01) rp5 ^= cur; else rp4 ^= cur;
-          if (i & 0x02) rp7 ^= cur; else rp6 ^= cur;
-          if (i & 0x04) rp9 ^= cur; else rp8 ^= cur;
-          if (i & 0x08) rp11 ^= cur; else rp10 ^= cur;
-          if (i & 0x10) rp13 ^= cur; else rp12 ^= cur;
-          if (i & 0x20) rp15 ^= cur; else rp14 ^= cur;
-
-with::
-
-          if (i & 0x01) rp5 ^= cur;
-          if (i & 0x02) rp7 ^= cur;
-          if (i & 0x04) rp9 ^= cur;
-          if (i & 0x08) rp11 ^= cur;
-          if (i & 0x10) rp13 ^= cur;
-          if (i & 0x20) rp15 ^= cur;
-
-and outside the loop added::
-
-          rp4  = par ^ rp5;
-          rp6  = par ^ rp7;
-          rp8  = par ^ rp9;
-          rp10  = par ^ rp11;
-          rp12  = par ^ rp13;
-          rp14  = par ^ rp15;
-
-And after that the code takes about 30% more time, although the number of
-statements is reduced. This is also reflected in the assembly code.
-
-
-Analysis 3
-==========
-
-Very weird. Guess it has to do with caching or instruction parallellism
-or so. I also tried on an eeePC (Celeron, clocked at 900 Mhz). Interesting
-observation was that this one is only 30% slower (according to time)
-executing the code as my 3Ghz D920 processor.
-
-Well, it was expected not to be easy so maybe instead move to a
-different track: let's move back to the code from attempt2 and do some
-loop unrolling. This will eliminate a few if statements. I'll try
-different amounts of unrolling to see what works best.
-
-
-Attempt 4
-=========
-
-Unrolled the loop 1, 2, 3 and 4 times.
-For 4 the code starts with::
-
-    for (i = 0; i < 4; i++)
-    {
-        cur = *bp++;
-        par ^= cur;
-        rp4 ^= cur;
-        rp6 ^= cur;
-        rp8 ^= cur;
-        rp10 ^= cur;
-        if (i & 0x1) rp13 ^= cur; else rp12 ^= cur;
-        if (i & 0x2) rp15 ^= cur; else rp14 ^= cur;
-        cur = *bp++;
-        par ^= cur;
-        rp5 ^= cur;
-        rp6 ^= cur;
-        ...
-
-
-Analysis 4
-==========
-
-Unrolling once gains about 15%
-
-Unrolling twice keeps the gain at about 15%
-
-Unrolling three times gives a gain of 30% compared to attempt 2.
-
-Unrolling four times gives a marginal improvement compared to unrolling
-three times.
-
-I decided to proceed with a four time unrolled loop anyway. It was my gut
-feeling that in the next steps I would obtain additional gain from it.
-
-The next step was triggered by the fact that par contains the xor of all
-bytes and rp4 and rp5 each contain the xor of half of the bytes.
-So in effect par = rp4 ^ rp5. But as xor is commutative we can also say
-that rp5 = par ^ rp4. So no need to keep both rp4 and rp5 around. We can
-eliminate rp5 (or rp4, but I already foresaw another optimisation).
-The same holds for rp6/7, rp8/9, rp10/11 rp12/13 and rp14/15.
-
-
-Attempt 5
-=========
-
-Effectively so all odd digit rp assignments in the loop were removed.
-This included the else clause of the if statements.
-Of course after the loop we need to correct things by adding code like::
-
-    rp5 = par ^ rp4;
-
-Also the initial assignments (rp5 = 0; etc) could be removed.
-Along the line I also removed the initialisation of rp0/1/2/3.
-
-
-Analysis 5
-==========
-
-Measurements showed this was a good move. The run-time roughly halved
-compared with attempt 4 with 4 times unrolled, and we only require 1/3rd
-of the processor time compared to the current code in the linux kernel.
-
-However, still I thought there was more. I didn't like all the if
-statements. Why not keep a running parity and only keep the last if
-statement. Time for yet another version!
-
-
-Attempt 6
-=========
-
-THe code within the for loop was changed to::
-
-    for (i = 0; i < 4; i++)
-    {
-        cur = *bp++; tmppar  = cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
-
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
-
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp8 ^= cur;
-
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur;
-
-        par ^= tmppar;
-        if ((i & 0x1) == 0) rp12 ^= tmppar;
-        if ((i & 0x2) == 0) rp14 ^= tmppar;
-    }
-
-As you can see tmppar is used to accumulate the parity within a for
-iteration. In the last 3 statements is added to par and, if needed,
-to rp12 and rp14.
-
-While making the changes I also found that I could exploit that tmppar
-contains the running parity for this iteration. So instead of having:
-rp4 ^= cur; rp6 ^= cur;
-I removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
-statement. A similar change was done for rp8 and rp10
-
-
-Analysis 6
-==========
-
-Measuring this code again showed big gain. When executing the original
-linux code 1 million times, this took about 1 second on my system.
-(using time to measure the performance). After this iteration I was back
-to 0.075 sec. Actually I had to decide to start measuring over 10
-million iterations in order not to lose too much accuracy. This one
-definitely seemed to be the jackpot!
-
-There is a little bit more room for improvement though. There are three
-places with statements::
-
-	rp4 ^= cur; rp6 ^= cur;
-
-It seems more efficient to also maintain a variable rp4_6 in the while
-loop; This eliminates 3 statements per loop. Of course after the loop we
-need to correct by adding::
-
-	rp4 ^= rp4_6;
-	rp6 ^= rp4_6
-
-Furthermore there are 4 sequential assignments to rp8. This can be
-encoded slightly more efficiently by saving tmppar before those 4 lines
-and later do rp8 = rp8 ^ tmppar ^ notrp8;
-(where notrp8 is the value of rp8 before those 4 lines).
-Again a use of the commutative property of xor.
-Time for a new test!
-
-
-Attempt 7
-=========
-
-The new code now looks like::
-
-    for (i = 0; i < 4; i++)
-    {
-        cur = *bp++; tmppar  = cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= tmppar;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp8 ^= tmppar;
-
-        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
-
-        notrp8 = tmppar;
-        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur;
-        rp8 = rp8 ^ tmppar ^ notrp8;
-
-        cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp6 ^= cur;
-        cur = *bp++; tmppar ^= cur; rp4 ^= cur;
-        cur = *bp++; tmppar ^= cur;
-
-        par ^= tmppar;
-        if ((i & 0x1) == 0) rp12 ^= tmppar;
-        if ((i & 0x2) == 0) rp14 ^= tmppar;
-    }
-    rp4 ^= rp4_6;
-    rp6 ^= rp4_6;
-
-
-Not a big change, but every penny counts :-)
-
-
-Analysis 7
-==========
-
-Actually this made things worse. Not very much, but I don't want to move
-into the wrong direction. Maybe something to investigate later. Could
-have to do with caching again.
-
-Guess that is what there is to win within the loop. Maybe unrolling one
-more time will help. I'll keep the optimisations from 7 for now.
-
-
-Attempt 8
-=========
-
-Unrolled the loop one more time.
-
-
-Analysis 8
-==========
-
-This makes things worse. Let's stick with attempt 6 and continue from there.
-Although it seems that the code within the loop cannot be optimised
-further there is still room to optimize the generation of the ecc codes.
-We can simply calculate the total parity. If this is 0 then rp4 = rp5
-etc. If the parity is 1, then rp4 = !rp5;
-
-But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
-in the result byte and then do something like::
-
-    code[0] |= (code[0] << 1);
-
-Lets test this.
-
-
-Attempt 9
-=========
-
-Changed the code but again this slightly degrades performance. Tried all
-kind of other things, like having dedicated parity arrays to avoid the
-shift after parity[rp7] << 7; No gain.
-Change the lookup using the parity array by using shift operators (e.g.
-replace parity[rp7] << 7 with::
-
-	rp7 ^= (rp7 << 4);
-	rp7 ^= (rp7 << 2);
-	rp7 ^= (rp7 << 1);
-	rp7 &= 0x80;
-
-No gain.
-
-The only marginal change was inverting the parity bits, so we can remove
-the last three invert statements.
-
-Ah well, pity this does not deliver more. Then again 10 million
-iterations using the linux driver code takes between 13 and 13.5
-seconds, whereas my code now takes about 0.73 seconds for those 10
-million iterations. So basically I've improved the performance by a
-factor 18 on my system. Not that bad. Of course on different hardware
-you will get different results. No warranties!
-
-But of course there is no such thing as a free lunch. The codesize almost
-tripled (from 562 bytes to 1434 bytes). Then again, it is not that much.
-
-
-Correcting errors
-=================
-
-For correcting errors I again used the ST application note as a starter,
-but I also peeked at the existing code.
-
-The algorithm itself is pretty straightforward. Just xor the given and
-the calculated ecc. If all bytes are 0 there is no problem. If 11 bits
-are 1 we have one correctable bit error. If there is 1 bit 1, we have an
-error in the given ecc code.
-
-It proved to be fastest to do some table lookups. Performance gain
-introduced by this is about a factor 2 on my system when a repair had to
-be done, and 1% or so if no repair had to be done.
-
-Code size increased from 330 bytes to 686 bytes for this function.
-(gcc 4.2, -O3)
-
-
-Conclusion
-==========
-
-The gain when calculating the ecc is tremendous. Om my development hardware
-a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
-embedded system with a MIPS core a factor 7 was obtained.
-
-On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
-5 (big endian mode, gcc 4.1.2, -O3)
-
-For correction not much gain could be obtained (as bitflips are rare). Then
-again there are also much less cycles spent there.
-
-It seems there is not much more gain possible in this, at least when
-programmed in C. Of course it might be possible to squeeze something more
-out of it with an assembler program, but due to pipeline behaviour etc
-this is very tricky (at least for intel hw).
-
-Author: Frans Meulenbroeks
-
-Copyright (C) 2008 Koninklijke Philips Electronics NV.
diff --git a/Documentation/mtd/spi-nor.rst b/Documentation/mtd/spi-nor.rst
deleted file mode 100644
index f5333e3bf486..000000000000
--- a/Documentation/mtd/spi-nor.rst
+++ /dev/null
@@ -1,66 +0,0 @@
-=================
-SPI NOR framework
-=================
-
-Part I - Why do we need this framework?
----------------------------------------
-
-SPI bus controllers (drivers/spi/) only deal with streams of bytes; the bus
-controller operates agnostic of the specific device attached. However, some
-controllers (such as Freescale's QuadSPI controller) cannot easily handle
-arbitrary streams of bytes, but rather are designed specifically for SPI NOR.
-
-In particular, Freescale's QuadSPI controller must know the NOR commands to
-find the right LUT sequence. Unfortunately, the SPI subsystem has no notion of
-opcodes, addresses, or data payloads; a SPI controller simply knows to send or
-receive bytes (Tx and Rx). Therefore, we must define a new layering scheme under
-which the controller driver is aware of the opcodes, addressing, and other
-details of the SPI NOR protocol.
-
-Part II - How does the framework work?
---------------------------------------
-
-This framework just adds a new layer between the MTD and the SPI bus driver.
-With this new layer, the SPI NOR controller driver does not depend on the
-m25p80 code anymore.
-
-Before this framework, the layer is like::
-
-                   MTD
-         ------------------------
-                  m25p80
-         ------------------------
-	       SPI bus driver
-         ------------------------
-	        SPI NOR chip
-
-   After this framework, the layer is like:
-                   MTD
-         ------------------------
-              SPI NOR framework
-         ------------------------
-                  m25p80
-         ------------------------
-	       SPI bus driver
-         ------------------------
-	       SPI NOR chip
-
-  With the SPI NOR controller driver (Freescale QuadSPI), it looks like:
-                   MTD
-         ------------------------
-              SPI NOR framework
-         ------------------------
-                fsl-quadSPI
-         ------------------------
-	       SPI NOR chip
-
-Part III - How can drivers use the framework?
----------------------------------------------
-
-The main API is spi_nor_scan(). Before you call the hook, a driver should
-initialize the necessary fields for spi_nor{}. Please see
-drivers/mtd/spi-nor/spi-nor.c for detail. Please also refer to fsl-quadspi.c
-when you want to write a new driver for a SPI NOR controller.
-Another API is spi_nor_restore(), this is used to restore the status of SPI
-flash chip such as addressing mode. Call it whenever detach the driver from
-device or reboot the system.
diff --git a/drivers/mtd/nand/raw/nand_ecc.c b/drivers/mtd/nand/raw/nand_ecc.c
index f6a7808db818..09fdced659f5 100644
--- a/drivers/mtd/nand/raw/nand_ecc.c
+++ b/drivers/mtd/nand/raw/nand_ecc.c
@@ -11,7 +11,7 @@
  *   Thomas Gleixner (tglx@linutronix.de)
  *
  * Information on how this algorithm works and how it was developed
- * can be found in Documentation/mtd/nand_ecc.rst
+ * can be found in Documentation/driver-api/mtd/nand_ecc.rst
  */
 
 #include <linux/types.h>
-- 
cgit v1.2.3-55-g7522


From e253d2c551ce876a374d533fbcc9e8f31142dcad Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:46:30 -0300
Subject: docs: nfc: add it to the driver-api book

Most of the descriptions here are oriented to a Kernel developer.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst         |   1 +
 Documentation/driver-api/nfc/index.rst     |   9 +
 Documentation/driver-api/nfc/nfc-hci.rst   | 311 +++++++++++++++++++++++++++++
 Documentation/driver-api/nfc/nfc-pn544.rst |  34 ++++
 Documentation/nfc/index.rst                |  11 -
 Documentation/nfc/nfc-hci.rst              | 311 -----------------------------
 Documentation/nfc/nfc-pn544.rst            |  34 ----
 7 files changed, 355 insertions(+), 356 deletions(-)
 create mode 100644 Documentation/driver-api/nfc/index.rst
 create mode 100644 Documentation/driver-api/nfc/nfc-hci.rst
 create mode 100644 Documentation/driver-api/nfc/nfc-pn544.rst
 delete mode 100644 Documentation/nfc/index.rst
 delete mode 100644 Documentation/nfc/nfc-hci.rst
 delete mode 100644 Documentation/nfc/nfc-pn544.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 7ecc65093493..d6bf4a37cefe 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -56,6 +56,7 @@ available subsections can be seen below.
    pinctl
    gpio/index
    misc_devices
+   nfc/index
    dmaengine/index
    slimbus
    soundwire/index
diff --git a/Documentation/driver-api/nfc/index.rst b/Documentation/driver-api/nfc/index.rst
new file mode 100644
index 000000000000..3afb2c0c2e3c
--- /dev/null
+++ b/Documentation/driver-api/nfc/index.rst
@@ -0,0 +1,9 @@
+========================
+Near Field Communication
+========================
+
+.. toctree::
+   :maxdepth: 1
+
+   nfc-hci
+   nfc-pn544
diff --git a/Documentation/driver-api/nfc/nfc-hci.rst b/Documentation/driver-api/nfc/nfc-hci.rst
new file mode 100644
index 000000000000..eb8a1a14e919
--- /dev/null
+++ b/Documentation/driver-api/nfc/nfc-hci.rst
@@ -0,0 +1,311 @@
+========================
+HCI backend for NFC Core
+========================
+
+- Author: Eric Lapuyade, Samuel Ortiz
+- Contact: eric.lapuyade@intel.com, samuel.ortiz@intel.com
+
+General
+-------
+
+The HCI layer implements much of the ETSI TS 102 622 V10.2.0 specification. It
+enables easy writing of HCI-based NFC drivers. The HCI layer runs as an NFC Core
+backend, implementing an abstract nfc device and translating NFC Core API
+to HCI commands and events.
+
+HCI
+---
+
+HCI registers as an nfc device with NFC Core. Requests coming from userspace are
+routed through netlink sockets to NFC Core and then to HCI. From this point,
+they are translated in a sequence of HCI commands sent to the HCI layer in the
+host controller (the chip). Commands can be executed synchronously (the sending
+context blocks waiting for response) or asynchronously (the response is returned
+from HCI Rx context).
+HCI events can also be received from the host controller. They will be handled
+and a translation will be forwarded to NFC Core as needed. There are hooks to
+let the HCI driver handle proprietary events or override standard behavior.
+HCI uses 2 execution contexts:
+
+- one for executing commands : nfc_hci_msg_tx_work(). Only one command
+  can be executing at any given moment.
+- one for dispatching received events and commands : nfc_hci_msg_rx_work().
+
+HCI Session initialization
+--------------------------
+
+The Session initialization is an HCI standard which must unfortunately
+support proprietary gates. This is the reason why the driver will pass a list
+of proprietary gates that must be part of the session. HCI will ensure all
+those gates have pipes connected when the hci device is set up.
+In case the chip supports pre-opened gates and pseudo-static pipes, the driver
+can pass that information to HCI core.
+
+HCI Gates and Pipes
+-------------------
+
+A gate defines the 'port' where some service can be found. In order to access
+a service, one must create a pipe to that gate and open it. In this
+implementation, pipes are totally hidden. The public API only knows gates.
+This is consistent with the driver need to send commands to proprietary gates
+without knowing the pipe connected to it.
+
+Driver interface
+----------------
+
+A driver is generally written in two parts : the physical link management and
+the HCI management. This makes it easier to maintain a driver for a chip that
+can be connected using various phy (i2c, spi, ...)
+
+HCI Management
+--------------
+
+A driver would normally register itself with HCI and provide the following
+entry points::
+
+  struct nfc_hci_ops {
+	int (*open)(struct nfc_hci_dev *hdev);
+	void (*close)(struct nfc_hci_dev *hdev);
+	int (*hci_ready) (struct nfc_hci_dev *hdev);
+	int (*xmit) (struct nfc_hci_dev *hdev, struct sk_buff *skb);
+	int (*start_poll) (struct nfc_hci_dev *hdev,
+			   u32 im_protocols, u32 tm_protocols);
+	int (*dep_link_up)(struct nfc_hci_dev *hdev, struct nfc_target *target,
+			   u8 comm_mode, u8 *gb, size_t gb_len);
+	int (*dep_link_down)(struct nfc_hci_dev *hdev);
+	int (*target_from_gate) (struct nfc_hci_dev *hdev, u8 gate,
+				 struct nfc_target *target);
+	int (*complete_target_discovered) (struct nfc_hci_dev *hdev, u8 gate,
+					   struct nfc_target *target);
+	int (*im_transceive) (struct nfc_hci_dev *hdev,
+			      struct nfc_target *target, struct sk_buff *skb,
+			      data_exchange_cb_t cb, void *cb_context);
+	int (*tm_send)(struct nfc_hci_dev *hdev, struct sk_buff *skb);
+	int (*check_presence)(struct nfc_hci_dev *hdev,
+			      struct nfc_target *target);
+	int (*event_received)(struct nfc_hci_dev *hdev, u8 gate, u8 event,
+			      struct sk_buff *skb);
+  };
+
+- open() and close() shall turn the hardware on and off.
+- hci_ready() is an optional entry point that is called right after the hci
+  session has been set up. The driver can use it to do additional initialization
+  that must be performed using HCI commands.
+- xmit() shall simply write a frame to the physical link.
+- start_poll() is an optional entrypoint that shall set the hardware in polling
+  mode. This must be implemented only if the hardware uses proprietary gates or a
+  mechanism slightly different from the HCI standard.
+- dep_link_up() is called after a p2p target has been detected, to finish
+  the p2p connection setup with hardware parameters that need to be passed back
+  to nfc core.
+- dep_link_down() is called to bring the p2p link down.
+- target_from_gate() is an optional entrypoint to return the nfc protocols
+  corresponding to a proprietary gate.
+- complete_target_discovered() is an optional entry point to let the driver
+  perform additional proprietary processing necessary to auto activate the
+  discovered target.
+- im_transceive() must be implemented by the driver if proprietary HCI commands
+  are required to send data to the tag. Some tag types will require custom
+  commands, others can be written to using the standard HCI commands. The driver
+  can check the tag type and either do proprietary processing, or return 1 to ask
+  for standard processing. The data exchange command itself must be sent
+  asynchronously.
+- tm_send() is called to send data in the case of a p2p connection
+- check_presence() is an optional entry point that will be called regularly
+  by the core to check that an activated tag is still in the field. If this is
+  not implemented, the core will not be able to push tag_lost events to the user
+  space
+- event_received() is called to handle an event coming from the chip. Driver
+  can handle the event or return 1 to let HCI attempt standard processing.
+
+On the rx path, the driver is responsible to push incoming HCP frames to HCI
+using nfc_hci_recv_frame(). HCI will take care of re-aggregation and handling
+This must be done from a context that can sleep.
+
+PHY Management
+--------------
+
+The physical link (i2c, ...) management is defined by the following structure::
+
+  struct nfc_phy_ops {
+	int (*write)(void *dev_id, struct sk_buff *skb);
+	int (*enable)(void *dev_id);
+	void (*disable)(void *dev_id);
+  };
+
+enable():
+	turn the phy on (power on), make it ready to transfer data
+disable():
+	turn the phy off
+write():
+	Send a data frame to the chip. Note that to enable higher
+	layers such as an llc to store the frame for re-emission, this
+	function must not alter the skb. It must also not return a positive
+	result (return 0 for success, negative for failure).
+
+Data coming from the chip shall be sent directly to nfc_hci_recv_frame().
+
+LLC
+---
+
+Communication between the CPU and the chip often requires some link layer
+protocol. Those are isolated as modules managed by the HCI layer. There are
+currently two modules : nop (raw transfert) and shdlc.
+A new llc must implement the following functions::
+
+  struct nfc_llc_ops {
+	void *(*init) (struct nfc_hci_dev *hdev, xmit_to_drv_t xmit_to_drv,
+		       rcv_to_hci_t rcv_to_hci, int tx_headroom,
+		       int tx_tailroom, int *rx_headroom, int *rx_tailroom,
+		       llc_failure_t llc_failure);
+	void (*deinit) (struct nfc_llc *llc);
+	int (*start) (struct nfc_llc *llc);
+	int (*stop) (struct nfc_llc *llc);
+	void (*rcv_from_drv) (struct nfc_llc *llc, struct sk_buff *skb);
+	int (*xmit_from_hci) (struct nfc_llc *llc, struct sk_buff *skb);
+  };
+
+init():
+	allocate and init your private storage
+deinit():
+	cleanup
+start():
+	establish the logical connection
+stop ():
+	terminate the logical connection
+rcv_from_drv():
+	handle data coming from the chip, going to HCI
+xmit_from_hci():
+	handle data sent by HCI, going to the chip
+
+The llc must be registered with nfc before it can be used. Do that by
+calling::
+
+	nfc_llc_register(const char *name, struct nfc_llc_ops *ops);
+
+Again, note that the llc does not handle the physical link. It is thus very
+easy to mix any physical link with any llc for a given chip driver.
+
+Included Drivers
+----------------
+
+An HCI based driver for an NXP PN544, connected through I2C bus, and using
+shdlc is included.
+
+Execution Contexts
+------------------
+
+The execution contexts are the following:
+- IRQ handler (IRQH):
+fast, cannot sleep. sends incoming frames to HCI where they are passed to
+the current llc. In case of shdlc, the frame is queued in shdlc rx queue.
+
+- SHDLC State Machine worker (SMW)
+
+  Only when llc_shdlc is used: handles shdlc rx & tx queues.
+
+  Dispatches HCI cmd responses.
+
+- HCI Tx Cmd worker (MSGTXWQ)
+
+  Serializes execution of HCI commands.
+
+  Completes execution in case of response timeout.
+
+- HCI Rx worker (MSGRXWQ)
+
+  Dispatches incoming HCI commands or events.
+
+- Syscall context from a userspace call (SYSCALL)
+
+  Any entrypoint in HCI called from NFC Core
+
+Workflow executing an HCI command (using shdlc)
+-----------------------------------------------
+
+Executing an HCI command can easily be performed synchronously using the
+following API::
+
+  int nfc_hci_send_cmd (struct nfc_hci_dev *hdev, u8 gate, u8 cmd,
+			const u8 *param, size_t param_len, struct sk_buff **skb)
+
+The API must be invoked from a context that can sleep. Most of the time, this
+will be the syscall context. skb will return the result that was received in
+the response.
+
+Internally, execution is asynchronous. So all this API does is to enqueue the
+HCI command, setup a local wait queue on stack, and wait_event() for completion.
+The wait is not interruptible because it is guaranteed that the command will
+complete after some short timeout anyway.
+
+MSGTXWQ context will then be scheduled and invoke nfc_hci_msg_tx_work().
+This function will dequeue the next pending command and send its HCP fragments
+to the lower layer which happens to be shdlc. It will then start a timer to be
+able to complete the command with a timeout error if no response arrive.
+
+SMW context gets scheduled and invokes nfc_shdlc_sm_work(). This function
+handles shdlc framing in and out. It uses the driver xmit to send frames and
+receives incoming frames in an skb queue filled from the driver IRQ handler.
+SHDLC I(nformation) frames payload are HCP fragments. They are aggregated to
+form complete HCI frames, which can be a response, command, or event.
+
+HCI Responses are dispatched immediately from this context to unblock
+waiting command execution. Response processing involves invoking the completion
+callback that was provided by nfc_hci_msg_tx_work() when it sent the command.
+The completion callback will then wake the syscall context.
+
+It is also possible to execute the command asynchronously using this API::
+
+  static int nfc_hci_execute_cmd_async(struct nfc_hci_dev *hdev, u8 pipe, u8 cmd,
+				       const u8 *param, size_t param_len,
+				       data_exchange_cb_t cb, void *cb_context)
+
+The workflow is the same, except that the API call returns immediately, and
+the callback will be called with the result from the SMW context.
+
+Workflow receiving an HCI event or command
+------------------------------------------
+
+HCI commands or events are not dispatched from SMW context. Instead, they are
+queued to HCI rx_queue and will be dispatched from HCI rx worker
+context (MSGRXWQ). This is done this way to allow a cmd or event handler
+to also execute other commands (for example, handling the
+NFC_HCI_EVT_TARGET_DISCOVERED event from PN544 requires to issue an
+ANY_GET_PARAMETER to the reader A gate to get information on the target
+that was discovered).
+
+Typically, such an event will be propagated to NFC Core from MSGRXWQ context.
+
+Error management
+----------------
+
+Errors that occur synchronously with the execution of an NFC Core request are
+simply returned as the execution result of the request. These are easy.
+
+Errors that occur asynchronously (e.g. in a background protocol handling thread)
+must be reported such that upper layers don't stay ignorant that something
+went wrong below and know that expected events will probably never happen.
+Handling of these errors is done as follows:
+
+- driver (pn544) fails to deliver an incoming frame: it stores the error such
+  that any subsequent call to the driver will result in this error. Then it
+  calls the standard nfc_shdlc_recv_frame() with a NULL argument to report the
+  problem above. shdlc stores a EREMOTEIO sticky status, which will trigger
+  SMW to report above in turn.
+
+- SMW is basically a background thread to handle incoming and outgoing shdlc
+  frames. This thread will also check the shdlc sticky status and report to HCI
+  when it discovers it is not able to run anymore because of an unrecoverable
+  error that happened within shdlc or below. If the problem occurs during shdlc
+  connection, the error is reported through the connect completion.
+
+- HCI: if an internal HCI error happens (frame is lost), or HCI is reported an
+  error from a lower layer, HCI will either complete the currently executing
+  command with that error, or notify NFC Core directly if no command is
+  executing.
+
+- NFC Core: when NFC Core is notified of an error from below and polling is
+  active, it will send a tag discovered event with an empty tag list to the user
+  space to let it know that the poll operation will never be able to detect a
+  tag. If polling is not active and the error was sticky, lower levels will
+  return it at next invocation.
diff --git a/Documentation/driver-api/nfc/nfc-pn544.rst b/Documentation/driver-api/nfc/nfc-pn544.rst
new file mode 100644
index 000000000000..6b2d8aae0c4e
--- /dev/null
+++ b/Documentation/driver-api/nfc/nfc-pn544.rst
@@ -0,0 +1,34 @@
+============================================================================
+Kernel driver for the NXP Semiconductors PN544 Near Field Communication chip
+============================================================================
+
+
+General
+-------
+
+The PN544 is an integrated transmission module for contactless
+communication. The driver goes under drives/nfc/ and is compiled as a
+module named "pn544".
+
+Host Interfaces: I2C, SPI and HSU, this driver supports currently only I2C.
+
+Protocols
+---------
+
+In the normal (HCI) mode and in the firmware update mode read and
+write functions behave a bit differently because the message formats
+or the protocols are different.
+
+In the normal (HCI) mode the protocol used is derived from the ETSI
+HCI specification. The firmware is updated using a specific protocol,
+which is different from HCI.
+
+HCI messages consist of an eight bit header and the message body. The
+header contains the message length. Maximum size for an HCI message is
+33. In HCI mode sent messages are tested for a correct
+checksum. Firmware update messages have the length in the second (MSB)
+and third (LSB) bytes of the message. The maximum FW message length is
+1024 bytes.
+
+For the ETSI HCI specification see
+http://www.etsi.org/WebSite/Technologies/ProtocolSpecification.aspx
diff --git a/Documentation/nfc/index.rst b/Documentation/nfc/index.rst
deleted file mode 100644
index 4f4947fce80d..000000000000
--- a/Documentation/nfc/index.rst
+++ /dev/null
@@ -1,11 +0,0 @@
-:orphan:
-
-========================
-Near Field Communication
-========================
-
-.. toctree::
-   :maxdepth: 1
-
-   nfc-hci
-   nfc-pn544
diff --git a/Documentation/nfc/nfc-hci.rst b/Documentation/nfc/nfc-hci.rst
deleted file mode 100644
index eb8a1a14e919..000000000000
--- a/Documentation/nfc/nfc-hci.rst
+++ /dev/null
@@ -1,311 +0,0 @@
-========================
-HCI backend for NFC Core
-========================
-
-- Author: Eric Lapuyade, Samuel Ortiz
-- Contact: eric.lapuyade@intel.com, samuel.ortiz@intel.com
-
-General
--------
-
-The HCI layer implements much of the ETSI TS 102 622 V10.2.0 specification. It
-enables easy writing of HCI-based NFC drivers. The HCI layer runs as an NFC Core
-backend, implementing an abstract nfc device and translating NFC Core API
-to HCI commands and events.
-
-HCI
----
-
-HCI registers as an nfc device with NFC Core. Requests coming from userspace are
-routed through netlink sockets to NFC Core and then to HCI. From this point,
-they are translated in a sequence of HCI commands sent to the HCI layer in the
-host controller (the chip). Commands can be executed synchronously (the sending
-context blocks waiting for response) or asynchronously (the response is returned
-from HCI Rx context).
-HCI events can also be received from the host controller. They will be handled
-and a translation will be forwarded to NFC Core as needed. There are hooks to
-let the HCI driver handle proprietary events or override standard behavior.
-HCI uses 2 execution contexts:
-
-- one for executing commands : nfc_hci_msg_tx_work(). Only one command
-  can be executing at any given moment.
-- one for dispatching received events and commands : nfc_hci_msg_rx_work().
-
-HCI Session initialization
---------------------------
-
-The Session initialization is an HCI standard which must unfortunately
-support proprietary gates. This is the reason why the driver will pass a list
-of proprietary gates that must be part of the session. HCI will ensure all
-those gates have pipes connected when the hci device is set up.
-In case the chip supports pre-opened gates and pseudo-static pipes, the driver
-can pass that information to HCI core.
-
-HCI Gates and Pipes
--------------------
-
-A gate defines the 'port' where some service can be found. In order to access
-a service, one must create a pipe to that gate and open it. In this
-implementation, pipes are totally hidden. The public API only knows gates.
-This is consistent with the driver need to send commands to proprietary gates
-without knowing the pipe connected to it.
-
-Driver interface
-----------------
-
-A driver is generally written in two parts : the physical link management and
-the HCI management. This makes it easier to maintain a driver for a chip that
-can be connected using various phy (i2c, spi, ...)
-
-HCI Management
---------------
-
-A driver would normally register itself with HCI and provide the following
-entry points::
-
-  struct nfc_hci_ops {
-	int (*open)(struct nfc_hci_dev *hdev);
-	void (*close)(struct nfc_hci_dev *hdev);
-	int (*hci_ready) (struct nfc_hci_dev *hdev);
-	int (*xmit) (struct nfc_hci_dev *hdev, struct sk_buff *skb);
-	int (*start_poll) (struct nfc_hci_dev *hdev,
-			   u32 im_protocols, u32 tm_protocols);
-	int (*dep_link_up)(struct nfc_hci_dev *hdev, struct nfc_target *target,
-			   u8 comm_mode, u8 *gb, size_t gb_len);
-	int (*dep_link_down)(struct nfc_hci_dev *hdev);
-	int (*target_from_gate) (struct nfc_hci_dev *hdev, u8 gate,
-				 struct nfc_target *target);
-	int (*complete_target_discovered) (struct nfc_hci_dev *hdev, u8 gate,
-					   struct nfc_target *target);
-	int (*im_transceive) (struct nfc_hci_dev *hdev,
-			      struct nfc_target *target, struct sk_buff *skb,
-			      data_exchange_cb_t cb, void *cb_context);
-	int (*tm_send)(struct nfc_hci_dev *hdev, struct sk_buff *skb);
-	int (*check_presence)(struct nfc_hci_dev *hdev,
-			      struct nfc_target *target);
-	int (*event_received)(struct nfc_hci_dev *hdev, u8 gate, u8 event,
-			      struct sk_buff *skb);
-  };
-
-- open() and close() shall turn the hardware on and off.
-- hci_ready() is an optional entry point that is called right after the hci
-  session has been set up. The driver can use it to do additional initialization
-  that must be performed using HCI commands.
-- xmit() shall simply write a frame to the physical link.
-- start_poll() is an optional entrypoint that shall set the hardware in polling
-  mode. This must be implemented only if the hardware uses proprietary gates or a
-  mechanism slightly different from the HCI standard.
-- dep_link_up() is called after a p2p target has been detected, to finish
-  the p2p connection setup with hardware parameters that need to be passed back
-  to nfc core.
-- dep_link_down() is called to bring the p2p link down.
-- target_from_gate() is an optional entrypoint to return the nfc protocols
-  corresponding to a proprietary gate.
-- complete_target_discovered() is an optional entry point to let the driver
-  perform additional proprietary processing necessary to auto activate the
-  discovered target.
-- im_transceive() must be implemented by the driver if proprietary HCI commands
-  are required to send data to the tag. Some tag types will require custom
-  commands, others can be written to using the standard HCI commands. The driver
-  can check the tag type and either do proprietary processing, or return 1 to ask
-  for standard processing. The data exchange command itself must be sent
-  asynchronously.
-- tm_send() is called to send data in the case of a p2p connection
-- check_presence() is an optional entry point that will be called regularly
-  by the core to check that an activated tag is still in the field. If this is
-  not implemented, the core will not be able to push tag_lost events to the user
-  space
-- event_received() is called to handle an event coming from the chip. Driver
-  can handle the event or return 1 to let HCI attempt standard processing.
-
-On the rx path, the driver is responsible to push incoming HCP frames to HCI
-using nfc_hci_recv_frame(). HCI will take care of re-aggregation and handling
-This must be done from a context that can sleep.
-
-PHY Management
---------------
-
-The physical link (i2c, ...) management is defined by the following structure::
-
-  struct nfc_phy_ops {
-	int (*write)(void *dev_id, struct sk_buff *skb);
-	int (*enable)(void *dev_id);
-	void (*disable)(void *dev_id);
-  };
-
-enable():
-	turn the phy on (power on), make it ready to transfer data
-disable():
-	turn the phy off
-write():
-	Send a data frame to the chip. Note that to enable higher
-	layers such as an llc to store the frame for re-emission, this
-	function must not alter the skb. It must also not return a positive
-	result (return 0 for success, negative for failure).
-
-Data coming from the chip shall be sent directly to nfc_hci_recv_frame().
-
-LLC
----
-
-Communication between the CPU and the chip often requires some link layer
-protocol. Those are isolated as modules managed by the HCI layer. There are
-currently two modules : nop (raw transfert) and shdlc.
-A new llc must implement the following functions::
-
-  struct nfc_llc_ops {
-	void *(*init) (struct nfc_hci_dev *hdev, xmit_to_drv_t xmit_to_drv,
-		       rcv_to_hci_t rcv_to_hci, int tx_headroom,
-		       int tx_tailroom, int *rx_headroom, int *rx_tailroom,
-		       llc_failure_t llc_failure);
-	void (*deinit) (struct nfc_llc *llc);
-	int (*start) (struct nfc_llc *llc);
-	int (*stop) (struct nfc_llc *llc);
-	void (*rcv_from_drv) (struct nfc_llc *llc, struct sk_buff *skb);
-	int (*xmit_from_hci) (struct nfc_llc *llc, struct sk_buff *skb);
-  };
-
-init():
-	allocate and init your private storage
-deinit():
-	cleanup
-start():
-	establish the logical connection
-stop ():
-	terminate the logical connection
-rcv_from_drv():
-	handle data coming from the chip, going to HCI
-xmit_from_hci():
-	handle data sent by HCI, going to the chip
-
-The llc must be registered with nfc before it can be used. Do that by
-calling::
-
-	nfc_llc_register(const char *name, struct nfc_llc_ops *ops);
-
-Again, note that the llc does not handle the physical link. It is thus very
-easy to mix any physical link with any llc for a given chip driver.
-
-Included Drivers
-----------------
-
-An HCI based driver for an NXP PN544, connected through I2C bus, and using
-shdlc is included.
-
-Execution Contexts
-------------------
-
-The execution contexts are the following:
-- IRQ handler (IRQH):
-fast, cannot sleep. sends incoming frames to HCI where they are passed to
-the current llc. In case of shdlc, the frame is queued in shdlc rx queue.
-
-- SHDLC State Machine worker (SMW)
-
-  Only when llc_shdlc is used: handles shdlc rx & tx queues.
-
-  Dispatches HCI cmd responses.
-
-- HCI Tx Cmd worker (MSGTXWQ)
-
-  Serializes execution of HCI commands.
-
-  Completes execution in case of response timeout.
-
-- HCI Rx worker (MSGRXWQ)
-
-  Dispatches incoming HCI commands or events.
-
-- Syscall context from a userspace call (SYSCALL)
-
-  Any entrypoint in HCI called from NFC Core
-
-Workflow executing an HCI command (using shdlc)
------------------------------------------------
-
-Executing an HCI command can easily be performed synchronously using the
-following API::
-
-  int nfc_hci_send_cmd (struct nfc_hci_dev *hdev, u8 gate, u8 cmd,
-			const u8 *param, size_t param_len, struct sk_buff **skb)
-
-The API must be invoked from a context that can sleep. Most of the time, this
-will be the syscall context. skb will return the result that was received in
-the response.
-
-Internally, execution is asynchronous. So all this API does is to enqueue the
-HCI command, setup a local wait queue on stack, and wait_event() for completion.
-The wait is not interruptible because it is guaranteed that the command will
-complete after some short timeout anyway.
-
-MSGTXWQ context will then be scheduled and invoke nfc_hci_msg_tx_work().
-This function will dequeue the next pending command and send its HCP fragments
-to the lower layer which happens to be shdlc. It will then start a timer to be
-able to complete the command with a timeout error if no response arrive.
-
-SMW context gets scheduled and invokes nfc_shdlc_sm_work(). This function
-handles shdlc framing in and out. It uses the driver xmit to send frames and
-receives incoming frames in an skb queue filled from the driver IRQ handler.
-SHDLC I(nformation) frames payload are HCP fragments. They are aggregated to
-form complete HCI frames, which can be a response, command, or event.
-
-HCI Responses are dispatched immediately from this context to unblock
-waiting command execution. Response processing involves invoking the completion
-callback that was provided by nfc_hci_msg_tx_work() when it sent the command.
-The completion callback will then wake the syscall context.
-
-It is also possible to execute the command asynchronously using this API::
-
-  static int nfc_hci_execute_cmd_async(struct nfc_hci_dev *hdev, u8 pipe, u8 cmd,
-				       const u8 *param, size_t param_len,
-				       data_exchange_cb_t cb, void *cb_context)
-
-The workflow is the same, except that the API call returns immediately, and
-the callback will be called with the result from the SMW context.
-
-Workflow receiving an HCI event or command
-------------------------------------------
-
-HCI commands or events are not dispatched from SMW context. Instead, they are
-queued to HCI rx_queue and will be dispatched from HCI rx worker
-context (MSGRXWQ). This is done this way to allow a cmd or event handler
-to also execute other commands (for example, handling the
-NFC_HCI_EVT_TARGET_DISCOVERED event from PN544 requires to issue an
-ANY_GET_PARAMETER to the reader A gate to get information on the target
-that was discovered).
-
-Typically, such an event will be propagated to NFC Core from MSGRXWQ context.
-
-Error management
-----------------
-
-Errors that occur synchronously with the execution of an NFC Core request are
-simply returned as the execution result of the request. These are easy.
-
-Errors that occur asynchronously (e.g. in a background protocol handling thread)
-must be reported such that upper layers don't stay ignorant that something
-went wrong below and know that expected events will probably never happen.
-Handling of these errors is done as follows:
-
-- driver (pn544) fails to deliver an incoming frame: it stores the error such
-  that any subsequent call to the driver will result in this error. Then it
-  calls the standard nfc_shdlc_recv_frame() with a NULL argument to report the
-  problem above. shdlc stores a EREMOTEIO sticky status, which will trigger
-  SMW to report above in turn.
-
-- SMW is basically a background thread to handle incoming and outgoing shdlc
-  frames. This thread will also check the shdlc sticky status and report to HCI
-  when it discovers it is not able to run anymore because of an unrecoverable
-  error that happened within shdlc or below. If the problem occurs during shdlc
-  connection, the error is reported through the connect completion.
-
-- HCI: if an internal HCI error happens (frame is lost), or HCI is reported an
-  error from a lower layer, HCI will either complete the currently executing
-  command with that error, or notify NFC Core directly if no command is
-  executing.
-
-- NFC Core: when NFC Core is notified of an error from below and polling is
-  active, it will send a tag discovered event with an empty tag list to the user
-  space to let it know that the poll operation will never be able to detect a
-  tag. If polling is not active and the error was sticky, lower levels will
-  return it at next invocation.
diff --git a/Documentation/nfc/nfc-pn544.rst b/Documentation/nfc/nfc-pn544.rst
deleted file mode 100644
index 6b2d8aae0c4e..000000000000
--- a/Documentation/nfc/nfc-pn544.rst
+++ /dev/null
@@ -1,34 +0,0 @@
-============================================================================
-Kernel driver for the NXP Semiconductors PN544 Near Field Communication chip
-============================================================================
-
-
-General
--------
-
-The PN544 is an integrated transmission module for contactless
-communication. The driver goes under drives/nfc/ and is compiled as a
-module named "pn544".
-
-Host Interfaces: I2C, SPI and HSU, this driver supports currently only I2C.
-
-Protocols
----------
-
-In the normal (HCI) mode and in the firmware update mode read and
-write functions behave a bit differently because the message formats
-or the protocols are different.
-
-In the normal (HCI) mode the protocol used is derived from the ETSI
-HCI specification. The firmware is updated using a specific protocol,
-which is different from HCI.
-
-HCI messages consist of an eight bit header and the message body. The
-header contains the message length. Maximum size for an HCI message is
-33. In HCI mode sent messages are tested for a correct
-checksum. Firmware update messages have the length in the second (MSB)
-and third (LSB) bytes of the message. The maximum FW message length is
-1024 bytes.
-
-For the ETSI HCI specification see
-http://www.etsi.org/WebSite/Technologies/ProtocolSpecification.aspx
-- 
cgit v1.2.3-55-g7522


From 19024c09c243c5107f738286459a0dd85697b089 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:48:15 -0300
Subject: docs: mmc: move it to the driver-api

Most of the stuff here is related to the kAPI.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst             |  1 +
 Documentation/driver-api/mmc/index.rst         | 11 +++
 Documentation/driver-api/mmc/mmc-async-req.rst | 98 ++++++++++++++++++++++++++
 Documentation/driver-api/mmc/mmc-dev-attrs.rst | 91 ++++++++++++++++++++++++
 Documentation/driver-api/mmc/mmc-dev-parts.rst | 41 +++++++++++
 Documentation/driver-api/mmc/mmc-tools.rst     | 37 ++++++++++
 Documentation/mmc/index.rst                    | 13 ----
 Documentation/mmc/mmc-async-req.rst            | 98 --------------------------
 Documentation/mmc/mmc-dev-attrs.rst            | 91 ------------------------
 Documentation/mmc/mmc-dev-parts.rst            | 41 -----------
 Documentation/mmc/mmc-tools.rst                | 37 ----------
 11 files changed, 279 insertions(+), 280 deletions(-)
 create mode 100644 Documentation/driver-api/mmc/index.rst
 create mode 100644 Documentation/driver-api/mmc/mmc-async-req.rst
 create mode 100644 Documentation/driver-api/mmc/mmc-dev-attrs.rst
 create mode 100644 Documentation/driver-api/mmc/mmc-dev-parts.rst
 create mode 100644 Documentation/driver-api/mmc/mmc-tools.rst
 delete mode 100644 Documentation/mmc/index.rst
 delete mode 100644 Documentation/mmc/mmc-async-req.rst
 delete mode 100644 Documentation/mmc/mmc-dev-attrs.rst
 delete mode 100644 Documentation/mmc/mmc-dev-parts.rst
 delete mode 100644 Documentation/mmc/mmc-tools.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index d6bf4a37cefe..25f85d3021aa 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -45,6 +45,7 @@ available subsections can be seen below.
    miscellaneous
    mei/index
    mtd/index
+   mmc/index
    nvdimm/index
    w1
    rapidio/index
diff --git a/Documentation/driver-api/mmc/index.rst b/Documentation/driver-api/mmc/index.rst
new file mode 100644
index 000000000000..9aaf64951a8c
--- /dev/null
+++ b/Documentation/driver-api/mmc/index.rst
@@ -0,0 +1,11 @@
+========================
+MMC/SD/SDIO card support
+========================
+
+.. toctree::
+   :maxdepth: 1
+
+   mmc-dev-attrs
+   mmc-dev-parts
+   mmc-async-req
+   mmc-tools
diff --git a/Documentation/driver-api/mmc/mmc-async-req.rst b/Documentation/driver-api/mmc/mmc-async-req.rst
new file mode 100644
index 000000000000..0f7197c9c3b5
--- /dev/null
+++ b/Documentation/driver-api/mmc/mmc-async-req.rst
@@ -0,0 +1,98 @@
+========================
+MMC Asynchronous Request
+========================
+
+Rationale
+=========
+
+How significant is the cache maintenance overhead?
+
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+
+MMC block driver
+================
+
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is the
+more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+
+Details on measurements from IOZone and mmc_test
+================================================
+
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+
+MMC core API extension
+======================
+
+There is one new public function mmc_start_req().
+
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is an ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+
+MMC host extensions
+===================
+
+There are two optional members in the mmc_host_ops -- pre_req() and
+post_req() -- that the host driver may implement in order to move work
+to before and after the actual mmc_host_ops.request() function is called.
+
+In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
+descriptor, and post_req() runs the dma_unmap_sg().
+
+Optimize for the first request
+==============================
+
+The first request in a series of requests can't be prepared in parallel
+with the previous transfer, since there is no previous request.
+
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+
+Pseudocode to handle is_first_req scenario with minimal prepare overhead::
+
+  if (is_first_req && req->size > threshold)
+     /* start MMC transfer for the complete transfer size */
+     mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+
+     /*
+      * Begin to prepare DMA while cmd is being processed by MMC.
+      * The first chunk of the request should take the same time
+      * to prepare as the "MMC process command time".
+      * If prepare time exceeds MMC cmd time
+      * the transfer is delayed, guesstimate max 4k as first chunk size.
+      */
+      prepare_1st_chunk_for_dma(req);
+      /* flush pending desc to the DMAC (dmaengine.h) */
+      dma_issue_pending(req->dma_desc);
+
+      prepare_2nd_chunk_for_dma(req);
+      /*
+       * The second issue_pending should be called before MMC runs out
+       * of the first chunk. If the MMC runs out of the first data chunk
+       * before this call, the transfer is delayed.
+       */
+      dma_issue_pending(req->dma_desc);
diff --git a/Documentation/driver-api/mmc/mmc-dev-attrs.rst b/Documentation/driver-api/mmc/mmc-dev-attrs.rst
new file mode 100644
index 000000000000..4f44b1b730d6
--- /dev/null
+++ b/Documentation/driver-api/mmc/mmc-dev-attrs.rst
@@ -0,0 +1,91 @@
+==================================
+SD and MMC Block Device Attributes
+==================================
+
+These attributes are defined for the block devices associated with the
+SD or MMC device.
+
+The following attributes are read/write.
+
+	========		===============================================
+	force_ro		Enforce read-only access even if write protect 					switch is off.
+	========		===============================================
+
+SD and MMC Device Attributes
+============================
+
+All attributes are read-only.
+
+	======================	===============================================
+	cid			Card Identification Register
+	csd			Card Specific Data Register
+	scr			SD Card Configuration Register (SD only)
+	date			Manufacturing Date (from CID Register)
+	fwrev			Firmware/Product Revision (from CID Register)
+				(SD and MMCv1 only)
+	hwrev			Hardware/Product Revision (from CID Register)
+				(SD and MMCv1 only)
+	manfid			Manufacturer ID (from CID Register)
+	name			Product Name (from CID Register)
+	oemid			OEM/Application ID (from CID Register)
+	prv			Product Revision (from CID Register)
+				(SD and MMCv4 only)
+	serial			Product Serial Number (from CID Register)
+	erase_size		Erase group size
+	preferred_erase_size	Preferred erase size
+	raw_rpmb_size_mult	RPMB partition size
+	rel_sectors		Reliable write sector count
+	ocr 			Operation Conditions Register
+	dsr			Driver Stage Register
+	cmdq_en			Command Queue enabled:
+
+					1 => enabled, 0 => not enabled
+	======================	===============================================
+
+Note on Erase Size and Preferred Erase Size:
+
+	"erase_size" is the  minimum size, in bytes, of an erase
+	operation.  For MMC, "erase_size" is the erase group size
+	reported by the card.  Note that "erase_size" does not apply
+	to trim or secure trim operations where the minimum size is
+	always one 512 byte sector.  For SD, "erase_size" is 512
+	if the card is block-addressed, 0 otherwise.
+
+	SD/MMC cards can erase an arbitrarily large area up to and
+	including the whole card.  When erasing a large area it may
+	be desirable to do it in smaller chunks for three reasons:
+
+	     1. A single erase command will make all other I/O on
+		the card wait.  This is not a problem if the whole card
+		is being erased, but erasing one partition will make
+		I/O for another partition on the same card wait for the
+		duration of the erase - which could be a several
+		minutes.
+	     2. To be able to inform the user of erase progress.
+	     3. The erase timeout becomes too large to be very
+		useful.  Because the erase timeout contains a margin
+		which is multiplied by the size of the erase area,
+		the value can end up being several minutes for large
+		areas.
+
+	"erase_size" is not the most efficient unit to erase
+	(especially for SD where it is just one sector),
+	hence "preferred_erase_size" provides a good chunk
+	size for erasing large areas.
+
+	For MMC, "preferred_erase_size" is the high-capacity
+	erase size if a card specifies one, otherwise it is
+	based on the capacity of the card.
+
+	For SD, "preferred_erase_size" is the allocation unit
+	size specified by the card.
+
+	"preferred_erase_size" is in bytes.
+
+Note on raw_rpmb_size_mult:
+
+	"raw_rpmb_size_mult" is a multiple of 128kB block.
+
+	RPMB size in byte is calculated by using the following equation:
+
+		RPMB partition size = 128kB x raw_rpmb_size_mult
diff --git a/Documentation/driver-api/mmc/mmc-dev-parts.rst b/Documentation/driver-api/mmc/mmc-dev-parts.rst
new file mode 100644
index 000000000000..995922f1f744
--- /dev/null
+++ b/Documentation/driver-api/mmc/mmc-dev-parts.rst
@@ -0,0 +1,41 @@
+============================
+SD and MMC Device Partitions
+============================
+
+Device partitions are additional logical block devices present on the
+SD/MMC device.
+
+As of this writing, MMC boot partitions as supported and exposed as
+/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the
+parent /dev/mmcblkX.
+
+MMC Boot Partitions
+===================
+
+Read and write access is provided to the two MMC boot partitions. Due to
+the sensitive nature of the boot partition contents, which often store
+a bootloader or bootloader configuration tables crucial to booting the
+platform, write access is disabled by default to reduce the chance of
+accidental bricking.
+
+To enable write access to /dev/mmcblkXbootY, disable the forced read-only
+access with::
+
+	echo 0 > /sys/block/mmcblkXbootY/force_ro
+
+To re-enable read-only access::
+
+	echo 1 > /sys/block/mmcblkXbootY/force_ro
+
+The boot partitions can also be locked read only until the next power on,
+with::
+
+	echo 1 > /sys/block/mmcblkXbootY/ro_lock_until_next_power_on
+
+This is a feature of the card and not of the kernel. If the card does
+not support boot partition locking, the file will not exist. If the
+feature has been disabled on the card, the file will be read-only.
+
+The boot partitions can also be locked permanently, but this feature is
+not accessible through sysfs in order to avoid accidental or malicious
+bricking.
diff --git a/Documentation/driver-api/mmc/mmc-tools.rst b/Documentation/driver-api/mmc/mmc-tools.rst
new file mode 100644
index 000000000000..54406093768b
--- /dev/null
+++ b/Documentation/driver-api/mmc/mmc-tools.rst
@@ -0,0 +1,37 @@
+======================
+MMC tools introduction
+======================
+
+There is one MMC test tools called mmc-utils, which is maintained by Chris Ball,
+you can find it at the below public git repository:
+
+	http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc-utils.git/
+
+Functions
+=========
+
+The mmc-utils tools can do the following:
+
+ - Print and parse extcsd data.
+ - Determine the eMMC writeprotect status.
+ - Set the eMMC writeprotect status.
+ - Set the eMMC data sector size to 4KB by disabling emulation.
+ - Create general purpose partition.
+ - Enable the enhanced user area.
+ - Enable write reliability per partition.
+ - Print the response to STATUS_SEND (CMD13).
+ - Enable the boot partition.
+ - Set Boot Bus Conditions.
+ - Enable the eMMC BKOPS feature.
+ - Permanently enable the eMMC H/W Reset feature.
+ - Permanently disable the eMMC H/W Reset feature.
+ - Send Sanitize command.
+ - Program authentication key for the device.
+ - Counter value for the rpmb device will be read to stdout.
+ - Read from rpmb device to output.
+ - Write to rpmb device from data file.
+ - Enable the eMMC cache feature.
+ - Disable the eMMC cache feature.
+ - Print and parse CID data.
+ - Print and parse CSD data.
+ - Print and parse SCR data.
diff --git a/Documentation/mmc/index.rst b/Documentation/mmc/index.rst
deleted file mode 100644
index 3305478ddadb..000000000000
--- a/Documentation/mmc/index.rst
+++ /dev/null
@@ -1,13 +0,0 @@
-:orphan:
-
-========================
-MMC/SD/SDIO card support
-========================
-
-.. toctree::
-   :maxdepth: 1
-
-   mmc-dev-attrs
-   mmc-dev-parts
-   mmc-async-req
-   mmc-tools
diff --git a/Documentation/mmc/mmc-async-req.rst b/Documentation/mmc/mmc-async-req.rst
deleted file mode 100644
index 0f7197c9c3b5..000000000000
--- a/Documentation/mmc/mmc-async-req.rst
+++ /dev/null
@@ -1,98 +0,0 @@
-========================
-MMC Asynchronous Request
-========================
-
-Rationale
-=========
-
-How significant is the cache maintenance overhead?
-
-It depends. Fast eMMC and multiple cache levels with speculative cache
-pre-fetch makes the cache overhead relatively significant. If the DMA
-preparations for the next request are done in parallel with the current
-transfer, the DMA preparation overhead would not affect the MMC performance.
-
-The intention of non-blocking (asynchronous) MMC requests is to minimize the
-time between when an MMC request ends and another MMC request begins.
-
-Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
-dma_unmap_sg are processing. Using non-blocking MMC requests makes it
-possible to prepare the caches for next job in parallel with an active
-MMC request.
-
-MMC block driver
-================
-
-The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
-
-The increase in throughput is proportional to the time it takes to
-prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
-a request and how fast the memory is. The faster the MMC/SD is the
-more significant the prepare request time becomes. Roughly the expected
-performance gain is 5% for large writes and 10% on large reads on a L2 cache
-platform. In power save mode, when clocks run on a lower frequency, the DMA
-preparation may cost even more. As long as these slower preparations are run
-in parallel with the transfer performance won't be affected.
-
-Details on measurements from IOZone and mmc_test
-================================================
-
-https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
-
-MMC core API extension
-======================
-
-There is one new public function mmc_start_req().
-
-It starts a new MMC command request for a host. The function isn't
-truly non-blocking. If there is an ongoing async request it waits
-for completion of that request and starts the new one and returns. It
-doesn't wait for the new request to complete. If there is no ongoing
-request it starts the new request and returns immediately.
-
-MMC host extensions
-===================
-
-There are two optional members in the mmc_host_ops -- pre_req() and
-post_req() -- that the host driver may implement in order to move work
-to before and after the actual mmc_host_ops.request() function is called.
-
-In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
-descriptor, and post_req() runs the dma_unmap_sg().
-
-Optimize for the first request
-==============================
-
-The first request in a series of requests can't be prepared in parallel
-with the previous transfer, since there is no previous request.
-
-The argument is_first_req in pre_req() indicates that there is no previous
-request. The host driver may optimize for this scenario to minimize
-the performance loss. A way to optimize for this is to split the current
-request in two chunks, prepare the first chunk and start the request,
-and finally prepare the second chunk and start the transfer.
-
-Pseudocode to handle is_first_req scenario with minimal prepare overhead::
-
-  if (is_first_req && req->size > threshold)
-     /* start MMC transfer for the complete transfer size */
-     mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
-
-     /*
-      * Begin to prepare DMA while cmd is being processed by MMC.
-      * The first chunk of the request should take the same time
-      * to prepare as the "MMC process command time".
-      * If prepare time exceeds MMC cmd time
-      * the transfer is delayed, guesstimate max 4k as first chunk size.
-      */
-      prepare_1st_chunk_for_dma(req);
-      /* flush pending desc to the DMAC (dmaengine.h) */
-      dma_issue_pending(req->dma_desc);
-
-      prepare_2nd_chunk_for_dma(req);
-      /*
-       * The second issue_pending should be called before MMC runs out
-       * of the first chunk. If the MMC runs out of the first data chunk
-       * before this call, the transfer is delayed.
-       */
-      dma_issue_pending(req->dma_desc);
diff --git a/Documentation/mmc/mmc-dev-attrs.rst b/Documentation/mmc/mmc-dev-attrs.rst
deleted file mode 100644
index 4f44b1b730d6..000000000000
--- a/Documentation/mmc/mmc-dev-attrs.rst
+++ /dev/null
@@ -1,91 +0,0 @@
-==================================
-SD and MMC Block Device Attributes
-==================================
-
-These attributes are defined for the block devices associated with the
-SD or MMC device.
-
-The following attributes are read/write.
-
-	========		===============================================
-	force_ro		Enforce read-only access even if write protect 					switch is off.
-	========		===============================================
-
-SD and MMC Device Attributes
-============================
-
-All attributes are read-only.
-
-	======================	===============================================
-	cid			Card Identification Register
-	csd			Card Specific Data Register
-	scr			SD Card Configuration Register (SD only)
-	date			Manufacturing Date (from CID Register)
-	fwrev			Firmware/Product Revision (from CID Register)
-				(SD and MMCv1 only)
-	hwrev			Hardware/Product Revision (from CID Register)
-				(SD and MMCv1 only)
-	manfid			Manufacturer ID (from CID Register)
-	name			Product Name (from CID Register)
-	oemid			OEM/Application ID (from CID Register)
-	prv			Product Revision (from CID Register)
-				(SD and MMCv4 only)
-	serial			Product Serial Number (from CID Register)
-	erase_size		Erase group size
-	preferred_erase_size	Preferred erase size
-	raw_rpmb_size_mult	RPMB partition size
-	rel_sectors		Reliable write sector count
-	ocr 			Operation Conditions Register
-	dsr			Driver Stage Register
-	cmdq_en			Command Queue enabled:
-
-					1 => enabled, 0 => not enabled
-	======================	===============================================
-
-Note on Erase Size and Preferred Erase Size:
-
-	"erase_size" is the  minimum size, in bytes, of an erase
-	operation.  For MMC, "erase_size" is the erase group size
-	reported by the card.  Note that "erase_size" does not apply
-	to trim or secure trim operations where the minimum size is
-	always one 512 byte sector.  For SD, "erase_size" is 512
-	if the card is block-addressed, 0 otherwise.
-
-	SD/MMC cards can erase an arbitrarily large area up to and
-	including the whole card.  When erasing a large area it may
-	be desirable to do it in smaller chunks for three reasons:
-
-	     1. A single erase command will make all other I/O on
-		the card wait.  This is not a problem if the whole card
-		is being erased, but erasing one partition will make
-		I/O for another partition on the same card wait for the
-		duration of the erase - which could be a several
-		minutes.
-	     2. To be able to inform the user of erase progress.
-	     3. The erase timeout becomes too large to be very
-		useful.  Because the erase timeout contains a margin
-		which is multiplied by the size of the erase area,
-		the value can end up being several minutes for large
-		areas.
-
-	"erase_size" is not the most efficient unit to erase
-	(especially for SD where it is just one sector),
-	hence "preferred_erase_size" provides a good chunk
-	size for erasing large areas.
-
-	For MMC, "preferred_erase_size" is the high-capacity
-	erase size if a card specifies one, otherwise it is
-	based on the capacity of the card.
-
-	For SD, "preferred_erase_size" is the allocation unit
-	size specified by the card.
-
-	"preferred_erase_size" is in bytes.
-
-Note on raw_rpmb_size_mult:
-
-	"raw_rpmb_size_mult" is a multiple of 128kB block.
-
-	RPMB size in byte is calculated by using the following equation:
-
-		RPMB partition size = 128kB x raw_rpmb_size_mult
diff --git a/Documentation/mmc/mmc-dev-parts.rst b/Documentation/mmc/mmc-dev-parts.rst
deleted file mode 100644
index 995922f1f744..000000000000
--- a/Documentation/mmc/mmc-dev-parts.rst
+++ /dev/null
@@ -1,41 +0,0 @@
-============================
-SD and MMC Device Partitions
-============================
-
-Device partitions are additional logical block devices present on the
-SD/MMC device.
-
-As of this writing, MMC boot partitions as supported and exposed as
-/dev/mmcblkXboot0 and /dev/mmcblkXboot1, where X is the index of the
-parent /dev/mmcblkX.
-
-MMC Boot Partitions
-===================
-
-Read and write access is provided to the two MMC boot partitions. Due to
-the sensitive nature of the boot partition contents, which often store
-a bootloader or bootloader configuration tables crucial to booting the
-platform, write access is disabled by default to reduce the chance of
-accidental bricking.
-
-To enable write access to /dev/mmcblkXbootY, disable the forced read-only
-access with::
-
-	echo 0 > /sys/block/mmcblkXbootY/force_ro
-
-To re-enable read-only access::
-
-	echo 1 > /sys/block/mmcblkXbootY/force_ro
-
-The boot partitions can also be locked read only until the next power on,
-with::
-
-	echo 1 > /sys/block/mmcblkXbootY/ro_lock_until_next_power_on
-
-This is a feature of the card and not of the kernel. If the card does
-not support boot partition locking, the file will not exist. If the
-feature has been disabled on the card, the file will be read-only.
-
-The boot partitions can also be locked permanently, but this feature is
-not accessible through sysfs in order to avoid accidental or malicious
-bricking.
diff --git a/Documentation/mmc/mmc-tools.rst b/Documentation/mmc/mmc-tools.rst
deleted file mode 100644
index 54406093768b..000000000000
--- a/Documentation/mmc/mmc-tools.rst
+++ /dev/null
@@ -1,37 +0,0 @@
-======================
-MMC tools introduction
-======================
-
-There is one MMC test tools called mmc-utils, which is maintained by Chris Ball,
-you can find it at the below public git repository:
-
-	http://git.kernel.org/cgit/linux/kernel/git/cjb/mmc-utils.git/
-
-Functions
-=========
-
-The mmc-utils tools can do the following:
-
- - Print and parse extcsd data.
- - Determine the eMMC writeprotect status.
- - Set the eMMC writeprotect status.
- - Set the eMMC data sector size to 4KB by disabling emulation.
- - Create general purpose partition.
- - Enable the enhanced user area.
- - Enable write reliability per partition.
- - Print the response to STATUS_SEND (CMD13).
- - Enable the boot partition.
- - Set Boot Bus Conditions.
- - Enable the eMMC BKOPS feature.
- - Permanently enable the eMMC H/W Reset feature.
- - Permanently disable the eMMC H/W Reset feature.
- - Send Sanitize command.
- - Program authentication key for the device.
- - Counter value for the rpmb device will be read to stdout.
- - Read from rpmb device to output.
- - Write to rpmb device from data file.
- - Enable the eMMC cache feature.
- - Disable the eMMC cache feature.
- - Print and parse CID data.
- - Print and parse CSD data.
- - Print and parse SCR data.
-- 
cgit v1.2.3-55-g7522


From c0b11a50aee643ac40ded5dbcd48189ee0926ee4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:50:07 -0300
Subject: docs: md: move it to the driver-api book

The docs there were meant to be read by a Kernel developer.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst          |   1 +
 Documentation/driver-api/md/index.rst       |  10 +
 Documentation/driver-api/md/md-cluster.rst  | 385 ++++++++++++++++++++++++++++
 Documentation/driver-api/md/raid5-cache.rst | 111 ++++++++
 Documentation/driver-api/md/raid5-ppl.rst   |  47 ++++
 Documentation/md/index.rst                  |  12 -
 Documentation/md/md-cluster.rst             | 385 ----------------------------
 Documentation/md/raid5-cache.rst            | 111 --------
 Documentation/md/raid5-ppl.rst              |  47 ----
 9 files changed, 554 insertions(+), 555 deletions(-)
 create mode 100644 Documentation/driver-api/md/index.rst
 create mode 100644 Documentation/driver-api/md/md-cluster.rst
 create mode 100644 Documentation/driver-api/md/raid5-cache.rst
 create mode 100644 Documentation/driver-api/md/raid5-ppl.rst
 delete mode 100644 Documentation/md/index.rst
 delete mode 100644 Documentation/md/md-cluster.rst
 delete mode 100644 Documentation/md/raid5-cache.rst
 delete mode 100644 Documentation/md/raid5-ppl.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 25f85d3021aa..b5179bf2ada2 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -56,6 +56,7 @@ available subsections can be seen below.
    firmware/index
    pinctl
    gpio/index
+   md/index
    misc_devices
    nfc/index
    dmaengine/index
diff --git a/Documentation/driver-api/md/index.rst b/Documentation/driver-api/md/index.rst
new file mode 100644
index 000000000000..205080891a1a
--- /dev/null
+++ b/Documentation/driver-api/md/index.rst
@@ -0,0 +1,10 @@
+====
+RAID
+====
+
+.. toctree::
+   :maxdepth: 1
+
+   md-cluster
+   raid5-cache
+   raid5-ppl
diff --git a/Documentation/driver-api/md/md-cluster.rst b/Documentation/driver-api/md/md-cluster.rst
new file mode 100644
index 000000000000..96eb52cec7eb
--- /dev/null
+++ b/Documentation/driver-api/md/md-cluster.rst
@@ -0,0 +1,385 @@
+==========
+MD Cluster
+==========
+
+The cluster MD is a shared-device RAID for a cluster, it supports
+two levels: raid1 and raid10 (limited support).
+
+
+1. On-disk format
+=================
+
+Separate write-intent-bitmaps are used for each cluster node.
+The bitmaps record all writes that may have been started on that node,
+and may not yet have finished. The on-disk layout is::
+
+  0                    4k                     8k                    12k
+  -------------------------------------------------------------------
+  | idle                | md super            | bm super [0] + bits |
+  | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
+  | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
+  | bm bits [3, contd]  |                     |                     |
+
+During "normal" functioning we assume the filesystem ensures that only
+one node writes to any given block at a time, so a write request will
+
+ - set the appropriate bit (if not already set)
+ - commit the write to all mirrors
+ - schedule the bit to be cleared after a timeout.
+
+Reads are just handled normally. It is up to the filesystem to ensure
+one node doesn't read from a location where another node (or the same
+node) is writing.
+
+
+2. DLM Locks for management
+===========================
+
+There are three groups of locks for managing the device:
+
+2.1 Bitmap lock resource (bm_lockres)
+-------------------------------------
+
+ The bm_lockres protects individual node bitmaps. They are named in
+ the form bitmap000 for node 1, bitmap001 for node 2 and so on. When a
+ node joins the cluster, it acquires the lock in PW mode and it stays
+ so during the lifetime the node is part of the cluster. The lock
+ resource number is based on the slot number returned by the DLM
+ subsystem. Since DLM starts node count from one and bitmap slots
+ start from zero, one is subtracted from the DLM slot number to arrive
+ at the bitmap slot number.
+
+ The LVB of the bitmap lock for a particular node records the range
+ of sectors that are being re-synced by that node.  No other
+ node may write to those sectors.  This is used when a new nodes
+ joins the cluster.
+
+2.2 Message passing locks
+-------------------------
+
+ Each node has to communicate with other nodes when starting or ending
+ resync, and for metadata superblock updates.  This communication is
+ managed through three locks: "token", "message", and "ack", together
+ with the Lock Value Block (LVB) of one of the "message" lock.
+
+2.3 new-device management
+-------------------------
+
+ A single lock: "no-new-dev" is used to co-ordinate the addition of
+ new devices - this must be synchronized across the array.
+ Normally all nodes hold a concurrent-read lock on this device.
+
+3. Communication
+================
+
+ Messages can be broadcast to all nodes, and the sender waits for all
+ other nodes to acknowledge the message before proceeding.  Only one
+ message can be processed at a time.
+
+3.1 Message Types
+-----------------
+
+ There are six types of messages which are passed:
+
+3.1.1 METADATA_UPDATED
+^^^^^^^^^^^^^^^^^^^^^^
+
+   informs other nodes that the metadata has
+   been updated, and the node must re-read the md superblock. This is
+   performed synchronously. It is primarily used to signal device
+   failure.
+
+3.1.2 RESYNCING
+^^^^^^^^^^^^^^^
+   informs other nodes that a resync is initiated or
+   ended so that each node may suspend or resume the region.  Each
+   RESYNCING message identifies a range of the devices that the
+   sending node is about to resync. This overrides any previous
+   notification from that node: only one ranged can be resynced at a
+   time per-node.
+
+3.1.3 NEWDISK
+^^^^^^^^^^^^^
+
+   informs other nodes that a device is being added to
+   the array. Message contains an identifier for that device.  See
+   below for further details.
+
+3.1.4 REMOVE
+^^^^^^^^^^^^
+
+   A failed or spare device is being removed from the
+   array. The slot-number of the device is included in the message.
+
+ 3.1.5 RE_ADD:
+
+   A failed device is being re-activated - the assumption
+   is that it has been determined to be working again.
+
+ 3.1.6 BITMAP_NEEDS_SYNC:
+
+   If a node is stopped locally but the bitmap
+   isn't clean, then another node is informed to take the ownership of
+   resync.
+
+3.2 Communication mechanism
+---------------------------
+
+ The DLM LVB is used to communicate within nodes of the cluster. There
+ are three resources used for the purpose:
+
+3.2.1 token
+^^^^^^^^^^^
+   The resource which protects the entire communication
+   system. The node having the token resource is allowed to
+   communicate.
+
+3.2.2 message
+^^^^^^^^^^^^^
+   The lock resource which carries the data to communicate.
+
+3.2.3 ack
+^^^^^^^^^
+
+   The resource, acquiring which means the message has been
+   acknowledged by all nodes in the cluster. The BAST of the resource
+   is used to inform the receiving node that a node wants to
+   communicate.
+
+The algorithm is:
+
+ 1. receive status - all nodes have concurrent-reader lock on "ack"::
+
+	sender                         receiver                 receiver
+	"ack":CR                       "ack":CR                 "ack":CR
+
+ 2. sender get EX on "token",
+    sender get EX on "message"::
+
+	sender                        receiver                 receiver
+	"token":EX                    "ack":CR                 "ack":CR
+	"message":EX
+	"ack":CR
+
+    Sender checks that it still needs to send a message. Messages
+    received or other events that happened while waiting for the
+    "token" may have made this message inappropriate or redundant.
+
+ 3. sender writes LVB
+
+    sender down-convert "message" from EX to CW
+
+    sender try to get EX of "ack"
+
+    ::
+
+      [ wait until all receivers have *processed* the "message" ]
+
+                                       [ triggered by bast of "ack" ]
+                                       receiver get CR on "message"
+                                       receiver read LVB
+                                       receiver processes the message
+                                       [ wait finish ]
+                                       receiver releases "ack"
+                                       receiver tries to get PR on "message"
+
+     sender                         receiver                  receiver
+     "token":EX                     "message":CR              "message":CR
+     "message":CW
+     "ack":EX
+
+ 4. triggered by grant of EX on "ack" (indicating all receivers
+    have processed message)
+
+    sender down-converts "ack" from EX to CR
+
+    sender releases "message"
+
+    sender releases "token"
+
+    ::
+
+                                 receiver upconvert to PR on "message"
+                                 receiver get CR of "ack"
+                                 receiver release "message"
+
+     sender                      receiver                   receiver
+     "ack":CR                    "ack":CR                   "ack":CR
+
+
+4. Handling Failures
+====================
+
+4.1 Node Failure
+----------------
+
+ When a node fails, the DLM informs the cluster with the slot
+ number. The node starts a cluster recovery thread. The cluster
+ recovery thread:
+
+	- acquires the bitmap<number> lock of the failed node
+	- opens the bitmap
+	- reads the bitmap of the failed node
+	- copies the set bitmap to local node
+	- cleans the bitmap of the failed node
+	- releases bitmap<number> lock of the failed node
+	- initiates resync of the bitmap on the current node
+	  md_check_recovery is invoked within recover_bitmaps,
+	  then md_check_recovery -> metadata_update_start/finish,
+	  it will lock the communication by lock_comm.
+	  Which means when one node is resyncing it blocks all
+	  other nodes from writing anywhere on the array.
+
+ The resync process is the regular md resync. However, in a clustered
+ environment when a resync is performed, it needs to tell other nodes
+ of the areas which are suspended. Before a resync starts, the node
+ send out RESYNCING with the (lo,hi) range of the area which needs to
+ be suspended. Each node maintains a suspend_list, which contains the
+ list of ranges which are currently suspended. On receiving RESYNCING,
+ the node adds the range to the suspend_list. Similarly, when the node
+ performing resync finishes, it sends RESYNCING with an empty range to
+ other nodes and other nodes remove the corresponding entry from the
+ suspend_list.
+
+ A helper function, ->area_resyncing() can be used to check if a
+ particular I/O range should be suspended or not.
+
+4.2 Device Failure
+==================
+
+ Device failures are handled and communicated with the metadata update
+ routine.  When a node detects a device failure it does not allow
+ any further writes to that device until the failure has been
+ acknowledged by all other nodes.
+
+5. Adding a new Device
+----------------------
+
+ For adding a new device, it is necessary that all nodes "see" the new
+ device to be added. For this, the following algorithm is used:
+
+   1.  Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
+       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CLUSTER_ADD)
+   2.  Node 1 sends a NEWDISK message with uuid and slot number
+   3.  Other nodes issue kobject_uevent_env with uuid and slot number
+       (Steps 4,5 could be a udev rule)
+   4.  In userspace, the node searches for the disk, perhaps
+       using blkid -t SUB_UUID=""
+   5.  Other nodes issue either of the following depending on whether
+       the disk was found:
+       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
+       disc.number set to slot number)
+       ioctl(CLUSTERED_DISK_NACK)
+   6.  Other nodes drop lock on "no-new-devs" (CR) if device is found
+   7.  Node 1 attempts EX lock on "no-new-dev"
+   8.  If node 1 gets the lock, it sends METADATA_UPDATED after
+       unmarking the disk as SpareLocal
+   9.  If not (get "no-new-dev" lock), it fails the operation and sends
+       METADATA_UPDATED.
+   10. Other nodes get the information whether a disk is added or not
+       by the following METADATA_UPDATED.
+
+6. Module interface
+===================
+
+ There are 17 call-backs which the md core can make to the cluster
+ module.  Understanding these can give a good overview of the whole
+ process.
+
+6.1 join(nodes) and leave()
+---------------------------
+
+ These are called when an array is started with a clustered bitmap,
+ and when the array is stopped.  join() ensures the cluster is
+ available and initializes the various resources.
+ Only the first 'nodes' nodes in the cluster can use the array.
+
+6.2 slot_number()
+-----------------
+
+ Reports the slot number advised by the cluster infrastructure.
+ Range is from 0 to nodes-1.
+
+6.3 resync_info_update()
+------------------------
+
+ This updates the resync range that is stored in the bitmap lock.
+ The starting point is updated as the resync progresses.  The
+ end point is always the end of the array.
+ It does *not* send a RESYNCING message.
+
+6.4 resync_start(), resync_finish()
+-----------------------------------
+
+ These are called when resync/recovery/reshape starts or stops.
+ They update the resyncing range in the bitmap lock and also
+ send a RESYNCING message.  resync_start reports the whole
+ array as resyncing, resync_finish reports none of it.
+
+ resync_finish() also sends a BITMAP_NEEDS_SYNC message which
+ allows some other node to take over.
+
+6.5 metadata_update_start(), metadata_update_finish(), metadata_update_cancel()
+-------------------------------------------------------------------------------
+
+ metadata_update_start is used to get exclusive access to
+ the metadata.  If a change is still needed once that access is
+ gained, metadata_update_finish() will send a METADATA_UPDATE
+ message to all other nodes, otherwise metadata_update_cancel()
+ can be used to release the lock.
+
+6.6 area_resyncing()
+--------------------
+
+ This combines two elements of functionality.
+
+ Firstly, it will check if any node is currently resyncing
+ anything in a given range of sectors.  If any resync is found,
+ then the caller will avoid writing or read-balancing in that
+ range.
+
+ Secondly, while node recovery is happening it reports that
+ all areas are resyncing for READ requests.  This avoids races
+ between the cluster-filesystem and the cluster-RAID handling
+ a node failure.
+
+6.7 add_new_disk_start(), add_new_disk_finish(), new_disk_ack()
+---------------------------------------------------------------
+
+ These are used to manage the new-disk protocol described above.
+ When a new device is added, add_new_disk_start() is called before
+ it is bound to the array and, if that succeeds, add_new_disk_finish()
+ is called the device is fully added.
+
+ When a device is added in acknowledgement to a previous
+ request, or when the device is declared "unavailable",
+ new_disk_ack() is called.
+
+6.8 remove_disk()
+-----------------
+
+ This is called when a spare or failed device is removed from
+ the array.  It causes a REMOVE message to be send to other nodes.
+
+6.9 gather_bitmaps()
+--------------------
+
+ This sends a RE_ADD message to all other nodes and then
+ gathers bitmap information from all bitmaps.  This combined
+ bitmap is then used to recovery the re-added device.
+
+6.10 lock_all_bitmaps() and unlock_all_bitmaps()
+------------------------------------------------
+
+ These are called when change bitmap to none. If a node plans
+ to clear the cluster raid's bitmap, it need to make sure no other
+ nodes are using the raid which is achieved by lock all bitmap
+ locks within the cluster, and also those locks are unlocked
+ accordingly.
+
+7. Unsupported features
+=======================
+
+There are somethings which are not supported by cluster MD yet.
+
+- change array_sectors.
diff --git a/Documentation/driver-api/md/raid5-cache.rst b/Documentation/driver-api/md/raid5-cache.rst
new file mode 100644
index 000000000000..d7a15f44a7c3
--- /dev/null
+++ b/Documentation/driver-api/md/raid5-cache.rst
@@ -0,0 +1,111 @@
+================
+RAID 4/5/6 cache
+================
+
+Raid 4/5/6 could include an extra disk for data cache besides normal RAID
+disks. The role of RAID disks isn't changed with the cache disk. The cache disk
+caches data to the RAID disks. The cache can be in write-through (supported
+since 4.4) or write-back mode (supported since 4.10). mdadm (supported since
+3.4) has a new option '--write-journal' to create array with cache. Please
+refer to mdadm manual for details. By default (RAID array starts), the cache is
+in write-through mode. A user can switch it to write-back mode by::
+
+	echo "write-back" > /sys/block/md0/md/journal_mode
+
+And switch it back to write-through mode by::
+
+	echo "write-through" > /sys/block/md0/md/journal_mode
+
+In both modes, all writes to the array will hit cache disk first. This means
+the cache disk must be fast and sustainable.
+
+write-through mode
+==================
+
+This mode mainly fixes the 'write hole' issue. For RAID 4/5/6 array, an unclean
+shutdown can cause data in some stripes to not be in consistent state, eg, data
+and parity don't match. The reason is that a stripe write involves several RAID
+disks and it's possible the writes don't hit all RAID disks yet before the
+unclean shutdown. We call an array degraded if it has inconsistent data. MD
+tries to resync the array to bring it back to normal state. But before the
+resync completes, any system crash will expose the chance of real data
+corruption in the RAID array. This problem is called 'write hole'.
+
+The write-through cache will cache all data on cache disk first. After the data
+is safe on the cache disk, the data will be flushed onto RAID disks. The
+two-step write will guarantee MD can recover correct data after unclean
+shutdown even the array is degraded. Thus the cache can close the 'write hole'.
+
+In write-through mode, MD reports IO completion to upper layer (usually
+filesystems) after the data is safe on RAID disks, so cache disk failure
+doesn't cause data loss. Of course cache disk failure means the array is
+exposed to 'write hole' again.
+
+In write-through mode, the cache disk isn't required to be big. Several
+hundreds megabytes are enough.
+
+write-back mode
+===============
+
+write-back mode fixes the 'write hole' issue too, since all write data is
+cached on cache disk. But the main goal of 'write-back' cache is to speed up
+write. If a write crosses all RAID disks of a stripe, we call it full-stripe
+write. For non-full-stripe writes, MD must read old data before the new parity
+can be calculated. These synchronous reads hurt write throughput. Some writes
+which are sequential but not dispatched in the same time will suffer from this
+overhead too. Write-back cache will aggregate the data and flush the data to
+RAID disks only after the data becomes a full stripe write. This will
+completely avoid the overhead, so it's very helpful for some workloads. A
+typical workload which does sequential write followed by fsync is an example.
+
+In write-back mode, MD reports IO completion to upper layer (usually
+filesystems) right after the data hits cache disk. The data is flushed to raid
+disks later after specific conditions met. So cache disk failure will cause
+data loss.
+
+In write-back mode, MD also caches data in memory. The memory cache includes
+the same data stored on cache disk, so a power loss doesn't cause data loss.
+The memory cache size has performance impact for the array. It's recommended
+the size is big. A user can configure the size by::
+
+	echo "2048" > /sys/block/md0/md/stripe_cache_size
+
+Too small cache disk will make the write aggregation less efficient in this
+mode depending on the workloads. It's recommended to use a cache disk with at
+least several gigabytes size in write-back mode.
+
+The implementation
+==================
+
+The write-through and write-back cache use the same disk format. The cache disk
+is organized as a simple write log. The log consists of 'meta data' and 'data'
+pairs. The meta data describes the data. It also includes checksum and sequence
+ID for recovery identification. Data can be IO data and parity data. Data is
+checksumed too. The checksum is stored in the meta data ahead of the data. The
+checksum is an optimization because MD can write meta and data freely without
+worry about the order. MD superblock has a field pointed to the valid meta data
+of log head.
+
+The log implementation is pretty straightforward. The difficult part is the
+order in which MD writes data to cache disk and RAID disks. Specifically, in
+write-through mode, MD calculates parity for IO data, writes both IO data and
+parity to the log, writes the data and parity to RAID disks after the data and
+parity is settled down in log and finally the IO is finished. Read just reads
+from raid disks as usual.
+
+In write-back mode, MD writes IO data to the log and reports IO completion. The
+data is also fully cached in memory at that time, which means read must query
+memory cache. If some conditions are met, MD will flush the data to RAID disks.
+MD will calculate parity for the data and write parity into the log. After this
+is finished, MD will write both data and parity into RAID disks, then MD can
+release the memory cache. The flush conditions could be stripe becomes a full
+stripe write, free cache disk space is low or free in-kernel memory cache space
+is low.
+
+After an unclean shutdown, MD does recovery. MD reads all meta data and data
+from the log. The sequence ID and checksum will help us detect corrupted meta
+data and data. If MD finds a stripe with data and valid parities (1 parity for
+raid4/5 and 2 for raid6), MD will write the data and parities to RAID disks. If
+parities are incompleted, they are discarded. If part of data is corrupted,
+they are discarded too. MD then loads valid data and writes them to RAID disks
+in normal way.
diff --git a/Documentation/driver-api/md/raid5-ppl.rst b/Documentation/driver-api/md/raid5-ppl.rst
new file mode 100644
index 000000000000..357e5515bc55
--- /dev/null
+++ b/Documentation/driver-api/md/raid5-ppl.rst
@@ -0,0 +1,47 @@
+==================
+Partial Parity Log
+==================
+
+Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
+addressed by PPL is that after a dirty shutdown, parity of a particular stripe
+may become inconsistent with data on other member disks. If the array is also
+in degraded state, there is no way to recalculate parity, because one of the
+disks is missing. This can lead to silent data corruption when rebuilding the
+array or using it is as degraded - data calculated from parity for array blocks
+that have not been touched by a write request during the unclean shutdown can
+be incorrect. Such condition is known as the RAID5 Write Hole. Because of
+this, md by default does not allow starting a dirty degraded array.
+
+Partial parity for a write operation is the XOR of stripe data chunks not
+modified by this write. It is just enough data needed for recovering from the
+write hole. XORing partial parity with the modified chunks produces parity for
+the stripe, consistent with its state before the write operation, regardless of
+which chunk writes have completed. If one of the not modified data disks of
+this stripe is missing, this updated parity can be used to recover its
+contents. PPL recovery is also performed when starting an array after an
+unclean shutdown and all disks are available, eliminating the need to resync
+the array. Because of this, using write-intent bitmap and PPL together is not
+supported.
+
+When handling a write request PPL writes partial parity before new data and
+parity are dispatched to disks. PPL is a distributed log - it is stored on
+array member drives in the metadata area, on the parity drive of a particular
+stripe.  It does not require a dedicated journaling drive. Write performance is
+reduced by up to 30%-40% but it scales with the number of drives in the array
+and the journaling drive does not become a bottleneck or a single point of
+failure.
+
+Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
+not a true journal. It does not protect from losing in-flight data, only from
+silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
+performed for this stripe (parity is not updated). So it is possible to have
+arbitrary data in the written part of a stripe if that disk is lost. In such
+case the behavior is the same as in plain raid5.
+
+PPL is available for md version-1 metadata and external (specifically IMSM)
+metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
+
+There is a limitation of maximum 64 disks in the array for PPL. It allows to
+keep data structures and implementation simple. RAID5 arrays with so many disks
+are not likely due to high risk of multiple disks failure. Such restriction
+should not be a real life limitation.
diff --git a/Documentation/md/index.rst b/Documentation/md/index.rst
deleted file mode 100644
index c4db34ed327d..000000000000
--- a/Documentation/md/index.rst
+++ /dev/null
@@ -1,12 +0,0 @@
-:orphan:
-
-====
-RAID
-====
-
-.. toctree::
-   :maxdepth: 1
-
-   md-cluster
-   raid5-cache
-   raid5-ppl
diff --git a/Documentation/md/md-cluster.rst b/Documentation/md/md-cluster.rst
deleted file mode 100644
index 96eb52cec7eb..000000000000
--- a/Documentation/md/md-cluster.rst
+++ /dev/null
@@ -1,385 +0,0 @@
-==========
-MD Cluster
-==========
-
-The cluster MD is a shared-device RAID for a cluster, it supports
-two levels: raid1 and raid10 (limited support).
-
-
-1. On-disk format
-=================
-
-Separate write-intent-bitmaps are used for each cluster node.
-The bitmaps record all writes that may have been started on that node,
-and may not yet have finished. The on-disk layout is::
-
-  0                    4k                     8k                    12k
-  -------------------------------------------------------------------
-  | idle                | md super            | bm super [0] + bits |
-  | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
-  | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
-  | bm bits [3, contd]  |                     |                     |
-
-During "normal" functioning we assume the filesystem ensures that only
-one node writes to any given block at a time, so a write request will
-
- - set the appropriate bit (if not already set)
- - commit the write to all mirrors
- - schedule the bit to be cleared after a timeout.
-
-Reads are just handled normally. It is up to the filesystem to ensure
-one node doesn't read from a location where another node (or the same
-node) is writing.
-
-
-2. DLM Locks for management
-===========================
-
-There are three groups of locks for managing the device:
-
-2.1 Bitmap lock resource (bm_lockres)
--------------------------------------
-
- The bm_lockres protects individual node bitmaps. They are named in
- the form bitmap000 for node 1, bitmap001 for node 2 and so on. When a
- node joins the cluster, it acquires the lock in PW mode and it stays
- so during the lifetime the node is part of the cluster. The lock
- resource number is based on the slot number returned by the DLM
- subsystem. Since DLM starts node count from one and bitmap slots
- start from zero, one is subtracted from the DLM slot number to arrive
- at the bitmap slot number.
-
- The LVB of the bitmap lock for a particular node records the range
- of sectors that are being re-synced by that node.  No other
- node may write to those sectors.  This is used when a new nodes
- joins the cluster.
-
-2.2 Message passing locks
--------------------------
-
- Each node has to communicate with other nodes when starting or ending
- resync, and for metadata superblock updates.  This communication is
- managed through three locks: "token", "message", and "ack", together
- with the Lock Value Block (LVB) of one of the "message" lock.
-
-2.3 new-device management
--------------------------
-
- A single lock: "no-new-dev" is used to co-ordinate the addition of
- new devices - this must be synchronized across the array.
- Normally all nodes hold a concurrent-read lock on this device.
-
-3. Communication
-================
-
- Messages can be broadcast to all nodes, and the sender waits for all
- other nodes to acknowledge the message before proceeding.  Only one
- message can be processed at a time.
-
-3.1 Message Types
------------------
-
- There are six types of messages which are passed:
-
-3.1.1 METADATA_UPDATED
-^^^^^^^^^^^^^^^^^^^^^^
-
-   informs other nodes that the metadata has
-   been updated, and the node must re-read the md superblock. This is
-   performed synchronously. It is primarily used to signal device
-   failure.
-
-3.1.2 RESYNCING
-^^^^^^^^^^^^^^^
-   informs other nodes that a resync is initiated or
-   ended so that each node may suspend or resume the region.  Each
-   RESYNCING message identifies a range of the devices that the
-   sending node is about to resync. This overrides any previous
-   notification from that node: only one ranged can be resynced at a
-   time per-node.
-
-3.1.3 NEWDISK
-^^^^^^^^^^^^^
-
-   informs other nodes that a device is being added to
-   the array. Message contains an identifier for that device.  See
-   below for further details.
-
-3.1.4 REMOVE
-^^^^^^^^^^^^
-
-   A failed or spare device is being removed from the
-   array. The slot-number of the device is included in the message.
-
- 3.1.5 RE_ADD:
-
-   A failed device is being re-activated - the assumption
-   is that it has been determined to be working again.
-
- 3.1.6 BITMAP_NEEDS_SYNC:
-
-   If a node is stopped locally but the bitmap
-   isn't clean, then another node is informed to take the ownership of
-   resync.
-
-3.2 Communication mechanism
----------------------------
-
- The DLM LVB is used to communicate within nodes of the cluster. There
- are three resources used for the purpose:
-
-3.2.1 token
-^^^^^^^^^^^
-   The resource which protects the entire communication
-   system. The node having the token resource is allowed to
-   communicate.
-
-3.2.2 message
-^^^^^^^^^^^^^
-   The lock resource which carries the data to communicate.
-
-3.2.3 ack
-^^^^^^^^^
-
-   The resource, acquiring which means the message has been
-   acknowledged by all nodes in the cluster. The BAST of the resource
-   is used to inform the receiving node that a node wants to
-   communicate.
-
-The algorithm is:
-
- 1. receive status - all nodes have concurrent-reader lock on "ack"::
-
-	sender                         receiver                 receiver
-	"ack":CR                       "ack":CR                 "ack":CR
-
- 2. sender get EX on "token",
-    sender get EX on "message"::
-
-	sender                        receiver                 receiver
-	"token":EX                    "ack":CR                 "ack":CR
-	"message":EX
-	"ack":CR
-
-    Sender checks that it still needs to send a message. Messages
-    received or other events that happened while waiting for the
-    "token" may have made this message inappropriate or redundant.
-
- 3. sender writes LVB
-
-    sender down-convert "message" from EX to CW
-
-    sender try to get EX of "ack"
-
-    ::
-
-      [ wait until all receivers have *processed* the "message" ]
-
-                                       [ triggered by bast of "ack" ]
-                                       receiver get CR on "message"
-                                       receiver read LVB
-                                       receiver processes the message
-                                       [ wait finish ]
-                                       receiver releases "ack"
-                                       receiver tries to get PR on "message"
-
-     sender                         receiver                  receiver
-     "token":EX                     "message":CR              "message":CR
-     "message":CW
-     "ack":EX
-
- 4. triggered by grant of EX on "ack" (indicating all receivers
-    have processed message)
-
-    sender down-converts "ack" from EX to CR
-
-    sender releases "message"
-
-    sender releases "token"
-
-    ::
-
-                                 receiver upconvert to PR on "message"
-                                 receiver get CR of "ack"
-                                 receiver release "message"
-
-     sender                      receiver                   receiver
-     "ack":CR                    "ack":CR                   "ack":CR
-
-
-4. Handling Failures
-====================
-
-4.1 Node Failure
-----------------
-
- When a node fails, the DLM informs the cluster with the slot
- number. The node starts a cluster recovery thread. The cluster
- recovery thread:
-
-	- acquires the bitmap<number> lock of the failed node
-	- opens the bitmap
-	- reads the bitmap of the failed node
-	- copies the set bitmap to local node
-	- cleans the bitmap of the failed node
-	- releases bitmap<number> lock of the failed node
-	- initiates resync of the bitmap on the current node
-	  md_check_recovery is invoked within recover_bitmaps,
-	  then md_check_recovery -> metadata_update_start/finish,
-	  it will lock the communication by lock_comm.
-	  Which means when one node is resyncing it blocks all
-	  other nodes from writing anywhere on the array.
-
- The resync process is the regular md resync. However, in a clustered
- environment when a resync is performed, it needs to tell other nodes
- of the areas which are suspended. Before a resync starts, the node
- send out RESYNCING with the (lo,hi) range of the area which needs to
- be suspended. Each node maintains a suspend_list, which contains the
- list of ranges which are currently suspended. On receiving RESYNCING,
- the node adds the range to the suspend_list. Similarly, when the node
- performing resync finishes, it sends RESYNCING with an empty range to
- other nodes and other nodes remove the corresponding entry from the
- suspend_list.
-
- A helper function, ->area_resyncing() can be used to check if a
- particular I/O range should be suspended or not.
-
-4.2 Device Failure
-==================
-
- Device failures are handled and communicated with the metadata update
- routine.  When a node detects a device failure it does not allow
- any further writes to that device until the failure has been
- acknowledged by all other nodes.
-
-5. Adding a new Device
-----------------------
-
- For adding a new device, it is necessary that all nodes "see" the new
- device to be added. For this, the following algorithm is used:
-
-   1.  Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
-       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CLUSTER_ADD)
-   2.  Node 1 sends a NEWDISK message with uuid and slot number
-   3.  Other nodes issue kobject_uevent_env with uuid and slot number
-       (Steps 4,5 could be a udev rule)
-   4.  In userspace, the node searches for the disk, perhaps
-       using blkid -t SUB_UUID=""
-   5.  Other nodes issue either of the following depending on whether
-       the disk was found:
-       ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
-       disc.number set to slot number)
-       ioctl(CLUSTERED_DISK_NACK)
-   6.  Other nodes drop lock on "no-new-devs" (CR) if device is found
-   7.  Node 1 attempts EX lock on "no-new-dev"
-   8.  If node 1 gets the lock, it sends METADATA_UPDATED after
-       unmarking the disk as SpareLocal
-   9.  If not (get "no-new-dev" lock), it fails the operation and sends
-       METADATA_UPDATED.
-   10. Other nodes get the information whether a disk is added or not
-       by the following METADATA_UPDATED.
-
-6. Module interface
-===================
-
- There are 17 call-backs which the md core can make to the cluster
- module.  Understanding these can give a good overview of the whole
- process.
-
-6.1 join(nodes) and leave()
----------------------------
-
- These are called when an array is started with a clustered bitmap,
- and when the array is stopped.  join() ensures the cluster is
- available and initializes the various resources.
- Only the first 'nodes' nodes in the cluster can use the array.
-
-6.2 slot_number()
------------------
-
- Reports the slot number advised by the cluster infrastructure.
- Range is from 0 to nodes-1.
-
-6.3 resync_info_update()
-------------------------
-
- This updates the resync range that is stored in the bitmap lock.
- The starting point is updated as the resync progresses.  The
- end point is always the end of the array.
- It does *not* send a RESYNCING message.
-
-6.4 resync_start(), resync_finish()
------------------------------------
-
- These are called when resync/recovery/reshape starts or stops.
- They update the resyncing range in the bitmap lock and also
- send a RESYNCING message.  resync_start reports the whole
- array as resyncing, resync_finish reports none of it.
-
- resync_finish() also sends a BITMAP_NEEDS_SYNC message which
- allows some other node to take over.
-
-6.5 metadata_update_start(), metadata_update_finish(), metadata_update_cancel()
--------------------------------------------------------------------------------
-
- metadata_update_start is used to get exclusive access to
- the metadata.  If a change is still needed once that access is
- gained, metadata_update_finish() will send a METADATA_UPDATE
- message to all other nodes, otherwise metadata_update_cancel()
- can be used to release the lock.
-
-6.6 area_resyncing()
---------------------
-
- This combines two elements of functionality.
-
- Firstly, it will check if any node is currently resyncing
- anything in a given range of sectors.  If any resync is found,
- then the caller will avoid writing or read-balancing in that
- range.
-
- Secondly, while node recovery is happening it reports that
- all areas are resyncing for READ requests.  This avoids races
- between the cluster-filesystem and the cluster-RAID handling
- a node failure.
-
-6.7 add_new_disk_start(), add_new_disk_finish(), new_disk_ack()
----------------------------------------------------------------
-
- These are used to manage the new-disk protocol described above.
- When a new device is added, add_new_disk_start() is called before
- it is bound to the array and, if that succeeds, add_new_disk_finish()
- is called the device is fully added.
-
- When a device is added in acknowledgement to a previous
- request, or when the device is declared "unavailable",
- new_disk_ack() is called.
-
-6.8 remove_disk()
------------------
-
- This is called when a spare or failed device is removed from
- the array.  It causes a REMOVE message to be send to other nodes.
-
-6.9 gather_bitmaps()
---------------------
-
- This sends a RE_ADD message to all other nodes and then
- gathers bitmap information from all bitmaps.  This combined
- bitmap is then used to recovery the re-added device.
-
-6.10 lock_all_bitmaps() and unlock_all_bitmaps()
-------------------------------------------------
-
- These are called when change bitmap to none. If a node plans
- to clear the cluster raid's bitmap, it need to make sure no other
- nodes are using the raid which is achieved by lock all bitmap
- locks within the cluster, and also those locks are unlocked
- accordingly.
-
-7. Unsupported features
-=======================
-
-There are somethings which are not supported by cluster MD yet.
-
-- change array_sectors.
diff --git a/Documentation/md/raid5-cache.rst b/Documentation/md/raid5-cache.rst
deleted file mode 100644
index d7a15f44a7c3..000000000000
--- a/Documentation/md/raid5-cache.rst
+++ /dev/null
@@ -1,111 +0,0 @@
-================
-RAID 4/5/6 cache
-================
-
-Raid 4/5/6 could include an extra disk for data cache besides normal RAID
-disks. The role of RAID disks isn't changed with the cache disk. The cache disk
-caches data to the RAID disks. The cache can be in write-through (supported
-since 4.4) or write-back mode (supported since 4.10). mdadm (supported since
-3.4) has a new option '--write-journal' to create array with cache. Please
-refer to mdadm manual for details. By default (RAID array starts), the cache is
-in write-through mode. A user can switch it to write-back mode by::
-
-	echo "write-back" > /sys/block/md0/md/journal_mode
-
-And switch it back to write-through mode by::
-
-	echo "write-through" > /sys/block/md0/md/journal_mode
-
-In both modes, all writes to the array will hit cache disk first. This means
-the cache disk must be fast and sustainable.
-
-write-through mode
-==================
-
-This mode mainly fixes the 'write hole' issue. For RAID 4/5/6 array, an unclean
-shutdown can cause data in some stripes to not be in consistent state, eg, data
-and parity don't match. The reason is that a stripe write involves several RAID
-disks and it's possible the writes don't hit all RAID disks yet before the
-unclean shutdown. We call an array degraded if it has inconsistent data. MD
-tries to resync the array to bring it back to normal state. But before the
-resync completes, any system crash will expose the chance of real data
-corruption in the RAID array. This problem is called 'write hole'.
-
-The write-through cache will cache all data on cache disk first. After the data
-is safe on the cache disk, the data will be flushed onto RAID disks. The
-two-step write will guarantee MD can recover correct data after unclean
-shutdown even the array is degraded. Thus the cache can close the 'write hole'.
-
-In write-through mode, MD reports IO completion to upper layer (usually
-filesystems) after the data is safe on RAID disks, so cache disk failure
-doesn't cause data loss. Of course cache disk failure means the array is
-exposed to 'write hole' again.
-
-In write-through mode, the cache disk isn't required to be big. Several
-hundreds megabytes are enough.
-
-write-back mode
-===============
-
-write-back mode fixes the 'write hole' issue too, since all write data is
-cached on cache disk. But the main goal of 'write-back' cache is to speed up
-write. If a write crosses all RAID disks of a stripe, we call it full-stripe
-write. For non-full-stripe writes, MD must read old data before the new parity
-can be calculated. These synchronous reads hurt write throughput. Some writes
-which are sequential but not dispatched in the same time will suffer from this
-overhead too. Write-back cache will aggregate the data and flush the data to
-RAID disks only after the data becomes a full stripe write. This will
-completely avoid the overhead, so it's very helpful for some workloads. A
-typical workload which does sequential write followed by fsync is an example.
-
-In write-back mode, MD reports IO completion to upper layer (usually
-filesystems) right after the data hits cache disk. The data is flushed to raid
-disks later after specific conditions met. So cache disk failure will cause
-data loss.
-
-In write-back mode, MD also caches data in memory. The memory cache includes
-the same data stored on cache disk, so a power loss doesn't cause data loss.
-The memory cache size has performance impact for the array. It's recommended
-the size is big. A user can configure the size by::
-
-	echo "2048" > /sys/block/md0/md/stripe_cache_size
-
-Too small cache disk will make the write aggregation less efficient in this
-mode depending on the workloads. It's recommended to use a cache disk with at
-least several gigabytes size in write-back mode.
-
-The implementation
-==================
-
-The write-through and write-back cache use the same disk format. The cache disk
-is organized as a simple write log. The log consists of 'meta data' and 'data'
-pairs. The meta data describes the data. It also includes checksum and sequence
-ID for recovery identification. Data can be IO data and parity data. Data is
-checksumed too. The checksum is stored in the meta data ahead of the data. The
-checksum is an optimization because MD can write meta and data freely without
-worry about the order. MD superblock has a field pointed to the valid meta data
-of log head.
-
-The log implementation is pretty straightforward. The difficult part is the
-order in which MD writes data to cache disk and RAID disks. Specifically, in
-write-through mode, MD calculates parity for IO data, writes both IO data and
-parity to the log, writes the data and parity to RAID disks after the data and
-parity is settled down in log and finally the IO is finished. Read just reads
-from raid disks as usual.
-
-In write-back mode, MD writes IO data to the log and reports IO completion. The
-data is also fully cached in memory at that time, which means read must query
-memory cache. If some conditions are met, MD will flush the data to RAID disks.
-MD will calculate parity for the data and write parity into the log. After this
-is finished, MD will write both data and parity into RAID disks, then MD can
-release the memory cache. The flush conditions could be stripe becomes a full
-stripe write, free cache disk space is low or free in-kernel memory cache space
-is low.
-
-After an unclean shutdown, MD does recovery. MD reads all meta data and data
-from the log. The sequence ID and checksum will help us detect corrupted meta
-data and data. If MD finds a stripe with data and valid parities (1 parity for
-raid4/5 and 2 for raid6), MD will write the data and parities to RAID disks. If
-parities are incompleted, they are discarded. If part of data is corrupted,
-they are discarded too. MD then loads valid data and writes them to RAID disks
-in normal way.
diff --git a/Documentation/md/raid5-ppl.rst b/Documentation/md/raid5-ppl.rst
deleted file mode 100644
index 357e5515bc55..000000000000
--- a/Documentation/md/raid5-ppl.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-==================
-Partial Parity Log
-==================
-
-Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
-addressed by PPL is that after a dirty shutdown, parity of a particular stripe
-may become inconsistent with data on other member disks. If the array is also
-in degraded state, there is no way to recalculate parity, because one of the
-disks is missing. This can lead to silent data corruption when rebuilding the
-array or using it is as degraded - data calculated from parity for array blocks
-that have not been touched by a write request during the unclean shutdown can
-be incorrect. Such condition is known as the RAID5 Write Hole. Because of
-this, md by default does not allow starting a dirty degraded array.
-
-Partial parity for a write operation is the XOR of stripe data chunks not
-modified by this write. It is just enough data needed for recovering from the
-write hole. XORing partial parity with the modified chunks produces parity for
-the stripe, consistent with its state before the write operation, regardless of
-which chunk writes have completed. If one of the not modified data disks of
-this stripe is missing, this updated parity can be used to recover its
-contents. PPL recovery is also performed when starting an array after an
-unclean shutdown and all disks are available, eliminating the need to resync
-the array. Because of this, using write-intent bitmap and PPL together is not
-supported.
-
-When handling a write request PPL writes partial parity before new data and
-parity are dispatched to disks. PPL is a distributed log - it is stored on
-array member drives in the metadata area, on the parity drive of a particular
-stripe.  It does not require a dedicated journaling drive. Write performance is
-reduced by up to 30%-40% but it scales with the number of drives in the array
-and the journaling drive does not become a bottleneck or a single point of
-failure.
-
-Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
-not a true journal. It does not protect from losing in-flight data, only from
-silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
-performed for this stripe (parity is not updated). So it is possible to have
-arbitrary data in the written part of a stripe if that disk is lost. In such
-case the behavior is the same as in plain raid5.
-
-PPL is available for md version-1 metadata and external (specifically IMSM)
-metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
-
-There is a limitation of maximum 64 disks in the array for PPL. It allows to
-keep data structures and implementation simple. RAID5 arrays with so many disks
-are not likely due to high risk of multiple disks failure. Such restriction
-should not be a real life limitation.
-- 
cgit v1.2.3-55-g7522


From 09fdc957ad0d0ee83c00cd1e0c3a605047f63bf7 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 16:51:34 -0300
Subject: docs: leds: add it to the driver-api book

The contents of leds driver docs is messy: it has lots of
admin-guide stuff and kernel internal ones, just like other
driver subsystems.

I'm opting to keep the dir at the same place and just add
a link to it. This makes clearer that this require changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/index.rst      | 1 +
 Documentation/leds/index.rst | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/index.rst b/Documentation/index.rst
index c6934d90363c..c4f9610b6167 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -91,6 +91,7 @@ needed).
 
    driver-api/index
    core-api/index
+   leds/index
    media/index
    networking/index
    input/index
diff --git a/Documentation/leds/index.rst b/Documentation/leds/index.rst
index 9885f7c1b75d..060f4e485897 100644
--- a/Documentation/leds/index.rst
+++ b/Documentation/leds/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ====
 LEDs
-- 
cgit v1.2.3-55-g7522


From 616b81db2fa757f48895242ea6aaf3c1a1ad22f4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 17:13:24 -0300
Subject: docs: ioctl: add it to the uAPI guide

While 100% of its contents is userspace, let's keep the dir
at the same place, as this is a well-known location.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/index.rst              | 1 +
 Documentation/ioctl/index.rst        | 2 +-
 Documentation/ioctl/ioctl-number.rst | 2 --
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/index.rst b/Documentation/index.rst
index c4f9610b6167..864daf8805a4 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -56,6 +56,7 @@ the kernel interface as seen by application developers.
    :maxdepth: 2
 
    userspace-api/index
+   ioctl/index
 
 
 Introduction to kernel development
diff --git a/Documentation/ioctl/index.rst b/Documentation/ioctl/index.rst
index 1a6f437566e3..0f0a857f6615 100644
--- a/Documentation/ioctl/index.rst
+++ b/Documentation/ioctl/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ======
 IOCTLs
diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst
index fcf9623a599f..7f8dcae7a230 100644
--- a/Documentation/ioctl/ioctl-number.rst
+++ b/Documentation/ioctl/ioctl-number.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 =============
 Ioctl Numbers
 =============
-- 
cgit v1.2.3-55-g7522


From 9b1f44028ff2e051816517781153e10a2d748dc3 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 17:15:10 -0300
Subject: docs: interconnect.rst: add it to the driver-api guide

This is intended for Kernel hackers audience.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Georgi Djakov <georgi.djakov@linaro.org>
---
 Documentation/driver-api/index.rst          |  1 +
 Documentation/driver-api/interconnect.rst   | 93 ++++++++++++++++++++++++++++
 Documentation/interconnect/interconnect.rst | 95 -----------------------------
 MAINTAINERS                                 |  2 +-
 4 files changed, 95 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/driver-api/interconnect.rst
 delete mode 100644 Documentation/interconnect/interconnect.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index b5179bf2ada2..baa77a666e46 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -36,6 +36,7 @@ available subsections can be seen below.
    i2c
    ipmb
    i3c/index
+   interconnect
    hsi
    edac
    scsi
diff --git a/Documentation/driver-api/interconnect.rst b/Documentation/driver-api/interconnect.rst
new file mode 100644
index 000000000000..c3e004893796
--- /dev/null
+++ b/Documentation/driver-api/interconnect.rst
@@ -0,0 +1,93 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
+GENERIC SYSTEM INTERCONNECT SUBSYSTEM
+=====================================
+
+Introduction
+------------
+
+This framework is designed to provide a standard kernel interface to control
+the settings of the interconnects on an SoC. These settings can be throughput,
+latency and priority between multiple interconnected devices or functional
+blocks. This can be controlled dynamically in order to save power or provide
+maximum performance.
+
+The interconnect bus is hardware with configurable parameters, which can be
+set on a data path according to the requests received from various drivers.
+An example of interconnect buses are the interconnects between various
+components or functional blocks in chipsets. There can be multiple interconnects
+on an SoC that can be multi-tiered.
+
+Below is a simplified diagram of a real-world SoC interconnect bus topology.
+
+::
+
+ +----------------+    +----------------+
+ | HW Accelerator |--->|      M NoC     |<---------------+
+ +----------------+    +----------------+                |
+                         |      |                    +------------+
+  +-----+  +-------------+      V       +------+     |            |
+  | DDR |  |                +--------+  | PCIe |     |            |
+  +-----+  |                | Slaves |  +------+     |            |
+    ^ ^    |                +--------+     |         |   C NoC    |
+    | |    V                               V         |            |
+ +------------------+   +------------------------+   |            |   +-----+
+ |                  |-->|                        |-->|            |-->| CPU |
+ |                  |-->|                        |<--|            |   +-----+
+ |     Mem NoC      |   |         S NoC          |   +------------+
+ |                  |<--|                        |---------+    |
+ |                  |<--|                        |<------+ |    |   +--------+
+ +------------------+   +------------------------+       | |    +-->| Slaves |
+   ^  ^    ^    ^          ^                             | |        +--------+
+   |  |    |    |          |                             | V
+ +------+  |  +-----+   +-----+  +---------+   +----------------+   +--------+
+ | CPUs |  |  | GPU |   | DSP |  | Masters |-->|       P NoC    |-->| Slaves |
+ +------+  |  +-----+   +-----+  +---------+   +----------------+   +--------+
+           |
+       +-------+
+       | Modem |
+       +-------+
+
+Terminology
+-----------
+
+Interconnect provider is the software definition of the interconnect hardware.
+The interconnect providers on the above diagram are M NoC, S NoC, C NoC, P NoC
+and Mem NoC.
+
+Interconnect node is the software definition of the interconnect hardware
+port. Each interconnect provider consists of multiple interconnect nodes,
+which are connected to other SoC components including other interconnect
+providers. The point on the diagram where the CPUs connect to the memory is
+called an interconnect node, which belongs to the Mem NoC interconnect provider.
+
+Interconnect endpoints are the first or the last element of the path. Every
+endpoint is a node, but not every node is an endpoint.
+
+Interconnect path is everything between two endpoints including all the nodes
+that have to be traversed to reach from a source to destination node. It may
+include multiple master-slave pairs across several interconnect providers.
+
+Interconnect consumers are the entities which make use of the data paths exposed
+by the providers. The consumers send requests to providers requesting various
+throughput, latency and priority. Usually the consumers are device drivers, that
+send request based on their needs. An example for a consumer is a video decoder
+that supports various formats and image sizes.
+
+Interconnect providers
+----------------------
+
+Interconnect provider is an entity that implements methods to initialize and
+configure interconnect bus hardware. The interconnect provider drivers should
+be registered with the interconnect provider core.
+
+.. kernel-doc:: include/linux/interconnect-provider.h
+
+Interconnect consumers
+----------------------
+
+Interconnect consumers are the clients which use the interconnect APIs to
+get paths between endpoints and set their bandwidth/latency/QoS requirements
+for these interconnect paths.  These interfaces are not currently
+documented.
diff --git a/Documentation/interconnect/interconnect.rst b/Documentation/interconnect/interconnect.rst
deleted file mode 100644
index 56e331dab70e..000000000000
--- a/Documentation/interconnect/interconnect.rst
+++ /dev/null
@@ -1,95 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-:orphan:
-
-=====================================
-GENERIC SYSTEM INTERCONNECT SUBSYSTEM
-=====================================
-
-Introduction
-------------
-
-This framework is designed to provide a standard kernel interface to control
-the settings of the interconnects on an SoC. These settings can be throughput,
-latency and priority between multiple interconnected devices or functional
-blocks. This can be controlled dynamically in order to save power or provide
-maximum performance.
-
-The interconnect bus is hardware with configurable parameters, which can be
-set on a data path according to the requests received from various drivers.
-An example of interconnect buses are the interconnects between various
-components or functional blocks in chipsets. There can be multiple interconnects
-on an SoC that can be multi-tiered.
-
-Below is a simplified diagram of a real-world SoC interconnect bus topology.
-
-::
-
- +----------------+    +----------------+
- | HW Accelerator |--->|      M NoC     |<---------------+
- +----------------+    +----------------+                |
-                         |      |                    +------------+
-  +-----+  +-------------+      V       +------+     |            |
-  | DDR |  |                +--------+  | PCIe |     |            |
-  +-----+  |                | Slaves |  +------+     |            |
-    ^ ^    |                +--------+     |         |   C NoC    |
-    | |    V                               V         |            |
- +------------------+   +------------------------+   |            |   +-----+
- |                  |-->|                        |-->|            |-->| CPU |
- |                  |-->|                        |<--|            |   +-----+
- |     Mem NoC      |   |         S NoC          |   +------------+
- |                  |<--|                        |---------+    |
- |                  |<--|                        |<------+ |    |   +--------+
- +------------------+   +------------------------+       | |    +-->| Slaves |
-   ^  ^    ^    ^          ^                             | |        +--------+
-   |  |    |    |          |                             | V
- +------+  |  +-----+   +-----+  +---------+   +----------------+   +--------+
- | CPUs |  |  | GPU |   | DSP |  | Masters |-->|       P NoC    |-->| Slaves |
- +------+  |  +-----+   +-----+  +---------+   +----------------+   +--------+
-           |
-       +-------+
-       | Modem |
-       +-------+
-
-Terminology
------------
-
-Interconnect provider is the software definition of the interconnect hardware.
-The interconnect providers on the above diagram are M NoC, S NoC, C NoC, P NoC
-and Mem NoC.
-
-Interconnect node is the software definition of the interconnect hardware
-port. Each interconnect provider consists of multiple interconnect nodes,
-which are connected to other SoC components including other interconnect
-providers. The point on the diagram where the CPUs connect to the memory is
-called an interconnect node, which belongs to the Mem NoC interconnect provider.
-
-Interconnect endpoints are the first or the last element of the path. Every
-endpoint is a node, but not every node is an endpoint.
-
-Interconnect path is everything between two endpoints including all the nodes
-that have to be traversed to reach from a source to destination node. It may
-include multiple master-slave pairs across several interconnect providers.
-
-Interconnect consumers are the entities which make use of the data paths exposed
-by the providers. The consumers send requests to providers requesting various
-throughput, latency and priority. Usually the consumers are device drivers, that
-send request based on their needs. An example for a consumer is a video decoder
-that supports various formats and image sizes.
-
-Interconnect providers
-----------------------
-
-Interconnect provider is an entity that implements methods to initialize and
-configure interconnect bus hardware. The interconnect provider drivers should
-be registered with the interconnect provider core.
-
-.. kernel-doc:: include/linux/interconnect-provider.h
-
-Interconnect consumers
-----------------------
-
-Interconnect consumers are the clients which use the interconnect APIs to
-get paths between endpoints and set their bandwidth/latency/QoS requirements
-for these interconnect paths.  These interfaces are not currently
-documented.
diff --git a/MAINTAINERS b/MAINTAINERS
index b8ce346d5254..49e9a58f4799 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8326,7 +8326,7 @@ INTERCONNECT API
 M:	Georgi Djakov <georgi.djakov@linaro.org>
 L:	linux-pm@vger.kernel.org
 S:	Maintained
-F:	Documentation/interconnect/
+F:	Documentation/driver-api/interconnect.rst
 F:	Documentation/devicetree/bindings/interconnect/
 F:	drivers/interconnect/
 F:	include/dt-bindings/interconnect/
-- 
cgit v1.2.3-55-g7522


From 159a5e78bdcabb1f87ee5536182a99a307ae0bac Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 22 Apr 2019 16:10:26 -0300
Subject: docs: add arch doc directories to the index

Now that several arch documents were converted to ReST,
add their indexes to Documentation/index.rst and remove the
:orphan:  from them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/arm/index.rst    | 2 --
 Documentation/arm64/index.rst  | 2 --
 Documentation/ia64/index.rst   | 2 --
 Documentation/index.rst        | 9 +++++++++
 Documentation/m68k/index.rst   | 2 +-
 Documentation/riscv/index.rst  | 2 --
 Documentation/s390/index.rst   | 2 --
 Documentation/sparc/index.rst  | 2 --
 Documentation/xtensa/index.rst | 2 +-
 9 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/Documentation/arm/index.rst b/Documentation/arm/index.rst
index bd316d1a1802..9c2f781f4685 100644
--- a/Documentation/arm/index.rst
+++ b/Documentation/arm/index.rst
@@ -1,5 +1,3 @@
-﻿:orphan:
-
 ================
 ARM Architecture
 ================
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 018b7836ecb7..96b696ba4e6c 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ==================
 ARM64 Architecture
 ==================
diff --git a/Documentation/ia64/index.rst b/Documentation/ia64/index.rst
index a3e3052ad6e2..ef99475f672b 100644
--- a/Documentation/ia64/index.rst
+++ b/Documentation/ia64/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ==================
 IA-64 Architecture
 ==================
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 864daf8805a4..a322c8721d13 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -117,7 +117,16 @@ implementation.
    :maxdepth: 2
 
    sh/index
+   arm/index
+   arm64/index
+   ia64/index
+   m68k/index
+   riscv/index
+   s390/index
+   sh/index
+   sparc/index
    x86/index
+   xtensa/index
 
 Filesystem Documentation
 ------------------------
diff --git a/Documentation/m68k/index.rst b/Documentation/m68k/index.rst
index f3273ec075c3..3a5ba7fe1703 100644
--- a/Documentation/m68k/index.rst
+++ b/Documentation/m68k/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 =================
 m68k Architecture
diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst
index c4b906d9b5a7..e3ca0922a8c2 100644
--- a/Documentation/riscv/index.rst
+++ b/Documentation/riscv/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ===================
 RISC-V architecture
 ===================
diff --git a/Documentation/s390/index.rst b/Documentation/s390/index.rst
index 1a914da2a07b..4602312909d3 100644
--- a/Documentation/s390/index.rst
+++ b/Documentation/s390/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 =================
 s390 Architecture
 =================
diff --git a/Documentation/sparc/index.rst b/Documentation/sparc/index.rst
index 91f7d6643dd5..71cff621f243 100644
--- a/Documentation/sparc/index.rst
+++ b/Documentation/sparc/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ==================
 Sparc Architecture
 ==================
diff --git a/Documentation/xtensa/index.rst b/Documentation/xtensa/index.rst
index 5a24e365e35f..52fa04eb39a3 100644
--- a/Documentation/xtensa/index.rst
+++ b/Documentation/xtensa/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ===================
 Xtensa Architecture
-- 
cgit v1.2.3-55-g7522


From 6cf2a73cb2bc422a03984b285a63632c27f8c4e4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 12:40:23 -0300
Subject: docs: device-mapper: move it to the admin-guide

The DM support describes lots of aspects related to mapped
disk partitions from the userspace PoV.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 .../admin-guide/device-mapper/cache-policies.rst   | 131 +++++++
 Documentation/admin-guide/device-mapper/cache.rst  | 337 ++++++++++++++++
 Documentation/admin-guide/device-mapper/delay.rst  |  31 ++
 .../admin-guide/device-mapper/dm-crypt.rst         | 173 +++++++++
 .../admin-guide/device-mapper/dm-dust.txt          | 272 +++++++++++++
 .../admin-guide/device-mapper/dm-flakey.rst        |  74 ++++
 .../admin-guide/device-mapper/dm-init.rst          | 125 ++++++
 .../admin-guide/device-mapper/dm-integrity.rst     | 259 +++++++++++++
 Documentation/admin-guide/device-mapper/dm-io.rst  |  75 ++++
 Documentation/admin-guide/device-mapper/dm-log.rst |  57 +++
 .../admin-guide/device-mapper/dm-queue-length.rst  |  48 +++
 .../admin-guide/device-mapper/dm-raid.rst          | 419 ++++++++++++++++++++
 .../admin-guide/device-mapper/dm-service-time.rst  | 101 +++++
 .../admin-guide/device-mapper/dm-uevent.rst        | 110 ++++++
 .../admin-guide/device-mapper/dm-zoned.rst         | 146 +++++++
 Documentation/admin-guide/device-mapper/era.rst    | 116 ++++++
 Documentation/admin-guide/device-mapper/index.rst  |  42 ++
 Documentation/admin-guide/device-mapper/kcopyd.rst |  47 +++
 Documentation/admin-guide/device-mapper/linear.rst |  63 +++
 .../admin-guide/device-mapper/log-writes.rst       | 145 +++++++
 .../admin-guide/device-mapper/persistent-data.rst  |  88 +++++
 .../admin-guide/device-mapper/snapshot.rst         | 196 ++++++++++
 .../admin-guide/device-mapper/statistics.rst       | 225 +++++++++++
 .../admin-guide/device-mapper/striped.rst          |  61 +++
 Documentation/admin-guide/device-mapper/switch.rst | 141 +++++++
 .../device-mapper/thin-provisioning.rst            | 427 +++++++++++++++++++++
 .../admin-guide/device-mapper/unstriped.rst        | 135 +++++++
 Documentation/admin-guide/device-mapper/verity.rst | 229 +++++++++++
 .../admin-guide/device-mapper/writecache.rst       |  79 ++++
 Documentation/admin-guide/device-mapper/zero.rst   |  37 ++
 Documentation/admin-guide/index.rst                |   1 +
 Documentation/device-mapper/cache-policies.rst     | 131 -------
 Documentation/device-mapper/cache.rst              | 337 ----------------
 Documentation/device-mapper/delay.rst              |  31 --
 Documentation/device-mapper/dm-crypt.rst           | 173 ---------
 Documentation/device-mapper/dm-dust.txt            | 272 -------------
 Documentation/device-mapper/dm-flakey.rst          |  74 ----
 Documentation/device-mapper/dm-init.rst            | 125 ------
 Documentation/device-mapper/dm-integrity.rst       | 259 -------------
 Documentation/device-mapper/dm-io.rst              |  75 ----
 Documentation/device-mapper/dm-log.rst             |  57 ---
 Documentation/device-mapper/dm-queue-length.rst    |  48 ---
 Documentation/device-mapper/dm-raid.rst            | 419 --------------------
 Documentation/device-mapper/dm-service-time.rst    | 101 -----
 Documentation/device-mapper/dm-uevent.rst          | 110 ------
 Documentation/device-mapper/dm-zoned.rst           | 146 -------
 Documentation/device-mapper/era.rst                | 116 ------
 Documentation/device-mapper/index.rst              |  44 ---
 Documentation/device-mapper/kcopyd.rst             |  47 ---
 Documentation/device-mapper/linear.rst             |  63 ---
 Documentation/device-mapper/log-writes.rst         | 145 -------
 Documentation/device-mapper/persistent-data.rst    |  88 -----
 Documentation/device-mapper/snapshot.rst           | 196 ----------
 Documentation/device-mapper/statistics.rst         | 225 -----------
 Documentation/device-mapper/striped.rst            |  61 ---
 Documentation/device-mapper/switch.rst             | 141 -------
 Documentation/device-mapper/thin-provisioning.rst  | 427 ---------------------
 Documentation/device-mapper/unstriped.rst          | 135 -------
 Documentation/device-mapper/verity.rst             | 229 -----------
 Documentation/device-mapper/writecache.rst         |  79 ----
 Documentation/device-mapper/zero.rst               |  37 --
 MAINTAINERS                                        |   2 +-
 drivers/md/Kconfig                                 |   2 +-
 drivers/md/dm-init.c                               |   2 +-
 drivers/md/dm-raid.c                               |   2 +-
 65 files changed, 4394 insertions(+), 4395 deletions(-)
 create mode 100644 Documentation/admin-guide/device-mapper/cache-policies.rst
 create mode 100644 Documentation/admin-guide/device-mapper/cache.rst
 create mode 100644 Documentation/admin-guide/device-mapper/delay.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-crypt.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-dust.txt
 create mode 100644 Documentation/admin-guide/device-mapper/dm-flakey.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-init.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-integrity.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-io.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-log.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-queue-length.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-raid.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-service-time.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-uevent.rst
 create mode 100644 Documentation/admin-guide/device-mapper/dm-zoned.rst
 create mode 100644 Documentation/admin-guide/device-mapper/era.rst
 create mode 100644 Documentation/admin-guide/device-mapper/index.rst
 create mode 100644 Documentation/admin-guide/device-mapper/kcopyd.rst
 create mode 100644 Documentation/admin-guide/device-mapper/linear.rst
 create mode 100644 Documentation/admin-guide/device-mapper/log-writes.rst
 create mode 100644 Documentation/admin-guide/device-mapper/persistent-data.rst
 create mode 100644 Documentation/admin-guide/device-mapper/snapshot.rst
 create mode 100644 Documentation/admin-guide/device-mapper/statistics.rst
 create mode 100644 Documentation/admin-guide/device-mapper/striped.rst
 create mode 100644 Documentation/admin-guide/device-mapper/switch.rst
 create mode 100644 Documentation/admin-guide/device-mapper/thin-provisioning.rst
 create mode 100644 Documentation/admin-guide/device-mapper/unstriped.rst
 create mode 100644 Documentation/admin-guide/device-mapper/verity.rst
 create mode 100644 Documentation/admin-guide/device-mapper/writecache.rst
 create mode 100644 Documentation/admin-guide/device-mapper/zero.rst
 delete mode 100644 Documentation/device-mapper/cache-policies.rst
 delete mode 100644 Documentation/device-mapper/cache.rst
 delete mode 100644 Documentation/device-mapper/delay.rst
 delete mode 100644 Documentation/device-mapper/dm-crypt.rst
 delete mode 100644 Documentation/device-mapper/dm-dust.txt
 delete mode 100644 Documentation/device-mapper/dm-flakey.rst
 delete mode 100644 Documentation/device-mapper/dm-init.rst
 delete mode 100644 Documentation/device-mapper/dm-integrity.rst
 delete mode 100644 Documentation/device-mapper/dm-io.rst
 delete mode 100644 Documentation/device-mapper/dm-log.rst
 delete mode 100644 Documentation/device-mapper/dm-queue-length.rst
 delete mode 100644 Documentation/device-mapper/dm-raid.rst
 delete mode 100644 Documentation/device-mapper/dm-service-time.rst
 delete mode 100644 Documentation/device-mapper/dm-uevent.rst
 delete mode 100644 Documentation/device-mapper/dm-zoned.rst
 delete mode 100644 Documentation/device-mapper/era.rst
 delete mode 100644 Documentation/device-mapper/index.rst
 delete mode 100644 Documentation/device-mapper/kcopyd.rst
 delete mode 100644 Documentation/device-mapper/linear.rst
 delete mode 100644 Documentation/device-mapper/log-writes.rst
 delete mode 100644 Documentation/device-mapper/persistent-data.rst
 delete mode 100644 Documentation/device-mapper/snapshot.rst
 delete mode 100644 Documentation/device-mapper/statistics.rst
 delete mode 100644 Documentation/device-mapper/striped.rst
 delete mode 100644 Documentation/device-mapper/switch.rst
 delete mode 100644 Documentation/device-mapper/thin-provisioning.rst
 delete mode 100644 Documentation/device-mapper/unstriped.rst
 delete mode 100644 Documentation/device-mapper/verity.rst
 delete mode 100644 Documentation/device-mapper/writecache.rst
 delete mode 100644 Documentation/device-mapper/zero.rst

diff --git a/Documentation/admin-guide/device-mapper/cache-policies.rst b/Documentation/admin-guide/device-mapper/cache-policies.rst
new file mode 100644
index 000000000000..b17fe352fc41
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/cache-policies.rst
@@ -0,0 +1,131 @@
+=============================
+Guidance for writing policies
+=============================
+
+Try to keep transactionality out of it.  The core is careful to
+avoid asking about anything that is migrating.  This is a pain, but
+makes it easier to write the policies.
+
+Mappings are loaded into the policy at construction time.
+
+Every bio that is mapped by the target is referred to the policy.
+The policy can return a simple HIT or MISS or issue a migration.
+
+Currently there's no way for the policy to issue background work,
+e.g. to start writing back dirty blocks that are going to be evicted
+soon.
+
+Because we map bios, rather than requests it's easy for the policy
+to get fooled by many small bios.  For this reason the core target
+issues periodic ticks to the policy.  It's suggested that the policy
+doesn't update states (eg, hit counts) for a block more than once
+for each tick.  The core ticks by watching bios complete, and so
+trying to see when the io scheduler has let the ios run.
+
+
+Overview of supplied cache replacement policies
+===============================================
+
+multiqueue (mq)
+---------------
+
+This policy is now an alias for smq (see below).
+
+The following tunables are accepted, but have no effect::
+
+	'sequential_threshold <#nr_sequential_ios>'
+	'random_threshold <#nr_random_ios>'
+	'read_promote_adjustment <value>'
+	'write_promote_adjustment <value>'
+	'discard_promote_adjustment <value>'
+
+Stochastic multiqueue (smq)
+---------------------------
+
+This policy is the default.
+
+The stochastic multi-queue (smq) policy addresses some of the problems
+with the multiqueue (mq) policy.
+
+The smq policy (vs mq) offers the promise of less memory utilization,
+improved performance and increased adaptability in the face of changing
+workloads.  smq also does not have any cumbersome tuning knobs.
+
+Users may switch from "mq" to "smq" simply by appropriately reloading a
+DM table that is using the cache target.  Doing so will cause all of the
+mq policy's hints to be dropped.  Also, performance of the cache may
+degrade slightly until smq recalculates the origin device's hotspots
+that should be cached.
+
+Memory usage
+^^^^^^^^^^^^
+
+The mq policy used a lot of memory; 88 bytes per cache block on a 64
+bit machine.
+
+smq uses 28bit indexes to implement its data structures rather than
+pointers.  It avoids storing an explicit hit count for each block.  It
+has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
+the entries (each hotspot block covers a larger area than a single
+cache block).
+
+All this means smq uses ~25bytes per cache block.  Still a lot of
+memory, but a substantial improvement nontheless.
+
+Level balancing
+^^^^^^^^^^^^^^^
+
+mq placed entries in different levels of the multiqueue structures
+based on their hit count (~ln(hit count)).  This meant the bottom
+levels generally had the most entries, and the top ones had very
+few.  Having unbalanced levels like this reduced the efficacy of the
+multiqueue.
+
+smq does not maintain a hit count, instead it swaps hit entries with
+the least recently used entry from the level above.  The overall
+ordering being a side effect of this stochastic process.  With this
+scheme we can decide how many entries occupy each multiqueue level,
+resulting in better promotion/demotion decisions.
+
+Adaptability:
+The mq policy maintained a hit count for each cache block.  For a
+different block to get promoted to the cache its hit count has to
+exceed the lowest currently in the cache.  This meant it could take a
+long time for the cache to adapt between varying IO patterns.
+
+smq doesn't maintain hit counts, so a lot of this problem just goes
+away.  In addition it tracks performance of the hotspot queue, which
+is used to decide which blocks to promote.  If the hotspot queue is
+performing badly then it starts moving entries more quickly between
+levels.  This lets it adapt to new IO patterns very quickly.
+
+Performance
+^^^^^^^^^^^
+
+Testing smq shows substantially better performance than mq.
+
+cleaner
+-------
+
+The cleaner writes back all dirty blocks in a cache to decommission it.
+
+Examples
+========
+
+The syntax for a table is::
+
+	cache <metadata dev> <cache dev> <origin dev> <block size>
+	<#feature_args> [<feature arg>]*
+	<policy> <#policy_args> [<policy arg>]*
+
+The syntax to send a message using the dmsetup command is::
+
+	dmsetup message <mapped device> 0 sequential_threshold 1024
+	dmsetup message <mapped device> 0 random_threshold 8
+
+Using dmsetup::
+
+	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
+	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
+	creates a 128GB large mapped device named 'blah' with the
+	sequential threshold set to 1024 and the random_threshold set to 8.
diff --git a/Documentation/admin-guide/device-mapper/cache.rst b/Documentation/admin-guide/device-mapper/cache.rst
new file mode 100644
index 000000000000..f15e5254d05b
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/cache.rst
@@ -0,0 +1,337 @@
+=====
+Cache
+=====
+
+Introduction
+============
+
+dm-cache is a device mapper target written by Joe Thornber, Heinz
+Mauelshagen, and Mike Snitzer.
+
+It aims to improve performance of a block device (eg, a spindle) by
+dynamically migrating some of its data to a faster, smaller device
+(eg, an SSD).
+
+This device-mapper solution allows us to insert this caching at
+different levels of the dm stack, for instance above the data device for
+a thin-provisioning pool.  Caching solutions that are integrated more
+closely with the virtual memory system should give better performance.
+
+The target reuses the metadata library used in the thin-provisioning
+library.
+
+The decision as to what data to migrate and when is left to a plug-in
+policy module.  Several of these have been written as we experiment,
+and we hope other people will contribute others for specific io
+scenarios (eg. a vm image server).
+
+Glossary
+========
+
+  Migration
+	       Movement of the primary copy of a logical block from one
+	       device to the other.
+  Promotion
+	       Migration from slow device to fast device.
+  Demotion
+	       Migration from fast device to slow device.
+
+The origin device always contains a copy of the logical block, which
+may be out of date or kept in sync with the copy on the cache device
+(depending on policy).
+
+Design
+======
+
+Sub-devices
+-----------
+
+The target is constructed by passing three devices to it (along with
+other parameters detailed later):
+
+1. An origin device - the big, slow one.
+
+2. A cache device - the small, fast one.
+
+3. A small metadata device - records which blocks are in the cache,
+   which are dirty, and extra hints for use by the policy object.
+   This information could be put on the cache device, but having it
+   separate allows the volume manager to configure it differently,
+   e.g. as a mirror for extra robustness.  This metadata device may only
+   be used by a single cache device.
+
+Fixed block size
+----------------
+
+The origin is divided up into blocks of a fixed size.  This block size
+is configurable when you first create the cache.  Typically we've been
+using block sizes of 256KB - 1024KB.  The block size must be between 64
+sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
+
+Having a fixed block size simplifies the target a lot.  But it is
+something of a compromise.  For instance, a small part of a block may be
+getting hit a lot, yet the whole block will be promoted to the cache.
+So large block sizes are bad because they waste cache space.  And small
+block sizes are bad because they increase the amount of metadata (both
+in core and on disk).
+
+Cache operating modes
+---------------------
+
+The cache has three operating modes: writeback, writethrough and
+passthrough.
+
+If writeback, the default, is selected then a write to a block that is
+cached will go only to the cache and the block will be marked dirty in
+the metadata.
+
+If writethrough is selected then a write to a cached block will not
+complete until it has hit both the origin and cache devices.  Clean
+blocks should remain clean.
+
+If passthrough is selected, useful when the cache contents are not known
+to be coherent with the origin device, then all reads are served from
+the origin device (all reads miss the cache) and all writes are
+forwarded to the origin device; additionally, write hits cause cache
+block invalidates.  To enable passthrough mode the cache must be clean.
+Passthrough mode allows a cache device to be activated without having to
+worry about coherency.  Coherency that exists is maintained, although
+the cache will gradually cool as writes take place.  If the coherency of
+the cache can later be verified, or established through use of the
+"invalidate_cblocks" message, the cache device can be transitioned to
+writethrough or writeback mode while still warm.  Otherwise, the cache
+contents can be discarded prior to transitioning to the desired
+operating mode.
+
+A simple cleaner policy is provided, which will clean (write back) all
+dirty blocks in a cache.  Useful for decommissioning a cache or when
+shrinking a cache.  Shrinking the cache's fast device requires all cache
+blocks, in the area of the cache being removed, to be clean.  If the
+area being removed from the cache still contains dirty blocks the resize
+will fail.  Care must be taken to never reduce the volume used for the
+cache's fast device until the cache is clean.  This is of particular
+importance if writeback mode is used.  Writethrough and passthrough
+modes already maintain a clean cache.  Future support to partially clean
+the cache, above a specified threshold, will allow for keeping the cache
+warm and in writeback mode during resize.
+
+Migration throttling
+--------------------
+
+Migrating data between the origin and cache device uses bandwidth.
+The user can set a throttle to prevent more than a certain amount of
+migration occurring at any one time.  Currently we're not taking any
+account of normal io traffic going to the devices.  More work needs
+doing here to avoid migrating during those peak io moments.
+
+For the time being, a message "migration_threshold <#sectors>"
+can be used to set the maximum number of sectors being migrated,
+the default being 2048 sectors (1MB).
+
+Updating on-disk metadata
+-------------------------
+
+On-disk metadata is committed every time a FLUSH or FUA bio is written.
+If no such requests are made then commits will occur every second.  This
+means the cache behaves like a physical disk that has a volatile write
+cache.  If power is lost you may lose some recent writes.  The metadata
+should always be consistent in spite of any crash.
+
+The 'dirty' state for a cache block changes far too frequently for us
+to keep updating it on the fly.  So we treat it as a hint.  In normal
+operation it will be written when the dm device is suspended.  If the
+system crashes all cache blocks will be assumed dirty when restarted.
+
+Per-block policy hints
+----------------------
+
+Policy plug-ins can store a chunk of data per cache block.  It's up to
+the policy how big this chunk is, but it should be kept small.  Like the
+dirty flags this data is lost if there's a crash so a safe fallback
+value should always be possible.
+
+Policy hints affect performance, not correctness.
+
+Policy messaging
+----------------
+
+Policies will have different tunables, specific to each one, so we
+need a generic way of getting and setting these.  Device-mapper
+messages are used.  Refer to cache-policies.txt.
+
+Discard bitset resolution
+-------------------------
+
+We can avoid copying data during migration if we know the block has
+been discarded.  A prime example of this is when mkfs discards the
+whole block device.  We store a bitset tracking the discard state of
+blocks.  However, we allow this bitset to have a different block size
+from the cache blocks.  This is because we need to track the discard
+state for all of the origin device (compare with the dirty bitset
+which is just for the smaller cache device).
+
+Target interface
+================
+
+Constructor
+-----------
+
+  ::
+
+   cache <metadata dev> <cache dev> <origin dev> <block size>
+         <#feature args> [<feature arg>]*
+         <policy> <#policy args> [policy args]*
+
+ ================ =======================================================
+ metadata dev     fast device holding the persistent metadata
+ cache dev	  fast device holding cached data blocks
+ origin dev	  slow device holding original data blocks
+ block size       cache unit size in sectors
+
+ #feature args    number of feature arguments passed
+ feature args     writethrough or passthrough (The default is writeback.)
+
+ policy           the replacement policy to use
+ #policy args     an even number of arguments corresponding to
+                  key/value pairs passed to the policy
+ policy args      key/value pairs passed to the policy
+		  E.g. 'sequential_threshold 1024'
+		  See cache-policies.txt for details.
+ ================ =======================================================
+
+Optional feature arguments are:
+
+
+   ==================== ========================================================
+   writethrough		write through caching that prohibits cache block
+			content from being different from origin block content.
+			Without this argument, the default behaviour is to write
+			back cache block contents later for performance reasons,
+			so they may differ from the corresponding origin blocks.
+
+   passthrough		a degraded mode useful for various cache coherency
+			situations (e.g., rolling back snapshots of
+			underlying storage).	 Reads and writes always go to
+			the origin.	If a write goes to a cached origin
+			block, then the cache block is invalidated.
+			To enable passthrough mode the cache must be clean.
+
+   metadata2		use version 2 of the metadata.  This stores the dirty
+			bits in a separate btree, which improves speed of
+			shutting down the cache.
+
+   no_discard_passdown	disable passing down discards from the cache
+			to the origin's data device.
+   ==================== ========================================================
+
+A policy called 'default' is always registered.  This is an alias for
+the policy we currently think is giving best all round performance.
+
+As the default policy could vary between kernels, if you are relying on
+the characteristics of a specific policy, always request it by name.
+
+Status
+------
+
+::
+
+  <metadata block size> <#used metadata blocks>/<#total metadata blocks>
+  <cache block size> <#used cache blocks>/<#total cache blocks>
+  <#read hits> <#read misses> <#write hits> <#write misses>
+  <#demotions> <#promotions> <#dirty> <#features> <features>*
+  <#core args> <core args>* <policy name> <#policy args> <policy args>*
+  <cache metadata mode>
+
+
+========================= =====================================================
+metadata block size	  Fixed block size for each metadata block in
+			  sectors
+#used metadata blocks	  Number of metadata blocks used
+#total metadata blocks	  Total number of metadata blocks
+cache block size	  Configurable block size for the cache device
+			  in sectors
+#used cache blocks	  Number of blocks resident in the cache
+#total cache blocks	  Total number of cache blocks
+#read hits		  Number of times a READ bio has been mapped
+			  to the cache
+#read misses		  Number of times a READ bio has been mapped
+			  to the origin
+#write hits		  Number of times a WRITE bio has been mapped
+			  to the cache
+#write misses		  Number of times a WRITE bio has been
+			  mapped to the origin
+#demotions		  Number of times a block has been removed
+			  from the cache
+#promotions		  Number of times a block has been moved to
+			  the cache
+#dirty			  Number of blocks in the cache that differ
+			  from the origin
+#feature args		  Number of feature args to follow
+feature args		  'writethrough' (optional)
+#core args		  Number of core arguments (must be even)
+core args		  Key/value pairs for tuning the core
+			  e.g. migration_threshold
+policy name		  Name of the policy
+#policy args		  Number of policy arguments to follow (must be even)
+policy args		  Key/value pairs e.g. sequential_threshold
+cache metadata mode       ro if read-only, rw if read-write
+
+			  In serious cases where even a read-only mode is
+			  deemed unsafe no further I/O will be permitted and
+			  the status will just contain the string 'Fail'.
+			  The userspace recovery tools should then be used.
+needs_check		  'needs_check' if set, '-' if not set
+			  A metadata operation has failed, resulting in the
+			  needs_check flag being set in the metadata's
+			  superblock.  The metadata device must be
+			  deactivated and checked/repaired before the
+			  cache can be made fully operational again.
+			  '-' indicates	needs_check is not set.
+========================= =====================================================
+
+Messages
+--------
+
+Policies will have different tunables, specific to each one, so we
+need a generic way of getting and setting these.  Device-mapper
+messages are used.  (A sysfs interface would also be possible.)
+
+The message format is::
+
+   <key> <value>
+
+E.g.::
+
+   dmsetup message my_cache 0 sequential_threshold 1024
+
+
+Invalidation is removing an entry from the cache without writing it
+back.  Cache blocks can be invalidated via the invalidate_cblocks
+message, which takes an arbitrary number of cblock ranges.  Each cblock
+range's end value is "one past the end", meaning 5-10 expresses a range
+of values from 5 to 9.  Each cblock must be expressed as a decimal
+value, in the future a variant message that takes cblock ranges
+expressed in hexadecimal may be needed to better support efficient
+invalidation of larger caches.  The cache must be in passthrough mode
+when invalidate_cblocks is used::
+
+   invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
+
+E.g.::
+
+   dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
+
+Examples
+========
+
+The test suite can be found here:
+
+https://github.com/jthornber/device-mapper-test-suite
+
+::
+
+  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
+	  /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
+  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
+	  /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
+	  mq 4 sequential_threshold 1024 random_threshold 8'
diff --git a/Documentation/admin-guide/device-mapper/delay.rst b/Documentation/admin-guide/device-mapper/delay.rst
new file mode 100644
index 000000000000..917ba8c33359
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/delay.rst
@@ -0,0 +1,31 @@
+========
+dm-delay
+========
+
+Device-Mapper's "delay" target delays reads and/or writes
+and maps them to different devices.
+
+Parameters::
+
+    <device> <offset> <delay> [<write_device> <write_offset> <write_delay>
+			       [<flush_device> <flush_offset> <flush_delay>]]
+
+With separate write parameters, the first set is only used for reads.
+Offsets are specified in sectors.
+Delays are specified in milliseconds.
+
+Example scripts
+===============
+
+::
+
+	#!/bin/sh
+	# Create device delaying rw operation for 500ms
+	echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
+
+::
+
+	#!/bin/sh
+	# Create device delaying only write operation for 500ms and
+	# splitting reads and writes to different devices $1 $2
+	echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst
new file mode 100644
index 000000000000..8f4a3f889d43
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst
@@ -0,0 +1,173 @@
+========
+dm-crypt
+========
+
+Device-Mapper's "crypt" target provides transparent encryption of block devices
+using the kernel crypto API.
+
+For a more detailed description of supported parameters see:
+https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
+
+Parameters::
+
+	      <cipher> <key> <iv_offset> <device path> \
+	      <offset> [<#opt_params> <opt_params>]
+
+<cipher>
+    Encryption cipher, encryption mode and Initial Vector (IV) generator.
+
+    The cipher specifications format is::
+
+       cipher[:keycount]-chainmode-ivmode[:ivopts]
+
+    Examples::
+
+       aes-cbc-essiv:sha256
+       aes-xts-plain64
+       serpent-xts-plain64
+
+    Cipher format also supports direct specification with kernel crypt API
+    format (selected by capi: prefix). The IV specification is the same
+    as for the first format type.
+    This format is mainly used for specification of authenticated modes.
+
+    The crypto API cipher specifications format is::
+
+        capi:cipher_api_spec-ivmode[:ivopts]
+
+    Examples::
+
+        capi:cbc(aes)-essiv:sha256
+        capi:xts(aes)-plain64
+
+    Examples of authenticated modes::
+
+        capi:gcm(aes)-random
+        capi:authenc(hmac(sha256),xts(aes))-random
+        capi:rfc7539(chacha20,poly1305)-random
+
+    The /proc/crypto contains a list of curently loaded crypto modes.
+
+<key>
+    Key used for encryption. It is encoded either as a hexadecimal number
+    or it can be passed as <key_string> prefixed with single colon
+    character (':') for keys residing in kernel keyring service.
+    You can only use key sizes that are valid for the selected cipher
+    in combination with the selected iv mode.
+    Note that for some iv modes the key string can contain additional
+    keys (for example IV seed) so the key contains more parts concatenated
+    into a single string.
+
+<key_string>
+    The kernel keyring key is identified by string in following format:
+    <key_size>:<key_type>:<key_description>.
+
+<key_size>
+    The encryption key size in bytes. The kernel key payload size must match
+    the value passed in <key_size>.
+
+<key_type>
+    Either 'logon' or 'user' kernel key type.
+
+<key_description>
+    The kernel keyring key description crypt target should look for
+    when loading key of <key_type>.
+
+<keycount>
+    Multi-key compatibility mode. You can define <keycount> keys and
+    then sectors are encrypted according to their offsets (sector 0 uses key0;
+    sector 1 uses key1 etc.).  <keycount> must be a power of two.
+
+<iv_offset>
+    The IV offset is a sector count that is added to the sector number
+    before creating the IV.
+
+<device path>
+    This is the device that is going to be used as backend and contains the
+    encrypted data.  You can specify it as a path like /dev/xxx or a device
+    number <major>:<minor>.
+
+<offset>
+    Starting sector within the device where the encrypted data begins.
+
+<#opt_params>
+    Number of optional parameters. If there are no optional parameters,
+    the optional paramaters section can be skipped or #opt_params can be zero.
+    Otherwise #opt_params is the number of following arguments.
+
+    Example of optional parameters section:
+        3 allow_discards same_cpu_crypt submit_from_crypt_cpus
+
+allow_discards
+    Block discard requests (a.k.a. TRIM) are passed through the crypt device.
+    The default is to ignore discard requests.
+
+    WARNING: Assess the specific security risks carefully before enabling this
+    option.  For example, allowing discards on encrypted devices may lead to
+    the leak of information about the ciphertext device (filesystem type,
+    used space etc.) if the discarded blocks can be located easily on the
+    device later.
+
+same_cpu_crypt
+    Perform encryption using the same cpu that IO was submitted on.
+    The default is to use an unbound workqueue so that encryption work
+    is automatically balanced between available CPUs.
+
+submit_from_crypt_cpus
+    Disable offloading writes to a separate thread after encryption.
+    There are some situations where offloading write bios from the
+    encryption threads to a single thread degrades performance
+    significantly.  The default is to offload write bios to the same
+    thread because it benefits CFQ to have writes submitted using the
+    same context.
+
+integrity:<bytes>:<type>
+    The device requires additional <bytes> metadata per-sector stored
+    in per-bio integrity structure. This metadata must by provided
+    by underlying dm-integrity target.
+
+    The <type> can be "none" if metadata is used only for persistent IV.
+
+    For Authenticated Encryption with Additional Data (AEAD)
+    the <type> is "aead". An AEAD mode additionally calculates and verifies
+    integrity for the encrypted device. The additional space is then
+    used for storing authentication tag (and persistent IV if needed).
+
+sector_size:<bytes>
+    Use <bytes> as the encryption unit instead of 512 bytes sectors.
+    This option can be in range 512 - 4096 bytes and must be power of two.
+    Virtual device will announce this size as a minimal IO and logical sector.
+
+iv_large_sectors
+   IV generators will use sector number counted in <sector_size> units
+   instead of default 512 bytes sectors.
+
+   For example, if <sector_size> is 4096 bytes, plain64 IV for the second
+   sector will be 8 (without flag) and 1 if iv_large_sectors is present.
+   The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
+   if this flag is specified.
+
+Example scripts
+===============
+LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
+encryption with dm-crypt using the 'cryptsetup' utility, see
+https://gitlab.com/cryptsetup/cryptsetup
+
+::
+
+	#!/bin/sh
+	# Create a crypt device using dmsetup
+	dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
+
+::
+
+	#!/bin/sh
+	# Create a crypt device using dmsetup when encryption key is stored in keyring service
+	dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
+
+::
+
+	#!/bin/sh
+	# Create a crypt device using cryptsetup and LUKS header with default cipher
+	cryptsetup luksFormat $1
+	cryptsetup luksOpen $1 crypt1
diff --git a/Documentation/admin-guide/device-mapper/dm-dust.txt b/Documentation/admin-guide/device-mapper/dm-dust.txt
new file mode 100644
index 000000000000..954d402a1f6a
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-dust.txt
@@ -0,0 +1,272 @@
+dm-dust
+=======
+
+This target emulates the behavior of bad sectors at arbitrary
+locations, and the ability to enable the emulation of the failures
+at an arbitrary time.
+
+This target behaves similarly to a linear target.  At a given time,
+the user can send a message to the target to start failing read
+requests on specific blocks (to emulate the behavior of a hard disk
+drive with bad sectors).
+
+When the failure behavior is enabled (i.e.: when the output of
+"dmsetup status" displays "fail_read_on_bad_block"), reads of blocks
+in the "bad block list" will fail with EIO ("Input/output error").
+
+Writes of blocks in the "bad block list will result in the following:
+
+1. Remove the block from the "bad block list".
+2. Successfully complete the write.
+
+This emulates the "remapped sector" behavior of a drive with bad
+sectors.
+
+Normally, a drive that is encountering bad sectors will most likely
+encounter more bad sectors, at an unknown time or location.
+With dm-dust, the user can use the "addbadblock" and "removebadblock"
+messages to add arbitrary bad blocks at new locations, and the
+"enable" and "disable" messages to modulate the state of whether the
+configured "bad blocks" will be treated as bad, or bypassed.
+This allows the pre-writing of test data and metadata prior to
+simulating a "failure" event where bad sectors start to appear.
+
+Table parameters:
+-----------------
+<device_path> <offset> <blksz>
+
+Mandatory parameters:
+    <device_path>: path to the block device.
+    <offset>: offset to data area from start of device_path
+    <blksz>: block size in bytes
+	     (minimum 512, maximum 1073741824, must be a power of 2)
+
+Usage instructions:
+-------------------
+
+First, find the size (in 512-byte sectors) of the device to be used:
+
+$ sudo blockdev --getsz /dev/vdb1
+33552384
+
+Create the dm-dust device:
+(For a device with a block size of 512 bytes)
+$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 512'
+
+(For a device with a block size of 4096 bytes)
+$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 4096'
+
+Check the status of the read behavior ("bypass" indicates that all I/O
+will be passed through to the underlying device):
+$ sudo dmsetup status dust1
+0 33552384 dust 252:17 bypass
+
+$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=128 iflag=direct
+128+0 records in
+128+0 records out
+
+$ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct
+128+0 records in
+128+0 records out
+
+Adding and removing bad blocks:
+-------------------------------
+
+At any time (i.e.: whether the device has the "bad block" emulation
+enabled or disabled), bad blocks may be added or removed from the
+device via the "addbadblock" and "removebadblock" messages:
+
+$ sudo dmsetup message dust1 0 addbadblock 60
+kernel: device-mapper: dust: badblock added at block 60
+
+$ sudo dmsetup message dust1 0 addbadblock 67
+kernel: device-mapper: dust: badblock added at block 67
+
+$ sudo dmsetup message dust1 0 addbadblock 72
+kernel: device-mapper: dust: badblock added at block 72
+
+These bad blocks will be stored in the "bad block list".
+While the device is in "bypass" mode, reads and writes will succeed:
+
+$ sudo dmsetup status dust1
+0 33552384 dust 252:17 bypass
+
+Enabling block read failures:
+-----------------------------
+
+To enable the "fail read on bad block" behavior, send the "enable" message:
+
+$ sudo dmsetup message dust1 0 enable
+kernel: device-mapper: dust: enabling read failures on bad sectors
+
+$ sudo dmsetup status dust1
+0 33552384 dust 252:17 fail_read_on_bad_block
+
+With the device in "fail read on bad block" mode, attempting to read a
+block will encounter an "Input/output error":
+
+$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=1 skip=67 iflag=direct
+dd: error reading '/dev/mapper/dust1': Input/output error
+0+0 records in
+0+0 records out
+0 bytes copied, 0.00040651 s, 0.0 kB/s
+
+...and writing to the bad blocks will remove the blocks from the list,
+therefore emulating the "remap" behavior of hard disk drives:
+
+$ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct
+128+0 records in
+128+0 records out
+
+kernel: device-mapper: dust: block 60 removed from badblocklist by write
+kernel: device-mapper: dust: block 67 removed from badblocklist by write
+kernel: device-mapper: dust: block 72 removed from badblocklist by write
+kernel: device-mapper: dust: block 87 removed from badblocklist by write
+
+Bad block add/remove error handling:
+------------------------------------
+
+Attempting to add a bad block that already exists in the list will
+result in an "Invalid argument" error, as well as a helpful message:
+
+$ sudo dmsetup message dust1 0 addbadblock 88
+device-mapper: message ioctl on dust1  failed: Invalid argument
+kernel: device-mapper: dust: block 88 already in badblocklist
+
+Attempting to remove a bad block that doesn't exist in the list will
+result in an "Invalid argument" error, as well as a helpful message:
+
+$ sudo dmsetup message dust1 0 removebadblock 87
+device-mapper: message ioctl on dust1  failed: Invalid argument
+kernel: device-mapper: dust: block 87 not found in badblocklist
+
+Counting the number of bad blocks in the bad block list:
+--------------------------------------------------------
+
+To count the number of bad blocks configured in the device, run the
+following message command:
+
+$ sudo dmsetup message dust1 0 countbadblocks
+
+A message will print with the number of bad blocks currently
+configured on the device:
+
+kernel: device-mapper: dust: countbadblocks: 895 badblock(s) found
+
+Querying for specific bad blocks:
+---------------------------------
+
+To find out if a specific block is in the bad block list, run the
+following message command:
+
+$ sudo dmsetup message dust1 0 queryblock 72
+
+The following message will print if the block is in the list:
+device-mapper: dust: queryblock: block 72 found in badblocklist
+
+The following message will print if the block is in the list:
+device-mapper: dust: queryblock: block 72 not found in badblocklist
+
+The "queryblock" message command will work in both the "enabled"
+and "disabled" modes, allowing the verification of whether a block
+will be treated as "bad" without having to issue I/O to the device,
+or having to "enable" the bad block emulation.
+
+Clearing the bad block list:
+----------------------------
+
+To clear the bad block list (without needing to individually run
+a "removebadblock" message command for every block), run the
+following message command:
+
+$ sudo dmsetup message dust1 0 clearbadblocks
+
+After clearing the bad block list, the following message will appear:
+
+kernel: device-mapper: dust: clearbadblocks: badblocks cleared
+
+If there were no bad blocks to clear, the following message will
+appear:
+
+kernel: device-mapper: dust: clearbadblocks: no badblocks found
+
+Message commands list:
+----------------------
+
+Below is a list of the messages that can be sent to a dust device:
+
+Operations on blocks (requires a <blknum> argument):
+
+addbadblock <blknum>
+queryblock <blknum>
+removebadblock <blknum>
+
+...where <blknum> is a block number within range of the device
+  (corresponding to the block size of the device.)
+
+Single argument message commands:
+
+countbadblocks
+clearbadblocks
+disable
+enable
+quiet
+
+Device removal:
+---------------
+
+When finished, remove the device via the "dmsetup remove" command:
+
+$ sudo dmsetup remove dust1
+
+Quiet mode:
+-----------
+
+On test runs with many bad blocks, it may be desirable to avoid
+excessive logging (from bad blocks added, removed, or "remapped").
+This can be done by enabling "quiet mode" via the following message:
+
+$ sudo dmsetup message dust1 0 quiet
+
+This will suppress log messages from add / remove / removed by write
+operations.  Log messages from "countbadblocks" or "queryblock"
+message commands will still print in quiet mode.
+
+The status of quiet mode can be seen by running "dmsetup status":
+
+$ sudo dmsetup status dust1
+0 33552384 dust 252:17 fail_read_on_bad_block quiet
+
+To disable quiet mode, send the "quiet" message again:
+
+$ sudo dmsetup message dust1 0 quiet
+
+$ sudo dmsetup status dust1
+0 33552384 dust 252:17 fail_read_on_bad_block verbose
+
+(The presence of "verbose" indicates normal logging.)
+
+"Why not...?"
+-------------
+
+scsi_debug has a "medium error" mode that can fail reads on one
+specified sector (sector 0x1234, hardcoded in the source code), but
+it uses RAM for the persistent storage, which drastically decreases
+the potential device size.
+
+dm-flakey fails all I/O from all block locations at a specified time
+frequency, and not a given point in time.
+
+When a bad sector occurs on a hard disk drive, reads to that sector
+are failed by the device, usually resulting in an error code of EIO
+("I/O error") or ENODATA ("No data available").  However, a write to
+the sector may succeed, and result in the sector becoming readable
+after the device controller no longer experiences errors reading the
+sector (or after a reallocation of the sector).  However, there may
+be bad sectors that occur on the device in the future, in a different,
+unpredictable location.
+
+This target seeks to provide a device that can exhibit the behavior
+of a bad sector at a known sector location, at a known time, based
+on a large storage device (at least tens of gigabytes, not occupying
+system memory).
diff --git a/Documentation/admin-guide/device-mapper/dm-flakey.rst b/Documentation/admin-guide/device-mapper/dm-flakey.rst
new file mode 100644
index 000000000000..86138735879d
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-flakey.rst
@@ -0,0 +1,74 @@
+=========
+dm-flakey
+=========
+
+This target is the same as the linear target except that it exhibits
+unreliable behaviour periodically.  It's been found useful in simulating
+failing devices for testing purposes.
+
+Starting from the time the table is loaded, the device is available for
+<up interval> seconds, then exhibits unreliable behaviour for <down
+interval> seconds, and then this cycle repeats.
+
+Also, consider using this in combination with the dm-delay target too,
+which can delay reads and writes and/or send them to different
+underlying devices.
+
+Table parameters
+----------------
+
+::
+
+  <dev path> <offset> <up interval> <down interval> \
+    [<num_features> [<feature arguments>]]
+
+Mandatory parameters:
+
+    <dev path>:
+        Full pathname to the underlying block-device, or a
+        "major:minor" device-number.
+    <offset>:
+        Starting sector within the device.
+    <up interval>:
+        Number of seconds device is available.
+    <down interval>:
+        Number of seconds device returns errors.
+
+Optional feature parameters:
+
+  If no feature parameters are present, during the periods of
+  unreliability, all I/O returns errors.
+
+  drop_writes:
+	All write I/O is silently ignored.
+	Read I/O is handled correctly.
+
+  error_writes:
+	All write I/O is failed with an error signalled.
+	Read I/O is handled correctly.
+
+  corrupt_bio_byte <Nth_byte> <direction> <value> <flags>:
+	During <down interval>, replace <Nth_byte> of the data of
+	each matching bio with <value>.
+
+    <Nth_byte>:
+	The offset of the byte to replace.
+	Counting starts at 1, to replace the first byte.
+    <direction>:
+	Either 'r' to corrupt reads or 'w' to corrupt writes.
+	'w' is incompatible with drop_writes.
+    <value>:
+	The value (from 0-255) to write.
+    <flags>:
+	Perform the replacement only if bio->bi_opf has all the
+	selected flags set.
+
+Examples:
+
+Replaces the 32nd byte of READ bios with the value 1::
+
+  corrupt_bio_byte 32 r 1 0
+
+Replaces the 224th byte of REQ_META (=32) bios with the value 0::
+
+  corrupt_bio_byte 224 w 0 32
diff --git a/Documentation/admin-guide/device-mapper/dm-init.rst b/Documentation/admin-guide/device-mapper/dm-init.rst
new file mode 100644
index 000000000000..e5242ff17e9b
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-init.rst
@@ -0,0 +1,125 @@
+================================
+Early creation of mapped devices
+================================
+
+It is possible to configure a device-mapper device to act as the root device for
+your system in two ways.
+
+The first is to build an initial ramdisk which boots to a minimal userspace
+which configures the device, then pivot_root(8) in to it.
+
+The second is to create one or more device-mappers using the module parameter
+"dm-mod.create=" through the kernel boot command line argument.
+
+The format is specified as a string of data separated by commas and optionally
+semi-colons, where:
+
+ - a comma is used to separate fields like name, uuid, flags and table
+   (specifies one device)
+ - a semi-colon is used to separate devices.
+
+So the format will look like this::
+
+ dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
+
+Where::
+
+	<name>		::= The device name.
+	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
+	<minor>		::= The device minor number | ""
+	<flags>		::= "ro" | "rw"
+	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
+	<target_type>	::= "verity" | "linear" | ... (see list below)
+
+The dm line should be equivalent to the one used by the dmsetup tool with the
+`--concise` argument.
+
+Target types
+============
+
+Not all target types are available as there are serious risks in allowing
+activation of certain DM targets without first using userspace tools to check
+the validity of associated metadata.
+
+======================= =======================================================
+`cache`			constrained, userspace should verify cache device
+`crypt`			allowed
+`delay`			allowed
+`era`			constrained, userspace should verify metadata device
+`flakey`		constrained, meant for test
+`linear`		allowed
+`log-writes`		constrained, userspace should verify metadata device
+`mirror`		constrained, userspace should verify main/mirror device
+`raid`			constrained, userspace should verify metadata device
+`snapshot`		constrained, userspace should verify src/dst device
+`snapshot-origin`	allowed
+`snapshot-merge`	constrained, userspace should verify src/dst device
+`striped`		allowed
+`switch`		constrained, userspace should verify dev path
+`thin`			constrained, requires dm target message from userspace
+`thin-pool`		constrained, requires dm target message from userspace
+`verity`		allowed
+`writecache`		constrained, userspace should verify cache device
+`zero`			constrained, not meant for rootfs
+======================= =======================================================
+
+If the target is not listed above, it is constrained by default (not tested).
+
+Examples
+========
+An example of booting to a linear array made up of user-mode linux block
+devices::
+
+  dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
+
+This will boot to a rw dm-linear target of 8192 sectors split across two block
+devices identified by their major:minor numbers.  After boot, udev will rename
+this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
+
+An example of multiple device-mappers, with the dm-mod.create="..." contents
+is shown here split on multiple lines for readability::
+
+  dm-linear,,1,rw,
+    0 32768 linear 8:1 0,
+    32768 1024000 linear 8:2 0;
+  dm-verity,,3,ro,
+    0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256
+    ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42
+    5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51
+
+Other examples (per target):
+
+"crypt"::
+
+  dm-crypt,,8,ro,
+    0 1048576 crypt aes-xts-plain64
+    babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
+    /dev/sda 0 1 allow_discards
+
+"delay"::
+
+  dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
+
+"linear"::
+
+  dm-linear,,,rw,
+    0 32768 linear /dev/sda1 0,
+    32768 1024000 linear /dev/sda2 0,
+    1056768 204800 linear /dev/sda3 0,
+    1261568 512000 linear /dev/sda4 0
+
+"snapshot-origin"::
+
+  dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
+
+"striped"::
+
+  dm-striped,,4,ro,0 1638400 striped 4 4096
+  /dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
+
+"verity"::
+
+  dm-verity,,4,ro,
+    0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
+    fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
+    51934789604d1b92399c52e7cb149d1b3a1b74bbbcb103b2a0aaacbed5c08584
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
new file mode 100644
index 000000000000..a30aa91b5fbe
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -0,0 +1,259 @@
+============
+dm-integrity
+============
+
+The dm-integrity target emulates a block device that has additional
+per-sector tags that can be used for storing integrity information.
+
+A general problem with storing integrity tags with every sector is that
+writing the sector and the integrity tag must be atomic - i.e. in case of
+crash, either both sector and integrity tag or none of them is written.
+
+To guarantee write atomicity, the dm-integrity target uses journal, it
+writes sector data and integrity tags into a journal, commits the journal
+and then copies the data and integrity tags to their respective location.
+
+The dm-integrity target can be used with the dm-crypt target - in this
+situation the dm-crypt target creates the integrity data and passes them
+to the dm-integrity target via bio_integrity_payload attached to the bio.
+In this mode, the dm-crypt and dm-integrity targets provide authenticated
+disk encryption - if the attacker modifies the encrypted device, an I/O
+error is returned instead of random data.
+
+The dm-integrity target can also be used as a standalone target, in this
+mode it calculates and verifies the integrity tag internally. In this
+mode, the dm-integrity target can be used to detect silent data
+corruption on the disk or in the I/O path.
+
+There's an alternate mode of operation where dm-integrity uses bitmap
+instead of a journal. If a bit in the bitmap is 1, the corresponding
+region's data and integrity tags are not synchronized - if the machine
+crashes, the unsynchronized regions will be recalculated. The bitmap mode
+is faster than the journal mode, because we don't have to write the data
+twice, but it is also less reliable, because if data corruption happens
+when the machine crashes, it may not be detected.
+
+When loading the target for the first time, the kernel driver will format
+the device. But it will only format the device if the superblock contains
+zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
+target can't be loaded.
+
+To use the target for the first time:
+
+1. overwrite the superblock with zeroes
+2. load the dm-integrity target with one-sector size, the kernel driver
+   will format the device
+3. unload the dm-integrity target
+4. read the "provided_data_sectors" value from the superblock
+5. load the dm-integrity target with the the target size
+   "provided_data_sectors"
+6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
+   with the size "provided_data_sectors"
+
+
+Target arguments:
+
+1. the underlying block device
+
+2. the number of reserved sector at the beginning of the device - the
+   dm-integrity won't read of write these sectors
+
+3. the size of the integrity tag (if "-" is used, the size is taken from
+   the internal-hash algorithm)
+
+4. mode:
+
+	D - direct writes (without journal)
+		in this mode, journaling is
+		not used and data sectors and integrity tags are written
+		separately. In case of crash, it is possible that the data
+		and integrity tag doesn't match.
+	J - journaled writes
+		data and integrity tags are written to the
+		journal and atomicity is guaranteed. In case of crash,
+		either both data and tag or none of them are written. The
+		journaled mode degrades write throughput twice because the
+		data have to be written twice.
+	B - bitmap mode - data and metadata are written without any
+		synchronization, the driver maintains a bitmap of dirty
+		regions where data and metadata don't match. This mode can
+		only be used with internal hash.
+	R - recovery mode - in this mode, journal is not replayed,
+		checksums are not checked and writes to the device are not
+		allowed. This mode is useful for data recovery if the
+		device cannot be activated in any of the other standard
+		modes.
+
+5. the number of additional arguments
+
+Additional arguments:
+
+journal_sectors:number
+	The size of journal, this argument is used only if formatting the
+	device. If the device is already formatted, the value from the
+	superblock is used.
+
+interleave_sectors:number
+	The number of interleaved sectors. This values is rounded down to
+	a power of two. If the device is already formatted, the value from
+	the superblock is used.
+
+meta_device:device
+	Don't interleave the data and metadata on on device. Use a
+	separate device for metadata.
+
+buffer_sectors:number
+	The number of sectors in one buffer. The value is rounded down to
+	a power of two.
+
+	The tag area is accessed using buffers, the buffer size is
+	configurable. The large buffer size means that the I/O size will
+	be larger, but there could be less I/Os issued.
+
+journal_watermark:number
+	The journal watermark in percents. When the size of the journal
+	exceeds this watermark, the thread that flushes the journal will
+	be started.
+
+commit_time:number
+	Commit time in milliseconds. When this time passes, the journal is
+	written. The journal is also written immediatelly if the FLUSH
+	request is received.
+
+internal_hash:algorithm(:key)	(the key is optional)
+	Use internal hash or crc.
+	When this argument is used, the dm-integrity target won't accept
+	integrity tags from the upper target, but it will automatically
+	generate and verify the integrity tags.
+
+	You can use a crc algorithm (such as crc32), then integrity target
+	will protect the data against accidental corruption.
+	You can also use a hmac algorithm (for example
+	"hmac(sha256):0123456789abcdef"), in this mode it will provide
+	cryptographic authentication of the data without encryption.
+
+	When this argument is not used, the integrity tags are accepted
+	from an upper layer target, such as dm-crypt. The upper layer
+	target should check the validity of the integrity tags.
+
+recalculate
+	Recalculate the integrity tags automatically. It is only valid
+	when using internal hash.
+
+journal_crypt:algorithm(:key)	(the key is optional)
+	Encrypt the journal using given algorithm to make sure that the
+	attacker can't read the journal. You can use a block cipher here
+	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
+	"salsa20", "ctr(aes)" or "ecb(arc4)").
+
+	The journal contains history of last writes to the block device,
+	an attacker reading the journal could see the last sector nubmers
+	that were written. From the sector numbers, the attacker can infer
+	the size of files that were written. To protect against this
+	situation, you can encrypt the journal.
+
+journal_mac:algorithm(:key)	(the key is optional)
+	Protect sector numbers in the journal from accidental or malicious
+	modification. To protect against accidental modification, use a
+	crc algorithm, to protect against malicious modification, use a
+	hmac algorithm with a key.
+
+	This option is not needed when using internal-hash because in this
+	mode, the integrity of journal entries is checked when replaying
+	the journal. Thus, modified sector number would be detected at
+	this stage.
+
+block_size:number
+	The size of a data block in bytes.  The larger the block size the
+	less overhead there is for per-block integrity metadata.
+	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
+	specified the default block size is 512 bytes.
+
+sectors_per_bit:number
+	In the bitmap mode, this parameter specifies the number of
+	512-byte sectors that corresponds to one bitmap bit.
+
+bitmap_flush_interval:number
+	The bitmap flush interval in milliseconds. The metadata buffers
+	are synchronized when this interval expires.
+
+
+The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
+be changed when reloading the target (load an inactive table and swap the
+tables with suspend and resume). The other arguments should not be changed
+when reloading the target because the layout of disk data depend on them
+and the reloaded target would be non-functional.
+
+
+The layout of the formatted block device:
+
+* reserved sectors
+    (they are not used by this target, they can be used for
+    storing LUKS metadata or for other purpose), the size of the reserved
+    area is specified in the target arguments
+
+* superblock (4kiB)
+	* magic string - identifies that the device was formatted
+	* version
+	* log2(interleave sectors)
+	* integrity tag size
+	* the number of journal sections
+	* provided data sectors - the number of sectors that this target
+	  provides (i.e. the size of the device minus the size of all
+	  metadata and padding). The user of this target should not send
+	  bios that access data beyond the "provided data sectors" limit.
+	* flags
+	    SB_FLAG_HAVE_JOURNAL_MAC
+		- a flag is set if journal_mac is used
+	    SB_FLAG_RECALCULATING
+		- recalculating is in progress
+	    SB_FLAG_DIRTY_BITMAP
+		- journal area contains the bitmap of dirty
+		  blocks
+	* log2(sectors per block)
+	* a position where recalculating finished
+* journal
+	The journal is divided into sections, each section contains:
+
+	* metadata area (4kiB), it contains journal entries
+
+	  - every journal entry contains:
+
+		* logical sector (specifies where the data and tag should
+		  be written)
+		* last 8 bytes of data
+		* integrity tag (the size is specified in the superblock)
+
+	  - every metadata sector ends with
+
+		* mac (8-bytes), all the macs in 8 metadata sectors form a
+		  64-byte value. It is used to store hmac of sector
+		  numbers in the journal section, to protect against a
+		  possibility that the attacker tampers with sector
+		  numbers in the journal.
+		* commit id
+
+	* data area (the size is variable; it depends on how many journal
+	  entries fit into the metadata area)
+
+	    - every sector in the data area contains:
+
+		* data (504 bytes of data, the last 8 bytes are stored in
+		  the journal entry)
+		* commit id
+
+	To test if the whole journal section was written correctly, every
+	512-byte sector of the journal ends with 8-byte commit id. If the
+	commit id matches on all sectors in a journal section, then it is
+	assumed that the section was written correctly. If the commit id
+	doesn't match, the section was written partially and it should not
+	be replayed.
+
+* one or more runs of interleaved tags and data.
+    Each run contains:
+
+	* tag area - it contains integrity tags. There is one tag for each
+	  sector in the data area
+	* data area - it contains data sectors. The number of data sectors
+	  in one run must be a power of two. log2 of this value is stored
+	  in the superblock.
diff --git a/Documentation/admin-guide/device-mapper/dm-io.rst b/Documentation/admin-guide/device-mapper/dm-io.rst
new file mode 100644
index 000000000000..d2492917a1f5
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-io.rst
@@ -0,0 +1,75 @@
+=====
+dm-io
+=====
+
+Dm-io provides synchronous and asynchronous I/O services. There are three
+types of I/O services available, and each type has a sync and an async
+version.
+
+The user must set up an io_region structure to describe the desired location
+of the I/O. Each io_region indicates a block-device along with the starting
+sector and size of the region::
+
+   struct io_region {
+      struct block_device *bdev;
+      sector_t sector;
+      sector_t count;
+   };
+
+Dm-io can read from one io_region or write to one or more io_regions. Writes
+to multiple regions are specified by an array of io_region structures.
+
+The first I/O service type takes a list of memory pages as the data buffer for
+the I/O, along with an offset into the first page::
+
+   struct page_list {
+      struct page_list *next;
+      struct page *page;
+   };
+
+   int dm_io_sync(unsigned int num_regions, struct io_region *where, int rw,
+                  struct page_list *pl, unsigned int offset,
+                  unsigned long *error_bits);
+   int dm_io_async(unsigned int num_regions, struct io_region *where, int rw,
+                   struct page_list *pl, unsigned int offset,
+                   io_notify_fn fn, void *context);
+
+The second I/O service type takes an array of bio vectors as the data buffer
+for the I/O. This service can be handy if the caller has a pre-assembled bio,
+but wants to direct different portions of the bio to different devices::
+
+   int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
+                       int rw, struct bio_vec *bvec,
+                       unsigned long *error_bits);
+   int dm_io_async_bvec(unsigned int num_regions, struct io_region *where,
+                        int rw, struct bio_vec *bvec,
+                        io_notify_fn fn, void *context);
+
+The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
+data buffer for the I/O. This service can be handy if the caller needs to do
+I/O to a large region but doesn't want to allocate a large number of individual
+memory pages::
+
+   int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
+                     void *data, unsigned long *error_bits);
+   int dm_io_async_vm(unsigned int num_regions, struct io_region *where, int rw,
+                      void *data, io_notify_fn fn, void *context);
+
+Callers of the asynchronous I/O services must include the name of a completion
+callback routine and a pointer to some context data for the I/O::
+
+   typedef void (*io_notify_fn)(unsigned long error, void *context);
+
+The "error" parameter in this callback, as well as the `*error` parameter in
+all of the synchronous versions, is a bitset (instead of a simple error value).
+In the case of an write-I/O to multiple regions, this bitset allows dm-io to
+indicate success or failure on each individual region.
+
+Before using any of the dm-io services, the user should call dm_io_get()
+and specify the number of pages they expect to perform I/O on concurrently.
+Dm-io will attempt to resize its mempool to make sure enough pages are
+always available in order to avoid unnecessary waiting while performing I/O.
+
+When the user is finished using the dm-io services, they should call
+dm_io_put() and specify the same number of pages that were given on the
+dm_io_get() call.
diff --git a/Documentation/admin-guide/device-mapper/dm-log.rst b/Documentation/admin-guide/device-mapper/dm-log.rst
new file mode 100644
index 000000000000..ba4fce39bc27
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-log.rst
@@ -0,0 +1,57 @@
+=====================
+Device-Mapper Logging
+=====================
+The device-mapper logging code is used by some of the device-mapper
+RAID targets to track regions of the disk that are not consistent.
+A region (or portion of the address space) of the disk may be
+inconsistent because a RAID stripe is currently being operated on or
+a machine died while the region was being altered.  In the case of
+mirrors, a region would be considered dirty/inconsistent while you
+are writing to it because the writes need to be replicated for all
+the legs of the mirror and may not reach the legs at the same time.
+Once all writes are complete, the region is considered clean again.
+
+There is a generic logging interface that the device-mapper RAID
+implementations use to perform logging operations (see
+dm_dirty_log_type in include/linux/dm-dirty-log.h).  Various different
+logging implementations are available and provide different
+capabilities.  The list includes:
+
+==============	==============================================================
+Type		Files
+==============	==============================================================
+disk		drivers/md/dm-log.c
+core		drivers/md/dm-log.c
+userspace	drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
+==============	==============================================================
+
+The "disk" log type
+-------------------
+This log implementation commits the log state to disk.  This way, the
+logging state survives reboots/crashes.
+
+The "core" log type
+-------------------
+This log implementation keeps the log state in memory.  The log state
+will not survive a reboot or crash, but there may be a small boost in
+performance.  This method can also be used if no storage device is
+available for storing log state.
+
+The "userspace" log type
+------------------------
+This log type simply provides a way to export the log API to userspace,
+so log implementations can be done there.  This is done by forwarding most
+logging requests to userspace, where a daemon receives and processes the
+request.
+
+The structure used for communication between kernel and userspace are
+located in include/linux/dm-log-userspace.h.  Due to the frequency,
+diversity, and 2-way communication nature of the exchanges between
+kernel and userspace, 'connector' is used as the interface for
+communication.
+
+There are currently two userspace log implementations that leverage this
+framework - "clustered-disk" and "clustered-core".  These implementations
+provide a cluster-coherent log for shared-storage.  Device-mapper mirroring
+can be used in a shared-storage environment when the cluster log implementations
+are employed.
diff --git a/Documentation/admin-guide/device-mapper/dm-queue-length.rst b/Documentation/admin-guide/device-mapper/dm-queue-length.rst
new file mode 100644
index 000000000000..d8e381c1cb02
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-queue-length.rst
@@ -0,0 +1,48 @@
+===============
+dm-queue-length
+===============
+
+dm-queue-length is a path selector module for device-mapper targets,
+which selects a path with the least number of in-flight I/Os.
+The path selector name is 'queue-length'.
+
+Table parameters for each path: [<repeat_count>]
+
+::
+
+	<repeat_count>: The number of I/Os to dispatch using the selected
+			path before switching to the next path.
+			If not given, internal default is used. To check
+			the default value, see the activated table.
+
+Status for each path: <status> <fail-count> <in-flight>
+
+::
+
+	<status>: 'A' if the path is active, 'F' if the path is failed.
+	<fail-count>: The number of path failures.
+	<in-flight>: The number of in-flight I/Os on the path.
+
+
+Algorithm
+=========
+
+dm-queue-length increments/decrements 'in-flight' when an I/O is
+dispatched/completed respectively.
+dm-queue-length selects a path with the minimum 'in-flight'.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128.
+
+::
+
+  # echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
+    dmsetup create test
+  #
+  # dmsetup table
+  test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
+  #
+  # dmsetup status
+  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/admin-guide/device-mapper/dm-raid.rst b/Documentation/admin-guide/device-mapper/dm-raid.rst
new file mode 100644
index 000000000000..2fe255b130fb
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-raid.rst
@@ -0,0 +1,419 @@
+=======
+dm-raid
+=======
+
+The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
+It allows the MD RAID drivers to be accessed using a device-mapper
+interface.
+
+
+Mapping Table Interface
+-----------------------
+The target is named "raid" and it accepts the following parameters::
+
+  <raid_type> <#raid_params> <raid_params> \
+    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
+
+<raid_type>:
+
+  ============= ===============================================================
+  raid0		RAID0 striping (no resilience)
+  raid1		RAID1 mirroring
+  raid4		RAID4 with dedicated last parity disk
+  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
+		Same as raid4
+
+		- Transitory layout
+  raid5_la	RAID5 left asymmetric
+
+		- rotating parity 0 with data continuation
+  raid5_ra	RAID5 right asymmetric
+
+		- rotating parity N with data continuation
+  raid5_ls	RAID5 left symmetric
+
+		- rotating parity 0 with data restart
+  raid5_rs 	RAID5 right symmetric
+
+		- rotating parity N with data restart
+  raid6_zr	RAID6 zero restart
+
+		- rotating parity zero (left-to-right) with data restart
+  raid6_nr	RAID6 N restart
+
+		- rotating parity N (right-to-left) with data restart
+  raid6_nc	RAID6 N continue
+
+		- rotating parity N (right-to-left) with data continuation
+  raid6_n_6	RAID6 with dedicate parity disks
+
+		- parity and Q-syndrome on the last 2 disks;
+		  layout for takeover from/to raid4/raid5_n
+  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_la from/to raid6
+  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_ra from/to raid6
+  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_ls from/to raid6
+  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
+
+		- layout for takeover from raid5_rs from/to raid6
+  raid10        Various RAID10 inspired algorithms chosen by additional params
+		(see raid10_format and raid10_copies below)
+
+		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
+		- RAID1E: Integrated Adjacent Stripe Mirroring
+		- RAID1E: Integrated Offset Stripe Mirroring
+		- and other similar RAID10 variants
+  ============= ===============================================================
+
+  Reference: Chapter 4 of
+  http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
+
+<#raid_params>: The number of parameters that follow.
+
+<raid_params> consists of
+
+    Mandatory parameters:
+        <chunk_size>:
+		      Chunk size in sectors.  This parameter is often known as
+		      "stripe size".  It is the only mandatory parameter and
+		      is placed first.
+
+    followed by optional parameters (in any order):
+	[sync|nosync]
+		Force or prevent RAID initialization.
+
+	[rebuild <idx>]
+		Rebuild drive number 'idx' (first drive is 0).
+
+	[daemon_sleep <ms>]
+		Interval between runs of the bitmap daemon that
+		clear bits.  A longer interval means less bitmap I/O but
+		resyncing after a failure is likely to take longer.
+
+	[min_recovery_rate <kB/sec/disk>]
+		Throttle RAID initialization
+	[max_recovery_rate <kB/sec/disk>]
+		Throttle RAID initialization
+	[write_mostly <idx>]
+		Mark drive index 'idx' write-mostly.
+	[max_write_behind <sectors>]
+		See '--write-behind=' (man mdadm)
+	[stripe_cache <sectors>]
+		Stripe cache size (RAID 4/5/6 only)
+	[region_size <sectors>]
+		The region_size multiplied by the number of regions is the
+		logical size of the array.  The bitmap records the device
+		synchronisation state for each region.
+
+        [raid10_copies   <# copies>], [raid10_format   <near|far|offset>]
+		These two options are used to alter the default layout of
+		a RAID10 configuration.  The number of copies is can be
+		specified, but the default is 2.  There are also three
+		variations to how the copies are laid down - the default
+		is "near".  Near copies are what most people think of with
+		respect to mirroring.  If these options are left unspecified,
+		or 'raid10_copies 2' and/or 'raid10_format near' are given,
+		then the layouts for 2, 3 and 4 devices	are:
+
+		========	 ==========	   ==============
+		2 drives         3 drives          4 drives
+		========	 ==========	   ==============
+		A1  A1           A1  A1  A2        A1  A1  A2  A2
+		A2  A2           A2  A3  A3        A3  A3  A4  A4
+		A3  A3           A4  A4  A5        A5  A5  A6  A6
+		A4  A4           A5  A6  A6        A7  A7  A8  A8
+		..  ..           ..  ..  ..        ..  ..  ..  ..
+		========	 ==========	   ==============
+
+		The 2-device layout is equivalent 2-way RAID1.  The 4-device
+		layout is what a traditional RAID10 would look like.  The
+		3-device layout is what might be called a 'RAID1E - Integrated
+		Adjacent Stripe Mirroring'.
+
+		If 'raid10_copies 2' and 'raid10_format far', then the layouts
+		for 2, 3 and 4 devices are:
+
+		========	     ============	  ===================
+		2 drives             3 drives             4 drives
+		========	     ============	  ===================
+		A1  A2               A1   A2   A3         A1   A2   A3   A4
+		A3  A4               A4   A5   A6         A5   A6   A7   A8
+		A5  A6               A7   A8   A9         A9   A10  A11  A12
+		..  ..               ..   ..   ..         ..   ..   ..   ..
+		A2  A1               A3   A1   A2         A2   A1   A4   A3
+		A4  A3               A6   A4   A5         A6   A5   A8   A7
+		A6  A5               A9   A7   A8         A10  A9   A12  A11
+		..  ..               ..   ..   ..         ..   ..   ..   ..
+		========	     ============	  ===================
+
+		If 'raid10_copies 2' and 'raid10_format offset', then the
+		layouts for 2, 3 and 4 devices are:
+
+		========       ==========         ================
+		2 drives       3 drives           4 drives
+		========       ==========         ================
+		A1  A2         A1  A2  A3         A1  A2  A3  A4
+		A2  A1         A3  A1  A2         A2  A1  A4  A3
+		A3  A4         A4  A5  A6         A5  A6  A7  A8
+		A4  A3         A6  A4  A5         A6  A5  A8  A7
+		A5  A6         A7  A8  A9         A9  A10 A11 A12
+		A6  A5         A9  A7  A8         A10 A9  A12 A11
+		..  ..         ..  ..  ..         ..  ..  ..  ..
+		========       ==========         ================
+
+		Here we see layouts closely akin to 'RAID1E - Integrated
+		Offset Stripe Mirroring'.
+
+        [delta_disks <N>]
+		The delta_disks option value (-251 < N < +251) triggers
+		device removal (negative value) or device addition (positive
+		value) to any reshape supporting raid levels 4/5/6 and 10.
+		RAID levels 4/5/6 allow for addition of devices (metadata
+		and data device tuple), raid10_near and raid10_offset only
+		allow for device addition. raid10_far does not support any
+		reshaping at all.
+		A minimum of devices have to be kept to enforce resilience,
+		which is 3 devices for raid4/5 and 4 devices for raid6.
+
+        [data_offset <sectors>]
+		This option value defines the offset into each data device
+		where the data starts. This is used to provide out-of-place
+		reshaping space to avoid writing over data while
+		changing the layout of stripes, hence an interruption/crash
+		may happen at any time without the risk of losing data.
+		E.g. when adding devices to an existing raid set during
+		forward reshaping, the out-of-place space will be allocated
+		at the beginning of each raid device. The kernel raid4/5/6/10
+		MD personalities supporting such device addition will read the data from
+		the existing first stripes (those with smaller number of stripes)
+		starting at data_offset to fill up a new stripe with the larger
+		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
+		and write that new stripe to offset 0. Same will be applied to all
+		N-1 other new stripes. This out-of-place scheme is used to change
+		the RAID type (i.e. the allocation algorithm) as well, e.g.
+		changing from raid5_ls to raid5_n.
+
+	[journal_dev <dev>]
+		This option adds a journal device to raid4/5/6 raid sets and
+		uses it to close the 'write hole' caused by the non-atomic updates
+		to the component devices which can cause data loss during recovery.
+		The journal device is used as writethrough thus causing writes to
+		be throttled versus non-journaled raid4/5/6 sets.
+		Takeover/reshape is not possible with a raid4/5/6 journal device;
+		it has to be deconfigured before requesting these.
+
+	[journal_mode <mode>]
+		This option sets the caching mode on journaled raid4/5/6 raid sets
+		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
+		If 'writeback' is selected the journal device has to be resilient
+		and must not suffer from the 'write hole' problem itself (e.g. use
+		raid1 or raid10) to avoid a single point of failure.
+
+<#raid_devs>: The number of devices composing the array.
+	Each device consists of two entries.  The first is the device
+	containing the metadata (if any); the second is the one containing the
+	data. A Maximum of 64 metadata/data device entries are supported
+	up to target version 1.8.0.
+	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
+
+	If a drive has failed or is missing at creation time, a '-' can be
+	given for both the metadata and data drives for a given position.
+
+
+Example Tables
+--------------
+
+::
+
+  # RAID4 - 4 data drives, 1 parity (no metadata devices)
+  # No metadata devices specified to hold superblock/bitmap info
+  # Chunk size of 1MiB
+  # (Lines separated for easy reading)
+
+  0 1960893648 raid \
+          raid4 1 2048 \
+          5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
+
+  # RAID4 - 4 data drives, 1 parity (with metadata devices)
+  # Chunk size of 1MiB, force RAID initialization,
+  #       min recovery rate at 20 kiB/sec/disk
+
+  0 1960893648 raid \
+          raid4 4 2048 sync min_recovery_rate 20 \
+          5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
+
+
+Status Output
+-------------
+'dmsetup table' displays the table used to construct the mapping.
+The optional parameters are always printed in the order listed
+above with "sync" or "nosync" always output ahead of the other
+arguments, regardless of the order used when originally loading the table.
+Arguments that can be repeated are ordered by value.
+
+
+'dmsetup status' yields information on the state and health of the array.
+The output is as follows (normally a single line, but expanded here for
+clarity)::
+
+  1: <s> <l> raid \
+  2:      <raid_type> <#devices> <health_chars> \
+  3:      <sync_ratio> <sync_action> <mismatch_cnt>
+
+Line 1 is the standard output produced by device-mapper.
+
+Line 2 & 3 are produced by the raid target and are best explained by example::
+
+        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
+
+Here we can see the RAID type is raid4, there are 5 devices - all of
+which are 'A'live, and the array is 2/490221568 complete with its initial
+recovery.  Here is a fuller description of the individual fields:
+
+	=============== =========================================================
+	<raid_type>     Same as the <raid_type> used to create the array.
+	<health_chars>  One char for each device, indicating:
+
+			- 'A' = alive and in-sync
+			- 'a' = alive but not in-sync
+			- 'D' = dead/failed.
+	<sync_ratio>    The ratio indicating how much of the array has undergone
+			the process described by 'sync_action'.  If the
+			'sync_action' is "check" or "repair", then the process
+			of "resync" or "recover" can be considered complete.
+	<sync_action>   One of the following possible states:
+
+			idle
+				- No synchronization action is being performed.
+			frozen
+				- The current action has been halted.
+			resync
+				- Array is undergoing its initial synchronization
+				  or is resynchronizing after an unclean shutdown
+				  (possibly aided by a bitmap).
+			recover
+				- A device in the array is being rebuilt or
+				  replaced.
+			check
+				- A user-initiated full check of the array is
+				  being performed.  All blocks are read and
+				  checked for consistency.  The number of
+				  discrepancies found are recorded in
+				  <mismatch_cnt>.  No changes are made to the
+				  array by this action.
+			repair
+				- The same as "check", but discrepancies are
+				  corrected.
+			reshape
+				- The array is undergoing a reshape.
+	<mismatch_cnt>  The number of discrepancies found between mirror copies
+			in RAID1/10 or wrong parity values found in RAID4/5/6.
+			This value is valid only after a "check" of the array
+			is performed.  A healthy array has a 'mismatch_cnt' of 0.
+	<data_offset>   The current data offset to the start of the user data on
+			each component device of a raid set (see the respective
+			raid parameter to support out-of-place reshaping).
+	<journal_char>	- 'A' - active write-through journal device.
+			- 'a' - active write-back journal device.
+			- 'D' - dead journal device.
+			- '-' - no journal device.
+	=============== =========================================================
+
+
+Message Interface
+-----------------
+The dm-raid target will accept certain actions through the 'message' interface.
+('man dmsetup' for more information on the message interface.)  These actions
+include:
+
+	========= ================================================
+	"idle"    Halt the current sync action.
+	"frozen"  Freeze the current sync action.
+	"resync"  Initiate/continue a resync.
+	"recover" Initiate/continue a recover process.
+	"check"   Initiate a check (i.e. a "scrub") of the array.
+	"repair"  Initiate a repair of the array.
+	========= ================================================
+
+
+Discard Support
+---------------
+The implementation of discard support among hardware vendors varies.
+When a block is discarded, some storage devices will return zeroes when
+the block is read.  These devices set the 'discard_zeroes_data'
+attribute.  Other devices will return random data.  Confusingly, some
+devices that advertise 'discard_zeroes_data' will not reliably return
+zeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
+from a number of devices to calculate parity blocks and (for performance
+reasons) relies on 'discard_zeroes_data' being reliable, it is important
+that the devices be consistent.  Blocks may be discarded in the middle
+of a RAID 4/5/6 stripe and if subsequent read results are not
+consistent, the parity blocks may be calculated differently at any time;
+making the parity blocks useless for redundancy.  It is important to
+understand how your hardware behaves with discards if you are going to
+enable discards with RAID 4/5/6.
+
+Since the behavior of storage devices is unreliable in this respect,
+even when reporting 'discard_zeroes_data', by default RAID 4/5/6
+discard support is disabled -- this ensures data integrity at the
+expense of losing some performance.
+
+Storage devices that properly support 'discard_zeroes_data' are
+increasingly whitelisted in the kernel and can thus be trusted.
+
+For trusted devices, the following dm-raid module parameter can be set
+to safely enable discard support for RAID 4/5/6:
+
+    'devices_handle_discards_safely'
+
+
+Version History
+---------------
+
+::
+
+ 1.0.0	Initial version.  Support for RAID 4/5/6
+ 1.1.0	Added support for RAID 1
+ 1.2.0	Handle creation of arrays that contain failed devices.
+ 1.3.0	Added support for RAID 10
+ 1.3.1	Allow device replacement/rebuild for RAID 10
+ 1.3.2	Fix/improve redundancy checking for RAID10
+ 1.4.0	Non-functional change.  Removes arg from mapping function.
+ 1.4.1	RAID10 fix redundancy validation checks (commit 55ebbb5).
+ 1.4.2	Add RAID10 "far" and "offset" algorithm support.
+ 1.5.0	Add message interface to allow manipulation of the sync_action.
+	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
+ 1.5.1	Add ability to restore transiently failed devices on resume.
+ 1.5.2	'mismatch_cnt' is zero unless [last_]sync_action is "check".
+ 1.6.0	Add discard support (and devices_handle_discard_safely module param).
+ 1.7.0	Add support for MD RAID0 mappings.
+ 1.8.0	Explicitly check for compatible flags in the superblock metadata
+	and reject to start the raid set if any are set by a newer
+	target version, thus avoiding data corruption on a raid set
+	with a reshape in progress.
+ 1.9.0	Add support for RAID level takeover/reshape/region size
+	and set size reduction.
+ 1.9.1	Fix activation of existing RAID 4/10 mapped devices
+ 1.9.2	Don't emit '- -' on the status table line in case the constructor
+	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
+	'D' on the status line.  If '- -' is passed into the constructor, emit
+	'- -' on the table line and '-' as the status line health character.
+ 1.10.0	Add support for raid4/5/6 journal device
+ 1.10.1	Fix data corruption on reshape request
+ 1.11.0	Fix table line argument order
+	(wrong raid10_copies/raid10_format sequence)
+ 1.11.1	Add raid4/5/6 journal write-back support via journal_mode option
+ 1.12.1	Fix for MD deadlock between mddev_suspend() and md_write_start() available
+ 1.13.0	Fix dev_health status at end of "recover" (was 'a', now 'A')
+ 1.13.1	Fix deadlock caused by early md_stop_writes().  Also fix size an
+	state races.
+ 1.13.2	Fix raid redundancy validation and avoid keeping raid set frozen
+ 1.14.0	Fix reshape race on small devices.  Fix stripe adding reshape
+	deadlock/potential data corruption.  Update superblock when
+	specific devices are requested via rebuild.  Fix RAID leg
+	rebuild errors.
diff --git a/Documentation/admin-guide/device-mapper/dm-service-time.rst b/Documentation/admin-guide/device-mapper/dm-service-time.rst
new file mode 100644
index 000000000000..facf277fc13c
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-service-time.rst
@@ -0,0 +1,101 @@
+===============
+dm-service-time
+===============
+
+dm-service-time is a path selector module for device-mapper targets,
+which selects a path with the shortest estimated service time for
+the incoming I/O.
+
+The service time for each path is estimated by dividing the total size
+of in-flight I/Os on a path with the performance value of the path.
+The performance value is a relative throughput value among all paths
+in a path-group, and it can be specified as a table argument.
+
+The path selector name is 'service-time'.
+
+Table parameters for each path:
+
+    [<repeat_count> [<relative_throughput>]]
+	<repeat_count>:
+			The number of I/Os to dispatch using the selected
+			path before switching to the next path.
+			If not given, internal default is used.  To check
+			the default value, see the activated table.
+	<relative_throughput>:
+			The relative throughput value of the path
+			among all paths in the path-group.
+			The valid range is 0-100.
+			If not given, minimum value '1' is used.
+			If '0' is given, the path isn't selected while
+			other paths having a positive value are available.
+
+Status for each path:
+
+    <status> <fail-count> <in-flight-size> <relative_throughput>
+	<status>:
+		'A' if the path is active, 'F' if the path is failed.
+	<fail-count>:
+		The number of path failures.
+	<in-flight-size>:
+		The size of in-flight I/Os on the path.
+	<relative_throughput>:
+		The relative throughput value of the path
+		among all paths in the path-group.
+
+
+Algorithm
+=========
+
+dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
+dispatched and subtracts when completed.
+Basically, dm-service-time selects a path having minimum service time
+which is calculated by::
+
+	('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
+
+However, some optimizations below are used to reduce the calculation
+as much as possible.
+
+	1. If the paths have the same 'relative_throughput', skip
+	   the division and just compare the 'in-flight-size'.
+
+	2. If the paths have the same 'in-flight-size', skip the division
+	   and just compare the 'relative_throughput'.
+
+	3. If some paths have non-zero 'relative_throughput' and others
+	   have zero 'relative_throughput', ignore those paths with zero
+	   'relative_throughput'.
+
+If such optimizations can't be applied, calculate service time, and
+compare service time.
+If calculated service time is equal, the path having maximum
+'relative_throughput' may be better.  So compare 'relative_throughput'
+then.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128
+and sda has an average throughput 1GB/s and sdb has 4GB/s,
+'relative_throughput' value may be '1' for sda and '4' for sdb::
+
+  # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
+    dmsetup create test
+  #
+  # dmsetup table
+  test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
+  #
+  # dmsetup status
+  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
+
+
+Or '2' for sda and '8' for sdb would be also true::
+
+  # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
+    dmsetup create test
+  #
+  # dmsetup table
+  test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
+  #
+  # dmsetup status
+  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
diff --git a/Documentation/admin-guide/device-mapper/dm-uevent.rst b/Documentation/admin-guide/device-mapper/dm-uevent.rst
new file mode 100644
index 000000000000..4a8ee8d069c9
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-uevent.rst
@@ -0,0 +1,110 @@
+====================
+device-mapper uevent
+====================
+
+The device-mapper uevent code adds the capability to device-mapper to create
+and send kobject uevents (uevents).  Previously device-mapper events were only
+available through the ioctl interface.  The advantage of the uevents interface
+is the event contains environment attributes providing increased context for
+the event avoiding the need to query the state of the device-mapper device after
+the event is received.
+
+There are two functions currently for device-mapper events.  The first function
+listed creates the event and the second function sends the event(s)::
+
+  void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
+                      const char *path, unsigned nr_valid_paths)
+
+  void dm_send_uevents(struct list_head *events, struct kobject *kobj)
+
+
+The variables added to the uevent environment are:
+
+Variable Name: DM_TARGET
+------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description:
+:Value: Name of device-mapper target that generated the event.
+
+Variable Name: DM_ACTION
+------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description:
+:Value: Device-mapper specific action that caused the uevent action.
+	PATH_FAILED - A path has failed;
+	PATH_REINSTATED - A path has been reinstated.
+
+Variable Name: DM_SEQNUM
+------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: unsigned integer
+:Description: A sequence number for this specific device-mapper device.
+:Value: Valid unsigned integer range.
+
+Variable Name: DM_PATH
+----------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description: Major and minor number of the path device pertaining to this
+	      event.
+:Value: Path name in the form of "Major:Minor"
+
+Variable Name: DM_NR_VALID_PATHS
+--------------------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: unsigned integer
+:Description:
+:Value: Valid unsigned integer range.
+
+Variable Name: DM_NAME
+----------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description: Name of the device-mapper device.
+:Value: Name
+
+Variable Name: DM_UUID
+----------------------
+:Uevent Action(s): KOBJ_CHANGE
+:Type: string
+:Description: UUID of the device-mapper device.
+:Value: UUID. (Empty string if there isn't one.)
+
+An example of the uevents generated as captured by udevmonitor is shown
+below
+
+1.) Path failure::
+
+	UEVENT[1192521009.711215] change@/block/dm-3
+	ACTION=change
+	DEVPATH=/block/dm-3
+	SUBSYSTEM=block
+	DM_TARGET=multipath
+	DM_ACTION=PATH_FAILED
+	DM_SEQNUM=1
+	DM_PATH=8:32
+	DM_NR_VALID_PATHS=0
+	DM_NAME=mpath2
+	DM_UUID=mpath-35333333000002328
+	MINOR=3
+	MAJOR=253
+	SEQNUM=1130
+
+2.) Path reinstate::
+
+	UEVENT[1192521132.989927] change@/block/dm-3
+	ACTION=change
+	DEVPATH=/block/dm-3
+	SUBSYSTEM=block
+	DM_TARGET=multipath
+	DM_ACTION=PATH_REINSTATED
+	DM_SEQNUM=2
+	DM_PATH=8:32
+	DM_NR_VALID_PATHS=1
+	DM_NAME=mpath2
+	DM_UUID=mpath-35333333000002328
+	MINOR=3
+	MAJOR=253
+	SEQNUM=1131
diff --git a/Documentation/admin-guide/device-mapper/dm-zoned.rst b/Documentation/admin-guide/device-mapper/dm-zoned.rst
new file mode 100644
index 000000000000..07f56ebc1730
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-zoned.rst
@@ -0,0 +1,146 @@
+========
+dm-zoned
+========
+
+The dm-zoned device mapper target exposes a zoned block device (ZBC and
+ZAC compliant devices) as a regular block device without any write
+pattern constraints. In effect, it implements a drive-managed zoned
+block device which hides from the user (a file system or an application
+doing raw block device accesses) the sequential write constraints of
+host-managed zoned block devices and can mitigate the potential
+device-side performance degradation due to excessive random writes on
+host-aware zoned block devices.
+
+For a more detailed description of the zoned block device models and
+their constraints see (for SCSI devices):
+
+http://www.t10.org/drafts.htm#ZBC_Family
+
+and (for ATA devices):
+
+http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
+
+The dm-zoned implementation is simple and minimizes system overhead (CPU
+and memory usage as well as storage capacity loss). For a 10TB
+host-managed disk with 256 MB zones, dm-zoned memory usage per disk
+instance is at most 4.5 MB and as little as 5 zones will be used
+internally for storing metadata and performaing reclaim operations.
+
+dm-zoned target devices are formatted and checked using the dmzadm
+utility available at:
+
+https://github.com/hgst/dm-zoned-tools
+
+Algorithm
+=========
+
+dm-zoned implements an on-disk buffering scheme to handle non-sequential
+write accesses to the sequential zones of a zoned block device.
+Conventional zones are used for caching as well as for storing internal
+metadata.
+
+The zones of the device are separated into 2 types:
+
+1) Metadata zones: these are conventional zones used to store metadata.
+Metadata zones are not reported as useable capacity to the user.
+
+2) Data zones: all remaining zones, the vast majority of which will be
+sequential zones used exclusively to store user data. The conventional
+zones of the device may be used also for buffering user random writes.
+Data in these zones may be directly mapped to the conventional zone, but
+later moved to a sequential zone so that the conventional zone can be
+reused for buffering incoming random writes.
+
+dm-zoned exposes a logical device with a sector size of 4096 bytes,
+irrespective of the physical sector size of the backend zoned block
+device being used. This allows reducing the amount of metadata needed to
+manage valid blocks (blocks written).
+
+The on-disk metadata format is as follows:
+
+1) The first block of the first conventional zone found contains the
+super block which describes the on disk amount and position of metadata
+blocks.
+
+2) Following the super block, a set of blocks is used to describe the
+mapping of the logical device blocks. The mapping is done per chunk of
+blocks, with the chunk size equal to the zoned block device size. The
+mapping table is indexed by chunk number and each mapping entry
+indicates the zone number of the device storing the chunk of data. Each
+mapping entry may also indicate if the zone number of a conventional
+zone used to buffer random modification to the data zone.
+
+3) A set of blocks used to store bitmaps indicating the validity of
+blocks in the data zones follows the mapping table. A valid block is
+defined as a block that was written and not discarded. For a buffered
+data chunk, a block is always valid only in the data zone mapping the
+chunk or in the buffer zone of the chunk.
+
+For a logical chunk mapped to a conventional zone, all write operations
+are processed by directly writing to the zone. If the mapping zone is a
+sequential zone, the write operation is processed directly only if the
+write offset within the logical chunk is equal to the write pointer
+offset within of the sequential data zone (i.e. the write operation is
+aligned on the zone write pointer). Otherwise, write operations are
+processed indirectly using a buffer zone. In that case, an unused
+conventional zone is allocated and assigned to the chunk being
+accessed. Writing a block to the buffer zone of a chunk will
+automatically invalidate the same block in the sequential zone mapping
+the chunk. If all blocks of the sequential zone become invalid, the zone
+is freed and the chunk buffer zone becomes the primary zone mapping the
+chunk, resulting in native random write performance similar to a regular
+block device.
+
+Read operations are processed according to the block validity
+information provided by the bitmaps. Valid blocks are read either from
+the sequential zone mapping a chunk, or if the chunk is buffered, from
+the buffer zone assigned. If the accessed chunk has no mapping, or the
+accessed blocks are invalid, the read buffer is zeroed and the read
+operation terminated.
+
+After some time, the limited number of convnetional zones available may
+be exhausted (all used to map chunks or buffer sequential zones) and
+unaligned writes to unbuffered chunks become impossible. To avoid this
+situation, a reclaim process regularly scans used conventional zones and
+tries to reclaim the least recently used zones by copying the valid
+blocks of the buffer zone to a free sequential zone. Once the copy
+completes, the chunk mapping is updated to point to the sequential zone
+and the buffer zone freed for reuse.
+
+Metadata Protection
+===================
+
+To protect metadata against corruption in case of sudden power loss or
+system crash, 2 sets of metadata zones are used. One set, the primary
+set, is used as the main metadata region, while the secondary set is
+used as a staging area. Modified metadata is first written to the
+secondary set and validated by updating the super block in the secondary
+set, a generation counter is used to indicate that this set contains the
+newest metadata. Once this operation completes, in place of metadata
+block updates can be done in the primary metadata set. This ensures that
+one of the set is always consistent (all modifications committed or none
+at all). Flush operations are used as a commit point. Upon reception of
+a flush request, metadata modification activity is temporarily blocked
+(for both incoming BIO processing and reclaim process) and all dirty
+metadata blocks are staged and updated. Normal operation is then
+resumed. Flushing metadata thus only temporarily delays write and
+discard requests. Read requests can be processed concurrently while
+metadata flush is being executed.
+
+Usage
+=====
+
+A zoned block device must first be formatted using the dmzadm tool. This
+will analyze the device zone configuration, determine where to place the
+metadata sets on the device and initialize the metadata sets.
+
+Ex::
+
+	dmzadm --format /dev/sdxx
+
+For a formatted device, the target can be created normally with the
+dmsetup utility. The only parameter that dm-zoned requires is the
+underlying zoned block device name. Ex::
+
+	echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
+	dmsetup create dmz-`basename ${dev}`
diff --git a/Documentation/admin-guide/device-mapper/era.rst b/Documentation/admin-guide/device-mapper/era.rst
new file mode 100644
index 000000000000..90dd5c670b9f
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/era.rst
@@ -0,0 +1,116 @@
+======
+dm-era
+======
+
+Introduction
+============
+
+dm-era is a target that behaves similar to the linear target.  In
+addition it keeps track of which blocks were written within a user
+defined period of time called an 'era'.  Each era target instance
+maintains the current era as a monotonically increasing 32-bit
+counter.
+
+Use cases include tracking changed blocks for backup software, and
+partially invalidating the contents of a cache to restore cache
+coherency after rolling back a vendor snapshot.
+
+Constructor
+===========
+
+era <metadata dev> <origin dev> <block size>
+
+ ================ ======================================================
+ metadata dev     fast device holding the persistent metadata
+ origin dev	  device holding data blocks that may change
+ block size       block size of origin data device, granularity that is
+		  tracked by the target
+ ================ ======================================================
+
+Messages
+========
+
+None of the dm messages take any arguments.
+
+checkpoint
+----------
+
+Possibly move to a new era.  You shouldn't assume the era has
+incremented.  After sending this message, you should check the
+current era via the status line.
+
+take_metadata_snap
+------------------
+
+Create a clone of the metadata, to allow a userland process to read it.
+
+drop_metadata_snap
+------------------
+
+Drop the metadata snapshot.
+
+Status
+======
+
+<metadata block size> <#used metadata blocks>/<#total metadata blocks>
+<current era> <held metadata root | '-'>
+
+========================= ==============================================
+metadata block size	  Fixed block size for each metadata block in
+			  sectors
+#used metadata blocks	  Number of metadata blocks used
+#total metadata blocks	  Total number of metadata blocks
+current era		  The current era
+held metadata root	  The location, in blocks, of the metadata root
+			  that has been 'held' for userspace read
+			  access. '-' indicates there is no held root
+========================= ==============================================
+
+Detailed use case
+=================
+
+The scenario of invalidating a cache when rolling back a vendor
+snapshot was the primary use case when developing this target:
+
+Taking a vendor snapshot
+------------------------
+
+- Send a checkpoint message to the era target
+- Make a note of the current era in its status line
+- Take vendor snapshot (the era and snapshot should be forever
+  associated now).
+
+Rolling back to an vendor snapshot
+----------------------------------
+
+- Cache enters passthrough mode (see: dm-cache's docs in cache.txt)
+- Rollback vendor storage
+- Take metadata snapshot
+- Ascertain which blocks have been written since the snapshot was taken
+  by checking each block's era
+- Invalidate those blocks in the caching software
+- Cache returns to writeback/writethrough mode
+
+Memory usage
+============
+
+The target uses a bitset to record writes in the current era.  It also
+has a spare bitset ready for switching over to a new era.  Other than
+that it uses a few 4k blocks for updating metadata::
+
+   (4 * nr_blocks) bytes + buffers
+
+Resilience
+==========
+
+Metadata is updated on disk before a write to a previously unwritten
+block is performed.  As such dm-era should not be effected by a hard
+crash such as power failure.
+
+Userland tools
+==============
+
+Userland tools are found in the increasingly poorly named
+thin-provisioning-tools project:
+
+    https://github.com/jthornber/thin-provisioning-tools
diff --git a/Documentation/admin-guide/device-mapper/index.rst b/Documentation/admin-guide/device-mapper/index.rst
new file mode 100644
index 000000000000..c77c58b8f67b
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/index.rst
@@ -0,0 +1,42 @@
+=============
+Device Mapper
+=============
+
+.. toctree::
+    :maxdepth: 1
+
+    cache-policies
+    cache
+    delay
+    dm-crypt
+    dm-flakey
+    dm-init
+    dm-integrity
+    dm-io
+    dm-log
+    dm-queue-length
+    dm-raid
+    dm-service-time
+    dm-uevent
+    dm-zoned
+    era
+    kcopyd
+    linear
+    log-writes
+    persistent-data
+    snapshot
+    statistics
+    striped
+    switch
+    thin-provisioning
+    unstriped
+    verity
+    writecache
+    zero
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/admin-guide/device-mapper/kcopyd.rst b/Documentation/admin-guide/device-mapper/kcopyd.rst
new file mode 100644
index 000000000000..7651d395127f
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/kcopyd.rst
@@ -0,0 +1,47 @@
+======
+kcopyd
+======
+
+Kcopyd provides the ability to copy a range of sectors from one block-device
+to one or more other block-devices, with an asynchronous completion
+notification. It is used by dm-snapshot and dm-mirror.
+
+Users of kcopyd must first create a client and indicate how many memory pages
+to set aside for their copy jobs. This is done with a call to
+kcopyd_client_create()::
+
+   int kcopyd_client_create(unsigned int num_pages,
+                            struct kcopyd_client **result);
+
+To start a copy job, the user must set up io_region structures to describe
+the source and destinations of the copy. Each io_region indicates a
+block-device along with the starting sector and size of the region. The source
+of the copy is given as one io_region structure, and the destinations of the
+copy are given as an array of io_region structures::
+
+   struct io_region {
+      struct block_device *bdev;
+      sector_t sector;
+      sector_t count;
+   };
+
+To start the copy, the user calls kcopyd_copy(), passing in the client
+pointer, pointers to the source and destination io_regions, the name of a
+completion callback routine, and a pointer to some context data for the copy::
+
+   int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
+                   unsigned int num_dests, struct io_region *dests,
+                   unsigned int flags, kcopyd_notify_fn fn, void *context);
+
+   typedef void (*kcopyd_notify_fn)(int read_err, unsigned int write_err,
+				    void *context);
+
+When the copy completes, kcopyd will call the user's completion routine,
+passing back the user's context pointer. It will also indicate if a read or
+write error occurred during the copy.
+
+When a user is done with all their copy jobs, they should call
+kcopyd_client_destroy() to delete the kcopyd client, which will release the
+associated memory pages::
+
+   void kcopyd_client_destroy(struct kcopyd_client *kc);
diff --git a/Documentation/admin-guide/device-mapper/linear.rst b/Documentation/admin-guide/device-mapper/linear.rst
new file mode 100644
index 000000000000..9d17fc6e64a9
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/linear.rst
@@ -0,0 +1,63 @@
+=========
+dm-linear
+=========
+
+Device-Mapper's "linear" target maps a linear range of the Device-Mapper
+device onto a linear range of another device.  This is the basic building
+block of logical volume managers.
+
+Parameters: <dev path> <offset>
+    <dev path>:
+	Full pathname to the underlying block-device, or a
+        "major:minor" device-number.
+    <offset>:
+	Starting sector within the device.
+
+
+Example scripts
+===============
+
+::
+
+  #!/bin/sh
+  # Create an identity mapping for a device
+  echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
+
+::
+
+  #!/bin/sh
+  # Join 2 devices together
+  size1=`blockdev --getsz $1`
+  size2=`blockdev --getsz $2`
+  echo "0 $size1 linear $1 0
+  $size1 $size2 linear $2 0" | dmsetup create joined
+
+::
+
+  #!/usr/bin/perl -w
+  # Split a device into 4M chunks and then join them together in reverse order.
+
+  my $name = "reverse";
+  my $extent_size = 4 * 1024 * 2;
+  my $dev = $ARGV[0];
+  my $table = "";
+  my $count = 0;
+
+  if (!defined($dev)) {
+          die("Please specify a device.\n");
+  }
+
+  my $dev_size = `blockdev --getsz $dev`;
+  my $extents = int($dev_size / $extent_size) -
+                (($dev_size % $extent_size) ? 1 : 0);
+
+  while ($extents > 0) {
+          my $this_start = $count * $extent_size;
+          $extents--;
+          $count++;
+          my $this_offset = $extents * $extent_size;
+
+          $table .= "$this_start $extent_size linear $dev $this_offset\n";
+  }
+
+  `echo \"$table\" | dmsetup create $name`;
diff --git a/Documentation/admin-guide/device-mapper/log-writes.rst b/Documentation/admin-guide/device-mapper/log-writes.rst
new file mode 100644
index 000000000000..23141f2ffb7c
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/log-writes.rst
@@ -0,0 +1,145 @@
+=============
+dm-log-writes
+=============
+
+This target takes 2 devices, one to pass all IO to normally, and one to log all
+of the write operations to.  This is intended for file system developers wishing
+to verify the integrity of metadata or data as the file system is written to.
+There is a log_write_entry written for every WRITE request and the target is
+able to take arbitrary data from userspace to insert into the log.  The data
+that is in the WRITE requests is copied into the log to make the replay happen
+exactly as it happened originally.
+
+Log Ordering
+============
+
+We log things in order of completion once we are sure the write is no longer in
+cache.  This means that normal WRITE requests are not actually logged until the
+next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
+the log in a way that correlates to what is on disk and not what is in cache,
+to make it easier to detect improper waiting/flushing.
+
+This works by attaching all WRITE requests to a list once the write completes.
+Once we see a REQ_PREFLUSH request we splice this list onto the request and once
+the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
+completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
+simulate the worst case scenario with regard to power failures.  Consider the
+following example (W means write, C means complete):
+
+	W1,W2,W3,C3,C2,Wflush,C1,Cflush
+
+The log would show the following:
+
+	W3,W2,flush,W1....
+
+Again this is to simulate what is actually on disk, this allows us to detect
+cases where a power failure at a particular point in time would create an
+inconsistent file system.
+
+Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
+they complete as those requests will obviously bypass the device cache.
+
+Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
+have all the DISCARD requests, and then the WRITE requests and then the FLUSH
+request.  Consider the following example:
+
+	WRITE block 1, DISCARD block 1, FLUSH
+
+If we logged DISCARD when it completed, the replay would look like this:
+
+	DISCARD 1, WRITE 1, FLUSH
+
+which isn't quite what happened and wouldn't be caught during the log replay.
+
+Target interface
+================
+
+i) Constructor
+
+   log-writes <dev_path> <log_dev_path>
+
+   ============= ==============================================
+   dev_path	 Device that all of the IO will go to normally.
+   log_dev_path  Device where the log entries are written to.
+   ============= ==============================================
+
+ii) Status
+
+    <#logged entries> <highest allocated sector>
+
+    =========================== ========================
+    #logged entries	        Number of logged entries
+    highest allocated sector    Highest allocated sector
+    =========================== ========================
+
+iii) Messages
+
+    mark <description>
+
+	You can use a dmsetup message to set an arbitrary mark in a log.
+	For example say you want to fsck a file system after every
+	write, but first you need to replay up to the mkfs to make sure
+	we're fsck'ing something reasonable, you would do something like
+	this::
+
+	  mkfs.btrfs -f /dev/mapper/log
+	  dmsetup message log 0 mark mkfs
+	  <run test>
+
+	This would allow you to replay the log up to the mkfs mark and
+	then replay from that point on doing the fsck check in the
+	interval that you want.
+
+	Every log has a mark at the end labeled "dm-log-writes-end".
+
+Userspace component
+===================
+
+There is a userspace tool that will replay the log for you in various ways.
+It can be found here: https://github.com/josefbacik/log-writes
+
+Example usage
+=============
+
+Say you want to test fsync on your file system.  You would do something like
+this::
+
+  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
+  dmsetup create log --table "$TABLE"
+  mkfs.btrfs -f /dev/mapper/log
+  dmsetup message log 0 mark mkfs
+
+  mount /dev/mapper/log /mnt/btrfs-test
+  <some test that does fsync at the end>
+  dmsetup message log 0 mark fsync
+  md5sum /mnt/btrfs-test/foo
+  umount /mnt/btrfs-test
+
+  dmsetup remove log
+  replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
+  mount /dev/sdb /mnt/btrfs-test
+  md5sum /mnt/btrfs-test/foo
+  <verify md5sum's are correct>
+
+  Another option is to do a complicated file system operation and verify the file
+  system is consistent during the entire operation.  You could do this with:
+
+  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
+  dmsetup create log --table "$TABLE"
+  mkfs.btrfs -f /dev/mapper/log
+  dmsetup message log 0 mark mkfs
+
+  mount /dev/mapper/log /mnt/btrfs-test
+  <fsstress to dirty the fs>
+  btrfs filesystem balance /mnt/btrfs-test
+  umount /mnt/btrfs-test
+  dmsetup remove log
+
+  replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
+  btrfsck /dev/sdb
+  replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
+	--fsck "btrfsck /dev/sdb" --check fua
+
+And that will replay the log until it sees a FUA request, run the fsck command
+and if the fsck passes it will replay to the next FUA, until it is completed or
+the fsck command exists abnormally.
diff --git a/Documentation/admin-guide/device-mapper/persistent-data.rst b/Documentation/admin-guide/device-mapper/persistent-data.rst
new file mode 100644
index 000000000000..2065c3c5a091
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/persistent-data.rst
@@ -0,0 +1,88 @@
+===============
+Persistent data
+===============
+
+Introduction
+============
+
+The more-sophisticated device-mapper targets require complex metadata
+that is managed in kernel.  In late 2010 we were seeing that various
+different targets were rolling their own data structures, for example:
+
+- Mikulas Patocka's multisnap implementation
+- Heinz Mauelshagen's thin provisioning target
+- Another btree-based caching target posted to dm-devel
+- Another multi-snapshot target based on a design of Daniel Phillips
+
+Maintaining these data structures takes a lot of work, so if possible
+we'd like to reduce the number.
+
+The persistent-data library is an attempt to provide a re-usable
+framework for people who want to store metadata in device-mapper
+targets.  It's currently used by the thin-provisioning target and an
+upcoming hierarchical storage target.
+
+Overview
+========
+
+The main documentation is in the header files which can all be found
+under drivers/md/persistent-data.
+
+The block manager
+-----------------
+
+dm-block-manager.[hc]
+
+This provides access to the data on disk in fixed sized-blocks.  There
+is a read/write locking interface to prevent concurrent accesses, and
+keep data that is being used in the cache.
+
+Clients of persistent-data are unlikely to use this directly.
+
+The transaction manager
+-----------------------
+
+dm-transaction-manager.[hc]
+
+This restricts access to blocks and enforces copy-on-write semantics.
+The only way you can get hold of a writable block through the
+transaction manager is by shadowing an existing block (ie. doing
+copy-on-write) or allocating a fresh one.  Shadowing is elided within
+the same transaction so performance is reasonable.  The commit method
+ensures that all data is flushed before it writes the superblock.
+On power failure your metadata will be as it was when last committed.
+
+The Space Maps
+--------------
+
+dm-space-map.h
+dm-space-map-metadata.[hc]
+dm-space-map-disk.[hc]
+
+On-disk data structures that keep track of reference counts of blocks.
+Also acts as the allocator of new blocks.  Currently two
+implementations: a simpler one for managing blocks on a different
+device (eg. thinly-provisioned data blocks); and one for managing
+the metadata space.  The latter is complicated by the need to store
+its own data within the space it's managing.
+
+The data structures
+-------------------
+
+dm-btree.[hc]
+dm-btree-remove.c
+dm-btree-spine.c
+dm-btree-internal.h
+
+Currently there is only one data structure, a hierarchical btree.
+There are plans to add more.  For example, something with an
+array-like interface would see a lot of use.
+
+The btree is 'hierarchical' in that you can define it to be composed
+of nested btrees, and take multiple keys.  For example, the
+thin-provisioning target uses a btree with two levels of nesting.
+The first maps a device id to a mapping tree, and that in turn maps a
+virtual block to a physical block.
+
+Values stored in the btrees can have arbitrary size.  Keys are always
+64bits, although nesting allows you to use multiple keys.
diff --git a/Documentation/admin-guide/device-mapper/snapshot.rst b/Documentation/admin-guide/device-mapper/snapshot.rst
new file mode 100644
index 000000000000..ccdd8b587a74
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/snapshot.rst
@@ -0,0 +1,196 @@
+==============================
+Device-mapper snapshot support
+==============================
+
+Device-mapper allows you, without massive data copying:
+
+-  To create snapshots of any block device i.e. mountable, saved states of
+   the block device which are also writable without interfering with the
+   original content;
+-  To create device "forks", i.e. multiple different versions of the
+   same data stream.
+-  To merge a snapshot of a block device back into the snapshot's origin
+   device.
+
+In the first two cases, dm copies only the chunks of data that get
+changed and uses a separate copy-on-write (COW) block device for
+storage.
+
+For snapshot merge the contents of the COW storage are merged back into
+the origin device.
+
+
+There are three dm targets available:
+snapshot, snapshot-origin, and snapshot-merge.
+
+-  snapshot-origin <origin>
+
+which will normally have one or more snapshots based on it.
+Reads will be mapped directly to the backing device. For each write, the
+original data will be saved in the <COW device> of each snapshot to keep
+its visible content unchanged, at least until the <COW device> fills up.
+
+
+-  snapshot <origin> <COW device> <persistent?> <chunksize>
+   [<# feature args> [<arg>]*]
+
+A snapshot of the <origin> block device is created. Changed chunks of
+<chunksize> sectors will be stored on the <COW device>.  Writes will
+only go to the <COW device>.  Reads will come from the <COW device> or
+from <origin> for unchanged data.  <COW device> will often be
+smaller than the origin and if it fills up the snapshot will become
+useless and be disabled, returning errors.  So it is important to monitor
+the amount of free space and expand the <COW device> before it fills up.
+
+<persistent?> is P (Persistent) or N (Not persistent - will not survive
+after reboot).  O (Overflow) can be added as a persistent store option
+to allow userspace to advertise its support for seeing "Overflow" in the
+snapshot status.  So supported store types are "P", "PO" and "N".
+
+The difference between persistent and transient is with transient
+snapshots less metadata must be saved on disk - they can be kept in
+memory by the kernel.
+
+When loading or unloading the snapshot target, the corresponding
+snapshot-origin or snapshot-merge target must be suspended. A failure to
+suspend the origin target could result in data corruption.
+
+Optional features:
+
+   discard_zeroes_cow - a discard issued to the snapshot device that
+   maps to entire chunks to will zero the corresponding exception(s) in
+   the snapshot's exception store.
+
+   discard_passdown_origin - a discard to the snapshot device is passed
+   down to the snapshot-origin's underlying device.  This doesn't cause
+   copy-out to the snapshot exception store because the snapshot-origin
+   target is bypassed.
+
+   The discard_passdown_origin feature depends on the discard_zeroes_cow
+   feature being enabled.
+
+
+-  snapshot-merge <origin> <COW device> <persistent> <chunksize>
+   [<# feature args> [<arg>]*]
+
+takes the same table arguments as the snapshot target except it only
+works with persistent snapshots.  This target assumes the role of the
+"snapshot-origin" target and must not be loaded if the "snapshot-origin"
+is still present for <origin>.
+
+Creates a merging snapshot that takes control of the changed chunks
+stored in the <COW device> of an existing snapshot, through a handover
+procedure, and merges these chunks back into the <origin>.  Once merging
+has started (in the background) the <origin> may be opened and the merge
+will continue while I/O is flowing to it.  Changes to the <origin> are
+deferred until the merging snapshot's corresponding chunk(s) have been
+merged.  Once merging has started the snapshot device, associated with
+the "snapshot" target, will return -EIO when accessed.
+
+
+How snapshot is used by LVM2
+============================
+When you create the first LVM2 snapshot of a volume, four dm devices are used:
+
+1) a device containing the original mapping table of the source volume;
+2) a device used as the <COW device>;
+3) a "snapshot" device, combining #1 and #2, which is the visible snapshot
+   volume;
+4) the "original" volume (which uses the device number used by the original
+   source volume), whose table is replaced by a "snapshot-origin" mapping
+   from device #1.
+
+A fixed naming scheme is used, so with the following commands::
+
+  lvcreate -L 1G -n base volumeGroup
+  lvcreate -L 100M --snapshot -n snap volumeGroup/base
+
+we'll have this situation (with volumes in above order)::
+
+  # dmsetup table|grep volumeGroup
+
+  volumeGroup-base-real: 0 2097152 linear 8:19 384
+  volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
+  volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
+  volumeGroup-base: 0 2097152 snapshot-origin 254:11
+
+  # ls -lL /dev/mapper/volumeGroup-*
+  brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
+  brw-------  1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
+  brw-------  1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
+  brw-------  1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
+
+
+How snapshot-merge is used by LVM2
+==================================
+A merging snapshot assumes the role of the "snapshot-origin" while
+merging.  As such the "snapshot-origin" is replaced with
+"snapshot-merge".  The "-real" device is not changed and the "-cow"
+device is renamed to <origin name>-cow to aid LVM2's cleanup of the
+merging snapshot after it completes.  The "snapshot" that hands over its
+COW device to the "snapshot-merge" is deactivated (unless using lvchange
+--refresh); but if it is left active it will simply return I/O errors.
+
+A snapshot will merge into its origin with the following command::
+
+  lvconvert --merge volumeGroup/snap
+
+we'll now have this situation::
+
+  # dmsetup table|grep volumeGroup
+
+  volumeGroup-base-real: 0 2097152 linear 8:19 384
+  volumeGroup-base-cow: 0 204800 linear 8:19 2097536
+  volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
+
+  # ls -lL /dev/mapper/volumeGroup-*
+  brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
+  brw-------  1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
+  brw-------  1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
+
+
+How to determine when a merging is complete
+===========================================
+The snapshot-merge and snapshot status lines end with:
+
+  <sectors_allocated>/<total_sectors> <metadata_sectors>
+
+Both <sectors_allocated> and <total_sectors> include both data and metadata.
+During merging, the number of sectors allocated gets smaller and
+smaller.  Merging has finished when the number of sectors holding data
+is zero, in other words <sectors_allocated> == <metadata_sectors>.
+
+Here is a practical example (using a hybrid of lvm and dmsetup commands)::
+
+  # lvs
+    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
+    base    volumeGroup owi-a- 4.00g
+    snap    volumeGroup swi-a- 1.00g base  18.97
+
+  # dmsetup status volumeGroup-snap
+  0 8388608 snapshot 397896/2097152 1560
+                                    ^^^^ metadata sectors
+
+  # lvconvert --merge -b volumeGroup/snap
+    Merging of volume snap started.
+
+  # lvs volumeGroup/snap
+    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
+    base    volumeGroup Owi-a- 4.00g          17.23
+
+  # dmsetup status volumeGroup-base
+  0 8388608 snapshot-merge 281688/2097152 1104
+
+  # dmsetup status volumeGroup-base
+  0 8388608 snapshot-merge 180480/2097152 712
+
+  # dmsetup status volumeGroup-base
+  0 8388608 snapshot-merge 16/2097152 16
+
+Merging has finished.
+
+::
+
+  # lvs
+    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
+    base    volumeGroup owi-a- 4.00g
diff --git a/Documentation/admin-guide/device-mapper/statistics.rst b/Documentation/admin-guide/device-mapper/statistics.rst
new file mode 100644
index 000000000000..3d80a9f850cc
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/statistics.rst
@@ -0,0 +1,225 @@
+=============
+DM statistics
+=============
+
+Device Mapper supports the collection of I/O statistics on user-defined
+regions of a DM device.	 If no regions are defined no statistics are
+collected so there isn't any performance impact.  Only bio-based DM
+devices are currently supported.
+
+Each user-defined region specifies a starting sector, length and step.
+Individual statistics will be collected for each step-sized area within
+the range specified.
+
+The I/O statistics counters for each step-sized area of a region are
+in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
+Documentation/iostats.txt).  But two extra counters (12 and 13) are
+provided: total time spent reading and writing.  When the histogram
+argument is used, the 14th parameter is reported that represents the
+histogram of latencies.  All these counters may be accessed by sending
+the @stats_print message to the appropriate DM device via dmsetup.
+
+The reported times are in milliseconds and the granularity depends on
+the kernel ticks.  When the option precise_timestamps is used, the
+reported times are in nanoseconds.
+
+Each region has a corresponding unique identifier, which we call a
+region_id, that is assigned when the region is created.	 The region_id
+must be supplied when querying statistics about the region, deleting the
+region, etc.  Unique region_ids enable multiple userspace programs to
+request and process statistics for the same DM device without stepping
+on each other's data.
+
+The creation of DM statistics will allocate memory via kmalloc or
+fallback to using vmalloc space.  At most, 1/4 of the overall system
+memory may be allocated by DM statistics.  The admin can see how much
+memory is used by reading:
+
+	/sys/module/dm_mod/parameters/stats_current_allocated_bytes
+
+Messages
+========
+
+    @stats_create <range> <step> [<number_of_optional_arguments> <optional_arguments>...] [<program_id> [<aux_data>]]
+	Create a new region and return the region_id.
+
+	<range>
+	  "-"
+		whole device
+	  "<start_sector>+<length>"
+		a range of <length> 512-byte sectors
+		starting with <start_sector>.
+
+	<step>
+	  "<area_size>"
+		the range is subdivided into areas each containing
+		<area_size> sectors.
+	  "/<number_of_areas>"
+		the range is subdivided into the specified
+		number of areas.
+
+	<number_of_optional_arguments>
+	  The number of optional arguments
+
+	<optional_arguments>
+	  The following optional arguments are supported:
+
+	  precise_timestamps
+		use precise timer with nanosecond resolution
+		instead of the "jiffies" variable.  When this argument is
+		used, the resulting times are in nanoseconds instead of
+		milliseconds.  Precise timestamps are a little bit slower
+		to obtain than jiffies-based timestamps.
+	  histogram:n1,n2,n3,n4,...
+		collect histogram of latencies.  The
+		numbers n1, n2, etc are times that represent the boundaries
+		of the histogram.  If precise_timestamps is not used, the
+		times are in milliseconds, otherwise they are in
+		nanoseconds.  For each range, the kernel will report the
+		number of requests that completed within this range. For
+		example, if we use "histogram:10,20,30", the kernel will
+		report four numbers a:b:c:d. a is the number of requests
+		that took 0-10 ms to complete, b is the number of requests
+		that took 10-20 ms to complete, c is the number of requests
+		that took 20-30 ms to complete and d is the number of
+		requests that took more than 30 ms to complete.
+
+	<program_id>
+	  An optional parameter.  A name that uniquely identifies
+	  the userspace owner of the range.  This groups ranges together
+	  so that userspace programs can identify the ranges they
+	  created and ignore those created by others.
+	  The kernel returns this string back in the output of
+	  @stats_list message, but it doesn't use it for anything else.
+	  If we omit the number of optional arguments, program id must not
+	  be a number, otherwise it would be interpreted as the number of
+	  optional arguments.
+
+	<aux_data>
+	  An optional parameter.  A word that provides auxiliary data
+	  that is useful to the client program that created the range.
+	  The kernel returns this string back in the output of
+	  @stats_list message, but it doesn't use this value for anything.
+
+    @stats_delete <region_id>
+	Delete the region with the specified id.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+    @stats_clear <region_id>
+	Clear all the counters except the in-flight i/o counters.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+    @stats_list [<program_id>]
+	List all regions registered with @stats_create.
+
+	<program_id>
+	  An optional parameter.
+	  If this parameter is specified, only matching regions
+	  are returned.
+	  If it is not specified, all regions are returned.
+
+	Output format:
+	  <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
+	        precise_timestamps histogram:n1,n2,n3,...
+
+	The strings "precise_timestamps" and "histogram" are printed only
+	if they were specified when creating the region.
+
+    @stats_print <region_id> [<starting_line> <number_of_lines>]
+	Print counters for each step-sized area of a region.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+	<starting_line>
+	  The index of the starting line in the output.
+	  If omitted, all lines are returned.
+
+	<number_of_lines>
+	  The number of lines to include in the output.
+	  If omitted, all lines are returned.
+
+	Output format for each step-sized area of a region:
+
+	  <start_sector>+<length>
+		counters
+
+	  The first 11 counters have the same meaning as
+	  `/sys/block/*/stat or /proc/diskstats`.
+
+	  Please refer to Documentation/iostats.txt for details.
+
+	  1. the number of reads completed
+	  2. the number of reads merged
+	  3. the number of sectors read
+	  4. the number of milliseconds spent reading
+	  5. the number of writes completed
+	  6. the number of writes merged
+	  7. the number of sectors written
+	  8. the number of milliseconds spent writing
+	  9. the number of I/Os currently in progress
+	  10. the number of milliseconds spent doing I/Os
+	  11. the weighted number of milliseconds spent doing I/Os
+
+	  Additional counters:
+
+	  12. the total time spent reading in milliseconds
+	  13. the total time spent writing in milliseconds
+
+    @stats_print_clear <region_id> [<starting_line> <number_of_lines>]
+	Atomically print and then clear all the counters except the
+	in-flight i/o counters.	 Useful when the client consuming the
+	statistics does not want to lose any statistics (those updated
+	between printing and clearing).
+
+	<region_id>
+	  region_id returned from @stats_create
+
+	<starting_line>
+	  The index of the starting line in the output.
+	  If omitted, all lines are printed and then cleared.
+
+	<number_of_lines>
+	  The number of lines to process.
+	  If omitted, all lines are printed and then cleared.
+
+    @stats_set_aux <region_id> <aux_data>
+	Store auxiliary data aux_data for the specified region.
+
+	<region_id>
+	  region_id returned from @stats_create
+
+	<aux_data>
+	  The string that identifies data which is useful to the client
+	  program that created the range.  The kernel returns this
+	  string back in the output of @stats_list message, but it
+	  doesn't use this value for anything.
+
+Examples
+========
+
+Subdivide the DM device 'vol' into 100 pieces and start collecting
+statistics on them::
+
+  dmsetup message vol 0 @stats_create - /100
+
+Set the auxiliary data string to "foo bar baz" (the escape for each
+space must also be escaped, otherwise the shell will consume them)::
+
+  dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
+
+List the statistics::
+
+  dmsetup message vol 0 @stats_list
+
+Print the statistics::
+
+  dmsetup message vol 0 @stats_print 0
+
+Delete the statistics::
+
+  dmsetup message vol 0 @stats_delete 0
diff --git a/Documentation/admin-guide/device-mapper/striped.rst b/Documentation/admin-guide/device-mapper/striped.rst
new file mode 100644
index 000000000000..e9a8da192ae1
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/striped.rst
@@ -0,0 +1,61 @@
+=========
+dm-stripe
+=========
+
+Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0)
+device across one or more underlying devices. Data is written in "chunks",
+with consecutive chunks rotating among the underlying devices. This can
+potentially provide improved I/O throughput by utilizing several physical
+devices in parallel.
+
+Parameters: <num devs> <chunk size> [<dev path> <offset>]+
+    <num devs>:
+	Number of underlying devices.
+    <chunk size>:
+	Size of each chunk of data. Must be at least as
+        large as the system's PAGE_SIZE.
+    <dev path>:
+	Full pathname to the underlying block-device, or a
+	"major:minor" device-number.
+    <offset>:
+	Starting sector within the device.
+
+One or more underlying devices can be specified. The striped device size must
+be a multiple of the chunk size multiplied by the number of underlying devices.
+
+
+Example scripts
+===============
+
+::
+
+  #!/usr/bin/perl -w
+  # Create a striped device across any number of underlying devices. The device
+  # will be called "stripe_dev" and have a chunk-size of 128k.
+
+  my $chunk_size = 128 * 2;
+  my $dev_name = "stripe_dev";
+  my $num_devs = @ARGV;
+  my @devs = @ARGV;
+  my ($min_dev_size, $stripe_dev_size, $i);
+
+  if (!$num_devs) {
+          die("Specify at least one device\n");
+  }
+
+  $min_dev_size = `blockdev --getsz $devs[0]`;
+  for ($i = 1; $i < $num_devs; $i++) {
+          my $this_size = `blockdev --getsz $devs[$i]`;
+          $min_dev_size = ($min_dev_size < $this_size) ?
+                          $min_dev_size : $this_size;
+  }
+
+  $stripe_dev_size = $min_dev_size * $num_devs;
+  $stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs);
+
+  $table = "0 $stripe_dev_size striped $num_devs $chunk_size";
+  for ($i = 0; $i < $num_devs; $i++) {
+          $table .= " $devs[$i] 0";
+  }
+
+  `echo $table | dmsetup create $dev_name`;
diff --git a/Documentation/admin-guide/device-mapper/switch.rst b/Documentation/admin-guide/device-mapper/switch.rst
new file mode 100644
index 000000000000..7dde06be1a4f
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/switch.rst
@@ -0,0 +1,141 @@
+=========
+dm-switch
+=========
+
+The device-mapper switch target creates a device that supports an
+arbitrary mapping of fixed-size regions of I/O across a fixed set of
+paths.  The path used for any specific region can be switched
+dynamically by sending the target a message.
+
+It maps I/O to underlying block devices efficiently when there is a large
+number of fixed-sized address regions but there is no simple pattern
+that would allow for a compact representation of the mapping such as
+dm-stripe.
+
+Background
+----------
+
+Dell EqualLogic and some other iSCSI storage arrays use a distributed
+frameless architecture.  In this architecture, the storage group
+consists of a number of distinct storage arrays ("members") each having
+independent controllers, disk storage and network adapters.  When a LUN
+is created it is spread across multiple members.  The details of the
+spreading are hidden from initiators connected to this storage system.
+The storage group exposes a single target discovery portal, no matter
+how many members are being used.  When iSCSI sessions are created, each
+session is connected to an eth port on a single member.  Data to a LUN
+can be sent on any iSCSI session, and if the blocks being accessed are
+stored on another member the I/O will be forwarded as required.  This
+forwarding is invisible to the initiator.  The storage layout is also
+dynamic, and the blocks stored on disk may be moved from member to
+member as needed to balance the load.
+
+This architecture simplifies the management and configuration of both
+the storage group and initiators.  In a multipathing configuration, it
+is possible to set up multiple iSCSI sessions to use multiple network
+interfaces on both the host and target to take advantage of the
+increased network bandwidth.  An initiator could use a simple round
+robin algorithm to send I/O across all paths and let the storage array
+members forward it as necessary, but there is a performance advantage to
+sending data directly to the correct member.
+
+A device-mapper table already lets you map different regions of a
+device onto different targets.  However in this architecture the LUN is
+spread with an address region size on the order of 10s of MBs, which
+means the resulting table could have more than a million entries and
+consume far too much memory.
+
+Using this device-mapper switch target we can now build a two-layer
+device hierarchy:
+
+    Upper Tier - Determine which array member the I/O should be sent to.
+    Lower Tier - Load balance amongst paths to a particular member.
+
+The lower tier consists of a single dm multipath device for each member.
+Each of these multipath devices contains the set of paths directly to
+the array member in one priority group, and leverages existing path
+selectors to load balance amongst these paths.  We also build a
+non-preferred priority group containing paths to other array members for
+failover reasons.
+
+The upper tier consists of a single dm-switch device.  This device uses
+a bitmap to look up the location of the I/O and choose the appropriate
+lower tier device to route the I/O.  By using a bitmap we are able to
+use 4 bits for each address range in a 16 member group (which is very
+large for us).  This is a much denser representation than the dm table
+b-tree can achieve.
+
+Construction Parameters
+=======================
+
+    <num_paths> <region_size> <num_optional_args> [<optional_args>...] [<dev_path> <offset>]+
+	<num_paths>
+	    The number of paths across which to distribute the I/O.
+
+	<region_size>
+	    The number of 512-byte sectors in a region. Each region can be redirected
+	    to any of the available paths.
+
+	<num_optional_args>
+	    The number of optional arguments. Currently, no optional arguments
+	    are supported and so this must be zero.
+
+	<dev_path>
+	    The block device that represents a specific path to the device.
+
+	<offset>
+	    The offset of the start of data on the specific <dev_path> (in units
+	    of 512-byte sectors). This number is added to the sector number when
+	    forwarding the request to the specific path. Typically it is zero.
+
+Messages
+========
+
+set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
+
+Modify the region table by specifying which regions are redirected to
+which paths.
+
+<index>
+    The region number (region size was specified in constructor parameters).
+    If index is omitted, the next region (previous index + 1) is used.
+    Expressed in hexadecimal (WITHOUT any prefix like 0x).
+
+<path_nr>
+    The path number in the range 0 ... (<num_paths> - 1).
+    Expressed in hexadecimal (WITHOUT any prefix like 0x).
+
+R<n>,<m>
+    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
+    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
+    slots.
+
+Status
+======
+
+No status line is reported.
+
+Example
+=======
+
+Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
+the same size.
+
+Create a switch device with 64kB region size::
+
+    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
+	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
+
+Set mappings for the first 7 entries to point to devices switch0, switch1,
+switch2, switch0, switch1, switch2, switch1::
+
+    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
+
+Set repetitive mapping. This command::
+
+    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
+
+is equivalent to::
+
+    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
+	:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
diff --git a/Documentation/admin-guide/device-mapper/thin-provisioning.rst b/Documentation/admin-guide/device-mapper/thin-provisioning.rst
new file mode 100644
index 000000000000..bafebf79da4b
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/thin-provisioning.rst
@@ -0,0 +1,427 @@
+=================
+Thin provisioning
+=================
+
+Introduction
+============
+
+This document describes a collection of device-mapper targets that
+between them implement thin-provisioning and snapshots.
+
+The main highlight of this implementation, compared to the previous
+implementation of snapshots, is that it allows many virtual devices to
+be stored on the same data volume.  This simplifies administration and
+allows the sharing of data between volumes, thus reducing disk usage.
+
+Another significant feature is support for an arbitrary depth of
+recursive snapshots (snapshots of snapshots of snapshots ...).  The
+previous implementation of snapshots did this by chaining together
+lookup tables, and so performance was O(depth).  This new
+implementation uses a single data structure to avoid this degradation
+with depth.  Fragmentation may still be an issue, however, in some
+scenarios.
+
+Metadata is stored on a separate device from data, giving the
+administrator some freedom, for example to:
+
+- Improve metadata resilience by storing metadata on a mirrored volume
+  but data on a non-mirrored one.
+
+- Improve performance by storing the metadata on SSD.
+
+Status
+======
+
+These targets are considered safe for production use.  But different use
+cases will have different performance characteristics, for example due
+to fragmentation of the data volume.
+
+If you find this software is not performing as expected please mail
+dm-devel@redhat.com with details and we'll try our best to improve
+things for you.
+
+Userspace tools for checking and repairing the metadata have been fully
+developed and are available as 'thin_check' and 'thin_repair'.  The name
+of the package that provides these utilities varies by distribution (on
+a Red Hat distribution it is named 'device-mapper-persistent-data').
+
+Cookbook
+========
+
+This section describes some quick recipes for using thin provisioning.
+They use the dmsetup program to control the device-mapper driver
+directly.  End users will be advised to use a higher-level volume
+manager such as LVM2 once support has been added.
+
+Pool device
+-----------
+
+The pool device ties together the metadata volume and the data volume.
+It maps I/O linearly to the data volume and updates the metadata via
+two mechanisms:
+
+- Function calls from the thin targets
+
+- Device-mapper 'messages' from userspace which control the creation of new
+  virtual devices amongst other things.
+
+Setting up a fresh pool device
+------------------------------
+
+Setting up a pool device requires a valid metadata device, and a
+data device.  If you do not have an existing metadata device you can
+make one by zeroing the first 4k to indicate empty metadata.
+
+    dd if=/dev/zero of=$metadata_dev bs=4096 count=1
+
+The amount of metadata you need will vary according to how many blocks
+are shared between thin devices (i.e. through snapshots).  If you have
+less sharing than average you'll need a larger-than-average metadata device.
+
+As a guide, we suggest you calculate the number of bytes to use in the
+metadata device as 48 * $data_dev_size / $data_block_size but round it up
+to 2MB if the answer is smaller.  If you're creating large numbers of
+snapshots which are recording large amounts of change, you may find you
+need to increase this.
+
+The largest size supported is 16GB: If the device is larger,
+a warning will be issued and the excess space will not be used.
+
+Reloading a pool table
+----------------------
+
+You may reload a pool's table, indeed this is how the pool is resized
+if it runs out of space.  (N.B. While specifying a different metadata
+device when reloading is not forbidden at the moment, things will go
+wrong if it does not route I/O to exactly the same on-disk location as
+previously.)
+
+Using an existing pool device
+-----------------------------
+
+::
+
+    dmsetup create pool \
+	--table "0 20971520 thin-pool $metadata_dev $data_dev \
+		 $data_block_size $low_water_mark"
+
+$data_block_size gives the smallest unit of disk space that can be
+allocated at a time expressed in units of 512-byte sectors.
+$data_block_size must be between 128 (64KB) and 2097152 (1GB) and a
+multiple of 128 (64KB).  $data_block_size cannot be changed after the
+thin-pool is created.  People primarily interested in thin provisioning
+may want to use a value such as 1024 (512KB).  People doing lots of
+snapshotting may want a smaller value such as 128 (64KB).  If you are
+not zeroing newly-allocated data, a larger $data_block_size in the
+region of 256000 (128MB) is suggested.
+
+$low_water_mark is expressed in blocks of size $data_block_size.  If
+free space on the data device drops below this level then a dm event
+will be triggered which a userspace daemon should catch allowing it to
+extend the pool device.  Only one such event will be sent.
+
+No special event is triggered if a just resumed device's free space is below
+the low water mark. However, resuming a device always triggers an
+event; a userspace daemon should verify that free space exceeds the low
+water mark when handling this event.
+
+A low water mark for the metadata device is maintained in the kernel and
+will trigger a dm event if free space on the metadata device drops below
+it.
+
+Updating on-disk metadata
+-------------------------
+
+On-disk metadata is committed every time a FLUSH or FUA bio is written.
+If no such requests are made then commits will occur every second.  This
+means the thin-provisioning target behaves like a physical disk that has
+a volatile write cache.  If power is lost you may lose some recent
+writes.  The metadata should always be consistent in spite of any crash.
+
+If data space is exhausted the pool will either error or queue IO
+according to the configuration (see: error_if_no_space).  If metadata
+space is exhausted or a metadata operation fails: the pool will error IO
+until the pool is taken offline and repair is performed to 1) fix any
+potential inconsistencies and 2) clear the flag that imposes repair.
+Once the pool's metadata device is repaired it may be resized, which
+will allow the pool to return to normal operation.  Note that if a pool
+is flagged as needing repair, the pool's data and metadata devices
+cannot be resized until repair is performed.  It should also be noted
+that when the pool's metadata space is exhausted the current metadata
+transaction is aborted.  Given that the pool will cache IO whose
+completion may have already been acknowledged to upper IO layers
+(e.g. filesystem) it is strongly suggested that consistency checks
+(e.g. fsck) be performed on those layers when repair of the pool is
+required.
+
+Thin provisioning
+-----------------
+
+i) Creating a new thinly-provisioned volume.
+
+  To create a new thinly- provisioned volume you must send a message to an
+  active pool device, /dev/mapper/pool in this example::
+
+    dmsetup message /dev/mapper/pool 0 "create_thin 0"
+
+  Here '0' is an identifier for the volume, a 24-bit number.  It's up
+  to the caller to allocate and manage these identifiers.  If the
+  identifier is already in use, the message will fail with -EEXIST.
+
+ii) Using a thinly-provisioned volume.
+
+  Thinly-provisioned volumes are activated using the 'thin' target::
+
+    dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"
+
+  The last parameter is the identifier for the thinp device.
+
+Internal snapshots
+------------------
+
+i) Creating an internal snapshot.
+
+  Snapshots are created with another message to the pool.
+
+  N.B.  If the origin device that you wish to snapshot is active, you
+  must suspend it before creating the snapshot to avoid corruption.
+  This is NOT enforced at the moment, so please be careful!
+
+  ::
+
+    dmsetup suspend /dev/mapper/thin
+    dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
+    dmsetup resume /dev/mapper/thin
+
+  Here '1' is the identifier for the volume, a 24-bit number.  '0' is the
+  identifier for the origin device.
+
+ii) Using an internal snapshot.
+
+  Once created, the user doesn't have to worry about any connection
+  between the origin and the snapshot.  Indeed the snapshot is no
+  different from any other thinly-provisioned device and can be
+  snapshotted itself via the same method.  It's perfectly legal to
+  have only one of them active, and there's no ordering requirement on
+  activating or removing them both.  (This differs from conventional
+  device-mapper snapshots.)
+
+  Activate it exactly the same way as any other thinly-provisioned volume::
+
+    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
+
+External snapshots
+------------------
+
+You can use an external **read only** device as an origin for a
+thinly-provisioned volume.  Any read to an unprovisioned area of the
+thin device will be passed through to the origin.  Writes trigger
+the allocation of new blocks as usual.
+
+One use case for this is VM hosts that want to run guests on
+thinly-provisioned volumes but have the base image on another device
+(possibly shared between many VMs).
+
+You must not write to the origin device if you use this technique!
+Of course, you may write to the thin device and take internal snapshots
+of the thin volume.
+
+i) Creating a snapshot of an external device
+
+  This is the same as creating a thin device.
+  You don't mention the origin at this stage.
+
+  ::
+
+    dmsetup message /dev/mapper/pool 0 "create_thin 0"
+
+ii) Using a snapshot of an external device.
+
+  Append an extra parameter to the thin target specifying the origin::
+
+    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
+
+  N.B. All descendants (internal snapshots) of this snapshot require the
+  same extra origin parameter.
+
+Deactivation
+------------
+
+All devices using a pool must be deactivated before the pool itself
+can be.
+
+::
+
+    dmsetup remove thin
+    dmsetup remove snap
+    dmsetup remove pool
+
+Reference
+=========
+
+'thin-pool' target
+------------------
+
+i) Constructor
+
+    ::
+
+      thin-pool <metadata dev> <data dev> <data block size (sectors)> \
+	        <low water mark (blocks)> [<number of feature args> [<arg>]*]
+
+    Optional feature arguments:
+
+      skip_block_zeroing:
+	Skip the zeroing of newly-provisioned blocks.
+
+      ignore_discard:
+	Disable discard support.
+
+      no_discard_passdown:
+	Don't pass discards down to the underlying
+	data device, but just remove the mapping.
+
+      read_only:
+		 Don't allow any changes to be made to the pool
+		 metadata.  This mode is only available after the
+		 thin-pool has been created and first used in full
+		 read/write mode.  It cannot be specified on initial
+		 thin-pool creation.
+
+      error_if_no_space:
+	Error IOs, instead of queueing, if no space.
+
+    Data block size must be between 64KB (128 sectors) and 1GB
+    (2097152 sectors) inclusive.
+
+
+ii) Status
+
+    ::
+
+      <transaction id> <used metadata blocks>/<total metadata blocks>
+      <used data blocks>/<total data blocks> <held metadata root>
+      ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
+      needs_check|- metadata_low_watermark
+
+    transaction id:
+	A 64-bit number used by userspace to help synchronise with metadata
+	from volume managers.
+
+    used data blocks / total data blocks
+	If the number of free blocks drops below the pool's low water mark a
+	dm event will be sent to userspace.  This event is edge-triggered and
+	it will occur only once after each resume so volume manager writers
+	should register for the event and then check the target's status.
+
+    held metadata root:
+	The location, in blocks, of the metadata root that has been
+	'held' for userspace read access.  '-' indicates there is no
+	held root.
+
+    discard_passdown|no_discard_passdown
+	Whether or not discards are actually being passed down to the
+	underlying device.  When this is enabled when loading the table,
+	it can get disabled if the underlying device doesn't support it.
+
+    ro|rw|out_of_data_space
+	If the pool encounters certain types of device failures it will
+	drop into a read-only metadata mode in which no changes to
+	the pool metadata (like allocating new blocks) are permitted.
+
+	In serious cases where even a read-only mode is deemed unsafe
+	no further I/O will be permitted and the status will just
+	contain the string 'Fail'.  The userspace recovery tools
+	should then be used.
+
+    error_if_no_space|queue_if_no_space
+	If the pool runs out of data or metadata space, the pool will
+	either queue or error the IO destined to the data device.  The
+	default is to queue the IO until more space is added or the
+	'no_space_timeout' expires.  The 'no_space_timeout' dm-thin-pool
+	module parameter can be used to change this timeout -- it
+	defaults to 60 seconds but may be disabled using a value of 0.
+
+    needs_check
+	A metadata operation has failed, resulting in the needs_check
+	flag being set in the metadata's superblock.  The metadata
+	device must be deactivated and checked/repaired before the
+	thin-pool can be made fully operational again.  '-' indicates
+	needs_check is not set.
+
+    metadata_low_watermark:
+	Value of metadata low watermark in blocks.  The kernel sets this
+	value internally but userspace needs to know this value to
+	determine if an event was caused by crossing this threshold.
+
+iii) Messages
+
+    create_thin <dev id>
+	Create a new thinly-provisioned device.
+	<dev id> is an arbitrary unique 24-bit identifier chosen by
+	the caller.
+
+    create_snap <dev id> <origin id>
+	Create a new snapshot of another thinly-provisioned device.
+	<dev id> is an arbitrary unique 24-bit identifier chosen by
+	the caller.
+	<origin id> is the identifier of the thinly-provisioned device
+	of which the new device will be a snapshot.
+
+    delete <dev id>
+	Deletes a thin device.  Irreversible.
+
+    set_transaction_id <current id> <new id>
+	Userland volume managers, such as LVM, need a way to
+	synchronise their external metadata with the internal metadata of the
+	pool target.  The thin-pool target offers to store an
+	arbitrary 64-bit transaction id and return it on the target's
+	status line.  To avoid races you must provide what you think
+	the current transaction id is when you change it with this
+	compare-and-swap message.
+
+    reserve_metadata_snap
+        Reserve a copy of the data mapping btree for use by userland.
+        This allows userland to inspect the mappings as they were when
+        this message was executed.  Use the pool's status command to
+        get the root block associated with the metadata snapshot.
+
+    release_metadata_snap
+        Release a previously reserved copy of the data mapping btree.
+
+'thin' target
+-------------
+
+i) Constructor
+
+    ::
+
+        thin <pool dev> <dev id> [<external origin dev>]
+
+    pool dev:
+	the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
+
+    dev id:
+	the internal device identifier of the device to be
+	activated.
+
+    external origin dev:
+	an optional block device outside the pool to be treated as a
+	read-only snapshot origin: reads to unprovisioned areas of the
+	thin target will be mapped to this device.
+
+The pool doesn't store any size against the thin devices.  If you
+load a thin target that is smaller than you've been using previously,
+then you'll have no access to blocks mapped beyond the end.  If you
+load a target that is bigger than before, then extra blocks will be
+provisioned as and when needed.
+
+ii) Status
+
+    <nr mapped sectors> <highest mapped sector>
+	If the pool has encountered device errors and failed, the status
+	will just contain the string 'Fail'.  The userspace recovery
+	tools should then be used.
+
+    In the case where <nr mapped sectors> is 0, there is no highest
+    mapped sector and the value of <highest mapped sector> is unspecified.
diff --git a/Documentation/admin-guide/device-mapper/unstriped.rst b/Documentation/admin-guide/device-mapper/unstriped.rst
new file mode 100644
index 000000000000..0a8d3eb3f072
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/unstriped.rst
@@ -0,0 +1,135 @@
+================================
+Device-mapper "unstriped" target
+================================
+
+Introduction
+============
+
+The device-mapper "unstriped" target provides a transparent mechanism to
+unstripe a device-mapper "striped" target to access the underlying disks
+without having to touch the true backing block-device.  It can also be
+used to unstripe a hardware RAID-0 to access backing disks.
+
+Parameters:
+<number of stripes> <chunk size> <stripe #> <dev_path> <offset>
+
+<number of stripes>
+        The number of stripes in the RAID 0.
+
+<chunk size>
+	The amount of 512B sectors in the chunk striping.
+
+<dev_path>
+	The block device you wish to unstripe.
+
+<stripe #>
+        The stripe number within the device that corresponds to physical
+        drive you wish to unstripe.  This must be 0 indexed.
+
+
+Why use this module?
+====================
+
+An example of undoing an existing dm-stripe
+-------------------------------------------
+
+This small bash script will setup 4 loop devices and use the existing
+striped target to combine the 4 devices into one.  It then will use
+the unstriped target ontop of the striped device to access the
+individual backing loop devices.  We write data to the newly exposed
+unstriped devices and verify the data written matches the correct
+underlying device on the striped array::
+
+  #!/bin/bash
+
+  MEMBER_SIZE=$((128 * 1024 * 1024))
+  NUM=4
+  SEQ_END=$((${NUM}-1))
+  CHUNK=256
+  BS=4096
+
+  RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
+  DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
+  COUNT=$((${MEMBER_SIZE} / ${BS}))
+
+  for i in $(seq 0 ${SEQ_END}); do
+    dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
+    losetup /dev/loop${i} member-${i}
+    DM_PARMS+=" /dev/loop${i} 0"
+  done
+
+  echo $DM_PARMS | dmsetup create raid0
+  for i in $(seq 0 ${SEQ_END}); do
+    echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
+  done;
+
+  for i in $(seq 0 ${SEQ_END}); do
+    dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
+    diff /dev/mapper/set-${i} member-${i}
+  done;
+
+  for i in $(seq 0 ${SEQ_END}); do
+    dmsetup remove set-${i}
+  done
+
+  dmsetup remove raid0
+
+  for i in $(seq 0 ${SEQ_END}); do
+    losetup -d /dev/loop${i}
+    rm -f member-${i}
+  done
+
+Another example
+---------------
+
+Intel NVMe drives contain two cores on the physical device.
+Each core of the drive has segregated access to its LBA range.
+The current LBA model has a RAID 0 128k chunk on each core, resulting
+in a 256k stripe across the two cores::
+
+   Core 0:       Core 1:
+  __________    __________
+  | LBA 512|    | LBA 768|
+  | LBA 0  |    | LBA 256|
+  ----------    ----------
+
+The purpose of this unstriping is to provide better QoS in noisy
+neighbor environments. When two partitions are created on the
+aggregate drive without this unstriping, reads on one partition
+can affect writes on another partition.  This is because the partitions
+are striped across the two cores.  When we unstripe this hardware RAID 0
+and make partitions on each new exposed device the two partitions are now
+physically separated.
+
+With the dm-unstriped target we're able to segregate an fio script that
+has read and write jobs that are independent of each other.  Compared to
+when we run the test on a combined drive with partitions, we were able
+to get a 92% reduction in read latency using this device mapper target.
+
+
+Example dmsetup usage
+=====================
+
+unstriped ontop of Intel NVMe device that has 2 cores
+-----------------------------------------------------
+
+::
+
+  dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
+  dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
+
+There will now be two devices that expose Intel NVMe core 0 and 1
+respectively::
+
+  /dev/mapper/nvmset0
+  /dev/mapper/nvmset1
+
+unstriped ontop of striped with 4 drives using 128K chunk size
+--------------------------------------------------------------
+
+::
+
+  dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
+  dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
+  dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
+  dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
diff --git a/Documentation/admin-guide/device-mapper/verity.rst b/Documentation/admin-guide/device-mapper/verity.rst
new file mode 100644
index 000000000000..a4d1c1476d72
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/verity.rst
@@ -0,0 +1,229 @@
+=========
+dm-verity
+=========
+
+Device-Mapper's "verity" target provides transparent integrity checking of
+block devices using a cryptographic digest provided by the kernel crypto API.
+This target is read-only.
+
+Construction Parameters
+=======================
+
+::
+
+    <version> <dev> <hash_dev>
+    <data_block_size> <hash_block_size>
+    <num_data_blocks> <hash_start_block>
+    <algorithm> <digest> <salt>
+    [<#opt_params> <opt_params>]
+
+<version>
+    This is the type of the on-disk hash format.
+
+    0 is the original format used in the Chromium OS.
+      The salt is appended when hashing, digests are stored continuously and
+      the rest of the block is padded with zeroes.
+
+    1 is the current format that should be used for new devices.
+      The salt is prepended when hashing and each digest is
+      padded with zeroes to the power of two.
+
+<dev>
+    This is the device containing data, the integrity of which needs to be
+    checked.  It may be specified as a path, like /dev/sdaX, or a device number,
+    <major>:<minor>.
+
+<hash_dev>
+    This is the device that supplies the hash tree data.  It may be
+    specified similarly to the device path and may be the same device.  If the
+    same device is used, the hash_start should be outside the configured
+    dm-verity device.
+
+<data_block_size>
+    The block size on a data device in bytes.
+    Each block corresponds to one digest on the hash device.
+
+<hash_block_size>
+    The size of a hash block in bytes.
+
+<num_data_blocks>
+    The number of data blocks on the data device.  Additional blocks are
+    inaccessible.  You can place hashes to the same partition as data, in this
+    case hashes are placed after <num_data_blocks>.
+
+<hash_start_block>
+    This is the offset, in <hash_block_size>-blocks, from the start of hash_dev
+    to the root block of the hash tree.
+
+<algorithm>
+    The cryptographic hash algorithm used for this device.  This should
+    be the name of the algorithm, like "sha1".
+
+<digest>
+    The hexadecimal encoding of the cryptographic hash of the root hash block
+    and the salt.  This hash should be trusted as there is no other authenticity
+    beyond this point.
+
+<salt>
+    The hexadecimal encoding of the salt value.
+
+<#opt_params>
+    Number of optional parameters. If there are no optional parameters,
+    the optional paramaters section can be skipped or #opt_params can be zero.
+    Otherwise #opt_params is the number of following arguments.
+
+    Example of optional parameters section:
+        1 ignore_corruption
+
+ignore_corruption
+    Log corrupted blocks, but allow read operations to proceed normally.
+
+restart_on_corruption
+    Restart the system when a corrupted block is discovered. This option is
+    not compatible with ignore_corruption and requires user space support to
+    avoid restart loops.
+
+ignore_zero_blocks
+    Do not verify blocks that are expected to contain zeroes and always return
+    zeroes instead. This may be useful if the partition contains unused blocks
+    that are not guaranteed to contain zeroes.
+
+use_fec_from_device <fec_dev>
+    Use forward error correction (FEC) to recover from corruption if hash
+    verification fails. Use encoding data from the specified device. This
+    may be the same device where data and hash blocks reside, in which case
+    fec_start must be outside data and hash areas.
+
+    If the encoding data covers additional metadata, it must be accessible
+    on the hash device after the hash blocks.
+
+    Note: block sizes for data and hash devices must match. Also, if the
+    verity <dev> is encrypted the <fec_dev> should be too.
+
+fec_roots <num>
+    Number of generator roots. This equals to the number of parity bytes in
+    the encoding data. For example, in RS(M, N) encoding, the number of roots
+    is M-N.
+
+fec_blocks <num>
+    The number of encoding data blocks on the FEC device. The block size for
+    the FEC device is <data_block_size>.
+
+fec_start <offset>
+    This is the offset, in <data_block_size> blocks, from the start of the
+    FEC device to the beginning of the encoding data.
+
+check_at_most_once
+    Verify data blocks only the first time they are read from the data device,
+    rather than every time.  This reduces the overhead of dm-verity so that it
+    can be used on systems that are memory and/or CPU constrained.  However, it
+    provides a reduced level of security because only offline tampering of the
+    data device's content will be detected, not online tampering.
+
+    Hash blocks are still verified each time they are read from the hash device,
+    since verification of hash blocks is less performance critical than data
+    blocks, and a hash block will not be verified any more after all the data
+    blocks it covers have been verified anyway.
+
+Theory of operation
+===================
+
+dm-verity is meant to be set up as part of a verified boot path.  This
+may be anything ranging from a boot using tboot or trustedgrub to just
+booting from a known-good device (like a USB drive or CD).
+
+When a dm-verity device is configured, it is expected that the caller
+has been authenticated in some way (cryptographic signatures, etc).
+After instantiation, all hashes will be verified on-demand during
+disk access.  If they cannot be verified up to the root node of the
+tree, the root hash, then the I/O will fail.  This should detect
+tampering with any data on the device and the hash data.
+
+Cryptographic hashes are used to assert the integrity of the device on a
+per-block basis. This allows for a lightweight hash computation on first read
+into the page cache. Block hashes are stored linearly, aligned to the nearest
+block size.
+
+If forward error correction (FEC) support is enabled any recovery of
+corrupted data will be verified using the cryptographic hash of the
+corresponding data. This is why combining error correction with
+integrity checking is essential.
+
+Hash Tree
+---------
+
+Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
+of some data block on disk is calculated. If it is an intermediary node,
+the hash of a number of child nodes is calculated.
+
+Each entry in the tree is a collection of neighboring nodes that fit in one
+block.  The number is determined based on block_size and the size of the
+selected cryptographic digest algorithm.  The hashes are linearly-ordered in
+this entry and any unaligned trailing space is ignored but included when
+calculating the parent node.
+
+The tree looks something like:
+
+	alg = sha256, num_blocks = 32768, block_size = 4096
+
+::
+
+                                 [   root    ]
+                                /    . . .    \
+                     [entry_0]                 [entry_1]
+                    /  . . .  \                 . . .   \
+         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
+           / ... \             /   . . .  \             /           \
+     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
+
+
+On-disk format
+==============
+
+The verity kernel code does not read the verity metadata on-disk header.
+It only reads the hash blocks which directly follow the header.
+It is expected that a user-space tool will verify the integrity of the
+verity header.
+
+Alternatively, the header can be omitted and the dmsetup parameters can
+be passed via the kernel command-line in a rooted chain of trust where
+the command-line is verified.
+
+Directly following the header (and with sector number padded to the next hash
+block boundary) are the hash blocks which are stored a depth at a time
+(starting from the root), sorted in order of increasing index.
+
+The full specification of kernel parameters and on-disk metadata format
+is available at the cryptsetup project's wiki page
+
+  https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity
+
+Status
+======
+V (for Valid) is returned if every check performed so far was valid.
+If any check failed, C (for Corruption) is returned.
+
+Example
+=======
+Set up a device::
+
+  # dmsetup create vroot --readonly --table \
+    "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
+    "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
+    "1234000000000000000000000000000000000000000000000000000000000000"
+
+A command line tool veritysetup is available to compute or verify
+the hash tree or activate the kernel device. This is available from
+the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
+(as a libcryptsetup extension).
+
+Create hash on the device::
+
+  # veritysetup format /dev/sda1 /dev/sda2
+  ...
+  Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
+
+Activate the device::
+
+  # veritysetup create vroot /dev/sda1 /dev/sda2 \
+    4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
diff --git a/Documentation/admin-guide/device-mapper/writecache.rst b/Documentation/admin-guide/device-mapper/writecache.rst
new file mode 100644
index 000000000000..d3d7690f5e8d
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/writecache.rst
@@ -0,0 +1,79 @@
+=================
+Writecache target
+=================
+
+The writecache target caches writes on persistent memory or on SSD. It
+doesn't cache reads because reads are supposed to be cached in page cache
+in normal RAM.
+
+When the device is constructed, the first sector should be zeroed or the
+first sector should contain valid superblock from previous invocation.
+
+Constructor parameters:
+
+1. type of the cache device - "p" or "s"
+
+	- p - persistent memory
+	- s - SSD
+2. the underlying device that will be cached
+3. the cache device
+4. block size (4096 is recommended; the maximum block size is the page
+   size)
+5. the number of optional parameters (the parameters with an argument
+   count as two)
+
+	start_sector n		(default: 0)
+		offset from the start of cache device in 512-byte sectors
+	high_watermark n	(default: 50)
+		start writeback when the number of used blocks reach this
+		watermark
+	low_watermark x		(default: 45)
+		stop writeback when the number of used blocks drops below
+		this watermark
+	writeback_jobs n	(default: unlimited)
+		limit the number of blocks that are in flight during
+		writeback. Setting this value reduces writeback
+		throughput, but it may improve latency of read requests
+	autocommit_blocks n	(default: 64 for pmem, 65536 for ssd)
+		when the application writes this amount of blocks without
+		issuing the FLUSH request, the blocks are automatically
+		commited
+	autocommit_time ms	(default: 1000)
+		autocommit time in milliseconds. The data is automatically
+		commited if this time passes and no FLUSH request is
+		received
+	fua			(by default on)
+		applicable only to persistent memory - use the FUA flag
+		when writing data from persistent memory back to the
+		underlying device
+	nofua
+		applicable only to persistent memory - don't use the FUA
+		flag when writing back data and send the FLUSH request
+		afterwards
+
+		- some underlying devices perform better with fua, some
+		  with nofua. The user should test it
+
+Status:
+1. error indicator - 0 if there was no error, otherwise error number
+2. the number of blocks
+3. the number of free blocks
+4. the number of blocks under writeback
+
+Messages:
+	flush
+		flush the cache device. The message returns successfully
+		if the cache device was flushed without an error
+	flush_on_suspend
+		flush the cache device on next suspend. Use this message
+		when you are going to remove the cache device. The proper
+		sequence for removing the cache device is:
+
+		1. send the "flush_on_suspend" message
+		2. load an inactive table with a linear target that maps
+		   to the underlying device
+		3. suspend the device
+		4. ask for status and verify that there are no errors
+		5. resume the device, so that it will use the linear
+		   target
+		6. the cache device is now inactive and it can be deleted
diff --git a/Documentation/admin-guide/device-mapper/zero.rst b/Documentation/admin-guide/device-mapper/zero.rst
new file mode 100644
index 000000000000..11fb5cf4597c
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/zero.rst
@@ -0,0 +1,37 @@
+=======
+dm-zero
+=======
+
+Device-Mapper's "zero" target provides a block-device that always returns
+zero'd data on reads and silently drops writes. This is similar behavior to
+/dev/zero, but as a block-device instead of a character-device.
+
+Dm-zero has no target-specific parameters.
+
+One very interesting use of dm-zero is for creating "sparse" devices in
+conjunction with dm-snapshot. A sparse device reports a device-size larger
+than the amount of actual storage space available for that device. A user can
+write data anywhere within the sparse device and read it back like a normal
+device. Reads to previously unwritten areas will return a zero'd buffer. When
+enough data has been written to fill up the actual storage space, the sparse
+device is deactivated. This can be very useful for testing device and
+filesystem limitations.
+
+To create a sparse device, start by creating a dm-zero device that's the
+desired size of the sparse device. For this example, we'll assume a 10TB
+sparse device::
+
+  TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2`   # 10 TB in sectors
+  echo "0 $TEN_TERABYTES zero" | dmsetup create zero1
+
+Then create a snapshot of the zero device, using any available block-device as
+the COW device. The size of the COW device will determine the amount of real
+space available to the sparse device. For this example, we'll assume /dev/sdb1
+is an available 10GB partition::
+
+  echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \
+     dmsetup create sparse1
+
+This will create a 10TB sparse device called /dev/mapper/sparse1 that has
+10GB of actual storage space available. If more than 10GB of data is written
+to this device, it will start returning I/O errors.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index abc2c4e83939..64e97a969857 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -80,6 +80,7 @@ configure specific aspects of kernel behavior to your liking.
    namespaces/index
    perf-security
    acpi/index
+   device-mapper/index
 
 .. only::  subproject and html
 
diff --git a/Documentation/device-mapper/cache-policies.rst b/Documentation/device-mapper/cache-policies.rst
deleted file mode 100644
index b17fe352fc41..000000000000
--- a/Documentation/device-mapper/cache-policies.rst
+++ /dev/null
@@ -1,131 +0,0 @@
-=============================
-Guidance for writing policies
-=============================
-
-Try to keep transactionality out of it.  The core is careful to
-avoid asking about anything that is migrating.  This is a pain, but
-makes it easier to write the policies.
-
-Mappings are loaded into the policy at construction time.
-
-Every bio that is mapped by the target is referred to the policy.
-The policy can return a simple HIT or MISS or issue a migration.
-
-Currently there's no way for the policy to issue background work,
-e.g. to start writing back dirty blocks that are going to be evicted
-soon.
-
-Because we map bios, rather than requests it's easy for the policy
-to get fooled by many small bios.  For this reason the core target
-issues periodic ticks to the policy.  It's suggested that the policy
-doesn't update states (eg, hit counts) for a block more than once
-for each tick.  The core ticks by watching bios complete, and so
-trying to see when the io scheduler has let the ios run.
-
-
-Overview of supplied cache replacement policies
-===============================================
-
-multiqueue (mq)
----------------
-
-This policy is now an alias for smq (see below).
-
-The following tunables are accepted, but have no effect::
-
-	'sequential_threshold <#nr_sequential_ios>'
-	'random_threshold <#nr_random_ios>'
-	'read_promote_adjustment <value>'
-	'write_promote_adjustment <value>'
-	'discard_promote_adjustment <value>'
-
-Stochastic multiqueue (smq)
----------------------------
-
-This policy is the default.
-
-The stochastic multi-queue (smq) policy addresses some of the problems
-with the multiqueue (mq) policy.
-
-The smq policy (vs mq) offers the promise of less memory utilization,
-improved performance and increased adaptability in the face of changing
-workloads.  smq also does not have any cumbersome tuning knobs.
-
-Users may switch from "mq" to "smq" simply by appropriately reloading a
-DM table that is using the cache target.  Doing so will cause all of the
-mq policy's hints to be dropped.  Also, performance of the cache may
-degrade slightly until smq recalculates the origin device's hotspots
-that should be cached.
-
-Memory usage
-^^^^^^^^^^^^
-
-The mq policy used a lot of memory; 88 bytes per cache block on a 64
-bit machine.
-
-smq uses 28bit indexes to implement its data structures rather than
-pointers.  It avoids storing an explicit hit count for each block.  It
-has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
-the entries (each hotspot block covers a larger area than a single
-cache block).
-
-All this means smq uses ~25bytes per cache block.  Still a lot of
-memory, but a substantial improvement nontheless.
-
-Level balancing
-^^^^^^^^^^^^^^^
-
-mq placed entries in different levels of the multiqueue structures
-based on their hit count (~ln(hit count)).  This meant the bottom
-levels generally had the most entries, and the top ones had very
-few.  Having unbalanced levels like this reduced the efficacy of the
-multiqueue.
-
-smq does not maintain a hit count, instead it swaps hit entries with
-the least recently used entry from the level above.  The overall
-ordering being a side effect of this stochastic process.  With this
-scheme we can decide how many entries occupy each multiqueue level,
-resulting in better promotion/demotion decisions.
-
-Adaptability:
-The mq policy maintained a hit count for each cache block.  For a
-different block to get promoted to the cache its hit count has to
-exceed the lowest currently in the cache.  This meant it could take a
-long time for the cache to adapt between varying IO patterns.
-
-smq doesn't maintain hit counts, so a lot of this problem just goes
-away.  In addition it tracks performance of the hotspot queue, which
-is used to decide which blocks to promote.  If the hotspot queue is
-performing badly then it starts moving entries more quickly between
-levels.  This lets it adapt to new IO patterns very quickly.
-
-Performance
-^^^^^^^^^^^
-
-Testing smq shows substantially better performance than mq.
-
-cleaner
--------
-
-The cleaner writes back all dirty blocks in a cache to decommission it.
-
-Examples
-========
-
-The syntax for a table is::
-
-	cache <metadata dev> <cache dev> <origin dev> <block size>
-	<#feature_args> [<feature arg>]*
-	<policy> <#policy_args> [<policy arg>]*
-
-The syntax to send a message using the dmsetup command is::
-
-	dmsetup message <mapped device> 0 sequential_threshold 1024
-	dmsetup message <mapped device> 0 random_threshold 8
-
-Using dmsetup::
-
-	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
-	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
-	creates a 128GB large mapped device named 'blah' with the
-	sequential threshold set to 1024 and the random_threshold set to 8.
diff --git a/Documentation/device-mapper/cache.rst b/Documentation/device-mapper/cache.rst
deleted file mode 100644
index f15e5254d05b..000000000000
--- a/Documentation/device-mapper/cache.rst
+++ /dev/null
@@ -1,337 +0,0 @@
-=====
-Cache
-=====
-
-Introduction
-============
-
-dm-cache is a device mapper target written by Joe Thornber, Heinz
-Mauelshagen, and Mike Snitzer.
-
-It aims to improve performance of a block device (eg, a spindle) by
-dynamically migrating some of its data to a faster, smaller device
-(eg, an SSD).
-
-This device-mapper solution allows us to insert this caching at
-different levels of the dm stack, for instance above the data device for
-a thin-provisioning pool.  Caching solutions that are integrated more
-closely with the virtual memory system should give better performance.
-
-The target reuses the metadata library used in the thin-provisioning
-library.
-
-The decision as to what data to migrate and when is left to a plug-in
-policy module.  Several of these have been written as we experiment,
-and we hope other people will contribute others for specific io
-scenarios (eg. a vm image server).
-
-Glossary
-========
-
-  Migration
-	       Movement of the primary copy of a logical block from one
-	       device to the other.
-  Promotion
-	       Migration from slow device to fast device.
-  Demotion
-	       Migration from fast device to slow device.
-
-The origin device always contains a copy of the logical block, which
-may be out of date or kept in sync with the copy on the cache device
-(depending on policy).
-
-Design
-======
-
-Sub-devices
------------
-
-The target is constructed by passing three devices to it (along with
-other parameters detailed later):
-
-1. An origin device - the big, slow one.
-
-2. A cache device - the small, fast one.
-
-3. A small metadata device - records which blocks are in the cache,
-   which are dirty, and extra hints for use by the policy object.
-   This information could be put on the cache device, but having it
-   separate allows the volume manager to configure it differently,
-   e.g. as a mirror for extra robustness.  This metadata device may only
-   be used by a single cache device.
-
-Fixed block size
-----------------
-
-The origin is divided up into blocks of a fixed size.  This block size
-is configurable when you first create the cache.  Typically we've been
-using block sizes of 256KB - 1024KB.  The block size must be between 64
-sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
-
-Having a fixed block size simplifies the target a lot.  But it is
-something of a compromise.  For instance, a small part of a block may be
-getting hit a lot, yet the whole block will be promoted to the cache.
-So large block sizes are bad because they waste cache space.  And small
-block sizes are bad because they increase the amount of metadata (both
-in core and on disk).
-
-Cache operating modes
----------------------
-
-The cache has three operating modes: writeback, writethrough and
-passthrough.
-
-If writeback, the default, is selected then a write to a block that is
-cached will go only to the cache and the block will be marked dirty in
-the metadata.
-
-If writethrough is selected then a write to a cached block will not
-complete until it has hit both the origin and cache devices.  Clean
-blocks should remain clean.
-
-If passthrough is selected, useful when the cache contents are not known
-to be coherent with the origin device, then all reads are served from
-the origin device (all reads miss the cache) and all writes are
-forwarded to the origin device; additionally, write hits cause cache
-block invalidates.  To enable passthrough mode the cache must be clean.
-Passthrough mode allows a cache device to be activated without having to
-worry about coherency.  Coherency that exists is maintained, although
-the cache will gradually cool as writes take place.  If the coherency of
-the cache can later be verified, or established through use of the
-"invalidate_cblocks" message, the cache device can be transitioned to
-writethrough or writeback mode while still warm.  Otherwise, the cache
-contents can be discarded prior to transitioning to the desired
-operating mode.
-
-A simple cleaner policy is provided, which will clean (write back) all
-dirty blocks in a cache.  Useful for decommissioning a cache or when
-shrinking a cache.  Shrinking the cache's fast device requires all cache
-blocks, in the area of the cache being removed, to be clean.  If the
-area being removed from the cache still contains dirty blocks the resize
-will fail.  Care must be taken to never reduce the volume used for the
-cache's fast device until the cache is clean.  This is of particular
-importance if writeback mode is used.  Writethrough and passthrough
-modes already maintain a clean cache.  Future support to partially clean
-the cache, above a specified threshold, will allow for keeping the cache
-warm and in writeback mode during resize.
-
-Migration throttling
---------------------
-
-Migrating data between the origin and cache device uses bandwidth.
-The user can set a throttle to prevent more than a certain amount of
-migration occurring at any one time.  Currently we're not taking any
-account of normal io traffic going to the devices.  More work needs
-doing here to avoid migrating during those peak io moments.
-
-For the time being, a message "migration_threshold <#sectors>"
-can be used to set the maximum number of sectors being migrated,
-the default being 2048 sectors (1MB).
-
-Updating on-disk metadata
--------------------------
-
-On-disk metadata is committed every time a FLUSH or FUA bio is written.
-If no such requests are made then commits will occur every second.  This
-means the cache behaves like a physical disk that has a volatile write
-cache.  If power is lost you may lose some recent writes.  The metadata
-should always be consistent in spite of any crash.
-
-The 'dirty' state for a cache block changes far too frequently for us
-to keep updating it on the fly.  So we treat it as a hint.  In normal
-operation it will be written when the dm device is suspended.  If the
-system crashes all cache blocks will be assumed dirty when restarted.
-
-Per-block policy hints
-----------------------
-
-Policy plug-ins can store a chunk of data per cache block.  It's up to
-the policy how big this chunk is, but it should be kept small.  Like the
-dirty flags this data is lost if there's a crash so a safe fallback
-value should always be possible.
-
-Policy hints affect performance, not correctness.
-
-Policy messaging
-----------------
-
-Policies will have different tunables, specific to each one, so we
-need a generic way of getting and setting these.  Device-mapper
-messages are used.  Refer to cache-policies.txt.
-
-Discard bitset resolution
--------------------------
-
-We can avoid copying data during migration if we know the block has
-been discarded.  A prime example of this is when mkfs discards the
-whole block device.  We store a bitset tracking the discard state of
-blocks.  However, we allow this bitset to have a different block size
-from the cache blocks.  This is because we need to track the discard
-state for all of the origin device (compare with the dirty bitset
-which is just for the smaller cache device).
-
-Target interface
-================
-
-Constructor
------------
-
-  ::
-
-   cache <metadata dev> <cache dev> <origin dev> <block size>
-         <#feature args> [<feature arg>]*
-         <policy> <#policy args> [policy args]*
-
- ================ =======================================================
- metadata dev     fast device holding the persistent metadata
- cache dev	  fast device holding cached data blocks
- origin dev	  slow device holding original data blocks
- block size       cache unit size in sectors
-
- #feature args    number of feature arguments passed
- feature args     writethrough or passthrough (The default is writeback.)
-
- policy           the replacement policy to use
- #policy args     an even number of arguments corresponding to
-                  key/value pairs passed to the policy
- policy args      key/value pairs passed to the policy
-		  E.g. 'sequential_threshold 1024'
-		  See cache-policies.txt for details.
- ================ =======================================================
-
-Optional feature arguments are:
-
-
-   ==================== ========================================================
-   writethrough		write through caching that prohibits cache block
-			content from being different from origin block content.
-			Without this argument, the default behaviour is to write
-			back cache block contents later for performance reasons,
-			so they may differ from the corresponding origin blocks.
-
-   passthrough		a degraded mode useful for various cache coherency
-			situations (e.g., rolling back snapshots of
-			underlying storage).	 Reads and writes always go to
-			the origin.	If a write goes to a cached origin
-			block, then the cache block is invalidated.
-			To enable passthrough mode the cache must be clean.
-
-   metadata2		use version 2 of the metadata.  This stores the dirty
-			bits in a separate btree, which improves speed of
-			shutting down the cache.
-
-   no_discard_passdown	disable passing down discards from the cache
-			to the origin's data device.
-   ==================== ========================================================
-
-A policy called 'default' is always registered.  This is an alias for
-the policy we currently think is giving best all round performance.
-
-As the default policy could vary between kernels, if you are relying on
-the characteristics of a specific policy, always request it by name.
-
-Status
-------
-
-::
-
-  <metadata block size> <#used metadata blocks>/<#total metadata blocks>
-  <cache block size> <#used cache blocks>/<#total cache blocks>
-  <#read hits> <#read misses> <#write hits> <#write misses>
-  <#demotions> <#promotions> <#dirty> <#features> <features>*
-  <#core args> <core args>* <policy name> <#policy args> <policy args>*
-  <cache metadata mode>
-
-
-========================= =====================================================
-metadata block size	  Fixed block size for each metadata block in
-			  sectors
-#used metadata blocks	  Number of metadata blocks used
-#total metadata blocks	  Total number of metadata blocks
-cache block size	  Configurable block size for the cache device
-			  in sectors
-#used cache blocks	  Number of blocks resident in the cache
-#total cache blocks	  Total number of cache blocks
-#read hits		  Number of times a READ bio has been mapped
-			  to the cache
-#read misses		  Number of times a READ bio has been mapped
-			  to the origin
-#write hits		  Number of times a WRITE bio has been mapped
-			  to the cache
-#write misses		  Number of times a WRITE bio has been
-			  mapped to the origin
-#demotions		  Number of times a block has been removed
-			  from the cache
-#promotions		  Number of times a block has been moved to
-			  the cache
-#dirty			  Number of blocks in the cache that differ
-			  from the origin
-#feature args		  Number of feature args to follow
-feature args		  'writethrough' (optional)
-#core args		  Number of core arguments (must be even)
-core args		  Key/value pairs for tuning the core
-			  e.g. migration_threshold
-policy name		  Name of the policy
-#policy args		  Number of policy arguments to follow (must be even)
-policy args		  Key/value pairs e.g. sequential_threshold
-cache metadata mode       ro if read-only, rw if read-write
-
-			  In serious cases where even a read-only mode is
-			  deemed unsafe no further I/O will be permitted and
-			  the status will just contain the string 'Fail'.
-			  The userspace recovery tools should then be used.
-needs_check		  'needs_check' if set, '-' if not set
-			  A metadata operation has failed, resulting in the
-			  needs_check flag being set in the metadata's
-			  superblock.  The metadata device must be
-			  deactivated and checked/repaired before the
-			  cache can be made fully operational again.
-			  '-' indicates	needs_check is not set.
-========================= =====================================================
-
-Messages
---------
-
-Policies will have different tunables, specific to each one, so we
-need a generic way of getting and setting these.  Device-mapper
-messages are used.  (A sysfs interface would also be possible.)
-
-The message format is::
-
-   <key> <value>
-
-E.g.::
-
-   dmsetup message my_cache 0 sequential_threshold 1024
-
-
-Invalidation is removing an entry from the cache without writing it
-back.  Cache blocks can be invalidated via the invalidate_cblocks
-message, which takes an arbitrary number of cblock ranges.  Each cblock
-range's end value is "one past the end", meaning 5-10 expresses a range
-of values from 5 to 9.  Each cblock must be expressed as a decimal
-value, in the future a variant message that takes cblock ranges
-expressed in hexadecimal may be needed to better support efficient
-invalidation of larger caches.  The cache must be in passthrough mode
-when invalidate_cblocks is used::
-
-   invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
-
-E.g.::
-
-   dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
-
-Examples
-========
-
-The test suite can be found here:
-
-https://github.com/jthornber/device-mapper-test-suite
-
-::
-
-  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
-	  /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
-  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
-	  /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
-	  mq 4 sequential_threshold 1024 random_threshold 8'
diff --git a/Documentation/device-mapper/delay.rst b/Documentation/device-mapper/delay.rst
deleted file mode 100644
index 917ba8c33359..000000000000
--- a/Documentation/device-mapper/delay.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-========
-dm-delay
-========
-
-Device-Mapper's "delay" target delays reads and/or writes
-and maps them to different devices.
-
-Parameters::
-
-    <device> <offset> <delay> [<write_device> <write_offset> <write_delay>
-			       [<flush_device> <flush_offset> <flush_delay>]]
-
-With separate write parameters, the first set is only used for reads.
-Offsets are specified in sectors.
-Delays are specified in milliseconds.
-
-Example scripts
-===============
-
-::
-
-	#!/bin/sh
-	# Create device delaying rw operation for 500ms
-	echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
-
-::
-
-	#!/bin/sh
-	# Create device delaying only write operation for 500ms and
-	# splitting reads and writes to different devices $1 $2
-	echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
diff --git a/Documentation/device-mapper/dm-crypt.rst b/Documentation/device-mapper/dm-crypt.rst
deleted file mode 100644
index 8f4a3f889d43..000000000000
--- a/Documentation/device-mapper/dm-crypt.rst
+++ /dev/null
@@ -1,173 +0,0 @@
-========
-dm-crypt
-========
-
-Device-Mapper's "crypt" target provides transparent encryption of block devices
-using the kernel crypto API.
-
-For a more detailed description of supported parameters see:
-https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt
-
-Parameters::
-
-	      <cipher> <key> <iv_offset> <device path> \
-	      <offset> [<#opt_params> <opt_params>]
-
-<cipher>
-    Encryption cipher, encryption mode and Initial Vector (IV) generator.
-
-    The cipher specifications format is::
-
-       cipher[:keycount]-chainmode-ivmode[:ivopts]
-
-    Examples::
-
-       aes-cbc-essiv:sha256
-       aes-xts-plain64
-       serpent-xts-plain64
-
-    Cipher format also supports direct specification with kernel crypt API
-    format (selected by capi: prefix). The IV specification is the same
-    as for the first format type.
-    This format is mainly used for specification of authenticated modes.
-
-    The crypto API cipher specifications format is::
-
-        capi:cipher_api_spec-ivmode[:ivopts]
-
-    Examples::
-
-        capi:cbc(aes)-essiv:sha256
-        capi:xts(aes)-plain64
-
-    Examples of authenticated modes::
-
-        capi:gcm(aes)-random
-        capi:authenc(hmac(sha256),xts(aes))-random
-        capi:rfc7539(chacha20,poly1305)-random
-
-    The /proc/crypto contains a list of curently loaded crypto modes.
-
-<key>
-    Key used for encryption. It is encoded either as a hexadecimal number
-    or it can be passed as <key_string> prefixed with single colon
-    character (':') for keys residing in kernel keyring service.
-    You can only use key sizes that are valid for the selected cipher
-    in combination with the selected iv mode.
-    Note that for some iv modes the key string can contain additional
-    keys (for example IV seed) so the key contains more parts concatenated
-    into a single string.
-
-<key_string>
-    The kernel keyring key is identified by string in following format:
-    <key_size>:<key_type>:<key_description>.
-
-<key_size>
-    The encryption key size in bytes. The kernel key payload size must match
-    the value passed in <key_size>.
-
-<key_type>
-    Either 'logon' or 'user' kernel key type.
-
-<key_description>
-    The kernel keyring key description crypt target should look for
-    when loading key of <key_type>.
-
-<keycount>
-    Multi-key compatibility mode. You can define <keycount> keys and
-    then sectors are encrypted according to their offsets (sector 0 uses key0;
-    sector 1 uses key1 etc.).  <keycount> must be a power of two.
-
-<iv_offset>
-    The IV offset is a sector count that is added to the sector number
-    before creating the IV.
-
-<device path>
-    This is the device that is going to be used as backend and contains the
-    encrypted data.  You can specify it as a path like /dev/xxx or a device
-    number <major>:<minor>.
-
-<offset>
-    Starting sector within the device where the encrypted data begins.
-
-<#opt_params>
-    Number of optional parameters. If there are no optional parameters,
-    the optional paramaters section can be skipped or #opt_params can be zero.
-    Otherwise #opt_params is the number of following arguments.
-
-    Example of optional parameters section:
-        3 allow_discards same_cpu_crypt submit_from_crypt_cpus
-
-allow_discards
-    Block discard requests (a.k.a. TRIM) are passed through the crypt device.
-    The default is to ignore discard requests.
-
-    WARNING: Assess the specific security risks carefully before enabling this
-    option.  For example, allowing discards on encrypted devices may lead to
-    the leak of information about the ciphertext device (filesystem type,
-    used space etc.) if the discarded blocks can be located easily on the
-    device later.
-
-same_cpu_crypt
-    Perform encryption using the same cpu that IO was submitted on.
-    The default is to use an unbound workqueue so that encryption work
-    is automatically balanced between available CPUs.
-
-submit_from_crypt_cpus
-    Disable offloading writes to a separate thread after encryption.
-    There are some situations where offloading write bios from the
-    encryption threads to a single thread degrades performance
-    significantly.  The default is to offload write bios to the same
-    thread because it benefits CFQ to have writes submitted using the
-    same context.
-
-integrity:<bytes>:<type>
-    The device requires additional <bytes> metadata per-sector stored
-    in per-bio integrity structure. This metadata must by provided
-    by underlying dm-integrity target.
-
-    The <type> can be "none" if metadata is used only for persistent IV.
-
-    For Authenticated Encryption with Additional Data (AEAD)
-    the <type> is "aead". An AEAD mode additionally calculates and verifies
-    integrity for the encrypted device. The additional space is then
-    used for storing authentication tag (and persistent IV if needed).
-
-sector_size:<bytes>
-    Use <bytes> as the encryption unit instead of 512 bytes sectors.
-    This option can be in range 512 - 4096 bytes and must be power of two.
-    Virtual device will announce this size as a minimal IO and logical sector.
-
-iv_large_sectors
-   IV generators will use sector number counted in <sector_size> units
-   instead of default 512 bytes sectors.
-
-   For example, if <sector_size> is 4096 bytes, plain64 IV for the second
-   sector will be 8 (without flag) and 1 if iv_large_sectors is present.
-   The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
-   if this flag is specified.
-
-Example scripts
-===============
-LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
-encryption with dm-crypt using the 'cryptsetup' utility, see
-https://gitlab.com/cryptsetup/cryptsetup
-
-::
-
-	#!/bin/sh
-	# Create a crypt device using dmsetup
-	dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
-
-::
-
-	#!/bin/sh
-	# Create a crypt device using dmsetup when encryption key is stored in keyring service
-	dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
-
-::
-
-	#!/bin/sh
-	# Create a crypt device using cryptsetup and LUKS header with default cipher
-	cryptsetup luksFormat $1
-	cryptsetup luksOpen $1 crypt1
diff --git a/Documentation/device-mapper/dm-dust.txt b/Documentation/device-mapper/dm-dust.txt
deleted file mode 100644
index 954d402a1f6a..000000000000
--- a/Documentation/device-mapper/dm-dust.txt
+++ /dev/null
@@ -1,272 +0,0 @@
-dm-dust
-=======
-
-This target emulates the behavior of bad sectors at arbitrary
-locations, and the ability to enable the emulation of the failures
-at an arbitrary time.
-
-This target behaves similarly to a linear target.  At a given time,
-the user can send a message to the target to start failing read
-requests on specific blocks (to emulate the behavior of a hard disk
-drive with bad sectors).
-
-When the failure behavior is enabled (i.e.: when the output of
-"dmsetup status" displays "fail_read_on_bad_block"), reads of blocks
-in the "bad block list" will fail with EIO ("Input/output error").
-
-Writes of blocks in the "bad block list will result in the following:
-
-1. Remove the block from the "bad block list".
-2. Successfully complete the write.
-
-This emulates the "remapped sector" behavior of a drive with bad
-sectors.
-
-Normally, a drive that is encountering bad sectors will most likely
-encounter more bad sectors, at an unknown time or location.
-With dm-dust, the user can use the "addbadblock" and "removebadblock"
-messages to add arbitrary bad blocks at new locations, and the
-"enable" and "disable" messages to modulate the state of whether the
-configured "bad blocks" will be treated as bad, or bypassed.
-This allows the pre-writing of test data and metadata prior to
-simulating a "failure" event where bad sectors start to appear.
-
-Table parameters:
------------------
-<device_path> <offset> <blksz>
-
-Mandatory parameters:
-    <device_path>: path to the block device.
-    <offset>: offset to data area from start of device_path
-    <blksz>: block size in bytes
-	     (minimum 512, maximum 1073741824, must be a power of 2)
-
-Usage instructions:
--------------------
-
-First, find the size (in 512-byte sectors) of the device to be used:
-
-$ sudo blockdev --getsz /dev/vdb1
-33552384
-
-Create the dm-dust device:
-(For a device with a block size of 512 bytes)
-$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 512'
-
-(For a device with a block size of 4096 bytes)
-$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 4096'
-
-Check the status of the read behavior ("bypass" indicates that all I/O
-will be passed through to the underlying device):
-$ sudo dmsetup status dust1
-0 33552384 dust 252:17 bypass
-
-$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=128 iflag=direct
-128+0 records in
-128+0 records out
-
-$ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct
-128+0 records in
-128+0 records out
-
-Adding and removing bad blocks:
--------------------------------
-
-At any time (i.e.: whether the device has the "bad block" emulation
-enabled or disabled), bad blocks may be added or removed from the
-device via the "addbadblock" and "removebadblock" messages:
-
-$ sudo dmsetup message dust1 0 addbadblock 60
-kernel: device-mapper: dust: badblock added at block 60
-
-$ sudo dmsetup message dust1 0 addbadblock 67
-kernel: device-mapper: dust: badblock added at block 67
-
-$ sudo dmsetup message dust1 0 addbadblock 72
-kernel: device-mapper: dust: badblock added at block 72
-
-These bad blocks will be stored in the "bad block list".
-While the device is in "bypass" mode, reads and writes will succeed:
-
-$ sudo dmsetup status dust1
-0 33552384 dust 252:17 bypass
-
-Enabling block read failures:
------------------------------
-
-To enable the "fail read on bad block" behavior, send the "enable" message:
-
-$ sudo dmsetup message dust1 0 enable
-kernel: device-mapper: dust: enabling read failures on bad sectors
-
-$ sudo dmsetup status dust1
-0 33552384 dust 252:17 fail_read_on_bad_block
-
-With the device in "fail read on bad block" mode, attempting to read a
-block will encounter an "Input/output error":
-
-$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=1 skip=67 iflag=direct
-dd: error reading '/dev/mapper/dust1': Input/output error
-0+0 records in
-0+0 records out
-0 bytes copied, 0.00040651 s, 0.0 kB/s
-
-...and writing to the bad blocks will remove the blocks from the list,
-therefore emulating the "remap" behavior of hard disk drives:
-
-$ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct
-128+0 records in
-128+0 records out
-
-kernel: device-mapper: dust: block 60 removed from badblocklist by write
-kernel: device-mapper: dust: block 67 removed from badblocklist by write
-kernel: device-mapper: dust: block 72 removed from badblocklist by write
-kernel: device-mapper: dust: block 87 removed from badblocklist by write
-
-Bad block add/remove error handling:
-------------------------------------
-
-Attempting to add a bad block that already exists in the list will
-result in an "Invalid argument" error, as well as a helpful message:
-
-$ sudo dmsetup message dust1 0 addbadblock 88
-device-mapper: message ioctl on dust1  failed: Invalid argument
-kernel: device-mapper: dust: block 88 already in badblocklist
-
-Attempting to remove a bad block that doesn't exist in the list will
-result in an "Invalid argument" error, as well as a helpful message:
-
-$ sudo dmsetup message dust1 0 removebadblock 87
-device-mapper: message ioctl on dust1  failed: Invalid argument
-kernel: device-mapper: dust: block 87 not found in badblocklist
-
-Counting the number of bad blocks in the bad block list:
---------------------------------------------------------
-
-To count the number of bad blocks configured in the device, run the
-following message command:
-
-$ sudo dmsetup message dust1 0 countbadblocks
-
-A message will print with the number of bad blocks currently
-configured on the device:
-
-kernel: device-mapper: dust: countbadblocks: 895 badblock(s) found
-
-Querying for specific bad blocks:
----------------------------------
-
-To find out if a specific block is in the bad block list, run the
-following message command:
-
-$ sudo dmsetup message dust1 0 queryblock 72
-
-The following message will print if the block is in the list:
-device-mapper: dust: queryblock: block 72 found in badblocklist
-
-The following message will print if the block is in the list:
-device-mapper: dust: queryblock: block 72 not found in badblocklist
-
-The "queryblock" message command will work in both the "enabled"
-and "disabled" modes, allowing the verification of whether a block
-will be treated as "bad" without having to issue I/O to the device,
-or having to "enable" the bad block emulation.
-
-Clearing the bad block list:
-----------------------------
-
-To clear the bad block list (without needing to individually run
-a "removebadblock" message command for every block), run the
-following message command:
-
-$ sudo dmsetup message dust1 0 clearbadblocks
-
-After clearing the bad block list, the following message will appear:
-
-kernel: device-mapper: dust: clearbadblocks: badblocks cleared
-
-If there were no bad blocks to clear, the following message will
-appear:
-
-kernel: device-mapper: dust: clearbadblocks: no badblocks found
-
-Message commands list:
-----------------------
-
-Below is a list of the messages that can be sent to a dust device:
-
-Operations on blocks (requires a <blknum> argument):
-
-addbadblock <blknum>
-queryblock <blknum>
-removebadblock <blknum>
-
-...where <blknum> is a block number within range of the device
-  (corresponding to the block size of the device.)
-
-Single argument message commands:
-
-countbadblocks
-clearbadblocks
-disable
-enable
-quiet
-
-Device removal:
----------------
-
-When finished, remove the device via the "dmsetup remove" command:
-
-$ sudo dmsetup remove dust1
-
-Quiet mode:
------------
-
-On test runs with many bad blocks, it may be desirable to avoid
-excessive logging (from bad blocks added, removed, or "remapped").
-This can be done by enabling "quiet mode" via the following message:
-
-$ sudo dmsetup message dust1 0 quiet
-
-This will suppress log messages from add / remove / removed by write
-operations.  Log messages from "countbadblocks" or "queryblock"
-message commands will still print in quiet mode.
-
-The status of quiet mode can be seen by running "dmsetup status":
-
-$ sudo dmsetup status dust1
-0 33552384 dust 252:17 fail_read_on_bad_block quiet
-
-To disable quiet mode, send the "quiet" message again:
-
-$ sudo dmsetup message dust1 0 quiet
-
-$ sudo dmsetup status dust1
-0 33552384 dust 252:17 fail_read_on_bad_block verbose
-
-(The presence of "verbose" indicates normal logging.)
-
-"Why not...?"
--------------
-
-scsi_debug has a "medium error" mode that can fail reads on one
-specified sector (sector 0x1234, hardcoded in the source code), but
-it uses RAM for the persistent storage, which drastically decreases
-the potential device size.
-
-dm-flakey fails all I/O from all block locations at a specified time
-frequency, and not a given point in time.
-
-When a bad sector occurs on a hard disk drive, reads to that sector
-are failed by the device, usually resulting in an error code of EIO
-("I/O error") or ENODATA ("No data available").  However, a write to
-the sector may succeed, and result in the sector becoming readable
-after the device controller no longer experiences errors reading the
-sector (or after a reallocation of the sector).  However, there may
-be bad sectors that occur on the device in the future, in a different,
-unpredictable location.
-
-This target seeks to provide a device that can exhibit the behavior
-of a bad sector at a known sector location, at a known time, based
-on a large storage device (at least tens of gigabytes, not occupying
-system memory).
diff --git a/Documentation/device-mapper/dm-flakey.rst b/Documentation/device-mapper/dm-flakey.rst
deleted file mode 100644
index 86138735879d..000000000000
--- a/Documentation/device-mapper/dm-flakey.rst
+++ /dev/null
@@ -1,74 +0,0 @@
-=========
-dm-flakey
-=========
-
-This target is the same as the linear target except that it exhibits
-unreliable behaviour periodically.  It's been found useful in simulating
-failing devices for testing purposes.
-
-Starting from the time the table is loaded, the device is available for
-<up interval> seconds, then exhibits unreliable behaviour for <down
-interval> seconds, and then this cycle repeats.
-
-Also, consider using this in combination with the dm-delay target too,
-which can delay reads and writes and/or send them to different
-underlying devices.
-
-Table parameters
-----------------
-
-::
-
-  <dev path> <offset> <up interval> <down interval> \
-    [<num_features> [<feature arguments>]]
-
-Mandatory parameters:
-
-    <dev path>:
-        Full pathname to the underlying block-device, or a
-        "major:minor" device-number.
-    <offset>:
-        Starting sector within the device.
-    <up interval>:
-        Number of seconds device is available.
-    <down interval>:
-        Number of seconds device returns errors.
-
-Optional feature parameters:
-
-  If no feature parameters are present, during the periods of
-  unreliability, all I/O returns errors.
-
-  drop_writes:
-	All write I/O is silently ignored.
-	Read I/O is handled correctly.
-
-  error_writes:
-	All write I/O is failed with an error signalled.
-	Read I/O is handled correctly.
-
-  corrupt_bio_byte <Nth_byte> <direction> <value> <flags>:
-	During <down interval>, replace <Nth_byte> of the data of
-	each matching bio with <value>.
-
-    <Nth_byte>:
-	The offset of the byte to replace.
-	Counting starts at 1, to replace the first byte.
-    <direction>:
-	Either 'r' to corrupt reads or 'w' to corrupt writes.
-	'w' is incompatible with drop_writes.
-    <value>:
-	The value (from 0-255) to write.
-    <flags>:
-	Perform the replacement only if bio->bi_opf has all the
-	selected flags set.
-
-Examples:
-
-Replaces the 32nd byte of READ bios with the value 1::
-
-  corrupt_bio_byte 32 r 1 0
-
-Replaces the 224th byte of REQ_META (=32) bios with the value 0::
-
-  corrupt_bio_byte 224 w 0 32
diff --git a/Documentation/device-mapper/dm-init.rst b/Documentation/device-mapper/dm-init.rst
deleted file mode 100644
index e5242ff17e9b..000000000000
--- a/Documentation/device-mapper/dm-init.rst
+++ /dev/null
@@ -1,125 +0,0 @@
-================================
-Early creation of mapped devices
-================================
-
-It is possible to configure a device-mapper device to act as the root device for
-your system in two ways.
-
-The first is to build an initial ramdisk which boots to a minimal userspace
-which configures the device, then pivot_root(8) in to it.
-
-The second is to create one or more device-mappers using the module parameter
-"dm-mod.create=" through the kernel boot command line argument.
-
-The format is specified as a string of data separated by commas and optionally
-semi-colons, where:
-
- - a comma is used to separate fields like name, uuid, flags and table
-   (specifies one device)
- - a semi-colon is used to separate devices.
-
-So the format will look like this::
-
- dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
-
-Where::
-
-	<name>		::= The device name.
-	<uuid>		::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
-	<minor>		::= The device minor number | ""
-	<flags>		::= "ro" | "rw"
-	<table>		::= <start_sector> <num_sectors> <target_type> <target_args>
-	<target_type>	::= "verity" | "linear" | ... (see list below)
-
-The dm line should be equivalent to the one used by the dmsetup tool with the
-`--concise` argument.
-
-Target types
-============
-
-Not all target types are available as there are serious risks in allowing
-activation of certain DM targets without first using userspace tools to check
-the validity of associated metadata.
-
-======================= =======================================================
-`cache`			constrained, userspace should verify cache device
-`crypt`			allowed
-`delay`			allowed
-`era`			constrained, userspace should verify metadata device
-`flakey`		constrained, meant for test
-`linear`		allowed
-`log-writes`		constrained, userspace should verify metadata device
-`mirror`		constrained, userspace should verify main/mirror device
-`raid`			constrained, userspace should verify metadata device
-`snapshot`		constrained, userspace should verify src/dst device
-`snapshot-origin`	allowed
-`snapshot-merge`	constrained, userspace should verify src/dst device
-`striped`		allowed
-`switch`		constrained, userspace should verify dev path
-`thin`			constrained, requires dm target message from userspace
-`thin-pool`		constrained, requires dm target message from userspace
-`verity`		allowed
-`writecache`		constrained, userspace should verify cache device
-`zero`			constrained, not meant for rootfs
-======================= =======================================================
-
-If the target is not listed above, it is constrained by default (not tested).
-
-Examples
-========
-An example of booting to a linear array made up of user-mode linux block
-devices::
-
-  dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
-
-This will boot to a rw dm-linear target of 8192 sectors split across two block
-devices identified by their major:minor numbers.  After boot, udev will rename
-this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned.
-
-An example of multiple device-mappers, with the dm-mod.create="..." contents
-is shown here split on multiple lines for readability::
-
-  dm-linear,,1,rw,
-    0 32768 linear 8:1 0,
-    32768 1024000 linear 8:2 0;
-  dm-verity,,3,ro,
-    0 1638400 verity 1 /dev/sdc1 /dev/sdc2 4096 4096 204800 1 sha256
-    ac87db56303c9c1da433d7209b5a6ef3e4779df141200cbd7c157dcb8dd89c42
-    5ebfe87f7df3235b80a117ebc4078e44f55045487ad4a96581d1adb564615b51
-
-Other examples (per target):
-
-"crypt"::
-
-  dm-crypt,,8,ro,
-    0 1048576 crypt aes-xts-plain64
-    babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0
-    /dev/sda 0 1 allow_discards
-
-"delay"::
-
-  dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500
-
-"linear"::
-
-  dm-linear,,,rw,
-    0 32768 linear /dev/sda1 0,
-    32768 1024000 linear /dev/sda2 0,
-    1056768 204800 linear /dev/sda3 0,
-    1261568 512000 linear /dev/sda4 0
-
-"snapshot-origin"::
-
-  dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2
-
-"striped"::
-
-  dm-striped,,4,ro,0 1638400 striped 4 4096
-  /dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0
-
-"verity"::
-
-  dm-verity,,4,ro,
-    0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
-    fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
-    51934789604d1b92399c52e7cb149d1b3a1b74bbbcb103b2a0aaacbed5c08584
diff --git a/Documentation/device-mapper/dm-integrity.rst b/Documentation/device-mapper/dm-integrity.rst
deleted file mode 100644
index a30aa91b5fbe..000000000000
--- a/Documentation/device-mapper/dm-integrity.rst
+++ /dev/null
@@ -1,259 +0,0 @@
-============
-dm-integrity
-============
-
-The dm-integrity target emulates a block device that has additional
-per-sector tags that can be used for storing integrity information.
-
-A general problem with storing integrity tags with every sector is that
-writing the sector and the integrity tag must be atomic - i.e. in case of
-crash, either both sector and integrity tag or none of them is written.
-
-To guarantee write atomicity, the dm-integrity target uses journal, it
-writes sector data and integrity tags into a journal, commits the journal
-and then copies the data and integrity tags to their respective location.
-
-The dm-integrity target can be used with the dm-crypt target - in this
-situation the dm-crypt target creates the integrity data and passes them
-to the dm-integrity target via bio_integrity_payload attached to the bio.
-In this mode, the dm-crypt and dm-integrity targets provide authenticated
-disk encryption - if the attacker modifies the encrypted device, an I/O
-error is returned instead of random data.
-
-The dm-integrity target can also be used as a standalone target, in this
-mode it calculates and verifies the integrity tag internally. In this
-mode, the dm-integrity target can be used to detect silent data
-corruption on the disk or in the I/O path.
-
-There's an alternate mode of operation where dm-integrity uses bitmap
-instead of a journal. If a bit in the bitmap is 1, the corresponding
-region's data and integrity tags are not synchronized - if the machine
-crashes, the unsynchronized regions will be recalculated. The bitmap mode
-is faster than the journal mode, because we don't have to write the data
-twice, but it is also less reliable, because if data corruption happens
-when the machine crashes, it may not be detected.
-
-When loading the target for the first time, the kernel driver will format
-the device. But it will only format the device if the superblock contains
-zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
-target can't be loaded.
-
-To use the target for the first time:
-
-1. overwrite the superblock with zeroes
-2. load the dm-integrity target with one-sector size, the kernel driver
-   will format the device
-3. unload the dm-integrity target
-4. read the "provided_data_sectors" value from the superblock
-5. load the dm-integrity target with the the target size
-   "provided_data_sectors"
-6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
-   with the size "provided_data_sectors"
-
-
-Target arguments:
-
-1. the underlying block device
-
-2. the number of reserved sector at the beginning of the device - the
-   dm-integrity won't read of write these sectors
-
-3. the size of the integrity tag (if "-" is used, the size is taken from
-   the internal-hash algorithm)
-
-4. mode:
-
-	D - direct writes (without journal)
-		in this mode, journaling is
-		not used and data sectors and integrity tags are written
-		separately. In case of crash, it is possible that the data
-		and integrity tag doesn't match.
-	J - journaled writes
-		data and integrity tags are written to the
-		journal and atomicity is guaranteed. In case of crash,
-		either both data and tag or none of them are written. The
-		journaled mode degrades write throughput twice because the
-		data have to be written twice.
-	B - bitmap mode - data and metadata are written without any
-		synchronization, the driver maintains a bitmap of dirty
-		regions where data and metadata don't match. This mode can
-		only be used with internal hash.
-	R - recovery mode - in this mode, journal is not replayed,
-		checksums are not checked and writes to the device are not
-		allowed. This mode is useful for data recovery if the
-		device cannot be activated in any of the other standard
-		modes.
-
-5. the number of additional arguments
-
-Additional arguments:
-
-journal_sectors:number
-	The size of journal, this argument is used only if formatting the
-	device. If the device is already formatted, the value from the
-	superblock is used.
-
-interleave_sectors:number
-	The number of interleaved sectors. This values is rounded down to
-	a power of two. If the device is already formatted, the value from
-	the superblock is used.
-
-meta_device:device
-	Don't interleave the data and metadata on on device. Use a
-	separate device for metadata.
-
-buffer_sectors:number
-	The number of sectors in one buffer. The value is rounded down to
-	a power of two.
-
-	The tag area is accessed using buffers, the buffer size is
-	configurable. The large buffer size means that the I/O size will
-	be larger, but there could be less I/Os issued.
-
-journal_watermark:number
-	The journal watermark in percents. When the size of the journal
-	exceeds this watermark, the thread that flushes the journal will
-	be started.
-
-commit_time:number
-	Commit time in milliseconds. When this time passes, the journal is
-	written. The journal is also written immediatelly if the FLUSH
-	request is received.
-
-internal_hash:algorithm(:key)	(the key is optional)
-	Use internal hash or crc.
-	When this argument is used, the dm-integrity target won't accept
-	integrity tags from the upper target, but it will automatically
-	generate and verify the integrity tags.
-
-	You can use a crc algorithm (such as crc32), then integrity target
-	will protect the data against accidental corruption.
-	You can also use a hmac algorithm (for example
-	"hmac(sha256):0123456789abcdef"), in this mode it will provide
-	cryptographic authentication of the data without encryption.
-
-	When this argument is not used, the integrity tags are accepted
-	from an upper layer target, such as dm-crypt. The upper layer
-	target should check the validity of the integrity tags.
-
-recalculate
-	Recalculate the integrity tags automatically. It is only valid
-	when using internal hash.
-
-journal_crypt:algorithm(:key)	(the key is optional)
-	Encrypt the journal using given algorithm to make sure that the
-	attacker can't read the journal. You can use a block cipher here
-	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
-	"salsa20", "ctr(aes)" or "ecb(arc4)").
-
-	The journal contains history of last writes to the block device,
-	an attacker reading the journal could see the last sector nubmers
-	that were written. From the sector numbers, the attacker can infer
-	the size of files that were written. To protect against this
-	situation, you can encrypt the journal.
-
-journal_mac:algorithm(:key)	(the key is optional)
-	Protect sector numbers in the journal from accidental or malicious
-	modification. To protect against accidental modification, use a
-	crc algorithm, to protect against malicious modification, use a
-	hmac algorithm with a key.
-
-	This option is not needed when using internal-hash because in this
-	mode, the integrity of journal entries is checked when replaying
-	the journal. Thus, modified sector number would be detected at
-	this stage.
-
-block_size:number
-	The size of a data block in bytes.  The larger the block size the
-	less overhead there is for per-block integrity metadata.
-	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
-	specified the default block size is 512 bytes.
-
-sectors_per_bit:number
-	In the bitmap mode, this parameter specifies the number of
-	512-byte sectors that corresponds to one bitmap bit.
-
-bitmap_flush_interval:number
-	The bitmap flush interval in milliseconds. The metadata buffers
-	are synchronized when this interval expires.
-
-
-The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
-be changed when reloading the target (load an inactive table and swap the
-tables with suspend and resume). The other arguments should not be changed
-when reloading the target because the layout of disk data depend on them
-and the reloaded target would be non-functional.
-
-
-The layout of the formatted block device:
-
-* reserved sectors
-    (they are not used by this target, they can be used for
-    storing LUKS metadata or for other purpose), the size of the reserved
-    area is specified in the target arguments
-
-* superblock (4kiB)
-	* magic string - identifies that the device was formatted
-	* version
-	* log2(interleave sectors)
-	* integrity tag size
-	* the number of journal sections
-	* provided data sectors - the number of sectors that this target
-	  provides (i.e. the size of the device minus the size of all
-	  metadata and padding). The user of this target should not send
-	  bios that access data beyond the "provided data sectors" limit.
-	* flags
-	    SB_FLAG_HAVE_JOURNAL_MAC
-		- a flag is set if journal_mac is used
-	    SB_FLAG_RECALCULATING
-		- recalculating is in progress
-	    SB_FLAG_DIRTY_BITMAP
-		- journal area contains the bitmap of dirty
-		  blocks
-	* log2(sectors per block)
-	* a position where recalculating finished
-* journal
-	The journal is divided into sections, each section contains:
-
-	* metadata area (4kiB), it contains journal entries
-
-	  - every journal entry contains:
-
-		* logical sector (specifies where the data and tag should
-		  be written)
-		* last 8 bytes of data
-		* integrity tag (the size is specified in the superblock)
-
-	  - every metadata sector ends with
-
-		* mac (8-bytes), all the macs in 8 metadata sectors form a
-		  64-byte value. It is used to store hmac of sector
-		  numbers in the journal section, to protect against a
-		  possibility that the attacker tampers with sector
-		  numbers in the journal.
-		* commit id
-
-	* data area (the size is variable; it depends on how many journal
-	  entries fit into the metadata area)
-
-	    - every sector in the data area contains:
-
-		* data (504 bytes of data, the last 8 bytes are stored in
-		  the journal entry)
-		* commit id
-
-	To test if the whole journal section was written correctly, every
-	512-byte sector of the journal ends with 8-byte commit id. If the
-	commit id matches on all sectors in a journal section, then it is
-	assumed that the section was written correctly. If the commit id
-	doesn't match, the section was written partially and it should not
-	be replayed.
-
-* one or more runs of interleaved tags and data.
-    Each run contains:
-
-	* tag area - it contains integrity tags. There is one tag for each
-	  sector in the data area
-	* data area - it contains data sectors. The number of data sectors
-	  in one run must be a power of two. log2 of this value is stored
-	  in the superblock.
diff --git a/Documentation/device-mapper/dm-io.rst b/Documentation/device-mapper/dm-io.rst
deleted file mode 100644
index d2492917a1f5..000000000000
--- a/Documentation/device-mapper/dm-io.rst
+++ /dev/null
@@ -1,75 +0,0 @@
-=====
-dm-io
-=====
-
-Dm-io provides synchronous and asynchronous I/O services. There are three
-types of I/O services available, and each type has a sync and an async
-version.
-
-The user must set up an io_region structure to describe the desired location
-of the I/O. Each io_region indicates a block-device along with the starting
-sector and size of the region::
-
-   struct io_region {
-      struct block_device *bdev;
-      sector_t sector;
-      sector_t count;
-   };
-
-Dm-io can read from one io_region or write to one or more io_regions. Writes
-to multiple regions are specified by an array of io_region structures.
-
-The first I/O service type takes a list of memory pages as the data buffer for
-the I/O, along with an offset into the first page::
-
-   struct page_list {
-      struct page_list *next;
-      struct page *page;
-   };
-
-   int dm_io_sync(unsigned int num_regions, struct io_region *where, int rw,
-                  struct page_list *pl, unsigned int offset,
-                  unsigned long *error_bits);
-   int dm_io_async(unsigned int num_regions, struct io_region *where, int rw,
-                   struct page_list *pl, unsigned int offset,
-                   io_notify_fn fn, void *context);
-
-The second I/O service type takes an array of bio vectors as the data buffer
-for the I/O. This service can be handy if the caller has a pre-assembled bio,
-but wants to direct different portions of the bio to different devices::
-
-   int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where,
-                       int rw, struct bio_vec *bvec,
-                       unsigned long *error_bits);
-   int dm_io_async_bvec(unsigned int num_regions, struct io_region *where,
-                        int rw, struct bio_vec *bvec,
-                        io_notify_fn fn, void *context);
-
-The third I/O service type takes a pointer to a vmalloc'd memory buffer as the
-data buffer for the I/O. This service can be handy if the caller needs to do
-I/O to a large region but doesn't want to allocate a large number of individual
-memory pages::
-
-   int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw,
-                     void *data, unsigned long *error_bits);
-   int dm_io_async_vm(unsigned int num_regions, struct io_region *where, int rw,
-                      void *data, io_notify_fn fn, void *context);
-
-Callers of the asynchronous I/O services must include the name of a completion
-callback routine and a pointer to some context data for the I/O::
-
-   typedef void (*io_notify_fn)(unsigned long error, void *context);
-
-The "error" parameter in this callback, as well as the `*error` parameter in
-all of the synchronous versions, is a bitset (instead of a simple error value).
-In the case of an write-I/O to multiple regions, this bitset allows dm-io to
-indicate success or failure on each individual region.
-
-Before using any of the dm-io services, the user should call dm_io_get()
-and specify the number of pages they expect to perform I/O on concurrently.
-Dm-io will attempt to resize its mempool to make sure enough pages are
-always available in order to avoid unnecessary waiting while performing I/O.
-
-When the user is finished using the dm-io services, they should call
-dm_io_put() and specify the same number of pages that were given on the
-dm_io_get() call.
diff --git a/Documentation/device-mapper/dm-log.rst b/Documentation/device-mapper/dm-log.rst
deleted file mode 100644
index ba4fce39bc27..000000000000
--- a/Documentation/device-mapper/dm-log.rst
+++ /dev/null
@@ -1,57 +0,0 @@
-=====================
-Device-Mapper Logging
-=====================
-The device-mapper logging code is used by some of the device-mapper
-RAID targets to track regions of the disk that are not consistent.
-A region (or portion of the address space) of the disk may be
-inconsistent because a RAID stripe is currently being operated on or
-a machine died while the region was being altered.  In the case of
-mirrors, a region would be considered dirty/inconsistent while you
-are writing to it because the writes need to be replicated for all
-the legs of the mirror and may not reach the legs at the same time.
-Once all writes are complete, the region is considered clean again.
-
-There is a generic logging interface that the device-mapper RAID
-implementations use to perform logging operations (see
-dm_dirty_log_type in include/linux/dm-dirty-log.h).  Various different
-logging implementations are available and provide different
-capabilities.  The list includes:
-
-==============	==============================================================
-Type		Files
-==============	==============================================================
-disk		drivers/md/dm-log.c
-core		drivers/md/dm-log.c
-userspace	drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
-==============	==============================================================
-
-The "disk" log type
--------------------
-This log implementation commits the log state to disk.  This way, the
-logging state survives reboots/crashes.
-
-The "core" log type
--------------------
-This log implementation keeps the log state in memory.  The log state
-will not survive a reboot or crash, but there may be a small boost in
-performance.  This method can also be used if no storage device is
-available for storing log state.
-
-The "userspace" log type
-------------------------
-This log type simply provides a way to export the log API to userspace,
-so log implementations can be done there.  This is done by forwarding most
-logging requests to userspace, where a daemon receives and processes the
-request.
-
-The structure used for communication between kernel and userspace are
-located in include/linux/dm-log-userspace.h.  Due to the frequency,
-diversity, and 2-way communication nature of the exchanges between
-kernel and userspace, 'connector' is used as the interface for
-communication.
-
-There are currently two userspace log implementations that leverage this
-framework - "clustered-disk" and "clustered-core".  These implementations
-provide a cluster-coherent log for shared-storage.  Device-mapper mirroring
-can be used in a shared-storage environment when the cluster log implementations
-are employed.
diff --git a/Documentation/device-mapper/dm-queue-length.rst b/Documentation/device-mapper/dm-queue-length.rst
deleted file mode 100644
index d8e381c1cb02..000000000000
--- a/Documentation/device-mapper/dm-queue-length.rst
+++ /dev/null
@@ -1,48 +0,0 @@
-===============
-dm-queue-length
-===============
-
-dm-queue-length is a path selector module for device-mapper targets,
-which selects a path with the least number of in-flight I/Os.
-The path selector name is 'queue-length'.
-
-Table parameters for each path: [<repeat_count>]
-
-::
-
-	<repeat_count>: The number of I/Os to dispatch using the selected
-			path before switching to the next path.
-			If not given, internal default is used. To check
-			the default value, see the activated table.
-
-Status for each path: <status> <fail-count> <in-flight>
-
-::
-
-	<status>: 'A' if the path is active, 'F' if the path is failed.
-	<fail-count>: The number of path failures.
-	<in-flight>: The number of in-flight I/Os on the path.
-
-
-Algorithm
-=========
-
-dm-queue-length increments/decrements 'in-flight' when an I/O is
-dispatched/completed respectively.
-dm-queue-length selects a path with the minimum 'in-flight'.
-
-
-Examples
-========
-In case that 2 paths (sda and sdb) are used with repeat_count == 128.
-
-::
-
-  # echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
-    dmsetup create test
-  #
-  # dmsetup table
-  test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
-  #
-  # dmsetup status
-  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/device-mapper/dm-raid.rst b/Documentation/device-mapper/dm-raid.rst
deleted file mode 100644
index 2fe255b130fb..000000000000
--- a/Documentation/device-mapper/dm-raid.rst
+++ /dev/null
@@ -1,419 +0,0 @@
-=======
-dm-raid
-=======
-
-The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
-It allows the MD RAID drivers to be accessed using a device-mapper
-interface.
-
-
-Mapping Table Interface
------------------------
-The target is named "raid" and it accepts the following parameters::
-
-  <raid_type> <#raid_params> <raid_params> \
-    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
-
-<raid_type>:
-
-  ============= ===============================================================
-  raid0		RAID0 striping (no resilience)
-  raid1		RAID1 mirroring
-  raid4		RAID4 with dedicated last parity disk
-  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
-		Same as raid4
-
-		- Transitory layout
-  raid5_la	RAID5 left asymmetric
-
-		- rotating parity 0 with data continuation
-  raid5_ra	RAID5 right asymmetric
-
-		- rotating parity N with data continuation
-  raid5_ls	RAID5 left symmetric
-
-		- rotating parity 0 with data restart
-  raid5_rs 	RAID5 right symmetric
-
-		- rotating parity N with data restart
-  raid6_zr	RAID6 zero restart
-
-		- rotating parity zero (left-to-right) with data restart
-  raid6_nr	RAID6 N restart
-
-		- rotating parity N (right-to-left) with data restart
-  raid6_nc	RAID6 N continue
-
-		- rotating parity N (right-to-left) with data continuation
-  raid6_n_6	RAID6 with dedicate parity disks
-
-		- parity and Q-syndrome on the last 2 disks;
-		  layout for takeover from/to raid4/raid5_n
-  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
-
-		- layout for takeover from raid5_la from/to raid6
-  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
-
-		- layout for takeover from raid5_ra from/to raid6
-  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
-
-		- layout for takeover from raid5_ls from/to raid6
-  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
-
-		- layout for takeover from raid5_rs from/to raid6
-  raid10        Various RAID10 inspired algorithms chosen by additional params
-		(see raid10_format and raid10_copies below)
-
-		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
-		- RAID1E: Integrated Adjacent Stripe Mirroring
-		- RAID1E: Integrated Offset Stripe Mirroring
-		- and other similar RAID10 variants
-  ============= ===============================================================
-
-  Reference: Chapter 4 of
-  http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
-
-<#raid_params>: The number of parameters that follow.
-
-<raid_params> consists of
-
-    Mandatory parameters:
-        <chunk_size>:
-		      Chunk size in sectors.  This parameter is often known as
-		      "stripe size".  It is the only mandatory parameter and
-		      is placed first.
-
-    followed by optional parameters (in any order):
-	[sync|nosync]
-		Force or prevent RAID initialization.
-
-	[rebuild <idx>]
-		Rebuild drive number 'idx' (first drive is 0).
-
-	[daemon_sleep <ms>]
-		Interval between runs of the bitmap daemon that
-		clear bits.  A longer interval means less bitmap I/O but
-		resyncing after a failure is likely to take longer.
-
-	[min_recovery_rate <kB/sec/disk>]
-		Throttle RAID initialization
-	[max_recovery_rate <kB/sec/disk>]
-		Throttle RAID initialization
-	[write_mostly <idx>]
-		Mark drive index 'idx' write-mostly.
-	[max_write_behind <sectors>]
-		See '--write-behind=' (man mdadm)
-	[stripe_cache <sectors>]
-		Stripe cache size (RAID 4/5/6 only)
-	[region_size <sectors>]
-		The region_size multiplied by the number of regions is the
-		logical size of the array.  The bitmap records the device
-		synchronisation state for each region.
-
-        [raid10_copies   <# copies>], [raid10_format   <near|far|offset>]
-		These two options are used to alter the default layout of
-		a RAID10 configuration.  The number of copies is can be
-		specified, but the default is 2.  There are also three
-		variations to how the copies are laid down - the default
-		is "near".  Near copies are what most people think of with
-		respect to mirroring.  If these options are left unspecified,
-		or 'raid10_copies 2' and/or 'raid10_format near' are given,
-		then the layouts for 2, 3 and 4 devices	are:
-
-		========	 ==========	   ==============
-		2 drives         3 drives          4 drives
-		========	 ==========	   ==============
-		A1  A1           A1  A1  A2        A1  A1  A2  A2
-		A2  A2           A2  A3  A3        A3  A3  A4  A4
-		A3  A3           A4  A4  A5        A5  A5  A6  A6
-		A4  A4           A5  A6  A6        A7  A7  A8  A8
-		..  ..           ..  ..  ..        ..  ..  ..  ..
-		========	 ==========	   ==============
-
-		The 2-device layout is equivalent 2-way RAID1.  The 4-device
-		layout is what a traditional RAID10 would look like.  The
-		3-device layout is what might be called a 'RAID1E - Integrated
-		Adjacent Stripe Mirroring'.
-
-		If 'raid10_copies 2' and 'raid10_format far', then the layouts
-		for 2, 3 and 4 devices are:
-
-		========	     ============	  ===================
-		2 drives             3 drives             4 drives
-		========	     ============	  ===================
-		A1  A2               A1   A2   A3         A1   A2   A3   A4
-		A3  A4               A4   A5   A6         A5   A6   A7   A8
-		A5  A6               A7   A8   A9         A9   A10  A11  A12
-		..  ..               ..   ..   ..         ..   ..   ..   ..
-		A2  A1               A3   A1   A2         A2   A1   A4   A3
-		A4  A3               A6   A4   A5         A6   A5   A8   A7
-		A6  A5               A9   A7   A8         A10  A9   A12  A11
-		..  ..               ..   ..   ..         ..   ..   ..   ..
-		========	     ============	  ===================
-
-		If 'raid10_copies 2' and 'raid10_format offset', then the
-		layouts for 2, 3 and 4 devices are:
-
-		========       ==========         ================
-		2 drives       3 drives           4 drives
-		========       ==========         ================
-		A1  A2         A1  A2  A3         A1  A2  A3  A4
-		A2  A1         A3  A1  A2         A2  A1  A4  A3
-		A3  A4         A4  A5  A6         A5  A6  A7  A8
-		A4  A3         A6  A4  A5         A6  A5  A8  A7
-		A5  A6         A7  A8  A9         A9  A10 A11 A12
-		A6  A5         A9  A7  A8         A10 A9  A12 A11
-		..  ..         ..  ..  ..         ..  ..  ..  ..
-		========       ==========         ================
-
-		Here we see layouts closely akin to 'RAID1E - Integrated
-		Offset Stripe Mirroring'.
-
-        [delta_disks <N>]
-		The delta_disks option value (-251 < N < +251) triggers
-		device removal (negative value) or device addition (positive
-		value) to any reshape supporting raid levels 4/5/6 and 10.
-		RAID levels 4/5/6 allow for addition of devices (metadata
-		and data device tuple), raid10_near and raid10_offset only
-		allow for device addition. raid10_far does not support any
-		reshaping at all.
-		A minimum of devices have to be kept to enforce resilience,
-		which is 3 devices for raid4/5 and 4 devices for raid6.
-
-        [data_offset <sectors>]
-		This option value defines the offset into each data device
-		where the data starts. This is used to provide out-of-place
-		reshaping space to avoid writing over data while
-		changing the layout of stripes, hence an interruption/crash
-		may happen at any time without the risk of losing data.
-		E.g. when adding devices to an existing raid set during
-		forward reshaping, the out-of-place space will be allocated
-		at the beginning of each raid device. The kernel raid4/5/6/10
-		MD personalities supporting such device addition will read the data from
-		the existing first stripes (those with smaller number of stripes)
-		starting at data_offset to fill up a new stripe with the larger
-		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
-		and write that new stripe to offset 0. Same will be applied to all
-		N-1 other new stripes. This out-of-place scheme is used to change
-		the RAID type (i.e. the allocation algorithm) as well, e.g.
-		changing from raid5_ls to raid5_n.
-
-	[journal_dev <dev>]
-		This option adds a journal device to raid4/5/6 raid sets and
-		uses it to close the 'write hole' caused by the non-atomic updates
-		to the component devices which can cause data loss during recovery.
-		The journal device is used as writethrough thus causing writes to
-		be throttled versus non-journaled raid4/5/6 sets.
-		Takeover/reshape is not possible with a raid4/5/6 journal device;
-		it has to be deconfigured before requesting these.
-
-	[journal_mode <mode>]
-		This option sets the caching mode on journaled raid4/5/6 raid sets
-		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
-		If 'writeback' is selected the journal device has to be resilient
-		and must not suffer from the 'write hole' problem itself (e.g. use
-		raid1 or raid10) to avoid a single point of failure.
-
-<#raid_devs>: The number of devices composing the array.
-	Each device consists of two entries.  The first is the device
-	containing the metadata (if any); the second is the one containing the
-	data. A Maximum of 64 metadata/data device entries are supported
-	up to target version 1.8.0.
-	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
-
-	If a drive has failed or is missing at creation time, a '-' can be
-	given for both the metadata and data drives for a given position.
-
-
-Example Tables
---------------
-
-::
-
-  # RAID4 - 4 data drives, 1 parity (no metadata devices)
-  # No metadata devices specified to hold superblock/bitmap info
-  # Chunk size of 1MiB
-  # (Lines separated for easy reading)
-
-  0 1960893648 raid \
-          raid4 1 2048 \
-          5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
-
-  # RAID4 - 4 data drives, 1 parity (with metadata devices)
-  # Chunk size of 1MiB, force RAID initialization,
-  #       min recovery rate at 20 kiB/sec/disk
-
-  0 1960893648 raid \
-          raid4 4 2048 sync min_recovery_rate 20 \
-          5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
-
-
-Status Output
--------------
-'dmsetup table' displays the table used to construct the mapping.
-The optional parameters are always printed in the order listed
-above with "sync" or "nosync" always output ahead of the other
-arguments, regardless of the order used when originally loading the table.
-Arguments that can be repeated are ordered by value.
-
-
-'dmsetup status' yields information on the state and health of the array.
-The output is as follows (normally a single line, but expanded here for
-clarity)::
-
-  1: <s> <l> raid \
-  2:      <raid_type> <#devices> <health_chars> \
-  3:      <sync_ratio> <sync_action> <mismatch_cnt>
-
-Line 1 is the standard output produced by device-mapper.
-
-Line 2 & 3 are produced by the raid target and are best explained by example::
-
-        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
-
-Here we can see the RAID type is raid4, there are 5 devices - all of
-which are 'A'live, and the array is 2/490221568 complete with its initial
-recovery.  Here is a fuller description of the individual fields:
-
-	=============== =========================================================
-	<raid_type>     Same as the <raid_type> used to create the array.
-	<health_chars>  One char for each device, indicating:
-
-			- 'A' = alive and in-sync
-			- 'a' = alive but not in-sync
-			- 'D' = dead/failed.
-	<sync_ratio>    The ratio indicating how much of the array has undergone
-			the process described by 'sync_action'.  If the
-			'sync_action' is "check" or "repair", then the process
-			of "resync" or "recover" can be considered complete.
-	<sync_action>   One of the following possible states:
-
-			idle
-				- No synchronization action is being performed.
-			frozen
-				- The current action has been halted.
-			resync
-				- Array is undergoing its initial synchronization
-				  or is resynchronizing after an unclean shutdown
-				  (possibly aided by a bitmap).
-			recover
-				- A device in the array is being rebuilt or
-				  replaced.
-			check
-				- A user-initiated full check of the array is
-				  being performed.  All blocks are read and
-				  checked for consistency.  The number of
-				  discrepancies found are recorded in
-				  <mismatch_cnt>.  No changes are made to the
-				  array by this action.
-			repair
-				- The same as "check", but discrepancies are
-				  corrected.
-			reshape
-				- The array is undergoing a reshape.
-	<mismatch_cnt>  The number of discrepancies found between mirror copies
-			in RAID1/10 or wrong parity values found in RAID4/5/6.
-			This value is valid only after a "check" of the array
-			is performed.  A healthy array has a 'mismatch_cnt' of 0.
-	<data_offset>   The current data offset to the start of the user data on
-			each component device of a raid set (see the respective
-			raid parameter to support out-of-place reshaping).
-	<journal_char>	- 'A' - active write-through journal device.
-			- 'a' - active write-back journal device.
-			- 'D' - dead journal device.
-			- '-' - no journal device.
-	=============== =========================================================
-
-
-Message Interface
------------------
-The dm-raid target will accept certain actions through the 'message' interface.
-('man dmsetup' for more information on the message interface.)  These actions
-include:
-
-	========= ================================================
-	"idle"    Halt the current sync action.
-	"frozen"  Freeze the current sync action.
-	"resync"  Initiate/continue a resync.
-	"recover" Initiate/continue a recover process.
-	"check"   Initiate a check (i.e. a "scrub") of the array.
-	"repair"  Initiate a repair of the array.
-	========= ================================================
-
-
-Discard Support
----------------
-The implementation of discard support among hardware vendors varies.
-When a block is discarded, some storage devices will return zeroes when
-the block is read.  These devices set the 'discard_zeroes_data'
-attribute.  Other devices will return random data.  Confusingly, some
-devices that advertise 'discard_zeroes_data' will not reliably return
-zeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
-from a number of devices to calculate parity blocks and (for performance
-reasons) relies on 'discard_zeroes_data' being reliable, it is important
-that the devices be consistent.  Blocks may be discarded in the middle
-of a RAID 4/5/6 stripe and if subsequent read results are not
-consistent, the parity blocks may be calculated differently at any time;
-making the parity blocks useless for redundancy.  It is important to
-understand how your hardware behaves with discards if you are going to
-enable discards with RAID 4/5/6.
-
-Since the behavior of storage devices is unreliable in this respect,
-even when reporting 'discard_zeroes_data', by default RAID 4/5/6
-discard support is disabled -- this ensures data integrity at the
-expense of losing some performance.
-
-Storage devices that properly support 'discard_zeroes_data' are
-increasingly whitelisted in the kernel and can thus be trusted.
-
-For trusted devices, the following dm-raid module parameter can be set
-to safely enable discard support for RAID 4/5/6:
-
-    'devices_handle_discards_safely'
-
-
-Version History
----------------
-
-::
-
- 1.0.0	Initial version.  Support for RAID 4/5/6
- 1.1.0	Added support for RAID 1
- 1.2.0	Handle creation of arrays that contain failed devices.
- 1.3.0	Added support for RAID 10
- 1.3.1	Allow device replacement/rebuild for RAID 10
- 1.3.2	Fix/improve redundancy checking for RAID10
- 1.4.0	Non-functional change.  Removes arg from mapping function.
- 1.4.1	RAID10 fix redundancy validation checks (commit 55ebbb5).
- 1.4.2	Add RAID10 "far" and "offset" algorithm support.
- 1.5.0	Add message interface to allow manipulation of the sync_action.
-	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
- 1.5.1	Add ability to restore transiently failed devices on resume.
- 1.5.2	'mismatch_cnt' is zero unless [last_]sync_action is "check".
- 1.6.0	Add discard support (and devices_handle_discard_safely module param).
- 1.7.0	Add support for MD RAID0 mappings.
- 1.8.0	Explicitly check for compatible flags in the superblock metadata
-	and reject to start the raid set if any are set by a newer
-	target version, thus avoiding data corruption on a raid set
-	with a reshape in progress.
- 1.9.0	Add support for RAID level takeover/reshape/region size
-	and set size reduction.
- 1.9.1	Fix activation of existing RAID 4/10 mapped devices
- 1.9.2	Don't emit '- -' on the status table line in case the constructor
-	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
-	'D' on the status line.  If '- -' is passed into the constructor, emit
-	'- -' on the table line and '-' as the status line health character.
- 1.10.0	Add support for raid4/5/6 journal device
- 1.10.1	Fix data corruption on reshape request
- 1.11.0	Fix table line argument order
-	(wrong raid10_copies/raid10_format sequence)
- 1.11.1	Add raid4/5/6 journal write-back support via journal_mode option
- 1.12.1	Fix for MD deadlock between mddev_suspend() and md_write_start() available
- 1.13.0	Fix dev_health status at end of "recover" (was 'a', now 'A')
- 1.13.1	Fix deadlock caused by early md_stop_writes().  Also fix size an
-	state races.
- 1.13.2	Fix raid redundancy validation and avoid keeping raid set frozen
- 1.14.0	Fix reshape race on small devices.  Fix stripe adding reshape
-	deadlock/potential data corruption.  Update superblock when
-	specific devices are requested via rebuild.  Fix RAID leg
-	rebuild errors.
diff --git a/Documentation/device-mapper/dm-service-time.rst b/Documentation/device-mapper/dm-service-time.rst
deleted file mode 100644
index facf277fc13c..000000000000
--- a/Documentation/device-mapper/dm-service-time.rst
+++ /dev/null
@@ -1,101 +0,0 @@
-===============
-dm-service-time
-===============
-
-dm-service-time is a path selector module for device-mapper targets,
-which selects a path with the shortest estimated service time for
-the incoming I/O.
-
-The service time for each path is estimated by dividing the total size
-of in-flight I/Os on a path with the performance value of the path.
-The performance value is a relative throughput value among all paths
-in a path-group, and it can be specified as a table argument.
-
-The path selector name is 'service-time'.
-
-Table parameters for each path:
-
-    [<repeat_count> [<relative_throughput>]]
-	<repeat_count>:
-			The number of I/Os to dispatch using the selected
-			path before switching to the next path.
-			If not given, internal default is used.  To check
-			the default value, see the activated table.
-	<relative_throughput>:
-			The relative throughput value of the path
-			among all paths in the path-group.
-			The valid range is 0-100.
-			If not given, minimum value '1' is used.
-			If '0' is given, the path isn't selected while
-			other paths having a positive value are available.
-
-Status for each path:
-
-    <status> <fail-count> <in-flight-size> <relative_throughput>
-	<status>:
-		'A' if the path is active, 'F' if the path is failed.
-	<fail-count>:
-		The number of path failures.
-	<in-flight-size>:
-		The size of in-flight I/Os on the path.
-	<relative_throughput>:
-		The relative throughput value of the path
-		among all paths in the path-group.
-
-
-Algorithm
-=========
-
-dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
-dispatched and subtracts when completed.
-Basically, dm-service-time selects a path having minimum service time
-which is calculated by::
-
-	('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
-
-However, some optimizations below are used to reduce the calculation
-as much as possible.
-
-	1. If the paths have the same 'relative_throughput', skip
-	   the division and just compare the 'in-flight-size'.
-
-	2. If the paths have the same 'in-flight-size', skip the division
-	   and just compare the 'relative_throughput'.
-
-	3. If some paths have non-zero 'relative_throughput' and others
-	   have zero 'relative_throughput', ignore those paths with zero
-	   'relative_throughput'.
-
-If such optimizations can't be applied, calculate service time, and
-compare service time.
-If calculated service time is equal, the path having maximum
-'relative_throughput' may be better.  So compare 'relative_throughput'
-then.
-
-
-Examples
-========
-In case that 2 paths (sda and sdb) are used with repeat_count == 128
-and sda has an average throughput 1GB/s and sdb has 4GB/s,
-'relative_throughput' value may be '1' for sda and '4' for sdb::
-
-  # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
-    dmsetup create test
-  #
-  # dmsetup table
-  test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
-  #
-  # dmsetup status
-  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
-
-
-Or '2' for sda and '8' for sdb would be also true::
-
-  # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
-    dmsetup create test
-  #
-  # dmsetup table
-  test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
-  #
-  # dmsetup status
-  test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
diff --git a/Documentation/device-mapper/dm-uevent.rst b/Documentation/device-mapper/dm-uevent.rst
deleted file mode 100644
index 4a8ee8d069c9..000000000000
--- a/Documentation/device-mapper/dm-uevent.rst
+++ /dev/null
@@ -1,110 +0,0 @@
-====================
-device-mapper uevent
-====================
-
-The device-mapper uevent code adds the capability to device-mapper to create
-and send kobject uevents (uevents).  Previously device-mapper events were only
-available through the ioctl interface.  The advantage of the uevents interface
-is the event contains environment attributes providing increased context for
-the event avoiding the need to query the state of the device-mapper device after
-the event is received.
-
-There are two functions currently for device-mapper events.  The first function
-listed creates the event and the second function sends the event(s)::
-
-  void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti,
-                      const char *path, unsigned nr_valid_paths)
-
-  void dm_send_uevents(struct list_head *events, struct kobject *kobj)
-
-
-The variables added to the uevent environment are:
-
-Variable Name: DM_TARGET
-------------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: string
-:Description:
-:Value: Name of device-mapper target that generated the event.
-
-Variable Name: DM_ACTION
-------------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: string
-:Description:
-:Value: Device-mapper specific action that caused the uevent action.
-	PATH_FAILED - A path has failed;
-	PATH_REINSTATED - A path has been reinstated.
-
-Variable Name: DM_SEQNUM
-------------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: unsigned integer
-:Description: A sequence number for this specific device-mapper device.
-:Value: Valid unsigned integer range.
-
-Variable Name: DM_PATH
-----------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: string
-:Description: Major and minor number of the path device pertaining to this
-	      event.
-:Value: Path name in the form of "Major:Minor"
-
-Variable Name: DM_NR_VALID_PATHS
---------------------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: unsigned integer
-:Description:
-:Value: Valid unsigned integer range.
-
-Variable Name: DM_NAME
-----------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: string
-:Description: Name of the device-mapper device.
-:Value: Name
-
-Variable Name: DM_UUID
-----------------------
-:Uevent Action(s): KOBJ_CHANGE
-:Type: string
-:Description: UUID of the device-mapper device.
-:Value: UUID. (Empty string if there isn't one.)
-
-An example of the uevents generated as captured by udevmonitor is shown
-below
-
-1.) Path failure::
-
-	UEVENT[1192521009.711215] change@/block/dm-3
-	ACTION=change
-	DEVPATH=/block/dm-3
-	SUBSYSTEM=block
-	DM_TARGET=multipath
-	DM_ACTION=PATH_FAILED
-	DM_SEQNUM=1
-	DM_PATH=8:32
-	DM_NR_VALID_PATHS=0
-	DM_NAME=mpath2
-	DM_UUID=mpath-35333333000002328
-	MINOR=3
-	MAJOR=253
-	SEQNUM=1130
-
-2.) Path reinstate::
-
-	UEVENT[1192521132.989927] change@/block/dm-3
-	ACTION=change
-	DEVPATH=/block/dm-3
-	SUBSYSTEM=block
-	DM_TARGET=multipath
-	DM_ACTION=PATH_REINSTATED
-	DM_SEQNUM=2
-	DM_PATH=8:32
-	DM_NR_VALID_PATHS=1
-	DM_NAME=mpath2
-	DM_UUID=mpath-35333333000002328
-	MINOR=3
-	MAJOR=253
-	SEQNUM=1131
diff --git a/Documentation/device-mapper/dm-zoned.rst b/Documentation/device-mapper/dm-zoned.rst
deleted file mode 100644
index 07f56ebc1730..000000000000
--- a/Documentation/device-mapper/dm-zoned.rst
+++ /dev/null
@@ -1,146 +0,0 @@
-========
-dm-zoned
-========
-
-The dm-zoned device mapper target exposes a zoned block device (ZBC and
-ZAC compliant devices) as a regular block device without any write
-pattern constraints. In effect, it implements a drive-managed zoned
-block device which hides from the user (a file system or an application
-doing raw block device accesses) the sequential write constraints of
-host-managed zoned block devices and can mitigate the potential
-device-side performance degradation due to excessive random writes on
-host-aware zoned block devices.
-
-For a more detailed description of the zoned block device models and
-their constraints see (for SCSI devices):
-
-http://www.t10.org/drafts.htm#ZBC_Family
-
-and (for ATA devices):
-
-http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
-
-The dm-zoned implementation is simple and minimizes system overhead (CPU
-and memory usage as well as storage capacity loss). For a 10TB
-host-managed disk with 256 MB zones, dm-zoned memory usage per disk
-instance is at most 4.5 MB and as little as 5 zones will be used
-internally for storing metadata and performaing reclaim operations.
-
-dm-zoned target devices are formatted and checked using the dmzadm
-utility available at:
-
-https://github.com/hgst/dm-zoned-tools
-
-Algorithm
-=========
-
-dm-zoned implements an on-disk buffering scheme to handle non-sequential
-write accesses to the sequential zones of a zoned block device.
-Conventional zones are used for caching as well as for storing internal
-metadata.
-
-The zones of the device are separated into 2 types:
-
-1) Metadata zones: these are conventional zones used to store metadata.
-Metadata zones are not reported as useable capacity to the user.
-
-2) Data zones: all remaining zones, the vast majority of which will be
-sequential zones used exclusively to store user data. The conventional
-zones of the device may be used also for buffering user random writes.
-Data in these zones may be directly mapped to the conventional zone, but
-later moved to a sequential zone so that the conventional zone can be
-reused for buffering incoming random writes.
-
-dm-zoned exposes a logical device with a sector size of 4096 bytes,
-irrespective of the physical sector size of the backend zoned block
-device being used. This allows reducing the amount of metadata needed to
-manage valid blocks (blocks written).
-
-The on-disk metadata format is as follows:
-
-1) The first block of the first conventional zone found contains the
-super block which describes the on disk amount and position of metadata
-blocks.
-
-2) Following the super block, a set of blocks is used to describe the
-mapping of the logical device blocks. The mapping is done per chunk of
-blocks, with the chunk size equal to the zoned block device size. The
-mapping table is indexed by chunk number and each mapping entry
-indicates the zone number of the device storing the chunk of data. Each
-mapping entry may also indicate if the zone number of a conventional
-zone used to buffer random modification to the data zone.
-
-3) A set of blocks used to store bitmaps indicating the validity of
-blocks in the data zones follows the mapping table. A valid block is
-defined as a block that was written and not discarded. For a buffered
-data chunk, a block is always valid only in the data zone mapping the
-chunk or in the buffer zone of the chunk.
-
-For a logical chunk mapped to a conventional zone, all write operations
-are processed by directly writing to the zone. If the mapping zone is a
-sequential zone, the write operation is processed directly only if the
-write offset within the logical chunk is equal to the write pointer
-offset within of the sequential data zone (i.e. the write operation is
-aligned on the zone write pointer). Otherwise, write operations are
-processed indirectly using a buffer zone. In that case, an unused
-conventional zone is allocated and assigned to the chunk being
-accessed. Writing a block to the buffer zone of a chunk will
-automatically invalidate the same block in the sequential zone mapping
-the chunk. If all blocks of the sequential zone become invalid, the zone
-is freed and the chunk buffer zone becomes the primary zone mapping the
-chunk, resulting in native random write performance similar to a regular
-block device.
-
-Read operations are processed according to the block validity
-information provided by the bitmaps. Valid blocks are read either from
-the sequential zone mapping a chunk, or if the chunk is buffered, from
-the buffer zone assigned. If the accessed chunk has no mapping, or the
-accessed blocks are invalid, the read buffer is zeroed and the read
-operation terminated.
-
-After some time, the limited number of convnetional zones available may
-be exhausted (all used to map chunks or buffer sequential zones) and
-unaligned writes to unbuffered chunks become impossible. To avoid this
-situation, a reclaim process regularly scans used conventional zones and
-tries to reclaim the least recently used zones by copying the valid
-blocks of the buffer zone to a free sequential zone. Once the copy
-completes, the chunk mapping is updated to point to the sequential zone
-and the buffer zone freed for reuse.
-
-Metadata Protection
-===================
-
-To protect metadata against corruption in case of sudden power loss or
-system crash, 2 sets of metadata zones are used. One set, the primary
-set, is used as the main metadata region, while the secondary set is
-used as a staging area. Modified metadata is first written to the
-secondary set and validated by updating the super block in the secondary
-set, a generation counter is used to indicate that this set contains the
-newest metadata. Once this operation completes, in place of metadata
-block updates can be done in the primary metadata set. This ensures that
-one of the set is always consistent (all modifications committed or none
-at all). Flush operations are used as a commit point. Upon reception of
-a flush request, metadata modification activity is temporarily blocked
-(for both incoming BIO processing and reclaim process) and all dirty
-metadata blocks are staged and updated. Normal operation is then
-resumed. Flushing metadata thus only temporarily delays write and
-discard requests. Read requests can be processed concurrently while
-metadata flush is being executed.
-
-Usage
-=====
-
-A zoned block device must first be formatted using the dmzadm tool. This
-will analyze the device zone configuration, determine where to place the
-metadata sets on the device and initialize the metadata sets.
-
-Ex::
-
-	dmzadm --format /dev/sdxx
-
-For a formatted device, the target can be created normally with the
-dmsetup utility. The only parameter that dm-zoned requires is the
-underlying zoned block device name. Ex::
-
-	echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
-	dmsetup create dmz-`basename ${dev}`
diff --git a/Documentation/device-mapper/era.rst b/Documentation/device-mapper/era.rst
deleted file mode 100644
index 90dd5c670b9f..000000000000
--- a/Documentation/device-mapper/era.rst
+++ /dev/null
@@ -1,116 +0,0 @@
-======
-dm-era
-======
-
-Introduction
-============
-
-dm-era is a target that behaves similar to the linear target.  In
-addition it keeps track of which blocks were written within a user
-defined period of time called an 'era'.  Each era target instance
-maintains the current era as a monotonically increasing 32-bit
-counter.
-
-Use cases include tracking changed blocks for backup software, and
-partially invalidating the contents of a cache to restore cache
-coherency after rolling back a vendor snapshot.
-
-Constructor
-===========
-
-era <metadata dev> <origin dev> <block size>
-
- ================ ======================================================
- metadata dev     fast device holding the persistent metadata
- origin dev	  device holding data blocks that may change
- block size       block size of origin data device, granularity that is
-		  tracked by the target
- ================ ======================================================
-
-Messages
-========
-
-None of the dm messages take any arguments.
-
-checkpoint
-----------
-
-Possibly move to a new era.  You shouldn't assume the era has
-incremented.  After sending this message, you should check the
-current era via the status line.
-
-take_metadata_snap
-------------------
-
-Create a clone of the metadata, to allow a userland process to read it.
-
-drop_metadata_snap
-------------------
-
-Drop the metadata snapshot.
-
-Status
-======
-
-<metadata block size> <#used metadata blocks>/<#total metadata blocks>
-<current era> <held metadata root | '-'>
-
-========================= ==============================================
-metadata block size	  Fixed block size for each metadata block in
-			  sectors
-#used metadata blocks	  Number of metadata blocks used
-#total metadata blocks	  Total number of metadata blocks
-current era		  The current era
-held metadata root	  The location, in blocks, of the metadata root
-			  that has been 'held' for userspace read
-			  access. '-' indicates there is no held root
-========================= ==============================================
-
-Detailed use case
-=================
-
-The scenario of invalidating a cache when rolling back a vendor
-snapshot was the primary use case when developing this target:
-
-Taking a vendor snapshot
-------------------------
-
-- Send a checkpoint message to the era target
-- Make a note of the current era in its status line
-- Take vendor snapshot (the era and snapshot should be forever
-  associated now).
-
-Rolling back to an vendor snapshot
-----------------------------------
-
-- Cache enters passthrough mode (see: dm-cache's docs in cache.txt)
-- Rollback vendor storage
-- Take metadata snapshot
-- Ascertain which blocks have been written since the snapshot was taken
-  by checking each block's era
-- Invalidate those blocks in the caching software
-- Cache returns to writeback/writethrough mode
-
-Memory usage
-============
-
-The target uses a bitset to record writes in the current era.  It also
-has a spare bitset ready for switching over to a new era.  Other than
-that it uses a few 4k blocks for updating metadata::
-
-   (4 * nr_blocks) bytes + buffers
-
-Resilience
-==========
-
-Metadata is updated on disk before a write to a previously unwritten
-block is performed.  As such dm-era should not be effected by a hard
-crash such as power failure.
-
-Userland tools
-==============
-
-Userland tools are found in the increasingly poorly named
-thin-provisioning-tools project:
-
-    https://github.com/jthornber/thin-provisioning-tools
diff --git a/Documentation/device-mapper/index.rst b/Documentation/device-mapper/index.rst
deleted file mode 100644
index 105e253bc231..000000000000
--- a/Documentation/device-mapper/index.rst
+++ /dev/null
@@ -1,44 +0,0 @@
-:orphan:
-
-=============
-Device Mapper
-=============
-
-.. toctree::
-    :maxdepth: 1
-
-    cache-policies
-    cache
-    delay
-    dm-crypt
-    dm-flakey
-    dm-init
-    dm-integrity
-    dm-io
-    dm-log
-    dm-queue-length
-    dm-raid
-    dm-service-time
-    dm-uevent
-    dm-zoned
-    era
-    kcopyd
-    linear
-    log-writes
-    persistent-data
-    snapshot
-    statistics
-    striped
-    switch
-    thin-provisioning
-    unstriped
-    verity
-    writecache
-    zero
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/device-mapper/kcopyd.rst b/Documentation/device-mapper/kcopyd.rst
deleted file mode 100644
index 7651d395127f..000000000000
--- a/Documentation/device-mapper/kcopyd.rst
+++ /dev/null
@@ -1,47 +0,0 @@
-======
-kcopyd
-======
-
-Kcopyd provides the ability to copy a range of sectors from one block-device
-to one or more other block-devices, with an asynchronous completion
-notification. It is used by dm-snapshot and dm-mirror.
-
-Users of kcopyd must first create a client and indicate how many memory pages
-to set aside for their copy jobs. This is done with a call to
-kcopyd_client_create()::
-
-   int kcopyd_client_create(unsigned int num_pages,
-                            struct kcopyd_client **result);
-
-To start a copy job, the user must set up io_region structures to describe
-the source and destinations of the copy. Each io_region indicates a
-block-device along with the starting sector and size of the region. The source
-of the copy is given as one io_region structure, and the destinations of the
-copy are given as an array of io_region structures::
-
-   struct io_region {
-      struct block_device *bdev;
-      sector_t sector;
-      sector_t count;
-   };
-
-To start the copy, the user calls kcopyd_copy(), passing in the client
-pointer, pointers to the source and destination io_regions, the name of a
-completion callback routine, and a pointer to some context data for the copy::
-
-   int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from,
-                   unsigned int num_dests, struct io_region *dests,
-                   unsigned int flags, kcopyd_notify_fn fn, void *context);
-
-   typedef void (*kcopyd_notify_fn)(int read_err, unsigned int write_err,
-				    void *context);
-
-When the copy completes, kcopyd will call the user's completion routine,
-passing back the user's context pointer. It will also indicate if a read or
-write error occurred during the copy.
-
-When a user is done with all their copy jobs, they should call
-kcopyd_client_destroy() to delete the kcopyd client, which will release the
-associated memory pages::
-
-   void kcopyd_client_destroy(struct kcopyd_client *kc);
diff --git a/Documentation/device-mapper/linear.rst b/Documentation/device-mapper/linear.rst
deleted file mode 100644
index 9d17fc6e64a9..000000000000
--- a/Documentation/device-mapper/linear.rst
+++ /dev/null
@@ -1,63 +0,0 @@
-=========
-dm-linear
-=========
-
-Device-Mapper's "linear" target maps a linear range of the Device-Mapper
-device onto a linear range of another device.  This is the basic building
-block of logical volume managers.
-
-Parameters: <dev path> <offset>
-    <dev path>:
-	Full pathname to the underlying block-device, or a
-        "major:minor" device-number.
-    <offset>:
-	Starting sector within the device.
-
-
-Example scripts
-===============
-
-::
-
-  #!/bin/sh
-  # Create an identity mapping for a device
-  echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
-
-::
-
-  #!/bin/sh
-  # Join 2 devices together
-  size1=`blockdev --getsz $1`
-  size2=`blockdev --getsz $2`
-  echo "0 $size1 linear $1 0
-  $size1 $size2 linear $2 0" | dmsetup create joined
-
-::
-
-  #!/usr/bin/perl -w
-  # Split a device into 4M chunks and then join them together in reverse order.
-
-  my $name = "reverse";
-  my $extent_size = 4 * 1024 * 2;
-  my $dev = $ARGV[0];
-  my $table = "";
-  my $count = 0;
-
-  if (!defined($dev)) {
-          die("Please specify a device.\n");
-  }
-
-  my $dev_size = `blockdev --getsz $dev`;
-  my $extents = int($dev_size / $extent_size) -
-                (($dev_size % $extent_size) ? 1 : 0);
-
-  while ($extents > 0) {
-          my $this_start = $count * $extent_size;
-          $extents--;
-          $count++;
-          my $this_offset = $extents * $extent_size;
-
-          $table .= "$this_start $extent_size linear $dev $this_offset\n";
-  }
-
-  `echo \"$table\" | dmsetup create $name`;
diff --git a/Documentation/device-mapper/log-writes.rst b/Documentation/device-mapper/log-writes.rst
deleted file mode 100644
index 23141f2ffb7c..000000000000
--- a/Documentation/device-mapper/log-writes.rst
+++ /dev/null
@@ -1,145 +0,0 @@
-=============
-dm-log-writes
-=============
-
-This target takes 2 devices, one to pass all IO to normally, and one to log all
-of the write operations to.  This is intended for file system developers wishing
-to verify the integrity of metadata or data as the file system is written to.
-There is a log_write_entry written for every WRITE request and the target is
-able to take arbitrary data from userspace to insert into the log.  The data
-that is in the WRITE requests is copied into the log to make the replay happen
-exactly as it happened originally.
-
-Log Ordering
-============
-
-We log things in order of completion once we are sure the write is no longer in
-cache.  This means that normal WRITE requests are not actually logged until the
-next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
-the log in a way that correlates to what is on disk and not what is in cache,
-to make it easier to detect improper waiting/flushing.
-
-This works by attaching all WRITE requests to a list once the write completes.
-Once we see a REQ_PREFLUSH request we splice this list onto the request and once
-the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
-completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
-simulate the worst case scenario with regard to power failures.  Consider the
-following example (W means write, C means complete):
-
-	W1,W2,W3,C3,C2,Wflush,C1,Cflush
-
-The log would show the following:
-
-	W3,W2,flush,W1....
-
-Again this is to simulate what is actually on disk, this allows us to detect
-cases where a power failure at a particular point in time would create an
-inconsistent file system.
-
-Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
-they complete as those requests will obviously bypass the device cache.
-
-Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
-have all the DISCARD requests, and then the WRITE requests and then the FLUSH
-request.  Consider the following example:
-
-	WRITE block 1, DISCARD block 1, FLUSH
-
-If we logged DISCARD when it completed, the replay would look like this:
-
-	DISCARD 1, WRITE 1, FLUSH
-
-which isn't quite what happened and wouldn't be caught during the log replay.
-
-Target interface
-================
-
-i) Constructor
-
-   log-writes <dev_path> <log_dev_path>
-
-   ============= ==============================================
-   dev_path	 Device that all of the IO will go to normally.
-   log_dev_path  Device where the log entries are written to.
-   ============= ==============================================
-
-ii) Status
-
-    <#logged entries> <highest allocated sector>
-
-    =========================== ========================
-    #logged entries	        Number of logged entries
-    highest allocated sector    Highest allocated sector
-    =========================== ========================
-
-iii) Messages
-
-    mark <description>
-
-	You can use a dmsetup message to set an arbitrary mark in a log.
-	For example say you want to fsck a file system after every
-	write, but first you need to replay up to the mkfs to make sure
-	we're fsck'ing something reasonable, you would do something like
-	this::
-
-	  mkfs.btrfs -f /dev/mapper/log
-	  dmsetup message log 0 mark mkfs
-	  <run test>
-
-	This would allow you to replay the log up to the mkfs mark and
-	then replay from that point on doing the fsck check in the
-	interval that you want.
-
-	Every log has a mark at the end labeled "dm-log-writes-end".
-
-Userspace component
-===================
-
-There is a userspace tool that will replay the log for you in various ways.
-It can be found here: https://github.com/josefbacik/log-writes
-
-Example usage
-=============
-
-Say you want to test fsync on your file system.  You would do something like
-this::
-
-  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
-  dmsetup create log --table "$TABLE"
-  mkfs.btrfs -f /dev/mapper/log
-  dmsetup message log 0 mark mkfs
-
-  mount /dev/mapper/log /mnt/btrfs-test
-  <some test that does fsync at the end>
-  dmsetup message log 0 mark fsync
-  md5sum /mnt/btrfs-test/foo
-  umount /mnt/btrfs-test
-
-  dmsetup remove log
-  replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
-  mount /dev/sdb /mnt/btrfs-test
-  md5sum /mnt/btrfs-test/foo
-  <verify md5sum's are correct>
-
-  Another option is to do a complicated file system operation and verify the file
-  system is consistent during the entire operation.  You could do this with:
-
-  TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
-  dmsetup create log --table "$TABLE"
-  mkfs.btrfs -f /dev/mapper/log
-  dmsetup message log 0 mark mkfs
-
-  mount /dev/mapper/log /mnt/btrfs-test
-  <fsstress to dirty the fs>
-  btrfs filesystem balance /mnt/btrfs-test
-  umount /mnt/btrfs-test
-  dmsetup remove log
-
-  replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
-  btrfsck /dev/sdb
-  replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
-	--fsck "btrfsck /dev/sdb" --check fua
-
-And that will replay the log until it sees a FUA request, run the fsck command
-and if the fsck passes it will replay to the next FUA, until it is completed or
-the fsck command exists abnormally.
diff --git a/Documentation/device-mapper/persistent-data.rst b/Documentation/device-mapper/persistent-data.rst
deleted file mode 100644
index 2065c3c5a091..000000000000
--- a/Documentation/device-mapper/persistent-data.rst
+++ /dev/null
@@ -1,88 +0,0 @@
-===============
-Persistent data
-===============
-
-Introduction
-============
-
-The more-sophisticated device-mapper targets require complex metadata
-that is managed in kernel.  In late 2010 we were seeing that various
-different targets were rolling their own data structures, for example:
-
-- Mikulas Patocka's multisnap implementation
-- Heinz Mauelshagen's thin provisioning target
-- Another btree-based caching target posted to dm-devel
-- Another multi-snapshot target based on a design of Daniel Phillips
-
-Maintaining these data structures takes a lot of work, so if possible
-we'd like to reduce the number.
-
-The persistent-data library is an attempt to provide a re-usable
-framework for people who want to store metadata in device-mapper
-targets.  It's currently used by the thin-provisioning target and an
-upcoming hierarchical storage target.
-
-Overview
-========
-
-The main documentation is in the header files which can all be found
-under drivers/md/persistent-data.
-
-The block manager
------------------
-
-dm-block-manager.[hc]
-
-This provides access to the data on disk in fixed sized-blocks.  There
-is a read/write locking interface to prevent concurrent accesses, and
-keep data that is being used in the cache.
-
-Clients of persistent-data are unlikely to use this directly.
-
-The transaction manager
------------------------
-
-dm-transaction-manager.[hc]
-
-This restricts access to blocks and enforces copy-on-write semantics.
-The only way you can get hold of a writable block through the
-transaction manager is by shadowing an existing block (ie. doing
-copy-on-write) or allocating a fresh one.  Shadowing is elided within
-the same transaction so performance is reasonable.  The commit method
-ensures that all data is flushed before it writes the superblock.
-On power failure your metadata will be as it was when last committed.
-
-The Space Maps
---------------
-
-dm-space-map.h
-dm-space-map-metadata.[hc]
-dm-space-map-disk.[hc]
-
-On-disk data structures that keep track of reference counts of blocks.
-Also acts as the allocator of new blocks.  Currently two
-implementations: a simpler one for managing blocks on a different
-device (eg. thinly-provisioned data blocks); and one for managing
-the metadata space.  The latter is complicated by the need to store
-its own data within the space it's managing.
-
-The data structures
--------------------
-
-dm-btree.[hc]
-dm-btree-remove.c
-dm-btree-spine.c
-dm-btree-internal.h
-
-Currently there is only one data structure, a hierarchical btree.
-There are plans to add more.  For example, something with an
-array-like interface would see a lot of use.
-
-The btree is 'hierarchical' in that you can define it to be composed
-of nested btrees, and take multiple keys.  For example, the
-thin-provisioning target uses a btree with two levels of nesting.
-The first maps a device id to a mapping tree, and that in turn maps a
-virtual block to a physical block.
-
-Values stored in the btrees can have arbitrary size.  Keys are always
-64bits, although nesting allows you to use multiple keys.
diff --git a/Documentation/device-mapper/snapshot.rst b/Documentation/device-mapper/snapshot.rst
deleted file mode 100644
index ccdd8b587a74..000000000000
--- a/Documentation/device-mapper/snapshot.rst
+++ /dev/null
@@ -1,196 +0,0 @@
-==============================
-Device-mapper snapshot support
-==============================
-
-Device-mapper allows you, without massive data copying:
-
--  To create snapshots of any block device i.e. mountable, saved states of
-   the block device which are also writable without interfering with the
-   original content;
--  To create device "forks", i.e. multiple different versions of the
-   same data stream.
--  To merge a snapshot of a block device back into the snapshot's origin
-   device.
-
-In the first two cases, dm copies only the chunks of data that get
-changed and uses a separate copy-on-write (COW) block device for
-storage.
-
-For snapshot merge the contents of the COW storage are merged back into
-the origin device.
-
-
-There are three dm targets available:
-snapshot, snapshot-origin, and snapshot-merge.
-
--  snapshot-origin <origin>
-
-which will normally have one or more snapshots based on it.
-Reads will be mapped directly to the backing device. For each write, the
-original data will be saved in the <COW device> of each snapshot to keep
-its visible content unchanged, at least until the <COW device> fills up.
-
-
--  snapshot <origin> <COW device> <persistent?> <chunksize>
-   [<# feature args> [<arg>]*]
-
-A snapshot of the <origin> block device is created. Changed chunks of
-<chunksize> sectors will be stored on the <COW device>.  Writes will
-only go to the <COW device>.  Reads will come from the <COW device> or
-from <origin> for unchanged data.  <COW device> will often be
-smaller than the origin and if it fills up the snapshot will become
-useless and be disabled, returning errors.  So it is important to monitor
-the amount of free space and expand the <COW device> before it fills up.
-
-<persistent?> is P (Persistent) or N (Not persistent - will not survive
-after reboot).  O (Overflow) can be added as a persistent store option
-to allow userspace to advertise its support for seeing "Overflow" in the
-snapshot status.  So supported store types are "P", "PO" and "N".
-
-The difference between persistent and transient is with transient
-snapshots less metadata must be saved on disk - they can be kept in
-memory by the kernel.
-
-When loading or unloading the snapshot target, the corresponding
-snapshot-origin or snapshot-merge target must be suspended. A failure to
-suspend the origin target could result in data corruption.
-
-Optional features:
-
-   discard_zeroes_cow - a discard issued to the snapshot device that
-   maps to entire chunks to will zero the corresponding exception(s) in
-   the snapshot's exception store.
-
-   discard_passdown_origin - a discard to the snapshot device is passed
-   down to the snapshot-origin's underlying device.  This doesn't cause
-   copy-out to the snapshot exception store because the snapshot-origin
-   target is bypassed.
-
-   The discard_passdown_origin feature depends on the discard_zeroes_cow
-   feature being enabled.
-
-
--  snapshot-merge <origin> <COW device> <persistent> <chunksize>
-   [<# feature args> [<arg>]*]
-
-takes the same table arguments as the snapshot target except it only
-works with persistent snapshots.  This target assumes the role of the
-"snapshot-origin" target and must not be loaded if the "snapshot-origin"
-is still present for <origin>.
-
-Creates a merging snapshot that takes control of the changed chunks
-stored in the <COW device> of an existing snapshot, through a handover
-procedure, and merges these chunks back into the <origin>.  Once merging
-has started (in the background) the <origin> may be opened and the merge
-will continue while I/O is flowing to it.  Changes to the <origin> are
-deferred until the merging snapshot's corresponding chunk(s) have been
-merged.  Once merging has started the snapshot device, associated with
-the "snapshot" target, will return -EIO when accessed.
-
-
-How snapshot is used by LVM2
-============================
-When you create the first LVM2 snapshot of a volume, four dm devices are used:
-
-1) a device containing the original mapping table of the source volume;
-2) a device used as the <COW device>;
-3) a "snapshot" device, combining #1 and #2, which is the visible snapshot
-   volume;
-4) the "original" volume (which uses the device number used by the original
-   source volume), whose table is replaced by a "snapshot-origin" mapping
-   from device #1.
-
-A fixed naming scheme is used, so with the following commands::
-
-  lvcreate -L 1G -n base volumeGroup
-  lvcreate -L 100M --snapshot -n snap volumeGroup/base
-
-we'll have this situation (with volumes in above order)::
-
-  # dmsetup table|grep volumeGroup
-
-  volumeGroup-base-real: 0 2097152 linear 8:19 384
-  volumeGroup-snap-cow: 0 204800 linear 8:19 2097536
-  volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16
-  volumeGroup-base: 0 2097152 snapshot-origin 254:11
-
-  # ls -lL /dev/mapper/volumeGroup-*
-  brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
-  brw-------  1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow
-  brw-------  1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap
-  brw-------  1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base
-
-
-How snapshot-merge is used by LVM2
-==================================
-A merging snapshot assumes the role of the "snapshot-origin" while
-merging.  As such the "snapshot-origin" is replaced with
-"snapshot-merge".  The "-real" device is not changed and the "-cow"
-device is renamed to <origin name>-cow to aid LVM2's cleanup of the
-merging snapshot after it completes.  The "snapshot" that hands over its
-COW device to the "snapshot-merge" is deactivated (unless using lvchange
---refresh); but if it is left active it will simply return I/O errors.
-
-A snapshot will merge into its origin with the following command::
-
-  lvconvert --merge volumeGroup/snap
-
-we'll now have this situation::
-
-  # dmsetup table|grep volumeGroup
-
-  volumeGroup-base-real: 0 2097152 linear 8:19 384
-  volumeGroup-base-cow: 0 204800 linear 8:19 2097536
-  volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
-
-  # ls -lL /dev/mapper/volumeGroup-*
-  brw-------  1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
-  brw-------  1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
-  brw-------  1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
-
-
-How to determine when a merging is complete
-===========================================
-The snapshot-merge and snapshot status lines end with:
-
-  <sectors_allocated>/<total_sectors> <metadata_sectors>
-
-Both <sectors_allocated> and <total_sectors> include both data and metadata.
-During merging, the number of sectors allocated gets smaller and
-smaller.  Merging has finished when the number of sectors holding data
-is zero, in other words <sectors_allocated> == <metadata_sectors>.
-
-Here is a practical example (using a hybrid of lvm and dmsetup commands)::
-
-  # lvs
-    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
-    base    volumeGroup owi-a- 4.00g
-    snap    volumeGroup swi-a- 1.00g base  18.97
-
-  # dmsetup status volumeGroup-snap
-  0 8388608 snapshot 397896/2097152 1560
-                                    ^^^^ metadata sectors
-
-  # lvconvert --merge -b volumeGroup/snap
-    Merging of volume snap started.
-
-  # lvs volumeGroup/snap
-    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
-    base    volumeGroup Owi-a- 4.00g          17.23
-
-  # dmsetup status volumeGroup-base
-  0 8388608 snapshot-merge 281688/2097152 1104
-
-  # dmsetup status volumeGroup-base
-  0 8388608 snapshot-merge 180480/2097152 712
-
-  # dmsetup status volumeGroup-base
-  0 8388608 snapshot-merge 16/2097152 16
-
-Merging has finished.
-
-::
-
-  # lvs
-    LV      VG          Attr   LSize Origin  Snap%  Move Log Copy%  Convert
-    base    volumeGroup owi-a- 4.00g
diff --git a/Documentation/device-mapper/statistics.rst b/Documentation/device-mapper/statistics.rst
deleted file mode 100644
index 3d80a9f850cc..000000000000
--- a/Documentation/device-mapper/statistics.rst
+++ /dev/null
@@ -1,225 +0,0 @@
-=============
-DM statistics
-=============
-
-Device Mapper supports the collection of I/O statistics on user-defined
-regions of a DM device.	 If no regions are defined no statistics are
-collected so there isn't any performance impact.  Only bio-based DM
-devices are currently supported.
-
-Each user-defined region specifies a starting sector, length and step.
-Individual statistics will be collected for each step-sized area within
-the range specified.
-
-The I/O statistics counters for each step-sized area of a region are
-in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
-Documentation/iostats.txt).  But two extra counters (12 and 13) are
-provided: total time spent reading and writing.  When the histogram
-argument is used, the 14th parameter is reported that represents the
-histogram of latencies.  All these counters may be accessed by sending
-the @stats_print message to the appropriate DM device via dmsetup.
-
-The reported times are in milliseconds and the granularity depends on
-the kernel ticks.  When the option precise_timestamps is used, the
-reported times are in nanoseconds.
-
-Each region has a corresponding unique identifier, which we call a
-region_id, that is assigned when the region is created.	 The region_id
-must be supplied when querying statistics about the region, deleting the
-region, etc.  Unique region_ids enable multiple userspace programs to
-request and process statistics for the same DM device without stepping
-on each other's data.
-
-The creation of DM statistics will allocate memory via kmalloc or
-fallback to using vmalloc space.  At most, 1/4 of the overall system
-memory may be allocated by DM statistics.  The admin can see how much
-memory is used by reading:
-
-	/sys/module/dm_mod/parameters/stats_current_allocated_bytes
-
-Messages
-========
-
-    @stats_create <range> <step> [<number_of_optional_arguments> <optional_arguments>...] [<program_id> [<aux_data>]]
-	Create a new region and return the region_id.
-
-	<range>
-	  "-"
-		whole device
-	  "<start_sector>+<length>"
-		a range of <length> 512-byte sectors
-		starting with <start_sector>.
-
-	<step>
-	  "<area_size>"
-		the range is subdivided into areas each containing
-		<area_size> sectors.
-	  "/<number_of_areas>"
-		the range is subdivided into the specified
-		number of areas.
-
-	<number_of_optional_arguments>
-	  The number of optional arguments
-
-	<optional_arguments>
-	  The following optional arguments are supported:
-
-	  precise_timestamps
-		use precise timer with nanosecond resolution
-		instead of the "jiffies" variable.  When this argument is
-		used, the resulting times are in nanoseconds instead of
-		milliseconds.  Precise timestamps are a little bit slower
-		to obtain than jiffies-based timestamps.
-	  histogram:n1,n2,n3,n4,...
-		collect histogram of latencies.  The
-		numbers n1, n2, etc are times that represent the boundaries
-		of the histogram.  If precise_timestamps is not used, the
-		times are in milliseconds, otherwise they are in
-		nanoseconds.  For each range, the kernel will report the
-		number of requests that completed within this range. For
-		example, if we use "histogram:10,20,30", the kernel will
-		report four numbers a:b:c:d. a is the number of requests
-		that took 0-10 ms to complete, b is the number of requests
-		that took 10-20 ms to complete, c is the number of requests
-		that took 20-30 ms to complete and d is the number of
-		requests that took more than 30 ms to complete.
-
-	<program_id>
-	  An optional parameter.  A name that uniquely identifies
-	  the userspace owner of the range.  This groups ranges together
-	  so that userspace programs can identify the ranges they
-	  created and ignore those created by others.
-	  The kernel returns this string back in the output of
-	  @stats_list message, but it doesn't use it for anything else.
-	  If we omit the number of optional arguments, program id must not
-	  be a number, otherwise it would be interpreted as the number of
-	  optional arguments.
-
-	<aux_data>
-	  An optional parameter.  A word that provides auxiliary data
-	  that is useful to the client program that created the range.
-	  The kernel returns this string back in the output of
-	  @stats_list message, but it doesn't use this value for anything.
-
-    @stats_delete <region_id>
-	Delete the region with the specified id.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-    @stats_clear <region_id>
-	Clear all the counters except the in-flight i/o counters.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-    @stats_list [<program_id>]
-	List all regions registered with @stats_create.
-
-	<program_id>
-	  An optional parameter.
-	  If this parameter is specified, only matching regions
-	  are returned.
-	  If it is not specified, all regions are returned.
-
-	Output format:
-	  <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
-	        precise_timestamps histogram:n1,n2,n3,...
-
-	The strings "precise_timestamps" and "histogram" are printed only
-	if they were specified when creating the region.
-
-    @stats_print <region_id> [<starting_line> <number_of_lines>]
-	Print counters for each step-sized area of a region.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-	<starting_line>
-	  The index of the starting line in the output.
-	  If omitted, all lines are returned.
-
-	<number_of_lines>
-	  The number of lines to include in the output.
-	  If omitted, all lines are returned.
-
-	Output format for each step-sized area of a region:
-
-	  <start_sector>+<length>
-		counters
-
-	  The first 11 counters have the same meaning as
-	  `/sys/block/*/stat or /proc/diskstats`.
-
-	  Please refer to Documentation/iostats.txt for details.
-
-	  1. the number of reads completed
-	  2. the number of reads merged
-	  3. the number of sectors read
-	  4. the number of milliseconds spent reading
-	  5. the number of writes completed
-	  6. the number of writes merged
-	  7. the number of sectors written
-	  8. the number of milliseconds spent writing
-	  9. the number of I/Os currently in progress
-	  10. the number of milliseconds spent doing I/Os
-	  11. the weighted number of milliseconds spent doing I/Os
-
-	  Additional counters:
-
-	  12. the total time spent reading in milliseconds
-	  13. the total time spent writing in milliseconds
-
-    @stats_print_clear <region_id> [<starting_line> <number_of_lines>]
-	Atomically print and then clear all the counters except the
-	in-flight i/o counters.	 Useful when the client consuming the
-	statistics does not want to lose any statistics (those updated
-	between printing and clearing).
-
-	<region_id>
-	  region_id returned from @stats_create
-
-	<starting_line>
-	  The index of the starting line in the output.
-	  If omitted, all lines are printed and then cleared.
-
-	<number_of_lines>
-	  The number of lines to process.
-	  If omitted, all lines are printed and then cleared.
-
-    @stats_set_aux <region_id> <aux_data>
-	Store auxiliary data aux_data for the specified region.
-
-	<region_id>
-	  region_id returned from @stats_create
-
-	<aux_data>
-	  The string that identifies data which is useful to the client
-	  program that created the range.  The kernel returns this
-	  string back in the output of @stats_list message, but it
-	  doesn't use this value for anything.
-
-Examples
-========
-
-Subdivide the DM device 'vol' into 100 pieces and start collecting
-statistics on them::
-
-  dmsetup message vol 0 @stats_create - /100
-
-Set the auxiliary data string to "foo bar baz" (the escape for each
-space must also be escaped, otherwise the shell will consume them)::
-
-  dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
-
-List the statistics::
-
-  dmsetup message vol 0 @stats_list
-
-Print the statistics::
-
-  dmsetup message vol 0 @stats_print 0
-
-Delete the statistics::
-
-  dmsetup message vol 0 @stats_delete 0
diff --git a/Documentation/device-mapper/striped.rst b/Documentation/device-mapper/striped.rst
deleted file mode 100644
index e9a8da192ae1..000000000000
--- a/Documentation/device-mapper/striped.rst
+++ /dev/null
@@ -1,61 +0,0 @@
-=========
-dm-stripe
-=========
-
-Device-Mapper's "striped" target is used to create a striped (i.e. RAID-0)
-device across one or more underlying devices. Data is written in "chunks",
-with consecutive chunks rotating among the underlying devices. This can
-potentially provide improved I/O throughput by utilizing several physical
-devices in parallel.
-
-Parameters: <num devs> <chunk size> [<dev path> <offset>]+
-    <num devs>:
-	Number of underlying devices.
-    <chunk size>:
-	Size of each chunk of data. Must be at least as
-        large as the system's PAGE_SIZE.
-    <dev path>:
-	Full pathname to the underlying block-device, or a
-	"major:minor" device-number.
-    <offset>:
-	Starting sector within the device.
-
-One or more underlying devices can be specified. The striped device size must
-be a multiple of the chunk size multiplied by the number of underlying devices.
-
-
-Example scripts
-===============
-
-::
-
-  #!/usr/bin/perl -w
-  # Create a striped device across any number of underlying devices. The device
-  # will be called "stripe_dev" and have a chunk-size of 128k.
-
-  my $chunk_size = 128 * 2;
-  my $dev_name = "stripe_dev";
-  my $num_devs = @ARGV;
-  my @devs = @ARGV;
-  my ($min_dev_size, $stripe_dev_size, $i);
-
-  if (!$num_devs) {
-          die("Specify at least one device\n");
-  }
-
-  $min_dev_size = `blockdev --getsz $devs[0]`;
-  for ($i = 1; $i < $num_devs; $i++) {
-          my $this_size = `blockdev --getsz $devs[$i]`;
-          $min_dev_size = ($min_dev_size < $this_size) ?
-                          $min_dev_size : $this_size;
-  }
-
-  $stripe_dev_size = $min_dev_size * $num_devs;
-  $stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs);
-
-  $table = "0 $stripe_dev_size striped $num_devs $chunk_size";
-  for ($i = 0; $i < $num_devs; $i++) {
-          $table .= " $devs[$i] 0";
-  }
-
-  `echo $table | dmsetup create $dev_name`;
diff --git a/Documentation/device-mapper/switch.rst b/Documentation/device-mapper/switch.rst
deleted file mode 100644
index 7dde06be1a4f..000000000000
--- a/Documentation/device-mapper/switch.rst
+++ /dev/null
@@ -1,141 +0,0 @@
-=========
-dm-switch
-=========
-
-The device-mapper switch target creates a device that supports an
-arbitrary mapping of fixed-size regions of I/O across a fixed set of
-paths.  The path used for any specific region can be switched
-dynamically by sending the target a message.
-
-It maps I/O to underlying block devices efficiently when there is a large
-number of fixed-sized address regions but there is no simple pattern
-that would allow for a compact representation of the mapping such as
-dm-stripe.
-
-Background
-----------
-
-Dell EqualLogic and some other iSCSI storage arrays use a distributed
-frameless architecture.  In this architecture, the storage group
-consists of a number of distinct storage arrays ("members") each having
-independent controllers, disk storage and network adapters.  When a LUN
-is created it is spread across multiple members.  The details of the
-spreading are hidden from initiators connected to this storage system.
-The storage group exposes a single target discovery portal, no matter
-how many members are being used.  When iSCSI sessions are created, each
-session is connected to an eth port on a single member.  Data to a LUN
-can be sent on any iSCSI session, and if the blocks being accessed are
-stored on another member the I/O will be forwarded as required.  This
-forwarding is invisible to the initiator.  The storage layout is also
-dynamic, and the blocks stored on disk may be moved from member to
-member as needed to balance the load.
-
-This architecture simplifies the management and configuration of both
-the storage group and initiators.  In a multipathing configuration, it
-is possible to set up multiple iSCSI sessions to use multiple network
-interfaces on both the host and target to take advantage of the
-increased network bandwidth.  An initiator could use a simple round
-robin algorithm to send I/O across all paths and let the storage array
-members forward it as necessary, but there is a performance advantage to
-sending data directly to the correct member.
-
-A device-mapper table already lets you map different regions of a
-device onto different targets.  However in this architecture the LUN is
-spread with an address region size on the order of 10s of MBs, which
-means the resulting table could have more than a million entries and
-consume far too much memory.
-
-Using this device-mapper switch target we can now build a two-layer
-device hierarchy:
-
-    Upper Tier - Determine which array member the I/O should be sent to.
-    Lower Tier - Load balance amongst paths to a particular member.
-
-The lower tier consists of a single dm multipath device for each member.
-Each of these multipath devices contains the set of paths directly to
-the array member in one priority group, and leverages existing path
-selectors to load balance amongst these paths.  We also build a
-non-preferred priority group containing paths to other array members for
-failover reasons.
-
-The upper tier consists of a single dm-switch device.  This device uses
-a bitmap to look up the location of the I/O and choose the appropriate
-lower tier device to route the I/O.  By using a bitmap we are able to
-use 4 bits for each address range in a 16 member group (which is very
-large for us).  This is a much denser representation than the dm table
-b-tree can achieve.
-
-Construction Parameters
-=======================
-
-    <num_paths> <region_size> <num_optional_args> [<optional_args>...] [<dev_path> <offset>]+
-	<num_paths>
-	    The number of paths across which to distribute the I/O.
-
-	<region_size>
-	    The number of 512-byte sectors in a region. Each region can be redirected
-	    to any of the available paths.
-
-	<num_optional_args>
-	    The number of optional arguments. Currently, no optional arguments
-	    are supported and so this must be zero.
-
-	<dev_path>
-	    The block device that represents a specific path to the device.
-
-	<offset>
-	    The offset of the start of data on the specific <dev_path> (in units
-	    of 512-byte sectors). This number is added to the sector number when
-	    forwarding the request to the specific path. Typically it is zero.
-
-Messages
-========
-
-set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
-
-Modify the region table by specifying which regions are redirected to
-which paths.
-
-<index>
-    The region number (region size was specified in constructor parameters).
-    If index is omitted, the next region (previous index + 1) is used.
-    Expressed in hexadecimal (WITHOUT any prefix like 0x).
-
-<path_nr>
-    The path number in the range 0 ... (<num_paths> - 1).
-    Expressed in hexadecimal (WITHOUT any prefix like 0x).
-
-R<n>,<m>
-    This parameter allows repetitive patterns to be loaded quickly. <n> and <m>
-    are hexadecimal numbers. The last <n> mappings are repeated in the next <m>
-    slots.
-
-Status
-======
-
-No status line is reported.
-
-Example
-=======
-
-Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
-the same size.
-
-Create a switch device with 64kB region size::
-
-    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
-	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
-
-Set mappings for the first 7 entries to point to devices switch0, switch1,
-switch2, switch0, switch1, switch2, switch1::
-
-    dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
-
-Set repetitive mapping. This command::
-
-    dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10
-
-is equivalent to::
-
-    dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \
-	:1 :2 :1 :2 :1 :2 :1 :2 :1 :2
diff --git a/Documentation/device-mapper/thin-provisioning.rst b/Documentation/device-mapper/thin-provisioning.rst
deleted file mode 100644
index bafebf79da4b..000000000000
--- a/Documentation/device-mapper/thin-provisioning.rst
+++ /dev/null
@@ -1,427 +0,0 @@
-=================
-Thin provisioning
-=================
-
-Introduction
-============
-
-This document describes a collection of device-mapper targets that
-between them implement thin-provisioning and snapshots.
-
-The main highlight of this implementation, compared to the previous
-implementation of snapshots, is that it allows many virtual devices to
-be stored on the same data volume.  This simplifies administration and
-allows the sharing of data between volumes, thus reducing disk usage.
-
-Another significant feature is support for an arbitrary depth of
-recursive snapshots (snapshots of snapshots of snapshots ...).  The
-previous implementation of snapshots did this by chaining together
-lookup tables, and so performance was O(depth).  This new
-implementation uses a single data structure to avoid this degradation
-with depth.  Fragmentation may still be an issue, however, in some
-scenarios.
-
-Metadata is stored on a separate device from data, giving the
-administrator some freedom, for example to:
-
-- Improve metadata resilience by storing metadata on a mirrored volume
-  but data on a non-mirrored one.
-
-- Improve performance by storing the metadata on SSD.
-
-Status
-======
-
-These targets are considered safe for production use.  But different use
-cases will have different performance characteristics, for example due
-to fragmentation of the data volume.
-
-If you find this software is not performing as expected please mail
-dm-devel@redhat.com with details and we'll try our best to improve
-things for you.
-
-Userspace tools for checking and repairing the metadata have been fully
-developed and are available as 'thin_check' and 'thin_repair'.  The name
-of the package that provides these utilities varies by distribution (on
-a Red Hat distribution it is named 'device-mapper-persistent-data').
-
-Cookbook
-========
-
-This section describes some quick recipes for using thin provisioning.
-They use the dmsetup program to control the device-mapper driver
-directly.  End users will be advised to use a higher-level volume
-manager such as LVM2 once support has been added.
-
-Pool device
------------
-
-The pool device ties together the metadata volume and the data volume.
-It maps I/O linearly to the data volume and updates the metadata via
-two mechanisms:
-
-- Function calls from the thin targets
-
-- Device-mapper 'messages' from userspace which control the creation of new
-  virtual devices amongst other things.
-
-Setting up a fresh pool device
-------------------------------
-
-Setting up a pool device requires a valid metadata device, and a
-data device.  If you do not have an existing metadata device you can
-make one by zeroing the first 4k to indicate empty metadata.
-
-    dd if=/dev/zero of=$metadata_dev bs=4096 count=1
-
-The amount of metadata you need will vary according to how many blocks
-are shared between thin devices (i.e. through snapshots).  If you have
-less sharing than average you'll need a larger-than-average metadata device.
-
-As a guide, we suggest you calculate the number of bytes to use in the
-metadata device as 48 * $data_dev_size / $data_block_size but round it up
-to 2MB if the answer is smaller.  If you're creating large numbers of
-snapshots which are recording large amounts of change, you may find you
-need to increase this.
-
-The largest size supported is 16GB: If the device is larger,
-a warning will be issued and the excess space will not be used.
-
-Reloading a pool table
-----------------------
-
-You may reload a pool's table, indeed this is how the pool is resized
-if it runs out of space.  (N.B. While specifying a different metadata
-device when reloading is not forbidden at the moment, things will go
-wrong if it does not route I/O to exactly the same on-disk location as
-previously.)
-
-Using an existing pool device
------------------------------
-
-::
-
-    dmsetup create pool \
-	--table "0 20971520 thin-pool $metadata_dev $data_dev \
-		 $data_block_size $low_water_mark"
-
-$data_block_size gives the smallest unit of disk space that can be
-allocated at a time expressed in units of 512-byte sectors.
-$data_block_size must be between 128 (64KB) and 2097152 (1GB) and a
-multiple of 128 (64KB).  $data_block_size cannot be changed after the
-thin-pool is created.  People primarily interested in thin provisioning
-may want to use a value such as 1024 (512KB).  People doing lots of
-snapshotting may want a smaller value such as 128 (64KB).  If you are
-not zeroing newly-allocated data, a larger $data_block_size in the
-region of 256000 (128MB) is suggested.
-
-$low_water_mark is expressed in blocks of size $data_block_size.  If
-free space on the data device drops below this level then a dm event
-will be triggered which a userspace daemon should catch allowing it to
-extend the pool device.  Only one such event will be sent.
-
-No special event is triggered if a just resumed device's free space is below
-the low water mark. However, resuming a device always triggers an
-event; a userspace daemon should verify that free space exceeds the low
-water mark when handling this event.
-
-A low water mark for the metadata device is maintained in the kernel and
-will trigger a dm event if free space on the metadata device drops below
-it.
-
-Updating on-disk metadata
--------------------------
-
-On-disk metadata is committed every time a FLUSH or FUA bio is written.
-If no such requests are made then commits will occur every second.  This
-means the thin-provisioning target behaves like a physical disk that has
-a volatile write cache.  If power is lost you may lose some recent
-writes.  The metadata should always be consistent in spite of any crash.
-
-If data space is exhausted the pool will either error or queue IO
-according to the configuration (see: error_if_no_space).  If metadata
-space is exhausted or a metadata operation fails: the pool will error IO
-until the pool is taken offline and repair is performed to 1) fix any
-potential inconsistencies and 2) clear the flag that imposes repair.
-Once the pool's metadata device is repaired it may be resized, which
-will allow the pool to return to normal operation.  Note that if a pool
-is flagged as needing repair, the pool's data and metadata devices
-cannot be resized until repair is performed.  It should also be noted
-that when the pool's metadata space is exhausted the current metadata
-transaction is aborted.  Given that the pool will cache IO whose
-completion may have already been acknowledged to upper IO layers
-(e.g. filesystem) it is strongly suggested that consistency checks
-(e.g. fsck) be performed on those layers when repair of the pool is
-required.
-
-Thin provisioning
------------------
-
-i) Creating a new thinly-provisioned volume.
-
-  To create a new thinly- provisioned volume you must send a message to an
-  active pool device, /dev/mapper/pool in this example::
-
-    dmsetup message /dev/mapper/pool 0 "create_thin 0"
-
-  Here '0' is an identifier for the volume, a 24-bit number.  It's up
-  to the caller to allocate and manage these identifiers.  If the
-  identifier is already in use, the message will fail with -EEXIST.
-
-ii) Using a thinly-provisioned volume.
-
-  Thinly-provisioned volumes are activated using the 'thin' target::
-
-    dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0"
-
-  The last parameter is the identifier for the thinp device.
-
-Internal snapshots
-------------------
-
-i) Creating an internal snapshot.
-
-  Snapshots are created with another message to the pool.
-
-  N.B.  If the origin device that you wish to snapshot is active, you
-  must suspend it before creating the snapshot to avoid corruption.
-  This is NOT enforced at the moment, so please be careful!
-
-  ::
-
-    dmsetup suspend /dev/mapper/thin
-    dmsetup message /dev/mapper/pool 0 "create_snap 1 0"
-    dmsetup resume /dev/mapper/thin
-
-  Here '1' is the identifier for the volume, a 24-bit number.  '0' is the
-  identifier for the origin device.
-
-ii) Using an internal snapshot.
-
-  Once created, the user doesn't have to worry about any connection
-  between the origin and the snapshot.  Indeed the snapshot is no
-  different from any other thinly-provisioned device and can be
-  snapshotted itself via the same method.  It's perfectly legal to
-  have only one of them active, and there's no ordering requirement on
-  activating or removing them both.  (This differs from conventional
-  device-mapper snapshots.)
-
-  Activate it exactly the same way as any other thinly-provisioned volume::
-
-    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1"
-
-External snapshots
-------------------
-
-You can use an external **read only** device as an origin for a
-thinly-provisioned volume.  Any read to an unprovisioned area of the
-thin device will be passed through to the origin.  Writes trigger
-the allocation of new blocks as usual.
-
-One use case for this is VM hosts that want to run guests on
-thinly-provisioned volumes but have the base image on another device
-(possibly shared between many VMs).
-
-You must not write to the origin device if you use this technique!
-Of course, you may write to the thin device and take internal snapshots
-of the thin volume.
-
-i) Creating a snapshot of an external device
-
-  This is the same as creating a thin device.
-  You don't mention the origin at this stage.
-
-  ::
-
-    dmsetup message /dev/mapper/pool 0 "create_thin 0"
-
-ii) Using a snapshot of an external device.
-
-  Append an extra parameter to the thin target specifying the origin::
-
-    dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image"
-
-  N.B. All descendants (internal snapshots) of this snapshot require the
-  same extra origin parameter.
-
-Deactivation
-------------
-
-All devices using a pool must be deactivated before the pool itself
-can be.
-
-::
-
-    dmsetup remove thin
-    dmsetup remove snap
-    dmsetup remove pool
-
-Reference
-=========
-
-'thin-pool' target
-------------------
-
-i) Constructor
-
-    ::
-
-      thin-pool <metadata dev> <data dev> <data block size (sectors)> \
-	        <low water mark (blocks)> [<number of feature args> [<arg>]*]
-
-    Optional feature arguments:
-
-      skip_block_zeroing:
-	Skip the zeroing of newly-provisioned blocks.
-
-      ignore_discard:
-	Disable discard support.
-
-      no_discard_passdown:
-	Don't pass discards down to the underlying
-	data device, but just remove the mapping.
-
-      read_only:
-		 Don't allow any changes to be made to the pool
-		 metadata.  This mode is only available after the
-		 thin-pool has been created and first used in full
-		 read/write mode.  It cannot be specified on initial
-		 thin-pool creation.
-
-      error_if_no_space:
-	Error IOs, instead of queueing, if no space.
-
-    Data block size must be between 64KB (128 sectors) and 1GB
-    (2097152 sectors) inclusive.
-
-
-ii) Status
-
-    ::
-
-      <transaction id> <used metadata blocks>/<total metadata blocks>
-      <used data blocks>/<total data blocks> <held metadata root>
-      ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
-      needs_check|- metadata_low_watermark
-
-    transaction id:
-	A 64-bit number used by userspace to help synchronise with metadata
-	from volume managers.
-
-    used data blocks / total data blocks
-	If the number of free blocks drops below the pool's low water mark a
-	dm event will be sent to userspace.  This event is edge-triggered and
-	it will occur only once after each resume so volume manager writers
-	should register for the event and then check the target's status.
-
-    held metadata root:
-	The location, in blocks, of the metadata root that has been
-	'held' for userspace read access.  '-' indicates there is no
-	held root.
-
-    discard_passdown|no_discard_passdown
-	Whether or not discards are actually being passed down to the
-	underlying device.  When this is enabled when loading the table,
-	it can get disabled if the underlying device doesn't support it.
-
-    ro|rw|out_of_data_space
-	If the pool encounters certain types of device failures it will
-	drop into a read-only metadata mode in which no changes to
-	the pool metadata (like allocating new blocks) are permitted.
-
-	In serious cases where even a read-only mode is deemed unsafe
-	no further I/O will be permitted and the status will just
-	contain the string 'Fail'.  The userspace recovery tools
-	should then be used.
-
-    error_if_no_space|queue_if_no_space
-	If the pool runs out of data or metadata space, the pool will
-	either queue or error the IO destined to the data device.  The
-	default is to queue the IO until more space is added or the
-	'no_space_timeout' expires.  The 'no_space_timeout' dm-thin-pool
-	module parameter can be used to change this timeout -- it
-	defaults to 60 seconds but may be disabled using a value of 0.
-
-    needs_check
-	A metadata operation has failed, resulting in the needs_check
-	flag being set in the metadata's superblock.  The metadata
-	device must be deactivated and checked/repaired before the
-	thin-pool can be made fully operational again.  '-' indicates
-	needs_check is not set.
-
-    metadata_low_watermark:
-	Value of metadata low watermark in blocks.  The kernel sets this
-	value internally but userspace needs to know this value to
-	determine if an event was caused by crossing this threshold.
-
-iii) Messages
-
-    create_thin <dev id>
-	Create a new thinly-provisioned device.
-	<dev id> is an arbitrary unique 24-bit identifier chosen by
-	the caller.
-
-    create_snap <dev id> <origin id>
-	Create a new snapshot of another thinly-provisioned device.
-	<dev id> is an arbitrary unique 24-bit identifier chosen by
-	the caller.
-	<origin id> is the identifier of the thinly-provisioned device
-	of which the new device will be a snapshot.
-
-    delete <dev id>
-	Deletes a thin device.  Irreversible.
-
-    set_transaction_id <current id> <new id>
-	Userland volume managers, such as LVM, need a way to
-	synchronise their external metadata with the internal metadata of the
-	pool target.  The thin-pool target offers to store an
-	arbitrary 64-bit transaction id and return it on the target's
-	status line.  To avoid races you must provide what you think
-	the current transaction id is when you change it with this
-	compare-and-swap message.
-
-    reserve_metadata_snap
-        Reserve a copy of the data mapping btree for use by userland.
-        This allows userland to inspect the mappings as they were when
-        this message was executed.  Use the pool's status command to
-        get the root block associated with the metadata snapshot.
-
-    release_metadata_snap
-        Release a previously reserved copy of the data mapping btree.
-
-'thin' target
--------------
-
-i) Constructor
-
-    ::
-
-        thin <pool dev> <dev id> [<external origin dev>]
-
-    pool dev:
-	the thin-pool device, e.g. /dev/mapper/my_pool or 253:0
-
-    dev id:
-	the internal device identifier of the device to be
-	activated.
-
-    external origin dev:
-	an optional block device outside the pool to be treated as a
-	read-only snapshot origin: reads to unprovisioned areas of the
-	thin target will be mapped to this device.
-
-The pool doesn't store any size against the thin devices.  If you
-load a thin target that is smaller than you've been using previously,
-then you'll have no access to blocks mapped beyond the end.  If you
-load a target that is bigger than before, then extra blocks will be
-provisioned as and when needed.
-
-ii) Status
-
-    <nr mapped sectors> <highest mapped sector>
-	If the pool has encountered device errors and failed, the status
-	will just contain the string 'Fail'.  The userspace recovery
-	tools should then be used.
-
-    In the case where <nr mapped sectors> is 0, there is no highest
-    mapped sector and the value of <highest mapped sector> is unspecified.
diff --git a/Documentation/device-mapper/unstriped.rst b/Documentation/device-mapper/unstriped.rst
deleted file mode 100644
index 0a8d3eb3f072..000000000000
--- a/Documentation/device-mapper/unstriped.rst
+++ /dev/null
@@ -1,135 +0,0 @@
-================================
-Device-mapper "unstriped" target
-================================
-
-Introduction
-============
-
-The device-mapper "unstriped" target provides a transparent mechanism to
-unstripe a device-mapper "striped" target to access the underlying disks
-without having to touch the true backing block-device.  It can also be
-used to unstripe a hardware RAID-0 to access backing disks.
-
-Parameters:
-<number of stripes> <chunk size> <stripe #> <dev_path> <offset>
-
-<number of stripes>
-        The number of stripes in the RAID 0.
-
-<chunk size>
-	The amount of 512B sectors in the chunk striping.
-
-<dev_path>
-	The block device you wish to unstripe.
-
-<stripe #>
-        The stripe number within the device that corresponds to physical
-        drive you wish to unstripe.  This must be 0 indexed.
-
-
-Why use this module?
-====================
-
-An example of undoing an existing dm-stripe
--------------------------------------------
-
-This small bash script will setup 4 loop devices and use the existing
-striped target to combine the 4 devices into one.  It then will use
-the unstriped target ontop of the striped device to access the
-individual backing loop devices.  We write data to the newly exposed
-unstriped devices and verify the data written matches the correct
-underlying device on the striped array::
-
-  #!/bin/bash
-
-  MEMBER_SIZE=$((128 * 1024 * 1024))
-  NUM=4
-  SEQ_END=$((${NUM}-1))
-  CHUNK=256
-  BS=4096
-
-  RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
-  DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
-  COUNT=$((${MEMBER_SIZE} / ${BS}))
-
-  for i in $(seq 0 ${SEQ_END}); do
-    dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
-    losetup /dev/loop${i} member-${i}
-    DM_PARMS+=" /dev/loop${i} 0"
-  done
-
-  echo $DM_PARMS | dmsetup create raid0
-  for i in $(seq 0 ${SEQ_END}); do
-    echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i}
-  done;
-
-  for i in $(seq 0 ${SEQ_END}); do
-    dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
-    diff /dev/mapper/set-${i} member-${i}
-  done;
-
-  for i in $(seq 0 ${SEQ_END}); do
-    dmsetup remove set-${i}
-  done
-
-  dmsetup remove raid0
-
-  for i in $(seq 0 ${SEQ_END}); do
-    losetup -d /dev/loop${i}
-    rm -f member-${i}
-  done
-
-Another example
----------------
-
-Intel NVMe drives contain two cores on the physical device.
-Each core of the drive has segregated access to its LBA range.
-The current LBA model has a RAID 0 128k chunk on each core, resulting
-in a 256k stripe across the two cores::
-
-   Core 0:       Core 1:
-  __________    __________
-  | LBA 512|    | LBA 768|
-  | LBA 0  |    | LBA 256|
-  ----------    ----------
-
-The purpose of this unstriping is to provide better QoS in noisy
-neighbor environments. When two partitions are created on the
-aggregate drive without this unstriping, reads on one partition
-can affect writes on another partition.  This is because the partitions
-are striped across the two cores.  When we unstripe this hardware RAID 0
-and make partitions on each new exposed device the two partitions are now
-physically separated.
-
-With the dm-unstriped target we're able to segregate an fio script that
-has read and write jobs that are independent of each other.  Compared to
-when we run the test on a combined drive with partitions, we were able
-to get a 92% reduction in read latency using this device mapper target.
-
-
-Example dmsetup usage
-=====================
-
-unstriped ontop of Intel NVMe device that has 2 cores
------------------------------------------------------
-
-::
-
-  dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0'
-  dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0'
-
-There will now be two devices that expose Intel NVMe core 0 and 1
-respectively::
-
-  /dev/mapper/nvmset0
-  /dev/mapper/nvmset1
-
-unstriped ontop of striped with 4 drives using 128K chunk size
---------------------------------------------------------------
-
-::
-
-  dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0'
-  dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0'
-  dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0'
-  dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0'
diff --git a/Documentation/device-mapper/verity.rst b/Documentation/device-mapper/verity.rst
deleted file mode 100644
index a4d1c1476d72..000000000000
--- a/Documentation/device-mapper/verity.rst
+++ /dev/null
@@ -1,229 +0,0 @@
-=========
-dm-verity
-=========
-
-Device-Mapper's "verity" target provides transparent integrity checking of
-block devices using a cryptographic digest provided by the kernel crypto API.
-This target is read-only.
-
-Construction Parameters
-=======================
-
-::
-
-    <version> <dev> <hash_dev>
-    <data_block_size> <hash_block_size>
-    <num_data_blocks> <hash_start_block>
-    <algorithm> <digest> <salt>
-    [<#opt_params> <opt_params>]
-
-<version>
-    This is the type of the on-disk hash format.
-
-    0 is the original format used in the Chromium OS.
-      The salt is appended when hashing, digests are stored continuously and
-      the rest of the block is padded with zeroes.
-
-    1 is the current format that should be used for new devices.
-      The salt is prepended when hashing and each digest is
-      padded with zeroes to the power of two.
-
-<dev>
-    This is the device containing data, the integrity of which needs to be
-    checked.  It may be specified as a path, like /dev/sdaX, or a device number,
-    <major>:<minor>.
-
-<hash_dev>
-    This is the device that supplies the hash tree data.  It may be
-    specified similarly to the device path and may be the same device.  If the
-    same device is used, the hash_start should be outside the configured
-    dm-verity device.
-
-<data_block_size>
-    The block size on a data device in bytes.
-    Each block corresponds to one digest on the hash device.
-
-<hash_block_size>
-    The size of a hash block in bytes.
-
-<num_data_blocks>
-    The number of data blocks on the data device.  Additional blocks are
-    inaccessible.  You can place hashes to the same partition as data, in this
-    case hashes are placed after <num_data_blocks>.
-
-<hash_start_block>
-    This is the offset, in <hash_block_size>-blocks, from the start of hash_dev
-    to the root block of the hash tree.
-
-<algorithm>
-    The cryptographic hash algorithm used for this device.  This should
-    be the name of the algorithm, like "sha1".
-
-<digest>
-    The hexadecimal encoding of the cryptographic hash of the root hash block
-    and the salt.  This hash should be trusted as there is no other authenticity
-    beyond this point.
-
-<salt>
-    The hexadecimal encoding of the salt value.
-
-<#opt_params>
-    Number of optional parameters. If there are no optional parameters,
-    the optional paramaters section can be skipped or #opt_params can be zero.
-    Otherwise #opt_params is the number of following arguments.
-
-    Example of optional parameters section:
-        1 ignore_corruption
-
-ignore_corruption
-    Log corrupted blocks, but allow read operations to proceed normally.
-
-restart_on_corruption
-    Restart the system when a corrupted block is discovered. This option is
-    not compatible with ignore_corruption and requires user space support to
-    avoid restart loops.
-
-ignore_zero_blocks
-    Do not verify blocks that are expected to contain zeroes and always return
-    zeroes instead. This may be useful if the partition contains unused blocks
-    that are not guaranteed to contain zeroes.
-
-use_fec_from_device <fec_dev>
-    Use forward error correction (FEC) to recover from corruption if hash
-    verification fails. Use encoding data from the specified device. This
-    may be the same device where data and hash blocks reside, in which case
-    fec_start must be outside data and hash areas.
-
-    If the encoding data covers additional metadata, it must be accessible
-    on the hash device after the hash blocks.
-
-    Note: block sizes for data and hash devices must match. Also, if the
-    verity <dev> is encrypted the <fec_dev> should be too.
-
-fec_roots <num>
-    Number of generator roots. This equals to the number of parity bytes in
-    the encoding data. For example, in RS(M, N) encoding, the number of roots
-    is M-N.
-
-fec_blocks <num>
-    The number of encoding data blocks on the FEC device. The block size for
-    the FEC device is <data_block_size>.
-
-fec_start <offset>
-    This is the offset, in <data_block_size> blocks, from the start of the
-    FEC device to the beginning of the encoding data.
-
-check_at_most_once
-    Verify data blocks only the first time they are read from the data device,
-    rather than every time.  This reduces the overhead of dm-verity so that it
-    can be used on systems that are memory and/or CPU constrained.  However, it
-    provides a reduced level of security because only offline tampering of the
-    data device's content will be detected, not online tampering.
-
-    Hash blocks are still verified each time they are read from the hash device,
-    since verification of hash blocks is less performance critical than data
-    blocks, and a hash block will not be verified any more after all the data
-    blocks it covers have been verified anyway.
-
-Theory of operation
-===================
-
-dm-verity is meant to be set up as part of a verified boot path.  This
-may be anything ranging from a boot using tboot or trustedgrub to just
-booting from a known-good device (like a USB drive or CD).
-
-When a dm-verity device is configured, it is expected that the caller
-has been authenticated in some way (cryptographic signatures, etc).
-After instantiation, all hashes will be verified on-demand during
-disk access.  If they cannot be verified up to the root node of the
-tree, the root hash, then the I/O will fail.  This should detect
-tampering with any data on the device and the hash data.
-
-Cryptographic hashes are used to assert the integrity of the device on a
-per-block basis. This allows for a lightweight hash computation on first read
-into the page cache. Block hashes are stored linearly, aligned to the nearest
-block size.
-
-If forward error correction (FEC) support is enabled any recovery of
-corrupted data will be verified using the cryptographic hash of the
-corresponding data. This is why combining error correction with
-integrity checking is essential.
-
-Hash Tree
----------
-
-Each node in the tree is a cryptographic hash.  If it is a leaf node, the hash
-of some data block on disk is calculated. If it is an intermediary node,
-the hash of a number of child nodes is calculated.
-
-Each entry in the tree is a collection of neighboring nodes that fit in one
-block.  The number is determined based on block_size and the size of the
-selected cryptographic digest algorithm.  The hashes are linearly-ordered in
-this entry and any unaligned trailing space is ignored but included when
-calculating the parent node.
-
-The tree looks something like:
-
-	alg = sha256, num_blocks = 32768, block_size = 4096
-
-::
-
-                                 [   root    ]
-                                /    . . .    \
-                     [entry_0]                 [entry_1]
-                    /  . . .  \                 . . .   \
-         [entry_0_0]   . . .  [entry_0_127]    . . . .  [entry_1_127]
-           / ... \             /   . . .  \             /           \
-     blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
-
-
-On-disk format
-==============
-
-The verity kernel code does not read the verity metadata on-disk header.
-It only reads the hash blocks which directly follow the header.
-It is expected that a user-space tool will verify the integrity of the
-verity header.
-
-Alternatively, the header can be omitted and the dmsetup parameters can
-be passed via the kernel command-line in a rooted chain of trust where
-the command-line is verified.
-
-Directly following the header (and with sector number padded to the next hash
-block boundary) are the hash blocks which are stored a depth at a time
-(starting from the root), sorted in order of increasing index.
-
-The full specification of kernel parameters and on-disk metadata format
-is available at the cryptsetup project's wiki page
-
-  https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity
-
-Status
-======
-V (for Valid) is returned if every check performed so far was valid.
-If any check failed, C (for Corruption) is returned.
-
-Example
-=======
-Set up a device::
-
-  # dmsetup create vroot --readonly --table \
-    "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\
-    "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\
-    "1234000000000000000000000000000000000000000000000000000000000000"
-
-A command line tool veritysetup is available to compute or verify
-the hash tree or activate the kernel device. This is available from
-the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
-(as a libcryptsetup extension).
-
-Create hash on the device::
-
-  # veritysetup format /dev/sda1 /dev/sda2
-  ...
-  Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
-
-Activate the device::
-
-  # veritysetup create vroot /dev/sda1 /dev/sda2 \
-    4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076
diff --git a/Documentation/device-mapper/writecache.rst b/Documentation/device-mapper/writecache.rst
deleted file mode 100644
index d3d7690f5e8d..000000000000
--- a/Documentation/device-mapper/writecache.rst
+++ /dev/null
@@ -1,79 +0,0 @@
-=================
-Writecache target
-=================
-
-The writecache target caches writes on persistent memory or on SSD. It
-doesn't cache reads because reads are supposed to be cached in page cache
-in normal RAM.
-
-When the device is constructed, the first sector should be zeroed or the
-first sector should contain valid superblock from previous invocation.
-
-Constructor parameters:
-
-1. type of the cache device - "p" or "s"
-
-	- p - persistent memory
-	- s - SSD
-2. the underlying device that will be cached
-3. the cache device
-4. block size (4096 is recommended; the maximum block size is the page
-   size)
-5. the number of optional parameters (the parameters with an argument
-   count as two)
-
-	start_sector n		(default: 0)
-		offset from the start of cache device in 512-byte sectors
-	high_watermark n	(default: 50)
-		start writeback when the number of used blocks reach this
-		watermark
-	low_watermark x		(default: 45)
-		stop writeback when the number of used blocks drops below
-		this watermark
-	writeback_jobs n	(default: unlimited)
-		limit the number of blocks that are in flight during
-		writeback. Setting this value reduces writeback
-		throughput, but it may improve latency of read requests
-	autocommit_blocks n	(default: 64 for pmem, 65536 for ssd)
-		when the application writes this amount of blocks without
-		issuing the FLUSH request, the blocks are automatically
-		commited
-	autocommit_time ms	(default: 1000)
-		autocommit time in milliseconds. The data is automatically
-		commited if this time passes and no FLUSH request is
-		received
-	fua			(by default on)
-		applicable only to persistent memory - use the FUA flag
-		when writing data from persistent memory back to the
-		underlying device
-	nofua
-		applicable only to persistent memory - don't use the FUA
-		flag when writing back data and send the FLUSH request
-		afterwards
-
-		- some underlying devices perform better with fua, some
-		  with nofua. The user should test it
-
-Status:
-1. error indicator - 0 if there was no error, otherwise error number
-2. the number of blocks
-3. the number of free blocks
-4. the number of blocks under writeback
-
-Messages:
-	flush
-		flush the cache device. The message returns successfully
-		if the cache device was flushed without an error
-	flush_on_suspend
-		flush the cache device on next suspend. Use this message
-		when you are going to remove the cache device. The proper
-		sequence for removing the cache device is:
-
-		1. send the "flush_on_suspend" message
-		2. load an inactive table with a linear target that maps
-		   to the underlying device
-		3. suspend the device
-		4. ask for status and verify that there are no errors
-		5. resume the device, so that it will use the linear
-		   target
-		6. the cache device is now inactive and it can be deleted
diff --git a/Documentation/device-mapper/zero.rst b/Documentation/device-mapper/zero.rst
deleted file mode 100644
index 11fb5cf4597c..000000000000
--- a/Documentation/device-mapper/zero.rst
+++ /dev/null
@@ -1,37 +0,0 @@
-=======
-dm-zero
-=======
-
-Device-Mapper's "zero" target provides a block-device that always returns
-zero'd data on reads and silently drops writes. This is similar behavior to
-/dev/zero, but as a block-device instead of a character-device.
-
-Dm-zero has no target-specific parameters.
-
-One very interesting use of dm-zero is for creating "sparse" devices in
-conjunction with dm-snapshot. A sparse device reports a device-size larger
-than the amount of actual storage space available for that device. A user can
-write data anywhere within the sparse device and read it back like a normal
-device. Reads to previously unwritten areas will return a zero'd buffer. When
-enough data has been written to fill up the actual storage space, the sparse
-device is deactivated. This can be very useful for testing device and
-filesystem limitations.
-
-To create a sparse device, start by creating a dm-zero device that's the
-desired size of the sparse device. For this example, we'll assume a 10TB
-sparse device::
-
-  TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2`   # 10 TB in sectors
-  echo "0 $TEN_TERABYTES zero" | dmsetup create zero1
-
-Then create a snapshot of the zero device, using any available block-device as
-the COW device. The size of the COW device will determine the amount of real
-space available to the sparse device. For this example, we'll assume /dev/sdb1
-is an available 10GB partition::
-
-  echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \
-     dmsetup create sparse1
-
-This will create a 10TB sparse device called /dev/mapper/sparse1 that has
-10GB of actual storage space available. If more than 10GB of data is written
-to this device, it will start returning I/O errors.
diff --git a/MAINTAINERS b/MAINTAINERS
index 49e9a58f4799..b0e044be81ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4735,7 +4735,7 @@ Q:	http://patchwork.kernel.org/project/dm-devel/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
 T:	quilt http://people.redhat.com/agk/patches/linux/editing/
 S:	Maintained
-F:	Documentation/device-mapper/
+F:	Documentation/admin-guide/device-mapper/
 F:	drivers/md/Makefile
 F:	drivers/md/Kconfig
 F:	drivers/md/dm*
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 5ccac0b77f17..3834332f4963 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -453,7 +453,7 @@ config DM_INIT
 	Enable "dm-mod.create=" parameter to create mapped devices at init time.
 	This option is useful to allow mounting rootfs without requiring an
 	initramfs.
-	See Documentation/device-mapper/dm-init.rst for dm-mod.create="..."
+	See Documentation/admin-guide/device-mapper/dm-init.rst for dm-mod.create="..."
 	format.
 
 	If unsure, say N.
diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index b65faef2c4b5..b869316d3722 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -25,7 +25,7 @@ static char *create;
  * Format: dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
  * Table format: <start_sector> <num_sectors> <target_type> <target_args>
  *
- * See Documentation/device-mapper/dm-init.rst for dm-mod.create="..." format
+ * See Documentation/admin-guide/device-mapper/dm-init.rst for dm-mod.create="..." format
  * details.
  */
 
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index 7a87a640f8ba..8a60a4a070ac 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3558,7 +3558,7 @@ static void raid_status(struct dm_target *ti, status_type_t type,
 		 * v1.5.0+:
 		 *
 		 * Sync action:
-		 *   See Documentation/device-mapper/dm-raid.rst for
+		 *   See Documentation/admin-guide/device-mapper/dm-raid.rst for
 		 *   information on each of these states.
 		 */
 		DMEMIT(" %s", sync_action);
-- 
cgit v1.2.3-55-g7522


From ec4b78a0e7dd4751423089b7cfd32168f9052377 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 15:00:25 -0300
Subject: docs: early-userspace: move to driver-api guide

Those documents describe a kAPI. So, add to the driver-api
book.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 .../driver-api/early-userspace/buffer-format.rst   | 119 ++++++++++++++++
 .../early-userspace/early_userspace_support.rst    | 154 +++++++++++++++++++++
 Documentation/driver-api/early-userspace/index.rst |  16 +++
 Documentation/driver-api/index.rst                 |   1 +
 Documentation/early-userspace/buffer-format.rst    | 119 ----------------
 .../early-userspace/early_userspace_support.rst    | 154 ---------------------
 Documentation/early-userspace/index.rst            |  18 ---
 Documentation/filesystems/nfs/nfsroot.txt          |   2 +-
 .../filesystems/ramfs-rootfs-initramfs.txt         |   4 +-
 usr/Kconfig                                        |   2 +-
 10 files changed, 294 insertions(+), 295 deletions(-)
 create mode 100644 Documentation/driver-api/early-userspace/buffer-format.rst
 create mode 100644 Documentation/driver-api/early-userspace/early_userspace_support.rst
 create mode 100644 Documentation/driver-api/early-userspace/index.rst
 delete mode 100644 Documentation/early-userspace/buffer-format.rst
 delete mode 100644 Documentation/early-userspace/early_userspace_support.rst
 delete mode 100644 Documentation/early-userspace/index.rst

diff --git a/Documentation/driver-api/early-userspace/buffer-format.rst b/Documentation/driver-api/early-userspace/buffer-format.rst
new file mode 100644
index 000000000000..7f74e301fdf3
--- /dev/null
+++ b/Documentation/driver-api/early-userspace/buffer-format.rst
@@ -0,0 +1,119 @@
+=======================
+initramfs buffer format
+=======================
+
+Al Viro, H. Peter Anvin
+
+Last revision: 2002-01-13
+
+Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
+getting {replaced/complemented} with the new "initial ramfs"
+(initramfs) protocol.  The initramfs contents is passed using the same
+memory buffer protocol used by the initrd protocol, but the contents
+is different.  The initramfs buffer contains an archive which is
+expanded into a ramfs filesystem; this document details the format of
+the initramfs buffer format.
+
+The initramfs buffer format is based around the "newc" or "crc" CPIO
+formats, and can be created with the cpio(1) utility.  The cpio
+archive can be compressed using gzip(1).  One valid version of an
+initramfs buffer is thus a single .cpio.gz file.
+
+The full format of the initramfs buffer is defined by the following
+grammar, where::
+
+	*	is used to indicate "0 or more occurrences of"
+	(|)	indicates alternatives
+	+	indicates concatenation
+	GZIP()	indicates the gzip(1) of the operand
+	ALGN(n)	means padding with null bytes to an n-byte boundary
+
+	initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
+
+	cpio_gzip_archive := GZIP(cpio_archive)
+
+	cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
+
+	cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
+
+	cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
+
+
+In human terms, the initramfs buffer contains a collection of
+compressed and/or uncompressed cpio archives (in the "newc" or "crc"
+formats); arbitrary amounts zero bytes (for padding) can be added
+between members.
+
+The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
+not ignored; see "handling of hard links" below.
+
+The structure of the cpio_header is as follows (all fields contain
+hexadecimal ASCII numbers fully padded with '0' on the left to the
+full width of the field, for example, the integer 4780 is represented
+by the ASCII string "000012ac"):
+
+============= ================== ==============================================
+Field name    Field size	 Meaning
+============= ================== ==============================================
+c_magic	      6 bytes		 The string "070701" or "070702"
+c_ino	      8 bytes		 File inode number
+c_mode	      8 bytes		 File mode and permissions
+c_uid	      8 bytes		 File uid
+c_gid	      8 bytes		 File gid
+c_nlink	      8 bytes		 Number of links
+c_mtime	      8 bytes		 Modification time
+c_filesize    8 bytes		 Size of data field
+c_maj	      8 bytes		 Major part of file device number
+c_min	      8 bytes		 Minor part of file device number
+c_rmaj	      8 bytes		 Major part of device node reference
+c_rmin	      8 bytes		 Minor part of device node reference
+c_namesize    8 bytes		 Length of filename, including final \0
+c_chksum      8 bytes		 Checksum of data field if c_magic is 070702;
+				 otherwise zero
+============= ================== ==============================================
+
+The c_mode field matches the contents of st_mode returned by stat(2)
+on Linux, and encodes the file type and file permissions.
+
+The c_filesize should be zero for any file which is not a regular file
+or symlink.
+
+The c_chksum field contains a simple 32-bit unsigned sum of all the
+bytes in the data field.  cpio(1) refers to this as "crc", which is
+clearly incorrect (a cyclic redundancy check is a different and
+significantly stronger integrity check), however, this is the
+algorithm used.
+
+If the filename is "TRAILER!!!" this is actually an end-of-archive
+marker; the c_filesize for an end-of-archive marker must be zero.
+
+
+Handling of hard links
+======================
+
+When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino)
+tuple is looked up in a tuple buffer.  If not found, it is entered in
+the tuple buffer and the entry is created as usual; if found, a hard
+link rather than a second copy of the file is created.  It is not
+necessary (but permitted) to include a second copy of the file
+contents; if the file contents is not included, the c_filesize field
+should be set to zero to indicate no data section follows.  If data is
+present, the previous instance of the file is overwritten; this allows
+the data-carrying instance of a file to occur anywhere in the sequence
+(GNU cpio is reported to attach the data to the last instance of a
+file only.)
+
+c_filesize must not be zero for a symlink.
+
+When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is
+reset.  This permits archives which are generated independently to be
+concatenated.
+
+To combine file data from different sources (without having to
+regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
+the following techniques can be used:
+
+a) Separate the different file data sources with a "TRAILER!!!"
+   end-of-archive marker, or
+
+b) Make sure c_nlink == 1 for all nondirectory entries.
diff --git a/Documentation/driver-api/early-userspace/early_userspace_support.rst b/Documentation/driver-api/early-userspace/early_userspace_support.rst
new file mode 100644
index 000000000000..3deefb34046b
--- /dev/null
+++ b/Documentation/driver-api/early-userspace/early_userspace_support.rst
@@ -0,0 +1,154 @@
+=======================
+Early userspace support
+=======================
+
+Last update: 2004-12-20 tlh
+
+
+"Early userspace" is a set of libraries and programs that provide
+various pieces of functionality that are important enough to be
+available while a Linux kernel is coming up, but that don't need to be
+run inside the kernel itself.
+
+It consists of several major infrastructure components:
+
+- gen_init_cpio, a program that builds a cpio-format archive
+  containing a root filesystem image.  This archive is compressed, and
+  the compressed image is linked into the kernel image.
+- initramfs, a chunk of code that unpacks the compressed cpio image
+  midway through the kernel boot process.
+- klibc, a userspace C library, currently packaged separately, that is
+  optimized for correctness and small size.
+
+The cpio file format used by initramfs is the "newc" (aka "cpio -H newc")
+format, and is documented in the file "buffer-format.txt".  There are
+two ways to add an early userspace image: specify an existing cpio
+archive to be used as the image or have the kernel build process build
+the image from specifications.
+
+CPIO ARCHIVE method
+-------------------
+
+You can create a cpio archive that contains the early userspace image.
+Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it
+will be used directly.  Only a single cpio file may be specified in
+CONFIG_INITRAMFS_SOURCE and directory and file names are not allowed in
+combination with a cpio archive.
+
+IMAGE BUILDING method
+---------------------
+
+The kernel build process can also build an early userspace image from
+source parts rather than supplying a cpio archive.  This method provides
+a way to create images with root-owned files even though the image was
+built by an unprivileged user.
+
+The image is specified as one or more sources in
+CONFIG_INITRAMFS_SOURCE.  Sources can be either directories or files -
+cpio archives are *not* allowed when building from sources.
+
+A source directory will have it and all of its contents packaged.  The
+specified directory name will be mapped to '/'.  When packaging a
+directory, limited user and group ID translation can be performed.
+INITRAMFS_ROOT_UID can be set to a user ID that needs to be mapped to
+user root (0).  INITRAMFS_ROOT_GID can be set to a group ID that needs
+to be mapped to group root (0).
+
+A source file must be directives in the format required by the
+usr/gen_init_cpio utility (run 'usr/gen_init_cpio -h' to get the
+file format).  The directives in the file will be passed directly to
+usr/gen_init_cpio.
+
+When a combination of directories and files are specified then the
+initramfs image will be an aggregate of all of them.  In this way a user
+can create a 'root-image' directory and install all files into it.
+Because device-special files cannot be created by a unprivileged user,
+special files can be listed in a 'root-files' file.  Both 'root-image'
+and 'root-files' can be listed in CONFIG_INITRAMFS_SOURCE and a complete
+early userspace image can be built by an unprivileged user.
+
+As a technical note, when directories and files are specified, the
+entire CONFIG_INITRAMFS_SOURCE is passed to
+usr/gen_initramfs_list.sh.  This means that CONFIG_INITRAMFS_SOURCE
+can really be interpreted as any legal argument to
+gen_initramfs_list.sh.  If a directory is specified as an argument then
+the contents are scanned, uid/gid translation is performed, and
+usr/gen_init_cpio file directives are output.  If a directory is
+specified as an argument to usr/gen_initramfs_list.sh then the
+contents of the file are simply copied to the output.  All of the output
+directives from directory scanning and file contents copying are
+processed by usr/gen_init_cpio.
+
+See also 'usr/gen_initramfs_list.sh -h'.
+
+Where's this all leading?
+=========================
+
+The klibc distribution contains some of the necessary software to make
+early userspace useful.  The klibc distribution is currently
+maintained separately from the kernel.
+
+You can obtain somewhat infrequent snapshots of klibc from
+https://www.kernel.org/pub/linux/libs/klibc/
+
+For active users, you are better off using the klibc git
+repository, at http://git.kernel.org/?p=libs/klibc/klibc.git
+
+The standalone klibc distribution currently provides three components,
+in addition to the klibc library:
+
+- ipconfig, a program that configures network interfaces.  It can
+  configure them statically, or use DHCP to obtain information
+  dynamically (aka "IP autoconfiguration").
+- nfsmount, a program that can mount an NFS filesystem.
+- kinit, the "glue" that uses ipconfig and nfsmount to replace the old
+  support for IP autoconfig, mount a filesystem over NFS, and continue
+  system boot using that filesystem as root.
+
+kinit is built as a single statically linked binary to save space.
+
+Eventually, several more chunks of kernel functionality will hopefully
+move to early userspace:
+
+- Almost all of init/do_mounts* (the beginning of this is already in
+  place)
+- ACPI table parsing
+- Insert unwieldy subsystem that doesn't really need to be in kernel
+  space here
+
+If kinit doesn't meet your current needs and you've got bytes to burn,
+the klibc distribution includes a small Bourne-compatible shell (ash)
+and a number of other utilities, so you can replace kinit and build
+custom initramfs images that meet your needs exactly.
+
+For questions and help, you can sign up for the early userspace
+mailing list at http://www.zytor.com/mailman/listinfo/klibc
+
+How does it work?
+=================
+
+The kernel has currently 3 ways to mount the root filesystem:
+
+a) all required device and filesystem drivers compiled into the kernel, no
+   initrd.  init/main.c:init() will call prepare_namespace() to mount the
+   final root filesystem, based on the root= option and optional init= to run
+   some other init binary than listed at the end of init/main.c:init().
+
+b) some device and filesystem drivers built as modules and stored in an
+   initrd.  The initrd must contain a binary '/linuxrc' which is supposed to
+   load these driver modules.  It is also possible to mount the final root
+   filesystem via linuxrc and use the pivot_root syscall.  The initrd is
+   mounted and executed via prepare_namespace().
+
+c) using initramfs.  The call to prepare_namespace() must be skipped.
+   This means that a binary must do all the work.  Said binary can be stored
+   into initramfs either via modifying usr/gen_init_cpio.c or via the new
+   initrd format, an cpio archive.  It must be called "/init".  This binary
+   is responsible to do all the things prepare_namespace() would do.
+
+   To maintain backwards compatibility, the /init binary will only run if it
+   comes via an initramfs cpio archive.  If this is not the case,
+   init/main.c:init() will run prepare_namespace() to mount the final root
+   and exec one of the predefined init binaries.
+
+Bryan O'Sullivan <bos@serpentine.com>
diff --git a/Documentation/driver-api/early-userspace/index.rst b/Documentation/driver-api/early-userspace/index.rst
new file mode 100644
index 000000000000..6f20c3c560d8
--- /dev/null
+++ b/Documentation/driver-api/early-userspace/index.rst
@@ -0,0 +1,16 @@
+===============
+Early Userspace
+===============
+
+.. toctree::
+    :maxdepth: 1
+
+    early_userspace_support
+    buffer-format
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index baa77a666e46..0f281f4f648f 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -16,6 +16,7 @@ available subsections can be seen below.
 
    basics
    infrastructure
+   early-userspace/index
    pm/index
    clk
    device-io
diff --git a/Documentation/early-userspace/buffer-format.rst b/Documentation/early-userspace/buffer-format.rst
deleted file mode 100644
index 7f74e301fdf3..000000000000
--- a/Documentation/early-userspace/buffer-format.rst
+++ /dev/null
@@ -1,119 +0,0 @@
-=======================
-initramfs buffer format
-=======================
-
-Al Viro, H. Peter Anvin
-
-Last revision: 2002-01-13
-
-Starting with kernel 2.5.x, the old "initial ramdisk" protocol is
-getting {replaced/complemented} with the new "initial ramfs"
-(initramfs) protocol.  The initramfs contents is passed using the same
-memory buffer protocol used by the initrd protocol, but the contents
-is different.  The initramfs buffer contains an archive which is
-expanded into a ramfs filesystem; this document details the format of
-the initramfs buffer format.
-
-The initramfs buffer format is based around the "newc" or "crc" CPIO
-formats, and can be created with the cpio(1) utility.  The cpio
-archive can be compressed using gzip(1).  One valid version of an
-initramfs buffer is thus a single .cpio.gz file.
-
-The full format of the initramfs buffer is defined by the following
-grammar, where::
-
-	*	is used to indicate "0 or more occurrences of"
-	(|)	indicates alternatives
-	+	indicates concatenation
-	GZIP()	indicates the gzip(1) of the operand
-	ALGN(n)	means padding with null bytes to an n-byte boundary
-
-	initramfs  := ("\0" | cpio_archive | cpio_gzip_archive)*
-
-	cpio_gzip_archive := GZIP(cpio_archive)
-
-	cpio_archive := cpio_file* + (<nothing> | cpio_trailer)
-
-	cpio_file := ALGN(4) + cpio_header + filename + "\0" + ALGN(4) + data
-
-	cpio_trailer := ALGN(4) + cpio_header + "TRAILER!!!\0" + ALGN(4)
-
-
-In human terms, the initramfs buffer contains a collection of
-compressed and/or uncompressed cpio archives (in the "newc" or "crc"
-formats); arbitrary amounts zero bytes (for padding) can be added
-between members.
-
-The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is
-not ignored; see "handling of hard links" below.
-
-The structure of the cpio_header is as follows (all fields contain
-hexadecimal ASCII numbers fully padded with '0' on the left to the
-full width of the field, for example, the integer 4780 is represented
-by the ASCII string "000012ac"):
-
-============= ================== ==============================================
-Field name    Field size	 Meaning
-============= ================== ==============================================
-c_magic	      6 bytes		 The string "070701" or "070702"
-c_ino	      8 bytes		 File inode number
-c_mode	      8 bytes		 File mode and permissions
-c_uid	      8 bytes		 File uid
-c_gid	      8 bytes		 File gid
-c_nlink	      8 bytes		 Number of links
-c_mtime	      8 bytes		 Modification time
-c_filesize    8 bytes		 Size of data field
-c_maj	      8 bytes		 Major part of file device number
-c_min	      8 bytes		 Minor part of file device number
-c_rmaj	      8 bytes		 Major part of device node reference
-c_rmin	      8 bytes		 Minor part of device node reference
-c_namesize    8 bytes		 Length of filename, including final \0
-c_chksum      8 bytes		 Checksum of data field if c_magic is 070702;
-				 otherwise zero
-============= ================== ==============================================
-
-The c_mode field matches the contents of st_mode returned by stat(2)
-on Linux, and encodes the file type and file permissions.
-
-The c_filesize should be zero for any file which is not a regular file
-or symlink.
-
-The c_chksum field contains a simple 32-bit unsigned sum of all the
-bytes in the data field.  cpio(1) refers to this as "crc", which is
-clearly incorrect (a cyclic redundancy check is a different and
-significantly stronger integrity check), however, this is the
-algorithm used.
-
-If the filename is "TRAILER!!!" this is actually an end-of-archive
-marker; the c_filesize for an end-of-archive marker must be zero.
-
-
-Handling of hard links
-======================
-
-When a nondirectory with c_nlink > 1 is seen, the (c_maj,c_min,c_ino)
-tuple is looked up in a tuple buffer.  If not found, it is entered in
-the tuple buffer and the entry is created as usual; if found, a hard
-link rather than a second copy of the file is created.  It is not
-necessary (but permitted) to include a second copy of the file
-contents; if the file contents is not included, the c_filesize field
-should be set to zero to indicate no data section follows.  If data is
-present, the previous instance of the file is overwritten; this allows
-the data-carrying instance of a file to occur anywhere in the sequence
-(GNU cpio is reported to attach the data to the last instance of a
-file only.)
-
-c_filesize must not be zero for a symlink.
-
-When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is
-reset.  This permits archives which are generated independently to be
-concatenated.
-
-To combine file data from different sources (without having to
-regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of
-the following techniques can be used:
-
-a) Separate the different file data sources with a "TRAILER!!!"
-   end-of-archive marker, or
-
-b) Make sure c_nlink == 1 for all nondirectory entries.
diff --git a/Documentation/early-userspace/early_userspace_support.rst b/Documentation/early-userspace/early_userspace_support.rst
deleted file mode 100644
index 3deefb34046b..000000000000
--- a/Documentation/early-userspace/early_userspace_support.rst
+++ /dev/null
@@ -1,154 +0,0 @@
-=======================
-Early userspace support
-=======================
-
-Last update: 2004-12-20 tlh
-
-
-"Early userspace" is a set of libraries and programs that provide
-various pieces of functionality that are important enough to be
-available while a Linux kernel is coming up, but that don't need to be
-run inside the kernel itself.
-
-It consists of several major infrastructure components:
-
-- gen_init_cpio, a program that builds a cpio-format archive
-  containing a root filesystem image.  This archive is compressed, and
-  the compressed image is linked into the kernel image.
-- initramfs, a chunk of code that unpacks the compressed cpio image
-  midway through the kernel boot process.
-- klibc, a userspace C library, currently packaged separately, that is
-  optimized for correctness and small size.
-
-The cpio file format used by initramfs is the "newc" (aka "cpio -H newc")
-format, and is documented in the file "buffer-format.txt".  There are
-two ways to add an early userspace image: specify an existing cpio
-archive to be used as the image or have the kernel build process build
-the image from specifications.
-
-CPIO ARCHIVE method
--------------------
-
-You can create a cpio archive that contains the early userspace image.
-Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it
-will be used directly.  Only a single cpio file may be specified in
-CONFIG_INITRAMFS_SOURCE and directory and file names are not allowed in
-combination with a cpio archive.
-
-IMAGE BUILDING method
----------------------
-
-The kernel build process can also build an early userspace image from
-source parts rather than supplying a cpio archive.  This method provides
-a way to create images with root-owned files even though the image was
-built by an unprivileged user.
-
-The image is specified as one or more sources in
-CONFIG_INITRAMFS_SOURCE.  Sources can be either directories or files -
-cpio archives are *not* allowed when building from sources.
-
-A source directory will have it and all of its contents packaged.  The
-specified directory name will be mapped to '/'.  When packaging a
-directory, limited user and group ID translation can be performed.
-INITRAMFS_ROOT_UID can be set to a user ID that needs to be mapped to
-user root (0).  INITRAMFS_ROOT_GID can be set to a group ID that needs
-to be mapped to group root (0).
-
-A source file must be directives in the format required by the
-usr/gen_init_cpio utility (run 'usr/gen_init_cpio -h' to get the
-file format).  The directives in the file will be passed directly to
-usr/gen_init_cpio.
-
-When a combination of directories and files are specified then the
-initramfs image will be an aggregate of all of them.  In this way a user
-can create a 'root-image' directory and install all files into it.
-Because device-special files cannot be created by a unprivileged user,
-special files can be listed in a 'root-files' file.  Both 'root-image'
-and 'root-files' can be listed in CONFIG_INITRAMFS_SOURCE and a complete
-early userspace image can be built by an unprivileged user.
-
-As a technical note, when directories and files are specified, the
-entire CONFIG_INITRAMFS_SOURCE is passed to
-usr/gen_initramfs_list.sh.  This means that CONFIG_INITRAMFS_SOURCE
-can really be interpreted as any legal argument to
-gen_initramfs_list.sh.  If a directory is specified as an argument then
-the contents are scanned, uid/gid translation is performed, and
-usr/gen_init_cpio file directives are output.  If a directory is
-specified as an argument to usr/gen_initramfs_list.sh then the
-contents of the file are simply copied to the output.  All of the output
-directives from directory scanning and file contents copying are
-processed by usr/gen_init_cpio.
-
-See also 'usr/gen_initramfs_list.sh -h'.
-
-Where's this all leading?
-=========================
-
-The klibc distribution contains some of the necessary software to make
-early userspace useful.  The klibc distribution is currently
-maintained separately from the kernel.
-
-You can obtain somewhat infrequent snapshots of klibc from
-https://www.kernel.org/pub/linux/libs/klibc/
-
-For active users, you are better off using the klibc git
-repository, at http://git.kernel.org/?p=libs/klibc/klibc.git
-
-The standalone klibc distribution currently provides three components,
-in addition to the klibc library:
-
-- ipconfig, a program that configures network interfaces.  It can
-  configure them statically, or use DHCP to obtain information
-  dynamically (aka "IP autoconfiguration").
-- nfsmount, a program that can mount an NFS filesystem.
-- kinit, the "glue" that uses ipconfig and nfsmount to replace the old
-  support for IP autoconfig, mount a filesystem over NFS, and continue
-  system boot using that filesystem as root.
-
-kinit is built as a single statically linked binary to save space.
-
-Eventually, several more chunks of kernel functionality will hopefully
-move to early userspace:
-
-- Almost all of init/do_mounts* (the beginning of this is already in
-  place)
-- ACPI table parsing
-- Insert unwieldy subsystem that doesn't really need to be in kernel
-  space here
-
-If kinit doesn't meet your current needs and you've got bytes to burn,
-the klibc distribution includes a small Bourne-compatible shell (ash)
-and a number of other utilities, so you can replace kinit and build
-custom initramfs images that meet your needs exactly.
-
-For questions and help, you can sign up for the early userspace
-mailing list at http://www.zytor.com/mailman/listinfo/klibc
-
-How does it work?
-=================
-
-The kernel has currently 3 ways to mount the root filesystem:
-
-a) all required device and filesystem drivers compiled into the kernel, no
-   initrd.  init/main.c:init() will call prepare_namespace() to mount the
-   final root filesystem, based on the root= option and optional init= to run
-   some other init binary than listed at the end of init/main.c:init().
-
-b) some device and filesystem drivers built as modules and stored in an
-   initrd.  The initrd must contain a binary '/linuxrc' which is supposed to
-   load these driver modules.  It is also possible to mount the final root
-   filesystem via linuxrc and use the pivot_root syscall.  The initrd is
-   mounted and executed via prepare_namespace().
-
-c) using initramfs.  The call to prepare_namespace() must be skipped.
-   This means that a binary must do all the work.  Said binary can be stored
-   into initramfs either via modifying usr/gen_init_cpio.c or via the new
-   initrd format, an cpio archive.  It must be called "/init".  This binary
-   is responsible to do all the things prepare_namespace() would do.
-
-   To maintain backwards compatibility, the /init binary will only run if it
-   comes via an initramfs cpio archive.  If this is not the case,
-   init/main.c:init() will run prepare_namespace() to mount the final root
-   and exec one of the predefined init binaries.
-
-Bryan O'Sullivan <bos@serpentine.com>
diff --git a/Documentation/early-userspace/index.rst b/Documentation/early-userspace/index.rst
deleted file mode 100644
index 2b8eb6132058..000000000000
--- a/Documentation/early-userspace/index.rst
+++ /dev/null
@@ -1,18 +0,0 @@
-:orphan:
-
-===============
-Early Userspace
-===============
-
-.. toctree::
-    :maxdepth: 1
-
-    early_userspace_support
-    buffer-format
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt
index 4862d3d77e27..ae4332464560 100644
--- a/Documentation/filesystems/nfs/nfsroot.txt
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -239,7 +239,7 @@ rdinit=<executable file>
   A description of the process of mounting the root file system can be
   found in:
 
-    Documentation/early-userspace/early_userspace_support.rst
+    Documentation/driver-api/early-userspace/early_userspace_support.rst
 
 
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index fa985909dbca..97d42ccaa92d 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -105,7 +105,7 @@ All this differs from the old initrd in several ways:
   - The old initrd file was a gzipped filesystem image (in some file format,
     such as ext2, that needed a driver built into the kernel), while the new
     initramfs archive is a gzipped cpio archive (like tar only simpler,
-    see cpio(1) and Documentation/early-userspace/buffer-format.rst).  The
+    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).  The
     kernel's cpio extraction code is not only extremely small, it's also
     __init text and data that can be discarded during the boot process.
 
@@ -159,7 +159,7 @@ One advantage of the configuration file is that root access is not required to
 set permissions or create device nodes in the new archive.  (Note that those
 two example "file" entries expect to find files named "init.sh" and "busybox" in
 a directory called "initramfs", under the linux-2.6.* directory.  See
-Documentation/early-userspace/early_userspace_support.rst for more details.)
+Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.)
 
 The kernel does not depend on external cpio tools.  If you specify a
 directory instead of a configuration file, the kernel's build infrastructure
diff --git a/usr/Kconfig b/usr/Kconfig
index 86e37e297278..a6b68503d177 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -18,7 +18,7 @@ config INITRAMFS_SOURCE
 	  When multiple directories and files are specified then the
 	  initramfs image will be the aggregate of all of them.
 
-	  See <file:Documentation/early-userspace/early_userspace_support.rst> for more details.
+	  See <file:Documentation/driver-api/early-userspace/early_userspace_support.rst> for more details.
 
 	  If you are not sure, leave it blank.
 
-- 
cgit v1.2.3-55-g7522


From 570432470275c3da15b85362bc1461945b9c1919 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 22 Apr 2019 16:48:00 -0300
Subject: docs: admin-guide: move sysctl directory to it

The stuff under sysctl describes /sys interface from userspace
point of view. So, add it to the admin-guide and remove the
:orphan: from its index file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 CREDITS                                         |    2 +-
 Documentation/admin-guide/index.rst             |    1 +
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 Documentation/admin-guide/mm/index.rst          |    2 +-
 Documentation/admin-guide/mm/ksm.rst            |    2 +-
 Documentation/admin-guide/sysctl/abi.rst        |   67 ++
 Documentation/admin-guide/sysctl/fs.rst         |  384 ++++++++
 Documentation/admin-guide/sysctl/index.rst      |   98 ++
 Documentation/admin-guide/sysctl/kernel.rst     | 1177 +++++++++++++++++++++++
 Documentation/admin-guide/sysctl/net.rst        |  461 +++++++++
 Documentation/admin-guide/sysctl/sunrpc.rst     |   25 +
 Documentation/admin-guide/sysctl/user.rst       |   78 ++
 Documentation/admin-guide/sysctl/vm.rst         |  964 +++++++++++++++++++
 Documentation/core-api/printk-formats.rst       |    2 +-
 Documentation/filesystems/proc.txt              |    2 +-
 Documentation/networking/ip-sysctl.txt          |    2 +-
 Documentation/sysctl/abi.rst                    |   67 --
 Documentation/sysctl/fs.rst                     |  384 --------
 Documentation/sysctl/index.rst                  |  100 --
 Documentation/sysctl/kernel.rst                 | 1177 -----------------------
 Documentation/sysctl/net.rst                    |  461 ---------
 Documentation/sysctl/sunrpc.rst                 |   25 -
 Documentation/sysctl/user.rst                   |   78 --
 Documentation/sysctl/vm.rst                     |  964 -------------------
 Documentation/vm/unevictable-lru.rst            |    2 +-
 fs/proc/Kconfig                                 |    2 +-
 kernel/panic.c                                  |    2 +-
 mm/swap.c                                       |    2 +-
 28 files changed, 3266 insertions(+), 3267 deletions(-)
 create mode 100644 Documentation/admin-guide/sysctl/abi.rst
 create mode 100644 Documentation/admin-guide/sysctl/fs.rst
 create mode 100644 Documentation/admin-guide/sysctl/index.rst
 create mode 100644 Documentation/admin-guide/sysctl/kernel.rst
 create mode 100644 Documentation/admin-guide/sysctl/net.rst
 create mode 100644 Documentation/admin-guide/sysctl/sunrpc.rst
 create mode 100644 Documentation/admin-guide/sysctl/user.rst
 create mode 100644 Documentation/admin-guide/sysctl/vm.rst
 delete mode 100644 Documentation/sysctl/abi.rst
 delete mode 100644 Documentation/sysctl/fs.rst
 delete mode 100644 Documentation/sysctl/index.rst
 delete mode 100644 Documentation/sysctl/kernel.rst
 delete mode 100644 Documentation/sysctl/net.rst
 delete mode 100644 Documentation/sysctl/sunrpc.rst
 delete mode 100644 Documentation/sysctl/user.rst
 delete mode 100644 Documentation/sysctl/vm.rst

diff --git a/CREDITS b/CREDITS
index beac0c81d081..401c5092bbf9 100644
--- a/CREDITS
+++ b/CREDITS
@@ -3120,7 +3120,7 @@ S: France
 N: Rik van Riel
 E: riel@redhat.com
 W: http://www.surriel.com/
-D: Linux-MM site, Documentation/sysctl/*, swap/mm readaround
+D: Linux-MM site, Documentation/admin-guide/sysctl/*, swap/mm readaround
 D: kswapd fixes, random kernel hacker, rmap VM,
 D: nl.linux.org administrator, minor scheduler additions
 S: Red Hat Boston
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 64e97a969857..5c6ae1ccee1a 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -16,6 +16,7 @@ etc.
    README
    kernel-parameters
    devices
+   sysctl/index
 
 This section describes CPU vulnerabilities and their mitigations.
 
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e8e28cac32a3..b323f5d4366a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3144,7 +3144,7 @@
 	numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
 			'node', 'default' can be specified
 			This can be set from sysctl after boot.
-			See Documentation/sysctl/vm.rst for details.
+			See Documentation/admin-guide/sysctl/vm.rst for details.
 
 	ohci1394_dma=early	[HW] enable debugging via the ohci1394 driver.
 			See Documentation/debugging-via-ohci1394.txt for more
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index f5e92f33f96e..5f61a6c429e0 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -11,7 +11,7 @@ processes address space and many other cool things.
 Linux memory management is a complex system with many configurable
 settings. Most of these settings are available via ``/proc``
 filesystem and can be quired and adjusted using ``sysctl``. These APIs
-are described in Documentation/sysctl/vm.rst and in `man 5 proc`_.
+are described in Documentation/admin-guide/sysctl/vm.rst and in `man 5 proc`_.
 
 .. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
 
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 7b2b8767c0b4..874eb0c77d34 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -59,7 +59,7 @@ MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
 
 If a region of memory must be split into at least one new MADV_MERGEABLE
 or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
-will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.rst).
+will exceed ``vm.max_map_count`` (see Documentation/admin-guide/sysctl/vm.rst).
 
 Like other madvise calls, they are intended for use on mapped areas of
 the user address space: they will report ENOMEM if the specified range
diff --git a/Documentation/admin-guide/sysctl/abi.rst b/Documentation/admin-guide/sysctl/abi.rst
new file mode 100644
index 000000000000..599bcde7f0b7
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/abi.rst
@@ -0,0 +1,67 @@
+================================
+Documentation for /proc/sys/abi/
+================================
+
+kernel version 2.6.0.test2
+
+Copyright (c) 2003,  Fabian Frederick <ffrederick@users.sourceforge.net>
+
+For general info: index.rst.
+
+------------------------------------------------------------------------------
+
+This path is binary emulation relevant aka personality types aka abi.
+When a process is executed, it's linked to an exec_domain whose
+personality is defined using values available from /proc/sys/abi.
+You can find further details about abi in include/linux/personality.h.
+
+Here are the files featuring in 2.6 kernel:
+
+- defhandler_coff
+- defhandler_elf
+- defhandler_lcall7
+- defhandler_libcso
+- fake_utsname
+- trace
+
+defhandler_coff
+---------------
+
+defined value:
+	PER_SCOSVR3::
+
+		0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
+
+defhandler_elf
+--------------
+
+defined value:
+	PER_LINUX::
+
+		0
+
+defhandler_lcall7
+-----------------
+
+defined value :
+	PER_SVR4::
+
+		0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
+
+defhandler_libsco
+-----------------
+
+defined value:
+	PER_SVR4::
+
+		0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
+
+fake_utsname
+------------
+
+Unused
+
+trace
+-----
+
+Unused
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
new file mode 100644
index 000000000000..2a45119e3331
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -0,0 +1,384 @@
+===============================
+Documentation for /proc/sys/fs/
+===============================
+
+kernel version 2.2.10
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
+
+For general info and legal blurb, please look in intro.rst.
+
+------------------------------------------------------------------------------
+
+This file contains documentation for the sysctl files in
+/proc/sys/fs/ and is valid for Linux kernel version 2.2.
+
+The files in this directory can be used to tune and monitor
+miscellaneous and general things in the operation of the Linux
+kernel. Since some of the files _can_ be used to screw up your
+system, it is advisable to read both documentation and source
+before actually making adjustments.
+
+1. /proc/sys/fs
+===============
+
+Currently, these files are in /proc/sys/fs:
+
+- aio-max-nr
+- aio-nr
+- dentry-state
+- dquot-max
+- dquot-nr
+- file-max
+- file-nr
+- inode-max
+- inode-nr
+- inode-state
+- nr_open
+- overflowuid
+- overflowgid
+- pipe-user-pages-hard
+- pipe-user-pages-soft
+- protected_fifos
+- protected_hardlinks
+- protected_regular
+- protected_symlinks
+- suid_dumpable
+- super-max
+- super-nr
+
+
+aio-nr & aio-max-nr
+-------------------
+
+aio-nr is the running total of the number of events specified on the
+io_setup system call for all currently active aio contexts.  If aio-nr
+reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
+raising aio-max-nr does not result in the pre-allocation or re-sizing
+of any kernel data structures.
+
+
+dentry-state
+------------
+
+From linux/include/linux/dcache.h::
+
+  struct dentry_stat_t dentry_stat {
+        int nr_dentry;
+        int nr_unused;
+        int age_limit;         /* age in seconds */
+        int want_pages;        /* pages requested by system */
+        int nr_negative;       /* # of unused negative dentries */
+        int dummy;             /* Reserved for future use */
+  };
+
+Dentries are dynamically allocated and deallocated.
+
+nr_dentry shows the total number of dentries allocated (active
++ unused). nr_unused shows the number of dentries that are not
+actively used, but are saved in the LRU list for future reuse.
+
+Age_limit is the age in seconds after which dcache entries
+can be reclaimed when memory is short and want_pages is
+nonzero when shrink_dcache_pages() has been called and the
+dcache isn't pruned yet.
+
+nr_negative shows the number of unused dentries that are also
+negative dentries which do not map to any files. Instead,
+they help speeding up rejection of non-existing files provided
+by the users.
+
+
+dquot-max & dquot-nr
+--------------------
+
+The file dquot-max shows the maximum number of cached disk
+quota entries.
+
+The file dquot-nr shows the number of allocated disk quota
+entries and the number of free disk quota entries.
+
+If the number of free cached disk quotas is very low and
+you have some awesome number of simultaneous system users,
+you might want to raise the limit.
+
+
+file-max & file-nr
+------------------
+
+The value in file-max denotes the maximum number of file-
+handles that the Linux kernel will allocate. When you get lots
+of error messages about running out of file handles, you might
+want to increase this limit.
+
+Historically,the kernel was able to allocate file handles
+dynamically, but not to free them again. The three values in
+file-nr denote the number of allocated file handles, the number
+of allocated but unused file handles, and the maximum number of
+file handles. Linux 2.6 always reports 0 as the number of free
+file handles -- this is not an error, it just means that the
+number of allocated file handles exactly matches the number of
+used file handles.
+
+Attempts to allocate more file descriptors than file-max are
+reported with printk, look for "VFS: file-max limit <number>
+reached".
+
+
+nr_open
+-------
+
+This denotes the maximum number of file-handles a process can
+allocate. Default value is 1024*1024 (1048576) which should be
+enough for most machines. Actual limit depends on RLIMIT_NOFILE
+resource limit.
+
+
+inode-max, inode-nr & inode-state
+---------------------------------
+
+As with file handles, the kernel allocates the inode structures
+dynamically, but can't free them yet.
+
+The value in inode-max denotes the maximum number of inode
+handlers. This value should be 3-4 times larger than the value
+in file-max, since stdin, stdout and network sockets also
+need an inode struct to handle them. When you regularly run
+out of inodes, you need to increase this value.
+
+The file inode-nr contains the first two items from
+inode-state, so we'll skip to that file...
+
+Inode-state contains three actual numbers and four dummies.
+The actual numbers are, in order of appearance, nr_inodes,
+nr_free_inodes and preshrink.
+
+Nr_inodes stands for the number of inodes the system has
+allocated, this can be slightly more than inode-max because
+Linux allocates them one pageful at a time.
+
+Nr_free_inodes represents the number of free inodes (?) and
+preshrink is nonzero when the nr_inodes > inode-max and the
+system needs to prune the inode list instead of allocating
+more.
+
+
+overflowgid & overflowuid
+-------------------------
+
+Some filesystems only support 16-bit UIDs and GIDs, although in Linux
+UIDs and GIDs are 32 bits. When one of these filesystems is mounted
+with writes enabled, any UID or GID that would exceed 65535 is translated
+to a fixed value before being written to disk.
+
+These sysctls allow you to change the value of the fixed UID and GID.
+The default is 65534.
+
+
+pipe-user-pages-hard
+--------------------
+
+Maximum total number of pages a non-privileged user may allocate for pipes.
+Once this limit is reached, no new pipes may be allocated until usage goes
+below the limit again. When set to 0, no limit is applied, which is the default
+setting.
+
+
+pipe-user-pages-soft
+--------------------
+
+Maximum total number of pages a non-privileged user may allocate for pipes
+before the pipe size gets limited to a single page. Once this limit is reached,
+new pipes will be limited to a single page in size for this user in order to
+limit total memory usage, and trying to increase them using fcntl() will be
+denied until usage goes below the limit again. The default value allows to
+allocate up to 1024 pipes at their default size. When set to 0, no limit is
+applied.
+
+
+protected_fifos
+---------------
+
+The intent of this protection is to avoid unintentional writes to
+an attacker-controlled FIFO, where a program expected to create a regular
+file.
+
+When set to "0", writing to FIFOs is unrestricted.
+
+When set to "1" don't allow O_CREAT open on FIFOs that we don't own
+in world writable sticky directories, unless they are owned by the
+owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+This protection is based on the restrictions in Openwall.
+
+
+protected_hardlinks
+--------------------
+
+A long-standing class of security issues is the hardlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given hardlink (i.e. a
+root process follows a hardlink created by another user). Additionally,
+on systems without separated partitions, this stops unauthorized users
+from "pinning" vulnerable setuid/setgid files against being upgraded by
+the administrator, or linking to special files.
+
+When set to "0", hardlink creation behavior is unrestricted.
+
+When set to "1" hardlinks cannot be created by users if they do not
+already own the source file, or do not have read/write access to it.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+
+protected_regular
+-----------------
+
+This protection is similar to protected_fifos, but it
+avoids writes to an attacker-controlled regular file, where a program
+expected to create one.
+
+When set to "0", writing to regular files is unrestricted.
+
+When set to "1" don't allow O_CREAT open on regular files that we
+don't own in world writable sticky directories, unless they are
+owned by the owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+
+protected_symlinks
+------------------
+
+A long-standing class of security issues is the symlink-based
+time-of-check-time-of-use race, most commonly seen in world-writable
+directories like /tmp. The common method of exploitation of this flaw
+is to cross privilege boundaries when following a given symlink (i.e. a
+root process follows a symlink belonging to another user). For a likely
+incomplete list of hundreds of examples across the years, please see:
+http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+
+When set to "0", symlink following behavior is unrestricted.
+
+When set to "1" symlinks are permitted to be followed only when outside
+a sticky world-writable directory, or when the uid of the symlink and
+follower match, or when the directory owner matches the symlink's owner.
+
+This protection is based on the restrictions in Openwall and grsecurity.
+
+
+suid_dumpable:
+--------------
+
+This value can be used to query and set the core dump mode for setuid
+or otherwise protected/tainted binaries. The modes are
+
+=   ==========  ===============================================================
+0   (default)	traditional behaviour. Any process which has changed
+		privilege levels or is execute only will not be dumped.
+1   (debug)	all processes dump core when possible. The core dump is
+		owned by the current user and no security is applied. This is
+		intended for system debugging situations only.
+		Ptrace is unchecked.
+		This is insecure as it allows regular users to examine the
+		memory contents of privileged processes.
+2   (suidsafe)	any binary which normally would not be dumped is dumped
+		anyway, but only if the "core_pattern" kernel sysctl is set to
+		either a pipe handler or a fully qualified path. (For more
+		details on this limitation, see CVE-2006-2451.) This mode is
+		appropriate when administrators are attempting to debug
+		problems in a normal environment, and either have a core dump
+		pipe handler that knows to treat privileged core dumps with
+		care, or specific directory defined for catching core dumps.
+		If a core dump happens without a pipe handler or fully
+		qualified path, a message will be emitted to syslog warning
+		about the lack of a correct setting.
+=   ==========  ===============================================================
+
+
+super-max & super-nr
+--------------------
+
+These numbers control the maximum number of superblocks, and
+thus the maximum number of mounted filesystems the kernel
+can have. You only need to increase super-max if you need to
+mount more filesystems than the current value in super-max
+allows you to.
+
+
+aio-nr & aio-max-nr
+-------------------
+
+aio-nr shows the current system-wide number of asynchronous io
+requests.  aio-max-nr allows you to change the maximum value
+aio-nr can grow to.
+
+
+mount-max
+---------
+
+This denotes the maximum number of mounts that may exist
+in a mount namespace.
+
+
+
+2. /proc/sys/fs/binfmt_misc
+===========================
+
+Documentation for the files in /proc/sys/fs/binfmt_misc is
+in Documentation/admin-guide/binfmt-misc.rst.
+
+
+3. /proc/sys/fs/mqueue - POSIX message queues filesystem
+========================================================
+
+
+The "mqueue"  filesystem provides  the necessary kernel features to enable the
+creation of a  user space  library that  implements  the  POSIX message queues
+API (as noted by the  MSG tag in the  POSIX 1003.1-2001 version  of the System
+Interfaces specification.)
+
+The "mqueue" filesystem contains values for determining/setting  the amount of
+resources used by the file system.
+
+/proc/sys/fs/mqueue/queues_max is a read/write  file for  setting/getting  the
+maximum number of message queues allowed on the system.
+
+/proc/sys/fs/mqueue/msg_max  is  a  read/write file  for  setting/getting  the
+maximum number of messages in a queue value.  In fact it is the limiting value
+for another (user) limit which is set in mq_open invocation. This attribute of
+a queue must be less or equal then msg_max.
+
+/proc/sys/fs/mqueue/msgsize_max is  a read/write  file for setting/getting the
+maximum  message size value (it is every  message queue's attribute set during
+its creation).
+
+/proc/sys/fs/mqueue/msg_default is  a read/write  file for setting/getting the
+default number of messages in a queue value if attr parameter of mq_open(2) is
+NULL. If it exceed msg_max, the default value is initialized msg_max.
+
+/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
+the default message size value if attr parameter of mq_open(2) is NULL. If it
+exceed msgsize_max, the default value is initialized msgsize_max.
+
+4. /proc/sys/fs/epoll - Configuration options for the epoll interface
+=====================================================================
+
+This directory contains configuration options for the epoll(7) interface.
+
+max_user_watches
+----------------
+
+Every epoll file descriptor can store a number of files to be monitored
+for event readiness. Each one of these monitored files constitutes a "watch".
+This configuration option sets the maximum number of "watches" that are
+allowed for each user.
+Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
+on a 64bit one.
+The current default value for  max_user_watches  is the 1/32 of the available
+low memory, divided for the "watch" cost in bytes.
diff --git a/Documentation/admin-guide/sysctl/index.rst b/Documentation/admin-guide/sysctl/index.rst
new file mode 100644
index 000000000000..03346f98c7b9
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/index.rst
@@ -0,0 +1,98 @@
+===========================
+Documentation for /proc/sys
+===========================
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+------------------------------------------------------------------------------
+
+'Why', I hear you ask, 'would anyone even _want_ documentation
+for them sysctl files? If anybody really needs it, it's all in
+the source...'
+
+Well, this documentation is written because some people either
+don't know they need to tweak something, or because they don't
+have the time or knowledge to read the source code.
+
+Furthermore, the programmers who built sysctl have built it to
+be actually used, not just for the fun of programming it :-)
+
+------------------------------------------------------------------------------
+
+Legal blurb:
+
+As usual, there are two main things to consider:
+
+1. you get what you pay for
+2. it's free
+
+The consequences are that I won't guarantee the correctness of
+this document, and if you come to me complaining about how you
+screwed up your system because of wrong documentation, I won't
+feel sorry for you. I might even laugh at you...
+
+But of course, if you _do_ manage to screw up your system using
+only the sysctl options used in this file, I'd like to hear of
+it. Not only to have a great laugh, but also to make sure that
+you're the last RTFMing person to screw up.
+
+In short, e-mail your suggestions, corrections and / or horror
+stories to: <riel@nl.linux.org>
+
+Rik van Riel.
+
+--------------------------------------------------------------
+
+Introduction
+============
+
+Sysctl is a means of configuring certain aspects of the kernel
+at run-time, and the /proc/sys/ directory is there so that you
+don't even need special tools to do it!
+In fact, there are only four things needed to use these config
+facilities:
+
+- a running Linux system
+- root access
+- common sense (this is especially hard to come by these days)
+- knowledge of what all those values mean
+
+As a quick 'ls /proc/sys' will show, the directory consists of
+several (arch-dependent?) subdirs. Each subdir is mainly about
+one part of the kernel, so you can do configuration on a piece
+by piece basis, or just some 'thematic frobbing'.
+
+This documentation is about:
+
+=============== ===============================================================
+abi/		execution domains & personalities
+debug/		<empty>
+dev/		device specific information (eg dev/cdrom/info)
+fs/		specific filesystems
+		filehandle, inode, dentry and quota tuning
+		binfmt_misc <Documentation/admin-guide/binfmt-misc.rst>
+kernel/		global kernel info / tuning
+		miscellaneous stuff
+net/		networking stuff, for documentation look in:
+		<Documentation/networking/>
+proc/		<empty>
+sunrpc/		SUN Remote Procedure Call (NFS)
+vm/		memory management tuning
+		buffer and cache management
+user/		Per user per user namespace limits
+=============== ===============================================================
+
+These are the subdirs I have on my system. There might be more
+or other subdirs in another setup. If you see another dir, I'd
+really like to hear about it :-)
+
+.. toctree::
+   :maxdepth: 1
+
+   abi
+   fs
+   kernel
+   net
+   sunrpc
+   user
+   vm
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
new file mode 100644
index 000000000000..a0c1d4ce403a
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -0,0 +1,1177 @@
+===================================
+Documentation for /proc/sys/kernel/
+===================================
+
+kernel version 2.2.10
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains documentation for the sysctl files in
+/proc/sys/kernel/ and is valid for Linux kernel version 2.2.
+
+The files in this directory can be used to tune and monitor
+miscellaneous and general things in the operation of the Linux
+kernel. Since some of the files _can_ be used to screw up your
+system, it is advisable to read both documentation and source
+before actually making adjustments.
+
+Currently, these files might (depending on your configuration)
+show up in /proc/sys/kernel:
+
+- acct
+- acpi_video_flags
+- auto_msgmni
+- bootloader_type	     [ X86 only ]
+- bootloader_version	     [ X86 only ]
+- cap_last_cap
+- core_pattern
+- core_pipe_limit
+- core_uses_pid
+- ctrl-alt-del
+- dmesg_restrict
+- domainname
+- hostname
+- hotplug
+- hardlockup_all_cpu_backtrace
+- hardlockup_panic
+- hung_task_panic
+- hung_task_check_count
+- hung_task_timeout_secs
+- hung_task_check_interval_secs
+- hung_task_warnings
+- hyperv_record_panic_msg
+- kexec_load_disabled
+- kptr_restrict
+- l2cr                        [ PPC only ]
+- modprobe                    ==> Documentation/debugging-modules.txt
+- modules_disabled
+- msg_next_id		      [ sysv ipc ]
+- msgmax
+- msgmnb
+- msgmni
+- nmi_watchdog
+- osrelease
+- ostype
+- overflowgid
+- overflowuid
+- panic
+- panic_on_oops
+- panic_on_stackoverflow
+- panic_on_unrecovered_nmi
+- panic_on_warn
+- panic_print
+- panic_on_rcu_stall
+- perf_cpu_time_max_percent
+- perf_event_paranoid
+- perf_event_max_stack
+- perf_event_mlock_kb
+- perf_event_max_contexts_per_stack
+- pid_max
+- powersave-nap               [ PPC only ]
+- printk
+- printk_delay
+- printk_ratelimit
+- printk_ratelimit_burst
+- pty                         ==> Documentation/filesystems/devpts.txt
+- randomize_va_space
+- real-root-dev               ==> Documentation/admin-guide/initrd.rst
+- reboot-cmd                  [ SPARC only ]
+- rtsig-max
+- rtsig-nr
+- sched_energy_aware
+- seccomp/                    ==> Documentation/userspace-api/seccomp_filter.rst
+- sem
+- sem_next_id		      [ sysv ipc ]
+- sg-big-buff                 [ generic SCSI device (sg) ]
+- shm_next_id		      [ sysv ipc ]
+- shm_rmid_forced
+- shmall
+- shmmax                      [ sysv ipc ]
+- shmmni
+- softlockup_all_cpu_backtrace
+- soft_watchdog
+- stack_erasing
+- stop-a                      [ SPARC only ]
+- sysrq                       ==> Documentation/admin-guide/sysrq.rst
+- sysctl_writes_strict
+- tainted                     ==> Documentation/admin-guide/tainted-kernels.rst
+- threads-max
+- unknown_nmi_panic
+- watchdog
+- watchdog_thresh
+- version
+
+
+acct:
+=====
+
+highwater lowwater frequency
+
+If BSD-style process accounting is enabled these values control
+its behaviour. If free space on filesystem where the log lives
+goes below <lowwater>% accounting suspends. If free space gets
+above <highwater>% accounting resumes. <Frequency> determines
+how often do we check the amount of free space (value is in
+seconds). Default:
+4 2 30
+That is, suspend accounting if there left <= 2% free; resume it
+if we got >=4%; consider information about amount of free space
+valid for 30 seconds.
+
+
+acpi_video_flags:
+=================
+
+flags
+
+See Doc*/kernel/power/video.txt, it allows mode of video boot to be
+set during run time.
+
+
+auto_msgmni:
+============
+
+This variable has no effect and may be removed in future kernel
+releases. Reading it always returns 0.
+Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
+upon memory add/remove or upon ipc namespace creation/removal.
+Echoing "1" into this file enabled msgmni automatic recomputing.
+Echoing "0" turned it off. auto_msgmni default value was 1.
+
+
+bootloader_type:
+================
+
+x86 bootloader identification
+
+This gives the bootloader type number as indicated by the bootloader,
+shifted left by 4, and OR'd with the low four bits of the bootloader
+version.  The reason for this encoding is that this used to match the
+type_of_loader field in the kernel header; the encoding is kept for
+backwards compatibility.  That is, if the full bootloader type number
+is 0x15 and the full version number is 0x234, this file will contain
+the value 340 = 0x154.
+
+See the type_of_loader and ext_loader_type fields in
+Documentation/x86/boot.rst for additional information.
+
+
+bootloader_version:
+===================
+
+x86 bootloader version
+
+The complete bootloader version number.  In the example above, this
+file will contain the value 564 = 0x234.
+
+See the type_of_loader and ext_loader_ver fields in
+Documentation/x86/boot.rst for additional information.
+
+
+cap_last_cap:
+=============
+
+Highest valid capability of the running kernel.  Exports
+CAP_LAST_CAP from the kernel.
+
+
+core_pattern:
+=============
+
+core_pattern is used to specify a core dumpfile pattern name.
+
+* max length 127 characters; default value is "core"
+* core_pattern is used as a pattern template for the output filename;
+  certain string patterns (beginning with '%') are substituted with
+  their actual values.
+* backward compatibility with core_uses_pid:
+
+	If core_pattern does not include "%p" (default does not)
+	and core_uses_pid is set, then .PID will be appended to
+	the filename.
+
+* corename format specifiers::
+
+	%<NUL>	'%' is dropped
+	%%	output one '%'
+	%p	pid
+	%P	global pid (init PID namespace)
+	%i	tid
+	%I	global tid (init PID namespace)
+	%u	uid (in initial user namespace)
+	%g	gid (in initial user namespace)
+	%d	dump mode, matches PR_SET_DUMPABLE and
+		/proc/sys/fs/suid_dumpable
+	%s	signal number
+	%t	UNIX time of dump
+	%h	hostname
+	%e	executable filename (may be shortened)
+	%E	executable path
+	%<OTHER> both are dropped
+
+* If the first character of the pattern is a '|', the kernel will treat
+  the rest of the pattern as a command to run.  The core dump will be
+  written to the standard input of that program instead of to a file.
+
+
+core_pipe_limit:
+================
+
+This sysctl is only applicable when core_pattern is configured to pipe
+core files to a user space helper (when the first character of
+core_pattern is a '|', see above).  When collecting cores via a pipe
+to an application, it is occasionally useful for the collecting
+application to gather data about the crashing process from its
+/proc/pid directory.  In order to do this safely, the kernel must wait
+for the collecting process to exit, so as not to remove the crashing
+processes proc files prematurely.  This in turn creates the
+possibility that a misbehaving userspace collecting process can block
+the reaping of a crashed process simply by never exiting.  This sysctl
+defends against that.  It defines how many concurrent crashing
+processes may be piped to user space applications in parallel.  If
+this value is exceeded, then those crashing processes above that value
+are noted via the kernel log and their cores are skipped.  0 is a
+special value, indicating that unlimited processes may be captured in
+parallel, but that no waiting will take place (i.e. the collecting
+process is not guaranteed access to /proc/<crashing pid>/).  This
+value defaults to 0.
+
+
+core_uses_pid:
+==============
+
+The default coredump filename is "core".  By setting
+core_uses_pid to 1, the coredump filename becomes core.PID.
+If core_pattern does not include "%p" (default does not)
+and core_uses_pid is set, then .PID will be appended to
+the filename.
+
+
+ctrl-alt-del:
+=============
+
+When the value in this file is 0, ctrl-alt-del is trapped and
+sent to the init(1) program to handle a graceful restart.
+When, however, the value is > 0, Linux's reaction to a Vulcan
+Nerve Pinch (tm) will be an immediate reboot, without even
+syncing its dirty buffers.
+
+Note:
+  when a program (like dosemu) has the keyboard in 'raw'
+  mode, the ctrl-alt-del is intercepted by the program before it
+  ever reaches the kernel tty layer, and it's up to the program
+  to decide what to do with it.
+
+
+dmesg_restrict:
+===============
+
+This toggle indicates whether unprivileged users are prevented
+from using dmesg(8) to view messages from the kernel's log buffer.
+When dmesg_restrict is set to (0) there are no restrictions. When
+dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
+dmesg(8).
+
+The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
+default value of dmesg_restrict.
+
+
+domainname & hostname:
+======================
+
+These files can be used to set the NIS/YP domainname and the
+hostname of your box in exactly the same way as the commands
+domainname and hostname, i.e.::
+
+	# echo "darkstar" > /proc/sys/kernel/hostname
+	# echo "mydomain" > /proc/sys/kernel/domainname
+
+has the same effect as::
+
+	# hostname "darkstar"
+	# domainname "mydomain"
+
+Note, however, that the classic darkstar.frop.org has the
+hostname "darkstar" and DNS (Internet Domain Name Server)
+domainname "frop.org", not to be confused with the NIS (Network
+Information Service) or YP (Yellow Pages) domainname. These two
+domain names are in general different. For a detailed discussion
+see the hostname(1) man page.
+
+
+hardlockup_all_cpu_backtrace:
+=============================
+
+This value controls the hard lockup detector behavior when a hard
+lockup condition is detected as to whether or not to gather further
+debug information. If enabled, arch-specific all-CPU stack dumping
+will be initiated.
+
+0: do nothing. This is the default behavior.
+
+1: on detection capture more debug information.
+
+
+hardlockup_panic:
+=================
+
+This parameter can be used to control whether the kernel panics
+when a hard lockup is detected.
+
+   0 - don't panic on hard lockup
+   1 - panic on hard lockup
+
+See Documentation/lockup-watchdogs.txt for more information.  This can
+also be set using the nmi_watchdog kernel parameter.
+
+
+hotplug:
+========
+
+Path for the hotplug policy agent.
+Default value is "/sbin/hotplug".
+
+
+hung_task_panic:
+================
+
+Controls the kernel's behavior when a hung task is detected.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0: continue operation. This is the default behavior.
+
+1: panic immediately.
+
+
+hung_task_check_count:
+======================
+
+The upper bound on the number of tasks that are checked.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+
+hung_task_timeout_secs:
+=======================
+
+When a task in D state did not get scheduled
+for more than this value report a warning.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0: means infinite timeout - no checking done.
+
+Possible values to set are in range {0..LONG_MAX/HZ}.
+
+
+hung_task_check_interval_secs:
+==============================
+
+Hung task check interval. If hung task checking is enabled
+(see hung_task_timeout_secs), the check is done every
+hung_task_check_interval_secs seconds.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+0 (default): means use hung_task_timeout_secs as checking interval.
+Possible values to set are in range {0..LONG_MAX/HZ}.
+
+
+hung_task_warnings:
+===================
+
+The maximum number of warnings to report. During a check interval
+if a hung task is detected, this value is decreased by 1.
+When this value reaches 0, no more warnings will be reported.
+This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
+
+-1: report an infinite number of warnings.
+
+
+hyperv_record_panic_msg:
+========================
+
+Controls whether the panic kmsg data should be reported to Hyper-V.
+
+0: do not report panic kmsg data.
+
+1: report the panic kmsg data. This is the default behavior.
+
+
+kexec_load_disabled:
+====================
+
+A toggle indicating if the kexec_load syscall has been disabled. This
+value defaults to 0 (false: kexec_load enabled), but can be set to 1
+(true: kexec_load disabled). Once true, kexec can no longer be used, and
+the toggle cannot be set back to false. This allows a kexec image to be
+loaded before disabling the syscall, allowing a system to set up (and
+later use) an image without it being altered. Generally used together
+with the "modules_disabled" sysctl.
+
+
+kptr_restrict:
+==============
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces.
+
+When kptr_restrict is set to 0 (the default) the address is hashed before
+printing. (This is the equivalent to %p.)
+
+When kptr_restrict is set to (1), kernel pointers printed using the %pK
+format specifier will be replaced with 0's unless the user has CAP_SYSLOG
+and effective user and group ids are equal to the real ids. This is
+because %pK checks are done at read() time rather than open() time, so
+if permissions are elevated between the open() and the read() (e.g via
+a setuid binary) then %pK will not leak kernel pointers to unprivileged
+users. Note, this is a temporary solution only. The correct long-term
+solution is to do the permission checks at open() time. Consider removing
+world read permissions from files that use %pK, and using dmesg_restrict
+to protect against uses of %pK in dmesg(8) if leaking kernel pointer
+values to unprivileged users is a concern.
+
+When kptr_restrict is set to (2), kernel pointers printed using
+%pK will be replaced with 0's regardless of privileges.
+
+
+l2cr: (PPC only)
+================
+
+This flag controls the L2 cache of G3 processor boards. If
+0, the cache is disabled. Enabled if nonzero.
+
+
+modules_disabled:
+=================
+
+A toggle value indicating if modules are allowed to be loaded
+in an otherwise modular kernel.  This toggle defaults to off
+(0), but can be set true (1).  Once true, modules can be
+neither loaded nor unloaded, and the toggle cannot be set back
+to false.  Generally used with the "kexec_load_disabled" toggle.
+
+
+msg_next_id, sem_next_id, and shm_next_id:
+==========================================
+
+These three toggles allows to specify desired id for next allocated IPC
+object: message, semaphore or shared memory respectively.
+
+By default they are equal to -1, which means generic allocation logic.
+Possible values to set are in range {0..INT_MAX}.
+
+Notes:
+  1) kernel doesn't guarantee, that new object will have desired id. So,
+     it's up to userspace, how to handle an object with "wrong" id.
+  2) Toggle with non-default value will be set back to -1 by kernel after
+     successful IPC object allocation. If an IPC object allocation syscall
+     fails, it is undefined if the value remains unmodified or is reset to -1.
+
+
+nmi_watchdog:
+=============
+
+This parameter can be used to control the NMI watchdog
+(i.e. the hard lockup detector) on x86 systems.
+
+0 - disable the hard lockup detector
+
+1 - enable the hard lockup detector
+
+The hard lockup detector monitors each CPU for its ability to respond to
+timer interrupts. The mechanism utilizes CPU performance counter registers
+that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
+while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
+
+The NMI watchdog is disabled by default if the kernel is running as a guest
+in a KVM virtual machine. This default can be overridden by adding::
+
+   nmi_watchdog=1
+
+to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst).
+
+
+numa_balancing:
+===============
+
+Enables/disables automatic page fault based NUMA memory
+balancing. Memory is moved automatically to nodes
+that access it often.
+
+Enables/disables automatic NUMA memory balancing. On NUMA machines, there
+is a performance penalty if remote memory is accessed by a CPU. When this
+feature is enabled the kernel samples what task thread is accessing memory
+by periodically unmapping pages and later trapping a page fault. At the
+time of the page fault, it is determined if the data being accessed should
+be migrated to a local memory node.
+
+The unmapping of pages and trapping faults incur additional overhead that
+ideally is offset by improved memory locality but there is no universal
+guarantee. If the target workload is already bound to NUMA nodes then this
+feature should be disabled. Otherwise, if the system overhead from the
+feature is too high then the rate the kernel samples for NUMA hinting
+faults may be controlled by the numa_balancing_scan_period_min_ms,
+numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
+numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls.
+
+numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
+===============================================================================================================================
+
+
+Automatic NUMA balancing scans tasks address space and unmaps pages to
+detect if pages are properly placed or if the data should be migrated to a
+memory node local to where the task is running.  Every "scan delay" the task
+scans the next "scan size" number of pages in its address space. When the
+end of the address space is reached the scanner restarts from the beginning.
+
+In combination, the "scan delay" and "scan size" determine the scan rate.
+When "scan delay" decreases, the scan rate increases.  The scan delay and
+hence the scan rate of every task is adaptive and depends on historical
+behaviour. If pages are properly placed then the scan delay increases,
+otherwise the scan delay decreases.  The "scan size" is not adaptive but
+the higher the "scan size", the higher the scan rate.
+
+Higher scan rates incur higher system overhead as page faults must be
+trapped and potentially data must be migrated. However, the higher the scan
+rate, the more quickly a tasks memory is migrated to a local node if the
+workload pattern changes and minimises performance impact due to remote
+memory accesses. These sysctls control the thresholds for scan delays and
+the number of pages scanned.
+
+numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
+scan a tasks virtual memory. It effectively controls the maximum scanning
+rate for each task.
+
+numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
+when it initially forks.
+
+numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
+scan a tasks virtual memory. It effectively controls the minimum scanning
+rate for each task.
+
+numa_balancing_scan_size_mb is how many megabytes worth of pages are
+scanned for a given scan.
+
+
+osrelease, ostype & version:
+============================
+
+::
+
+  # cat osrelease
+  2.1.88
+  # cat ostype
+  Linux
+  # cat version
+  #5 Wed Feb 25 21:49:24 MET 1998
+
+The files osrelease and ostype should be clear enough. Version
+needs a little more clarification however. The '#5' means that
+this is the fifth kernel built from this source base and the
+date behind it indicates the time the kernel was built.
+The only way to tune these values is to rebuild the kernel :-)
+
+
+overflowgid & overflowuid:
+==========================
+
+if your architecture did not always support 32-bit UIDs (i.e. arm,
+i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
+applications that use the old 16-bit UID/GID system calls, if the
+actual UID or GID would exceed 65535.
+
+These sysctls allow you to change the value of the fixed UID and GID.
+The default is 65534.
+
+
+panic:
+======
+
+The value in this file represents the number of seconds the kernel
+waits before rebooting on a panic. When you use the software watchdog,
+the recommended setting is 60.
+
+
+panic_on_io_nmi:
+================
+
+Controls the kernel's behavior when a CPU receives an NMI caused by
+an IO error.
+
+0: try to continue operation (default)
+
+1: panic immediately. The IO error triggered an NMI. This indicates a
+   serious system condition which could result in IO data corruption.
+   Rather than continuing, panicking might be a better choice. Some
+   servers issue this sort of NMI when the dump button is pushed,
+   and you can use this option to take a crash dump.
+
+
+panic_on_oops:
+==============
+
+Controls the kernel's behaviour when an oops or BUG is encountered.
+
+0: try to continue operation
+
+1: panic immediately.  If the `panic` sysctl is also non-zero then the
+   machine will be rebooted.
+
+
+panic_on_stackoverflow:
+=======================
+
+Controls the kernel's behavior when detecting the overflows of
+kernel, IRQ and exception stacks except a user stack.
+This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
+
+0: try to continue operation.
+
+1: panic immediately.
+
+
+panic_on_unrecovered_nmi:
+=========================
+
+The default Linux behaviour on an NMI of either memory or unknown is
+to continue operation. For many environments such as scientific
+computing it is preferable that the box is taken out and the error
+dealt with than an uncorrected parity/ECC error get propagated.
+
+A small number of systems do generate NMI's for bizarre random reasons
+such as power management so the default is off. That sysctl works like
+the existing panic controls already in that directory.
+
+
+panic_on_warn:
+==============
+
+Calls panic() in the WARN() path when set to 1.  This is useful to avoid
+a kernel rebuild when attempting to kdump at the location of a WARN().
+
+0: only WARN(), default behaviour.
+
+1: call panic() after printing out WARN() location.
+
+
+panic_print:
+============
+
+Bitmask for printing system info when panic happens. User can chose
+combination of the following bits:
+
+=====  ========================================
+bit 0  print all tasks info
+bit 1  print system memory info
+bit 2  print timer info
+bit 3  print locks info if CONFIG_LOCKDEP is on
+bit 4  print ftrace buffer
+=====  ========================================
+
+So for example to print tasks and memory info on panic, user can::
+
+  echo 3 > /proc/sys/kernel/panic_print
+
+
+panic_on_rcu_stall:
+===================
+
+When set to 1, calls panic() after RCU stall detection messages. This
+is useful to define the root cause of RCU stalls using a vmcore.
+
+0: do not panic() when RCU stall takes place, default behavior.
+
+1: panic() after printing RCU stall messages.
+
+
+perf_cpu_time_max_percent:
+==========================
+
+Hints to the kernel how much CPU time it should be allowed to
+use to handle perf sampling events.  If the perf subsystem
+is informed that its samples are exceeding this limit, it
+will drop its sampling frequency to attempt to reduce its CPU
+usage.
+
+Some perf sampling happens in NMIs.  If these samples
+unexpectedly take too long to execute, the NMIs can become
+stacked up next to each other so much that nothing else is
+allowed to execute.
+
+0:
+   disable the mechanism.  Do not monitor or correct perf's
+   sampling rate no matter how CPU time it takes.
+
+1-100:
+   attempt to throttle perf's sample rate to this
+   percentage of CPU.  Note: the kernel calculates an
+   "expected" length of each sample event.  100 here means
+   100% of that expected length.  Even if this is set to
+   100, you may still see sample throttling if this
+   length is exceeded.  Set to 0 if you truly do not care
+   how much CPU is consumed.
+
+
+perf_event_paranoid:
+====================
+
+Controls use of the performance events system by unprivileged
+users (without CAP_SYS_ADMIN).  The default value is 2.
+
+===  ==================================================================
+ -1  Allow use of (almost) all events by all users
+
+     Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
+
+>=0  Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
+
+     Disallow raw tracepoint access by users without CAP_SYS_ADMIN
+
+>=1  Disallow CPU event access by users without CAP_SYS_ADMIN
+
+>=2  Disallow kernel profiling by users without CAP_SYS_ADMIN
+===  ==================================================================
+
+
+perf_event_max_stack:
+=====================
+
+Controls maximum number of stack frames to copy for (attr.sample_type &
+PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
+'perf record -g' or 'perf trace --call-graph fp'.
+
+This can only be done when no events are in use that have callchains
+enabled, otherwise writing to this file will return -EBUSY.
+
+The default value is 127.
+
+
+perf_event_mlock_kb:
+====================
+
+Control size of per-cpu ring buffer not counted agains mlock limit.
+
+The default value is 512 + 1 page
+
+
+perf_event_max_contexts_per_stack:
+==================================
+
+Controls maximum number of stack frame context entries for
+(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
+instance, when using 'perf record -g' or 'perf trace --call-graph fp'.
+
+This can only be done when no events are in use that have callchains
+enabled, otherwise writing to this file will return -EBUSY.
+
+The default value is 8.
+
+
+pid_max:
+========
+
+PID allocation wrap value.  When the kernel's next PID value
+reaches this value, it wraps back to a minimum PID value.
+PIDs of value pid_max or larger are not allocated.
+
+
+ns_last_pid:
+============
+
+The last pid allocated in the current (the one task using this sysctl
+lives in) pid namespace. When selecting a pid for a next task on fork
+kernel tries to allocate a number starting from this one.
+
+
+powersave-nap: (PPC only)
+=========================
+
+If set, Linux-PPC will use the 'nap' mode of powersaving,
+otherwise the 'doze' mode will be used.
+
+==============================================================
+
+printk:
+=======
+
+The four values in printk denote: console_loglevel,
+default_message_loglevel, minimum_console_loglevel and
+default_console_loglevel respectively.
+
+These values influence printk() behavior when printing or
+logging error messages. See 'man 2 syslog' for more info on
+the different loglevels.
+
+- console_loglevel:
+	messages with a higher priority than
+	this will be printed to the console
+- default_message_loglevel:
+	messages without an explicit priority
+	will be printed with this priority
+- minimum_console_loglevel:
+	minimum (highest) value to which
+	console_loglevel can be set
+- default_console_loglevel:
+	default value for console_loglevel
+
+
+printk_delay:
+=============
+
+Delay each printk message in printk_delay milliseconds
+
+Value from 0 - 10000 is allowed.
+
+
+printk_ratelimit:
+=================
+
+Some warning messages are rate limited. printk_ratelimit specifies
+the minimum length of time between these messages (in jiffies), by
+default we allow one every 5 seconds.
+
+A value of 0 will disable rate limiting.
+
+
+printk_ratelimit_burst:
+=======================
+
+While long term we enforce one message per printk_ratelimit
+seconds, we do allow a burst of messages to pass through.
+printk_ratelimit_burst specifies the number of messages we can
+send before ratelimiting kicks in.
+
+
+printk_devkmsg:
+===============
+
+Control the logging to /dev/kmsg from userspace:
+
+ratelimit:
+	default, ratelimited
+
+on: unlimited logging to /dev/kmsg from userspace
+
+off: logging to /dev/kmsg disabled
+
+The kernel command line parameter printk.devkmsg= overrides this and is
+a one-time setting until next reboot: once set, it cannot be changed by
+this sysctl interface anymore.
+
+
+randomize_va_space:
+===================
+
+This option can be used to select the type of process address
+space randomization that is used in the system, for architectures
+that support this feature.
+
+==  ===========================================================================
+0   Turn the process address space randomization off.  This is the
+    default for architectures that do not support this feature anyways,
+    and kernels that are booted with the "norandmaps" parameter.
+
+1   Make the addresses of mmap base, stack and VDSO page randomized.
+    This, among other things, implies that shared libraries will be
+    loaded to random addresses.  Also for PIE-linked binaries, the
+    location of code start is randomized.  This is the default if the
+    CONFIG_COMPAT_BRK option is enabled.
+
+2   Additionally enable heap randomization.  This is the default if
+    CONFIG_COMPAT_BRK is disabled.
+
+    There are a few legacy applications out there (such as some ancient
+    versions of libc.so.5 from 1996) that assume that brk area starts
+    just after the end of the code+bss.  These applications break when
+    start of the brk area is randomized.  There are however no known
+    non-legacy applications that would be broken this way, so for most
+    systems it is safe to choose full randomization.
+
+    Systems with ancient and/or broken binaries should be configured
+    with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
+    address space randomization.
+==  ===========================================================================
+
+
+reboot-cmd: (Sparc only)
+========================
+
+??? This seems to be a way to give an argument to the Sparc
+ROM/Flash boot loader. Maybe to tell it what to do after
+rebooting. ???
+
+
+rtsig-max & rtsig-nr:
+=====================
+
+The file rtsig-max can be used to tune the maximum number
+of POSIX realtime (queued) signals that can be outstanding
+in the system.
+
+rtsig-nr shows the number of RT signals currently queued.
+
+
+sched_energy_aware:
+===================
+
+Enables/disables Energy Aware Scheduling (EAS). EAS starts
+automatically on platforms where it can run (that is,
+platforms with asymmetric CPU topologies and having an Energy
+Model available). If your platform happens to meet the
+requirements for EAS but you do not want to use it, change
+this value to 0.
+
+
+sched_schedstats:
+=================
+
+Enables/disables scheduler statistics. Enabling this feature
+incurs a small amount of overhead in the scheduler but is
+useful for debugging and performance tuning.
+
+
+sg-big-buff:
+============
+
+This file shows the size of the generic SCSI (sg) buffer.
+You can't tune it just yet, but you could change it on
+compile time by editing include/scsi/sg.h and changing
+the value of SG_BIG_BUFF.
+
+There shouldn't be any reason to change this value. If
+you can come up with one, you probably know what you
+are doing anyway :)
+
+
+shmall:
+=======
+
+This parameter sets the total amount of shared memory pages that
+can be used system wide. Hence, SHMALL should always be at least
+ceil(shmmax/PAGE_SIZE).
+
+If you are not sure what the default PAGE_SIZE is on your Linux
+system, you can run the following command:
+
+	# getconf PAGE_SIZE
+
+
+shmmax:
+=======
+
+This value can be used to query and set the run time limit
+on the maximum shared memory segment size that can be created.
+Shared memory segments up to 1Gb are now supported in the
+kernel.  This value defaults to SHMMAX.
+
+
+shm_rmid_forced:
+================
+
+Linux lets you set resource limits, including how much memory one
+process can consume, via setrlimit(2).  Unfortunately, shared memory
+segments are allowed to exist without association with any process, and
+thus might not be counted against any resource limits.  If enabled,
+shared memory segments are automatically destroyed when their attach
+count becomes zero after a detach or a process termination.  It will
+also destroy segments that were created, but never attached to, on exit
+from the process.  The only use left for IPC_RMID is to immediately
+destroy an unattached segment.  Of course, this breaks the way things are
+defined, so some applications might stop working.  Note that this
+feature will do you no good unless you also configure your resource
+limits (in particular, RLIMIT_AS and RLIMIT_NPROC).  Most systems don't
+need this.
+
+Note that if you change this from 0 to 1, already created segments
+without users and with a dead originative process will be destroyed.
+
+
+sysctl_writes_strict:
+=====================
+
+Control how file position affects the behavior of updating sysctl values
+via the /proc/sys interface:
+
+  ==   ======================================================================
+  -1   Legacy per-write sysctl value handling, with no printk warnings.
+       Each write syscall must fully contain the sysctl value to be
+       written, and multiple writes on the same sysctl file descriptor
+       will rewrite the sysctl value, regardless of file position.
+   0   Same behavior as above, but warn about processes that perform writes
+       to a sysctl file descriptor when the file position is not 0.
+   1   (default) Respect file position when writing sysctl strings. Multiple
+       writes will append to the sysctl value buffer. Anything past the max
+       length of the sysctl value buffer will be ignored. Writes to numeric
+       sysctl entries must always be at file position 0 and the value must
+       be fully contained in the buffer sent in the write syscall.
+  ==   ======================================================================
+
+
+softlockup_all_cpu_backtrace:
+=============================
+
+This value controls the soft lockup detector thread's behavior
+when a soft lockup condition is detected as to whether or not
+to gather further debug information. If enabled, each cpu will
+be issued an NMI and instructed to capture stack trace.
+
+This feature is only applicable for architectures which support
+NMI.
+
+0: do nothing. This is the default behavior.
+
+1: on detection capture more debug information.
+
+
+soft_watchdog:
+==============
+
+This parameter can be used to control the soft lockup detector.
+
+   0 - disable the soft lockup detector
+
+   1 - enable the soft lockup detector
+
+The soft lockup detector monitors CPUs for threads that are hogging the CPUs
+without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
+from running. The mechanism depends on the CPUs ability to respond to timer
+interrupts which are needed for the 'watchdog/N' threads to be woken up by
+the watchdog timer function, otherwise the NMI watchdog - if enabled - can
+detect a hard lockup condition.
+
+
+stack_erasing:
+==============
+
+This parameter can be used to control kernel stack erasing at the end
+of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.
+
+That erasing reduces the information which kernel stack leak bugs
+can reveal and blocks some uninitialized stack variable attacks.
+The tradeoff is the performance impact: on a single CPU system kernel
+compilation sees a 1% slowdown, other systems and workloads may vary.
+
+  0: kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
+
+  1: kernel stack erasing is enabled (default), it is performed before
+     returning to the userspace at the end of syscalls.
+
+
+tainted
+=======
+
+Non-zero if the kernel has been tainted. Numeric values, which can be
+ORed together. The letters are seen in "Tainted" line of Oops reports.
+
+======  =====  ==============================================================
+     1  `(P)`  proprietary module was loaded
+     2  `(F)`  module was force loaded
+     4  `(S)`  SMP kernel oops on an officially SMP incapable processor
+     8  `(R)`  module was force unloaded
+    16  `(M)`  processor reported a Machine Check Exception (MCE)
+    32  `(B)`  bad page referenced or some unexpected page flags
+    64  `(U)`  taint requested by userspace application
+   128  `(D)`  kernel died recently, i.e. there was an OOPS or BUG
+   256  `(A)`  an ACPI table was overridden by user
+   512  `(W)`  kernel issued warning
+  1024  `(C)`  staging driver was loaded
+  2048  `(I)`  workaround for bug in platform firmware applied
+  4096  `(O)`  externally-built ("out-of-tree") module was loaded
+  8192  `(E)`  unsigned module was loaded
+ 16384  `(L)`  soft lockup occurred
+ 32768  `(K)`  kernel has been live patched
+ 65536  `(X)`  Auxiliary taint, defined and used by for distros
+131072  `(T)`  The kernel was built with the struct randomization plugin
+======  =====  ==============================================================
+
+See Documentation/admin-guide/tainted-kernels.rst for more information.
+
+
+threads-max:
+============
+
+This value controls the maximum number of threads that can be created
+using fork().
+
+During initialization the kernel sets this value such that even if the
+maximum number of threads is created, the thread structures occupy only
+a part (1/8th) of the available RAM pages.
+
+The minimum value that can be written to threads-max is 20.
+
+The maximum value that can be written to threads-max is given by the
+constant FUTEX_TID_MASK (0x3fffffff).
+
+If a value outside of this range is written to threads-max an error
+EINVAL occurs.
+
+The value written is checked against the available RAM pages. If the
+thread structures would occupy too much (more than 1/8th) of the
+available RAM pages threads-max is reduced accordingly.
+
+
+unknown_nmi_panic:
+==================
+
+The value in this file affects behavior of handling NMI. When the
+value is non-zero, unknown NMI is trapped and then panic occurs. At
+that time, kernel debugging information is displayed on console.
+
+NMI switch that most IA32 servers have fires unknown NMI up, for
+example.  If a system hangs up, try pressing the NMI switch.
+
+
+watchdog:
+=========
+
+This parameter can be used to disable or enable the soft lockup detector
+_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
+
+   0 - disable both lockup detectors
+
+   1 - enable both lockup detectors
+
+The soft lockup detector and the NMI watchdog can also be disabled or
+enabled individually, using the soft_watchdog and nmi_watchdog parameters.
+If the watchdog parameter is read, for example by executing::
+
+   cat /proc/sys/kernel/watchdog
+
+the output of this command (0 or 1) shows the logical OR of soft_watchdog
+and nmi_watchdog.
+
+
+watchdog_cpumask:
+=================
+
+This value can be used to control on which cpus the watchdog may run.
+The default cpumask is all possible cores, but if NO_HZ_FULL is
+enabled in the kernel config, and cores are specified with the
+nohz_full= boot argument, those cores are excluded by default.
+Offline cores can be included in this mask, and if the core is later
+brought online, the watchdog will be started based on the mask value.
+
+Typically this value would only be touched in the nohz_full case
+to re-enable cores that by default were not running the watchdog,
+if a kernel lockup was suspected on those cores.
+
+The argument value is the standard cpulist format for cpumasks,
+so for example to enable the watchdog on cores 0, 2, 3, and 4 you
+might say::
+
+  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
+
+
+watchdog_thresh:
+================
+
+This value can be used to control the frequency of hrtimer and NMI
+events and the soft and hard lockup thresholds. The default threshold
+is 10 seconds.
+
+The softlockup threshold is (2 * watchdog_thresh). Setting this
+tunable to zero will disable lockup detection altogether.
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
new file mode 100644
index 000000000000..a7d44e71019d
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -0,0 +1,461 @@
+================================
+Documentation for /proc/sys/net/
+================================
+
+Copyright
+
+Copyright (c) 1999
+
+	- Terrehon Bowden <terrehon@pacbell.net>
+	- Bodo Bauer <bb@ricochet.net>
+
+Copyright (c) 2000
+
+	- Jorge Nerin <comandante@zaralinux.com>
+
+Copyright (c) 2009
+
+	- Shen Feng <shen@cn.fujitsu.com>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/net
+
+The interface  to  the  networking  parts  of  the  kernel  is  located  in
+/proc/sys/net. The following table shows all possible subdirectories.  You may
+see only some of them, depending on your kernel's configuration.
+
+
+Table : Subdirectories in /proc/sys/net
+
+ ========= =================== = ========== ==================
+ Directory Content               Directory  Content
+ ========= =================== = ========== ==================
+ core      General parameter     appletalk  Appletalk protocol
+ unix      Unix domain sockets   netrom     NET/ROM
+ 802       E802 protocol         ax25       AX25
+ ethernet  Ethernet protocol     rose       X.25 PLP layer
+ ipv4      IP version 4          x25        X.25 protocol
+ ipx       IPX                   token-ring IBM token ring
+ bridge    Bridging              decnet     DEC net
+ ipv6      IP version 6          tipc       TIPC
+ ========= =================== = ========== ==================
+
+1. /proc/sys/net/core - Network core options
+============================================
+
+bpf_jit_enable
+--------------
+
+This enables the BPF Just in Time (JIT) compiler. BPF is a flexible
+and efficient infrastructure allowing to execute bytecode at various
+hook points. It is used in a number of Linux kernel subsystems such
+as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints)
+and security (e.g. seccomp). LLVM has a BPF back end that can compile
+restricted C into a sequence of BPF instructions. After program load
+through bpf(2) and passing a verifier in the kernel, a JIT will then
+translate these BPF proglets into native CPU instructions. There are
+two flavors of JITs, the newer eBPF JIT currently supported on:
+
+  - x86_64
+  - x86_32
+  - arm64
+  - arm32
+  - ppc64
+  - sparc64
+  - mips64
+  - s390x
+  - riscv
+
+And the older cBPF JIT supported on the following archs:
+
+  - mips
+  - ppc
+  - sparc
+
+eBPF JITs are a superset of cBPF JITs, meaning the kernel will
+migrate cBPF instructions into eBPF instructions and then JIT
+compile them transparently. Older cBPF JITs can only translate
+tcpdump filters, seccomp rules, etc, but not mentioned eBPF
+programs loaded through bpf(2).
+
+Values:
+
+	- 0 - disable the JIT (default value)
+	- 1 - enable the JIT
+	- 2 - enable the JIT and ask the compiler to emit traces on kernel log.
+
+bpf_jit_harden
+--------------
+
+This enables hardening for the BPF JIT compiler. Supported are eBPF
+JIT backends. Enabling hardening trades off performance, but can
+mitigate JIT spraying.
+
+Values:
+
+	- 0 - disable JIT hardening (default value)
+	- 1 - enable JIT hardening for unprivileged users only
+	- 2 - enable JIT hardening for all users
+
+bpf_jit_kallsyms
+----------------
+
+When BPF JIT compiler is enabled, then compiled images are unknown
+addresses to the kernel, meaning they neither show up in traces nor
+in /proc/kallsyms. This enables export of these addresses, which can
+be used for debugging/tracing. If bpf_jit_harden is enabled, this
+feature is disabled.
+
+Values :
+
+	- 0 - disable JIT kallsyms export (default value)
+	- 1 - enable JIT kallsyms export for privileged users only
+
+bpf_jit_limit
+-------------
+
+This enforces a global limit for memory allocations to the BPF JIT
+compiler in order to reject unprivileged JIT requests once it has
+been surpassed. bpf_jit_limit contains the value of the global limit
+in bytes.
+
+dev_weight
+----------
+
+The maximum number of packets that kernel can handle on a NAPI interrupt,
+it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
+aggregated packet is counted as one packet in this context.
+
+Default: 64
+
+dev_weight_rx_bias
+------------------
+
+RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
+of the driver for the per softirq cycle netdev_budget. This parameter influences
+the proportion of the configured netdev_budget that is spent on RPS based packet
+processing during RX softirq cycles. It is further meant for making current
+dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
+(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
+on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
+
+Default: 1
+
+dev_weight_tx_bias
+------------------
+
+Scales the maximum number of packets that can be processed during a TX softirq cycle.
+Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
+net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
+
+Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
+
+Default: 1
+
+default_qdisc
+-------------
+
+The default queuing discipline to use for network devices. This allows
+overriding the default of pfifo_fast with an alternative. Since the default
+queuing discipline is created without additional parameters so is best suited
+to queuing disciplines that work well without configuration like stochastic
+fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
+queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
+which require setting up classes and bandwidths. Note that physical multiqueue
+interfaces still use mq as root qdisc, which in turn uses this default for its
+leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
+default to noqueue.
+
+Default: pfifo_fast
+
+busy_read
+---------
+
+Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
+Approximate time in us to busy loop waiting for packets on the device queue.
+This sets the default value of the SO_BUSY_POLL socket option.
+Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
+which is the preferred method of enabling. If you need to enable the feature
+globally via sysctl, a value of 50 is recommended.
+
+Will increase power usage.
+
+Default: 0 (off)
+
+busy_poll
+----------------
+Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
+Approximate time in us to busy loop waiting for events.
+Recommended value depends on the number of sockets you poll on.
+For several sockets 50, for several hundreds 100.
+For more than that you probably want to use epoll.
+Note that only sockets with SO_BUSY_POLL set will be busy polled,
+so you want to either selectively set SO_BUSY_POLL on those sockets or set
+sysctl.net.busy_read globally.
+
+Will increase power usage.
+
+Default: 0 (off)
+
+rmem_default
+------------
+
+The default setting of the socket receive buffer in bytes.
+
+rmem_max
+--------
+
+The maximum receive socket buffer size in bytes.
+
+tstamp_allow_data
+-----------------
+Allow processes to receive tx timestamps looped together with the original
+packet contents. If disabled, transmit timestamp requests from unprivileged
+processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
+
+Default: 1 (on)
+
+
+wmem_default
+------------
+
+The default setting (in bytes) of the socket send buffer.
+
+wmem_max
+--------
+
+The maximum send socket buffer size in bytes.
+
+message_burst and message_cost
+------------------------------
+
+These parameters  are used to limit the warning messages written to the kernel
+log from  the  networking  code.  They  enforce  a  rate  limit  to  make  a
+denial-of-service attack  impossible. A higher message_cost factor, results in
+fewer messages that will be written. Message_burst controls when messages will
+be dropped.  The  default  settings  limit  warning messages to one every five
+seconds.
+
+warnings
+--------
+
+This sysctl is now unused.
+
+This was used to control console messages from the networking stack that
+occur because of problems on the network like duplicate address or bad
+checksums.
+
+These messages are now emitted at KERN_DEBUG and can generally be enabled
+and controlled by the dynamic_debug facility.
+
+netdev_budget
+-------------
+
+Maximum number of packets taken from all interfaces in one polling cycle (NAPI
+poll). In one polling cycle interfaces which are registered to polling are
+probed in a round-robin manner. Also, a polling cycle may not exceed
+netdev_budget_usecs microseconds, even if netdev_budget has not been
+exhausted.
+
+netdev_budget_usecs
+---------------------
+
+Maximum number of microseconds in one NAPI polling cycle. Polling
+will exit when either netdev_budget_usecs have elapsed during the
+poll cycle or the number of packets processed reaches netdev_budget.
+
+netdev_max_backlog
+------------------
+
+Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
+receives packets faster than kernel can process them.
+
+netdev_rss_key
+--------------
+
+RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
+randomly generated.
+Some user space might need to gather its content even if drivers do not
+provide ethtool -x support yet.
+
+::
+
+  myhost:~# cat /proc/sys/net/core/netdev_rss_key
+  84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
+
+File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
+
+Note:
+  /proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
+  but most drivers only use 40 bytes of it.
+
+::
+
+  myhost:~# ethtool -x eth0
+  RX flow hash indirection table for eth0 with 8 RX ring(s):
+      0:    0     1     2     3     4     5     6     7
+  RSS hash key:
+  84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
+
+netdev_tstamp_prequeue
+----------------------
+
+If set to 0, RX packet timestamps can be sampled after RPS processing, when
+the target CPU processes packets. It might give some delay on timestamps, but
+permit to distribute the load on several cpus.
+
+If set to 1 (default), timestamps are sampled as soon as possible, before
+queueing.
+
+optmem_max
+----------
+
+Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
+of struct cmsghdr structures with appended data.
+
+fb_tunnels_only_for_init_net
+----------------------------
+
+Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
+sit0, ip6tnl0, ip6gre0) are automatically created when a new
+network namespace is created, if corresponding tunnel is present
+in initial network namespace.
+If set to 1, these devices are not automatically created, and
+user space is responsible for creating them if needed.
+
+Default : 0  (for compatibility reasons)
+
+devconf_inherit_init_net
+------------------------
+
+Controls if a new network namespace should inherit all current
+settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
+default, we keep the current behavior: for IPv4 we inherit all current
+settings from init_net and for IPv6 we reset all settings to default.
+
+If set to 1, both IPv4 and IPv6 settings are forced to inherit from
+current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
+forced to reset to their default values.
+
+Default : 0  (for compatibility reasons)
+
+2. /proc/sys/net/unix - Parameters for Unix domain sockets
+----------------------------------------------------------
+
+There is only one file in this directory.
+unix_dgram_qlen limits the max number of datagrams queued in Unix domain
+socket's buffer. It will not take effect unless PF_UNIX flag is specified.
+
+
+3. /proc/sys/net/ipv4 - IPV4 settings
+-------------------------------------
+Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
+descriptions of these entries.
+
+
+4. Appletalk
+------------
+
+The /proc/sys/net/appletalk  directory  holds the Appletalk configuration data
+when Appletalk is loaded. The configurable parameters are:
+
+aarp-expiry-time
+----------------
+
+The amount  of  time  we keep an ARP entry before expiring it. Used to age out
+old hosts.
+
+aarp-resolve-time
+-----------------
+
+The amount of time we will spend trying to resolve an Appletalk address.
+
+aarp-retransmit-limit
+---------------------
+
+The number of times we will retransmit a query before giving up.
+
+aarp-tick-time
+--------------
+
+Controls the rate at which expires are checked.
+
+The directory  /proc/net/appletalk  holds the list of active Appletalk sockets
+on a machine.
+
+The fields  indicate  the DDP type, the local address (in network:node format)
+the remote  address,  the  size of the transmit pending queue, the size of the
+received queue  (bytes waiting for applications to read) the state and the uid
+owning the socket.
+
+/proc/net/atalk_iface lists  all  the  interfaces  configured for appletalk.It
+shows the  name  of the interface, its Appletalk address, the network range on
+that address  (or  network number for phase 1 networks), and the status of the
+interface.
+
+/proc/net/atalk_route lists  each  known  network  route.  It lists the target
+(network) that the route leads to, the router (may be directly connected), the
+route flags, and the device the route is using.
+
+
+5. IPX
+------
+
+The IPX protocol has no tunable values in proc/sys/net.
+
+The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX
+socket giving  the  local  and  remote  addresses  in  Novell  format (that is
+network:node:port). In  accordance  with  the  strange  Novell  tradition,
+everything but the port is in hex. Not_Connected is displayed for sockets that
+are not  tied to a specific remote address. The Tx and Rx queue sizes indicate
+the number  of  bytes  pending  for  transmission  and  reception.  The  state
+indicates the  state  the  socket  is  in and the uid is the owning uid of the
+socket.
+
+The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface
+it gives  the network number, the node number, and indicates if the network is
+the primary  network.  It  also  indicates  which  device  it  is bound to (or
+Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux
+supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for
+IPX.
+
+The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it
+gives the  destination  network, the router node (or Directly) and the network
+address of the router (or Connected) for internal networks.
+
+6. TIPC
+-------
+
+tipc_rmem
+---------
+
+The TIPC protocol now has a tunable for the receive memory, similar to the
+tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
+
+::
+
+    # cat /proc/sys/net/tipc/tipc_rmem
+    4252725 34021800        68043600
+    #
+
+The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
+are scaled (shifted) versions of that same value.  Note that the min value
+is not at this point in time used in any meaningful way, but the triplet is
+preserved in order to be consistent with things like tcp_rmem.
+
+named_timeout
+-------------
+
+TIPC name table updates are distributed asynchronously in a cluster, without
+any form of transaction handling. This means that different race scenarios are
+possible. One such is that a name withdrawal sent out by one node and received
+by another node may arrive after a second, overlapping name publication already
+has been accepted from a third node, although the conflicting updates
+originally may have been issued in the correct sequential order.
+If named_timeout is nonzero, failed topology updates will be placed on a defer
+queue until another event arrives that clears the error, or until the timeout
+expires. Value is in milliseconds.
diff --git a/Documentation/admin-guide/sysctl/sunrpc.rst b/Documentation/admin-guide/sysctl/sunrpc.rst
new file mode 100644
index 000000000000..09780a682afd
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/sunrpc.rst
@@ -0,0 +1,25 @@
+===================================
+Documentation for /proc/sys/sunrpc/
+===================================
+
+kernel version 2.2.10
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/sunrpc and is valid for Linux kernel version 2.2.
+
+The files in this directory can be used to (re)set the debug
+flags of the SUN Remote Procedure Call (RPC) subsystem in
+the Linux kernel. This stuff is used for NFS, KNFSD and
+maybe a few other things as well.
+
+The files in there are used to control the debugging flags:
+rpc_debug, nfs_debug, nfsd_debug and nlm_debug.
+
+These flags are for kernel hackers only. You should read the
+source code in net/sunrpc/ for more information.
diff --git a/Documentation/admin-guide/sysctl/user.rst b/Documentation/admin-guide/sysctl/user.rst
new file mode 100644
index 000000000000..650eaa03f15e
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/user.rst
@@ -0,0 +1,78 @@
+=================================
+Documentation for /proc/sys/user/
+=================================
+
+kernel version 4.9.0
+
+Copyright (c) 2016		Eric Biederman <ebiederm@xmission.com>
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/user.
+
+The files in this directory can be used to override the default
+limits on the number of namespaces and other objects that have
+per user per user namespace limits.
+
+The primary purpose of these limits is to stop programs that
+malfunction and attempt to create a ridiculous number of objects,
+before the malfunction becomes a system wide problem.  It is the
+intention that the defaults of these limits are set high enough that
+no program in normal operation should run into these limits.
+
+The creation of per user per user namespace objects are charged to
+the user in the user namespace who created the object and
+verified to be below the per user limit in that user namespace.
+
+The creation of objects is also charged to all of the users
+who created user namespaces the creation of the object happens
+in (user namespaces can be nested) and verified to be below the per user
+limits in the user namespaces of those users.
+
+This recursive counting of created objects ensures that creating a
+user namespace does not allow a user to escape their current limits.
+
+Currently, these files are in /proc/sys/user:
+
+max_cgroup_namespaces
+=====================
+
+  The maximum number of cgroup namespaces that any user in the current
+  user namespace may create.
+
+max_ipc_namespaces
+==================
+
+  The maximum number of ipc namespaces that any user in the current
+  user namespace may create.
+
+max_mnt_namespaces
+==================
+
+  The maximum number of mount namespaces that any user in the current
+  user namespace may create.
+
+max_net_namespaces
+==================
+
+  The maximum number of network namespaces that any user in the
+  current user namespace may create.
+
+max_pid_namespaces
+==================
+
+  The maximum number of pid namespaces that any user in the current
+  user namespace may create.
+
+max_user_namespaces
+===================
+
+  The maximum number of user namespaces that any user in the current
+  user namespace may create.
+
+max_uts_namespaces
+==================
+
+  The maximum number of user namespaces that any user in the current
+  user namespace may create.
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
new file mode 100644
index 000000000000..5aceb5cd5ce7
--- /dev/null
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -0,0 +1,964 @@
+===============================
+Documentation for /proc/sys/vm/
+===============================
+
+kernel version 2.6.29
+
+Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
+
+Copyright (c) 2008         Peter W. Morreale <pmorreale@novell.com>
+
+For general info and legal blurb, please look in index.rst.
+
+------------------------------------------------------------------------------
+
+This file contains the documentation for the sysctl files in
+/proc/sys/vm and is valid for Linux kernel version 2.6.29.
+
+The files in this directory can be used to tune the operation
+of the virtual memory (VM) subsystem of the Linux kernel and
+the writeout of dirty data to disk.
+
+Default values and initialization routines for most of these
+files can be found in mm/swap.c.
+
+Currently, these files are in /proc/sys/vm:
+
+- admin_reserve_kbytes
+- block_dump
+- compact_memory
+- compact_unevictable_allowed
+- dirty_background_bytes
+- dirty_background_ratio
+- dirty_bytes
+- dirty_expire_centisecs
+- dirty_ratio
+- dirtytime_expire_seconds
+- dirty_writeback_centisecs
+- drop_caches
+- extfrag_threshold
+- hugetlb_shm_group
+- laptop_mode
+- legacy_va_layout
+- lowmem_reserve_ratio
+- max_map_count
+- memory_failure_early_kill
+- memory_failure_recovery
+- min_free_kbytes
+- min_slab_ratio
+- min_unmapped_ratio
+- mmap_min_addr
+- mmap_rnd_bits
+- mmap_rnd_compat_bits
+- nr_hugepages
+- nr_hugepages_mempolicy
+- nr_overcommit_hugepages
+- nr_trim_pages         (only if CONFIG_MMU=n)
+- numa_zonelist_order
+- oom_dump_tasks
+- oom_kill_allocating_task
+- overcommit_kbytes
+- overcommit_memory
+- overcommit_ratio
+- page-cluster
+- panic_on_oom
+- percpu_pagelist_fraction
+- stat_interval
+- stat_refresh
+- numa_stat
+- swappiness
+- unprivileged_userfaultfd
+- user_reserve_kbytes
+- vfs_cache_pressure
+- watermark_boost_factor
+- watermark_scale_factor
+- zone_reclaim_mode
+
+
+admin_reserve_kbytes
+====================
+
+The amount of free memory in the system that should be reserved for users
+with the capability cap_sys_admin.
+
+admin_reserve_kbytes defaults to min(3% of free pages, 8MB)
+
+That should provide enough for the admin to log in and kill a process,
+if necessary, under the default overcommit 'guess' mode.
+
+Systems running under overcommit 'never' should increase this to account
+for the full Virtual Memory Size of programs used to recover. Otherwise,
+root may not be able to log in to recover the system.
+
+How do you calculate a minimum useful reserve?
+
+sshd or login + bash (or some other shell) + top (or ps, kill, etc.)
+
+For overcommit 'guess', we can sum resident set sizes (RSS).
+On x86_64 this is about 8MB.
+
+For overcommit 'never', we can take the max of their virtual sizes (VSZ)
+and add the sum of their RSS.
+On x86_64 this is about 128MB.
+
+Changing this takes effect whenever an application requests memory.
+
+
+block_dump
+==========
+
+block_dump enables block I/O debugging when set to a nonzero value. More
+information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
+
+
+compact_memory
+==============
+
+Available only when CONFIG_COMPACTION is set. When 1 is written to the file,
+all zones are compacted such that free memory is available in contiguous
+blocks where possible. This can be important for example in the allocation of
+huge pages although processes will also directly compact memory as required.
+
+
+compact_unevictable_allowed
+===========================
+
+Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
+allowed to examine the unevictable lru (mlocked pages) for pages to compact.
+This should be used on systems where stalls for minor page faults are an
+acceptable trade for large contiguous free memory.  Set to 0 to prevent
+compaction from moving pages that are unevictable.  Default value is 1.
+
+
+dirty_background_bytes
+======================
+
+Contains the amount of dirty memory at which the background kernel
+flusher threads will start writeback.
+
+Note:
+  dirty_background_bytes is the counterpart of dirty_background_ratio. Only
+  one of them may be specified at a time. When one sysctl is written it is
+  immediately taken into account to evaluate the dirty memory limits and the
+  other appears as 0 when read.
+
+
+dirty_background_ratio
+======================
+
+Contains, as a percentage of total available memory that contains free pages
+and reclaimable pages, the number of pages at which the background kernel
+flusher threads will start writing out dirty data.
+
+The total available memory is not equal to total system memory.
+
+
+dirty_bytes
+===========
+
+Contains the amount of dirty memory at which a process generating disk writes
+will itself start writeback.
+
+Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be
+specified at a time. When one sysctl is written it is immediately taken into
+account to evaluate the dirty memory limits and the other appears as 0 when
+read.
+
+Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
+value lower than this limit will be ignored and the old configuration will be
+retained.
+
+
+dirty_expire_centisecs
+======================
+
+This tunable is used to define when dirty data is old enough to be eligible
+for writeout by the kernel flusher threads.  It is expressed in 100'ths
+of a second.  Data which has been dirty in-memory for longer than this
+interval will be written out next time a flusher thread wakes up.
+
+
+dirty_ratio
+===========
+
+Contains, as a percentage of total available memory that contains free pages
+and reclaimable pages, the number of pages at which a process which is
+generating disk writes will itself start writing out dirty data.
+
+The total available memory is not equal to total system memory.
+
+
+dirtytime_expire_seconds
+========================
+
+When a lazytime inode is constantly having its pages dirtied, the inode with
+an updated timestamp will never get chance to be written out.  And, if the
+only thing that has happened on the file system is a dirtytime inode caused
+by an atime update, a worker will be scheduled to make sure that inode
+eventually gets pushed out to disk.  This tunable is used to define when dirty
+inode is old enough to be eligible for writeback by the kernel flusher threads.
+And, it is also used as the interval to wakeup dirtytime_writeback thread.
+
+
+dirty_writeback_centisecs
+=========================
+
+The kernel flusher threads will periodically wake up and write `old` data
+out to disk.  This tunable expresses the interval between those wakeups, in
+100'ths of a second.
+
+Setting this to zero disables periodic writeback altogether.
+
+
+drop_caches
+===========
+
+Writing to this will cause the kernel to drop clean caches, as well as
+reclaimable slab objects like dentries and inodes.  Once dropped, their
+memory becomes free.
+
+To free pagecache::
+
+	echo 1 > /proc/sys/vm/drop_caches
+
+To free reclaimable slab objects (includes dentries and inodes)::
+
+	echo 2 > /proc/sys/vm/drop_caches
+
+To free slab objects and pagecache::
+
+	echo 3 > /proc/sys/vm/drop_caches
+
+This is a non-destructive operation and will not free any dirty objects.
+To increase the number of objects freed by this operation, the user may run
+`sync` prior to writing to /proc/sys/vm/drop_caches.  This will minimize the
+number of dirty objects on the system and create more candidates to be
+dropped.
+
+This file is not a means to control the growth of the various kernel caches
+(inodes, dentries, pagecache, etc...)  These objects are automatically
+reclaimed by the kernel when memory is needed elsewhere on the system.
+
+Use of this file can cause performance problems.  Since it discards cached
+objects, it may cost a significant amount of I/O and CPU to recreate the
+dropped objects, especially if they were under heavy use.  Because of this,
+use outside of a testing or debugging environment is not recommended.
+
+You may see informational messages in your kernel log when this file is
+used::
+
+	cat (1234): drop_caches: 3
+
+These are informational only.  They do not mean that anything is wrong
+with your system.  To disable them, echo 4 (bit 2) into drop_caches.
+
+
+extfrag_threshold
+=================
+
+This parameter affects whether the kernel will compact memory or direct
+reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
+debugfs shows what the fragmentation index for each order is in each zone in
+the system. Values tending towards 0 imply allocations would fail due to lack
+of memory, values towards 1000 imply failures are due to fragmentation and -1
+implies that the allocation will succeed as long as watermarks are met.
+
+The kernel will not compact memory in a zone if the
+fragmentation index is <= extfrag_threshold. The default value is 500.
+
+
+highmem_is_dirtyable
+====================
+
+Available only for systems with CONFIG_HIGHMEM enabled (32b systems).
+
+This parameter controls whether the high memory is considered for dirty
+writers throttling.  This is not the case by default which means that
+only the amount of memory directly visible/usable by the kernel can
+be dirtied. As a result, on systems with a large amount of memory and
+lowmem basically depleted writers might be throttled too early and
+streaming writes can get very slow.
+
+Changing the value to non zero would allow more memory to be dirtied
+and thus allow writers to write more data which can be flushed to the
+storage more effectively. Note this also comes with a risk of pre-mature
+OOM killer because some writers (e.g. direct block device writes) can
+only use the low memory and they can fill it up with dirty data without
+any throttling.
+
+
+hugetlb_shm_group
+=================
+
+hugetlb_shm_group contains group id that is allowed to create SysV
+shared memory segment using hugetlb page.
+
+
+laptop_mode
+===========
+
+laptop_mode is a knob that controls "laptop mode". All the things that are
+controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
+
+
+legacy_va_layout
+================
+
+If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel
+will use the legacy (2.4) layout for all processes.
+
+
+lowmem_reserve_ratio
+====================
+
+For some specialised workloads on highmem machines it is dangerous for
+the kernel to allow process memory to be allocated from the "lowmem"
+zone.  This is because that memory could then be pinned via the mlock()
+system call, or by unavailability of swapspace.
+
+And on large highmem machines this lack of reclaimable lowmem memory
+can be fatal.
+
+So the Linux page allocator has a mechanism which prevents allocations
+which *could* use highmem from using too much lowmem.  This means that
+a certain amount of lowmem is defended from the possibility of being
+captured into pinned user memory.
+
+(The same argument applies to the old 16 megabyte ISA DMA region.  This
+mechanism will also defend that region from allocations which could use
+highmem or lowmem).
+
+The `lowmem_reserve_ratio` tunable determines how aggressive the kernel is
+in defending these lower zones.
+
+If you have a machine which uses highmem or ISA DMA and your
+applications are using mlock(), or if you are running with no swap then
+you probably should change the lowmem_reserve_ratio setting.
+
+The lowmem_reserve_ratio is an array. You can see them by reading this file::
+
+	% cat /proc/sys/vm/lowmem_reserve_ratio
+	256     256     32
+
+But, these values are not used directly. The kernel calculates # of protection
+pages for each zones from them. These are shown as array of protection pages
+in /proc/zoneinfo like followings. (This is an example of x86-64 box).
+Each zone has an array of protection pages like this::
+
+  Node 0, zone      DMA
+    pages free     1355
+          min      3
+          low      3
+          high     4
+	:
+	:
+      numa_other   0
+          protection: (0, 2004, 2004, 2004)
+	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    pagesets
+      cpu: 0 pcp: 0
+          :
+
+These protections are added to score to judge whether this zone should be used
+for page allocation or should be reclaimed.
+
+In this example, if normal pages (index=2) are required to this DMA zone and
+watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
+not be used because pages_free(1355) is smaller than watermark + protection[2]
+(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
+normal page requirement. If requirement is DMA zone(index=0), protection[0]
+(=0) is used.
+
+zone[i]'s protection[j] is calculated by following expression::
+
+  (i < j):
+    zone[i]->protection[j]
+    = (total sums of managed_pages from zone[i+1] to zone[j] on the node)
+      / lowmem_reserve_ratio[i];
+  (i = j):
+     (should not be protected. = 0;
+  (i > j):
+     (not necessary, but looks 0)
+
+The default values of lowmem_reserve_ratio[i] are
+
+    === ====================================
+    256 (if zone[i] means DMA or DMA32 zone)
+    32  (others)
+    === ====================================
+
+As above expression, they are reciprocal number of ratio.
+256 means 1/256. # of protection pages becomes about "0.39%" of total managed
+pages of higher zones on the node.
+
+If you would like to protect more pages, smaller values are effective.
+The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
+disables protection of the pages.
+
+
+max_map_count:
+==============
+
+This file contains the maximum number of memory map areas a process
+may have. Memory map areas are used as a side-effect of calling
+malloc, directly by mmap, mprotect, and madvise, and also when loading
+shared libraries.
+
+While most applications need less than a thousand maps, certain
+programs, particularly malloc debuggers, may consume lots of them,
+e.g., up to one or two maps per allocation.
+
+The default value is 65536.
+
+
+memory_failure_early_kill:
+==========================
+
+Control how to kill processes when uncorrected memory error (typically
+a 2bit error in a memory module) is detected in the background by hardware
+that cannot be handled by the kernel. In some cases (like the page
+still having a valid copy on disk) the kernel will handle the failure
+transparently without affecting any applications. But if there is
+no other uptodate copy of the data it will kill to prevent any data
+corruptions from propagating.
+
+1: Kill all processes that have the corrupted and not reloadable page mapped
+as soon as the corruption is detected.  Note this is not supported
+for a few types of pages, like kernel internally allocated data or
+the swap cache, but works for the majority of user pages.
+
+0: Only unmap the corrupted page from all processes and only kill a process
+who tries to access it.
+
+The kill is done using a catchable SIGBUS with BUS_MCEERR_AO, so processes can
+handle this if they want to.
+
+This is only active on architectures/platforms with advanced machine
+check handling and depends on the hardware capabilities.
+
+Applications can override this setting individually with the PR_MCE_KILL prctl
+
+
+memory_failure_recovery
+=======================
+
+Enable memory failure recovery (when supported by the platform)
+
+1: Attempt recovery.
+
+0: Always panic on a memory failure.
+
+
+min_free_kbytes
+===============
+
+This is used to force the Linux VM to keep a minimum number
+of kilobytes free.  The VM uses this number to compute a
+watermark[WMARK_MIN] value for each lowmem zone in the system.
+Each lowmem zone gets a number of reserved free pages based
+proportionally on its size.
+
+Some minimal amount of memory is needed to satisfy PF_MEMALLOC
+allocations; if you set this to lower than 1024KB, your system will
+become subtly broken, and prone to deadlock under high loads.
+
+Setting this too high will OOM your machine instantly.
+
+
+min_slab_ratio
+==============
+
+This is available only on NUMA kernels.
+
+A percentage of the total pages in each zone.  On Zone reclaim
+(fallback from the local zone occurs) slabs will be reclaimed if more
+than this percentage of pages in a zone are reclaimable slab pages.
+This insures that the slab growth stays under control even in NUMA
+systems that rarely perform global reclaim.
+
+The default is 5 percent.
+
+Note that slab reclaim is triggered in a per zone / node fashion.
+The process of reclaiming slab memory is currently not node specific
+and may not be fast.
+
+
+min_unmapped_ratio
+==================
+
+This is available only on NUMA kernels.
+
+This is a percentage of the total pages in each zone. Zone reclaim will
+only occur if more than this percentage of pages are in a state that
+zone_reclaim_mode allows to be reclaimed.
+
+If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
+against all file-backed unmapped pages including swapcache pages and tmpfs
+files. Otherwise, only unmapped pages backed by normal files but not tmpfs
+files and similar are considered.
+
+The default is 1 percent.
+
+
+mmap_min_addr
+=============
+
+This file indicates the amount of address space  which a user process will
+be restricted from mmapping.  Since kernel null dereference bugs could
+accidentally operate based on the information in the first couple of pages
+of memory userspace processes should not be allowed to write to them.  By
+default this value is set to 0 and no protections will be enforced by the
+security module.  Setting this value to something like 64k will allow the
+vast majority of applications to work correctly and provide defense in depth
+against future potential kernel bugs.
+
+
+mmap_rnd_bits
+=============
+
+This value can be used to select the number of bits to use to
+determine the random offset to the base address of vma regions
+resulting from mmap allocations on architectures which support
+tuning address space randomization.  This value will be bounded
+by the architecture's minimum and maximum supported values.
+
+This value can be changed after boot using the
+/proc/sys/vm/mmap_rnd_bits tunable
+
+
+mmap_rnd_compat_bits
+====================
+
+This value can be used to select the number of bits to use to
+determine the random offset to the base address of vma regions
+resulting from mmap allocations for applications run in
+compatibility mode on architectures which support tuning address
+space randomization.  This value will be bounded by the
+architecture's minimum and maximum supported values.
+
+This value can be changed after boot using the
+/proc/sys/vm/mmap_rnd_compat_bits tunable
+
+
+nr_hugepages
+============
+
+Change the minimum size of the hugepage pool.
+
+See Documentation/admin-guide/mm/hugetlbpage.rst
+
+
+nr_hugepages_mempolicy
+======================
+
+Change the size of the hugepage pool at run-time on a specific
+set of NUMA nodes.
+
+See Documentation/admin-guide/mm/hugetlbpage.rst
+
+
+nr_overcommit_hugepages
+=======================
+
+Change the maximum size of the hugepage pool. The maximum is
+nr_hugepages + nr_overcommit_hugepages.
+
+See Documentation/admin-guide/mm/hugetlbpage.rst
+
+
+nr_trim_pages
+=============
+
+This is available only on NOMMU kernels.
+
+This value adjusts the excess page trimming behaviour of power-of-2 aligned
+NOMMU mmap allocations.
+
+A value of 0 disables trimming of allocations entirely, while a value of 1
+trims excess pages aggressively. Any value >= 1 acts as the watermark where
+trimming of allocations is initiated.
+
+The default value is 1.
+
+See Documentation/nommu-mmap.txt for more information.
+
+
+numa_zonelist_order
+===================
+
+This sysctl is only for NUMA and it is deprecated. Anything but
+Node order will fail!
+
+'where the memory is allocated from' is controlled by zonelists.
+
+(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation.
+you may be able to read ZONE_DMA as ZONE_DMA32...)
+
+In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following.
+ZONE_NORMAL -> ZONE_DMA
+This means that a memory allocation request for GFP_KERNEL will
+get memory from ZONE_DMA only when ZONE_NORMAL is not available.
+
+In NUMA case, you can think of following 2 types of order.
+Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL::
+
+  (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
+  (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
+
+Type(A) offers the best locality for processes on Node(0), but ZONE_DMA
+will be used before ZONE_NORMAL exhaustion. This increases possibility of
+out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small.
+
+Type(B) cannot offer the best locality but is more robust against OOM of
+the DMA zone.
+
+Type(A) is called as "Node" order. Type (B) is "Zone" order.
+
+"Node order" orders the zonelists by node, then by zone within each node.
+Specify "[Nn]ode" for node order
+
+"Zone Order" orders the zonelists by zone type, then by node within each
+zone.  Specify "[Zz]one" for zone order.
+
+Specify "[Dd]efault" to request automatic configuration.
+
+On 32-bit, the Normal zone needs to be preserved for allocations accessible
+by the kernel, so "zone" order will be selected.
+
+On 64-bit, devices that require DMA32/DMA are relatively rare, so "node"
+order will be selected.
+
+Default order is recommended unless this is causing problems for your
+system/application.
+
+
+oom_dump_tasks
+==============
+
+Enables a system-wide task dump (excluding kernel threads) to be produced
+when the kernel performs an OOM-killing and includes such information as
+pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
+score, and name.  This is helpful to determine why the OOM killer was
+invoked, to identify the rogue task that caused it, and to determine why
+the OOM killer chose the task it did to kill.
+
+If this is set to zero, this information is suppressed.  On very
+large systems with thousands of tasks it may not be feasible to dump
+the memory state information for each one.  Such systems should not
+be forced to incur a performance penalty in OOM conditions when the
+information may not be desired.
+
+If this is set to non-zero, this information is shown whenever the
+OOM killer actually kills a memory-hogging task.
+
+The default value is 1 (enabled).
+
+
+oom_kill_allocating_task
+========================
+
+This enables or disables killing the OOM-triggering task in
+out-of-memory situations.
+
+If this is set to zero, the OOM killer will scan through the entire
+tasklist and select a task based on heuristics to kill.  This normally
+selects a rogue memory-hogging task that frees up a large amount of
+memory when killed.
+
+If this is set to non-zero, the OOM killer simply kills the task that
+triggered the out-of-memory condition.  This avoids the expensive
+tasklist scan.
+
+If panic_on_oom is selected, it takes precedence over whatever value
+is used in oom_kill_allocating_task.
+
+The default value is 0.
+
+
+overcommit_kbytes
+=================
+
+When overcommit_memory is set to 2, the committed address space is not
+permitted to exceed swap plus this amount of physical RAM. See below.
+
+Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one
+of them may be specified at a time. Setting one disables the other (which
+then appears as 0 when read).
+
+
+overcommit_memory
+=================
+
+This value contains a flag that enables memory overcommitment.
+
+When this flag is 0, the kernel attempts to estimate the amount
+of free memory left when userspace requests more memory.
+
+When this flag is 1, the kernel pretends there is always enough
+memory until it actually runs out.
+
+When this flag is 2, the kernel uses a "never overcommit"
+policy that attempts to prevent any overcommit of memory.
+Note that user_reserve_kbytes affects this policy.
+
+This feature can be very useful because there are a lot of
+programs that malloc() huge amounts of memory "just-in-case"
+and don't use much of it.
+
+The default value is 0.
+
+See Documentation/vm/overcommit-accounting.rst and
+mm/util.c::__vm_enough_memory() for more information.
+
+
+overcommit_ratio
+================
+
+When overcommit_memory is set to 2, the committed address
+space is not permitted to exceed swap plus this percentage
+of physical RAM.  See above.
+
+
+page-cluster
+============
+
+page-cluster controls the number of pages up to which consecutive pages
+are read in from swap in a single attempt. This is the swap counterpart
+to page cache readahead.
+The mentioned consecutivity is not in terms of virtual/physical addresses,
+but consecutive on swap space - that means they were swapped out together.
+
+It is a logarithmic value - setting it to zero means "1 page", setting
+it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
+Zero disables swap readahead completely.
+
+The default value is three (eight pages at a time).  There may be some
+small benefits in tuning this to a different value if your workload is
+swap-intensive.
+
+Lower values mean lower latencies for initial faults, but at the same time
+extra faults and I/O delays for following faults if they would have been part of
+that consecutive pages readahead would have brought in.
+
+
+panic_on_oom
+============
+
+This enables or disables panic on out-of-memory feature.
+
+If this is set to 0, the kernel will kill some rogue process,
+called oom_killer.  Usually, oom_killer can kill rogue processes and
+system will survive.
+
+If this is set to 1, the kernel panics when out-of-memory happens.
+However, if a process limits using nodes by mempolicy/cpusets,
+and those nodes become memory exhaustion status, one process
+may be killed by oom-killer. No panic occurs in this case.
+Because other nodes' memory may be free. This means system total status
+may be not fatal yet.
+
+If this is set to 2, the kernel panics compulsorily even on the
+above-mentioned. Even oom happens under memory cgroup, the whole
+system panics.
+
+The default value is 0.
+
+1 and 2 are for failover of clustering. Please select either
+according to your policy of failover.
+
+panic_on_oom=2+kdump gives you very strong tool to investigate
+why oom happens. You can get snapshot.
+
+
+percpu_pagelist_fraction
+========================
+
+This is the fraction of pages at most (high mark pcp->high) in each zone that
+are allocated for each per cpu page list.  The min value for this is 8.  It
+means that we don't allow more than 1/8th of pages in each zone to be
+allocated in any single per_cpu_pagelist.  This entry only changes the value
+of hot per cpu pagelists.  User can specify a number like 100 to allocate
+1/100th of each zone to each per cpu page list.
+
+The batch value of each per cpu pagelist is also updated as a result.  It is
+set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
+
+The initial value is zero.  Kernel does not use this value at boot time to set
+the high water marks for each per cpu page list.  If the user writes '0' to this
+sysctl, it will revert to this default behavior.
+
+
+stat_interval
+=============
+
+The time interval between which vm statistics are updated.  The default
+is 1 second.
+
+
+stat_refresh
+============
+
+Any read or write (by root only) flushes all the per-cpu vm statistics
+into their global totals, for more accurate reports when testing
+e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
+
+As a side-effect, it also checks for negative totals (elsewhere reported
+as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
+(At time of writing, a few stats are known sometimes to be found negative,
+with no ill effects: errors and warnings on these stats are suppressed.)
+
+
+numa_stat
+=========
+
+This interface allows runtime configuration of numa statistics.
+
+When page allocation performance becomes a bottleneck and you can tolerate
+some possible tool breakage and decreased numa counter precision, you can
+do::
+
+	echo 0 > /proc/sys/vm/numa_stat
+
+When page allocation performance is not a bottleneck and you want all
+tooling to work, you can do::
+
+	echo 1 > /proc/sys/vm/numa_stat
+
+
+swappiness
+==========
+
+This control is used to define how aggressive the kernel will swap
+memory pages.  Higher values will increase aggressiveness, lower values
+decrease the amount of swap.  A value of 0 instructs the kernel not to
+initiate swap until the amount of free and file-backed pages is less
+than the high water mark in a zone.
+
+The default value is 60.
+
+
+unprivileged_userfaultfd
+========================
+
+This flag controls whether unprivileged users can use the userfaultfd
+system calls.  Set this to 1 to allow unprivileged users to use the
+userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
+privileged users (with SYS_CAP_PTRACE capability).
+
+The default value is 1.
+
+
+user_reserve_kbytes
+===================
+
+When overcommit_memory is set to 2, "never overcommit" mode, reserve
+min(3% of current process size, user_reserve_kbytes) of free memory.
+This is intended to prevent a user from starting a single memory hogging
+process, such that they cannot recover (kill the hog).
+
+user_reserve_kbytes defaults to min(3% of the current process size, 128MB).
+
+If this is reduced to zero, then the user will be allowed to allocate
+all free memory with a single process, minus admin_reserve_kbytes.
+Any subsequent attempts to execute a command will result in
+"fork: Cannot allocate memory".
+
+Changing this takes effect whenever an application requests memory.
+
+
+vfs_cache_pressure
+==================
+
+This percentage value controls the tendency of the kernel to reclaim
+the memory which is used for caching of directory and inode objects.
+
+At the default value of vfs_cache_pressure=100 the kernel will attempt to
+reclaim dentries and inodes at a "fair" rate with respect to pagecache and
+swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
+to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
+never reclaim dentries and inodes due to memory pressure and this can easily
+lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
+causes the kernel to prefer to reclaim dentries and inodes.
+
+Increasing vfs_cache_pressure significantly beyond 100 may have negative
+performance impact. Reclaim code needs to take various locks to find freeable
+directory and inode objects. With vfs_cache_pressure=1000, it will look for
+ten times more freeable objects than there are.
+
+
+watermark_boost_factor
+======================
+
+This factor controls the level of reclaim when memory is being fragmented.
+It defines the percentage of the high watermark of a zone that will be
+reclaimed if pages of different mobility are being mixed within pageblocks.
+The intent is that compaction has less work to do in the future and to
+increase the success rate of future high-order allocations such as SLUB
+allocations, THP and hugetlbfs pages.
+
+To make it sensible with respect to the watermark_scale_factor
+parameter, the unit is in fractions of 10,000. The default value of
+15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
+watermark will be reclaimed in the event of a pageblock being mixed due
+to fragmentation. The level of reclaim is determined by the number of
+fragmentation events that occurred in the recent past. If this value is
+smaller than a pageblock then a pageblocks worth of pages will be reclaimed
+(e.g.  2MB on 64-bit x86). A boost factor of 0 will disable the feature.
+
+
+watermark_scale_factor
+======================
+
+This factor controls the aggressiveness of kswapd. It defines the
+amount of memory left in a node/system before kswapd is woken up and
+how much memory needs to be free before kswapd goes back to sleep.
+
+The unit is in fractions of 10,000. The default value of 10 means the
+distances between watermarks are 0.1% of the available memory in the
+node/system. The maximum value is 1000, or 10% of memory.
+
+A high rate of threads entering direct reclaim (allocstall) or kswapd
+going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
+that the number of free pages kswapd maintains for latency reasons is
+too small for the allocation bursts occurring in the system. This knob
+can then be used to tune kswapd aggressiveness accordingly.
+
+
+zone_reclaim_mode
+=================
+
+Zone_reclaim_mode allows someone to set more or less aggressive approaches to
+reclaim memory when a zone runs out of memory. If it is set to zero then no
+zone reclaim occurs. Allocations will be satisfied from other zones / nodes
+in the system.
+
+This is value OR'ed together of
+
+=	===================================
+1	Zone reclaim on
+2	Zone reclaim writes dirty pages out
+4	Zone reclaim swaps pages
+=	===================================
+
+zone_reclaim_mode is disabled by default.  For file servers or workloads
+that benefit from having their data cached, zone_reclaim_mode should be
+left disabled as the caching effect is likely to be more important than
+data locality.
+
+zone_reclaim may be enabled if it's known that the workload is partitioned
+such that each partition fits within a NUMA node and that accessing remote
+memory would cause a measurable performance reduction.  The page allocator
+will then reclaim easily reusable pages (those page cache pages that are
+currently not used) before allocating off node pages.
+
+Allowing zone reclaim to write out pages stops processes that are
+writing large amounts of data from dirtying pages on other nodes. Zone
+reclaim will write out dirty pages if a zone fills up and so effectively
+throttle the process. This may decrease the performance of a single process
+since it cannot use all of system memory to buffer the outgoing writes
+anymore but it preserve the memory on other nodes so that the performance
+of other processes running on other nodes will not be affected.
+
+Allowing regular swap effectively restricts allocations to the local
+node unless explicitly overridden by memory policies or cpuset
+configurations.
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 1d8e748f909f..c6224d039bcb 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -119,7 +119,7 @@ Kernel Pointers
 
 For printing kernel pointers which should be hidden from unprivileged
 users. The behaviour of %pK depends on the kptr_restrict sysctl - see
-Documentation/sysctl/kernel.rst for more details.
+Documentation/admin-guide/sysctl/kernel.rst for more details.
 
 Unmodified Addresses
 --------------------
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d750b6926899..fb4735fd73b0 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1500,7 +1500,7 @@ review the kernel documentation in the directory /usr/src/linux/Documentation.
 This chapter  is  heavily  based  on the documentation included in the pre 2.2
 kernels, and became part of it in version 2.2.1 of the Linux kernel.
 
-Please see: Documentation/sysctl/ directory for descriptions of these
+Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
 entries.
 
 ------------------------------------------------------------------------------
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5c3399cde1c4..df33674799b5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -2287,7 +2287,7 @@ addr_scope_policy - INTEGER
 
 
 /proc/sys/net/core/*
-	Please see: Documentation/sysctl/net.rst for descriptions of these entries.
+	Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
 
 
 /proc/sys/net/unix/*
diff --git a/Documentation/sysctl/abi.rst b/Documentation/sysctl/abi.rst
deleted file mode 100644
index 599bcde7f0b7..000000000000
--- a/Documentation/sysctl/abi.rst
+++ /dev/null
@@ -1,67 +0,0 @@
-================================
-Documentation for /proc/sys/abi/
-================================
-
-kernel version 2.6.0.test2
-
-Copyright (c) 2003,  Fabian Frederick <ffrederick@users.sourceforge.net>
-
-For general info: index.rst.
-
-------------------------------------------------------------------------------
-
-This path is binary emulation relevant aka personality types aka abi.
-When a process is executed, it's linked to an exec_domain whose
-personality is defined using values available from /proc/sys/abi.
-You can find further details about abi in include/linux/personality.h.
-
-Here are the files featuring in 2.6 kernel:
-
-- defhandler_coff
-- defhandler_elf
-- defhandler_lcall7
-- defhandler_libcso
-- fake_utsname
-- trace
-
-defhandler_coff
----------------
-
-defined value:
-	PER_SCOSVR3::
-
-		0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
-
-defhandler_elf
---------------
-
-defined value:
-	PER_LINUX::
-
-		0
-
-defhandler_lcall7
------------------
-
-defined value :
-	PER_SVR4::
-
-		0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
-
-defhandler_libsco
------------------
-
-defined value:
-	PER_SVR4::
-
-		0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
-
-fake_utsname
-------------
-
-Unused
-
-trace
------
-
-Unused
diff --git a/Documentation/sysctl/fs.rst b/Documentation/sysctl/fs.rst
deleted file mode 100644
index 2a45119e3331..000000000000
--- a/Documentation/sysctl/fs.rst
+++ /dev/null
@@ -1,384 +0,0 @@
-===============================
-Documentation for /proc/sys/fs/
-===============================
-
-kernel version 2.2.10
-
-Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
-
-For general info and legal blurb, please look in intro.rst.
-
-------------------------------------------------------------------------------
-
-This file contains documentation for the sysctl files in
-/proc/sys/fs/ and is valid for Linux kernel version 2.2.
-
-The files in this directory can be used to tune and monitor
-miscellaneous and general things in the operation of the Linux
-kernel. Since some of the files _can_ be used to screw up your
-system, it is advisable to read both documentation and source
-before actually making adjustments.
-
-1. /proc/sys/fs
-===============
-
-Currently, these files are in /proc/sys/fs:
-
-- aio-max-nr
-- aio-nr
-- dentry-state
-- dquot-max
-- dquot-nr
-- file-max
-- file-nr
-- inode-max
-- inode-nr
-- inode-state
-- nr_open
-- overflowuid
-- overflowgid
-- pipe-user-pages-hard
-- pipe-user-pages-soft
-- protected_fifos
-- protected_hardlinks
-- protected_regular
-- protected_symlinks
-- suid_dumpable
-- super-max
-- super-nr
-
-
-aio-nr & aio-max-nr
--------------------
-
-aio-nr is the running total of the number of events specified on the
-io_setup system call for all currently active aio contexts.  If aio-nr
-reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
-raising aio-max-nr does not result in the pre-allocation or re-sizing
-of any kernel data structures.
-
-
-dentry-state
-------------
-
-From linux/include/linux/dcache.h::
-
-  struct dentry_stat_t dentry_stat {
-        int nr_dentry;
-        int nr_unused;
-        int age_limit;         /* age in seconds */
-        int want_pages;        /* pages requested by system */
-        int nr_negative;       /* # of unused negative dentries */
-        int dummy;             /* Reserved for future use */
-  };
-
-Dentries are dynamically allocated and deallocated.
-
-nr_dentry shows the total number of dentries allocated (active
-+ unused). nr_unused shows the number of dentries that are not
-actively used, but are saved in the LRU list for future reuse.
-
-Age_limit is the age in seconds after which dcache entries
-can be reclaimed when memory is short and want_pages is
-nonzero when shrink_dcache_pages() has been called and the
-dcache isn't pruned yet.
-
-nr_negative shows the number of unused dentries that are also
-negative dentries which do not map to any files. Instead,
-they help speeding up rejection of non-existing files provided
-by the users.
-
-
-dquot-max & dquot-nr
---------------------
-
-The file dquot-max shows the maximum number of cached disk
-quota entries.
-
-The file dquot-nr shows the number of allocated disk quota
-entries and the number of free disk quota entries.
-
-If the number of free cached disk quotas is very low and
-you have some awesome number of simultaneous system users,
-you might want to raise the limit.
-
-
-file-max & file-nr
-------------------
-
-The value in file-max denotes the maximum number of file-
-handles that the Linux kernel will allocate. When you get lots
-of error messages about running out of file handles, you might
-want to increase this limit.
-
-Historically,the kernel was able to allocate file handles
-dynamically, but not to free them again. The three values in
-file-nr denote the number of allocated file handles, the number
-of allocated but unused file handles, and the maximum number of
-file handles. Linux 2.6 always reports 0 as the number of free
-file handles -- this is not an error, it just means that the
-number of allocated file handles exactly matches the number of
-used file handles.
-
-Attempts to allocate more file descriptors than file-max are
-reported with printk, look for "VFS: file-max limit <number>
-reached".
-
-
-nr_open
--------
-
-This denotes the maximum number of file-handles a process can
-allocate. Default value is 1024*1024 (1048576) which should be
-enough for most machines. Actual limit depends on RLIMIT_NOFILE
-resource limit.
-
-
-inode-max, inode-nr & inode-state
----------------------------------
-
-As with file handles, the kernel allocates the inode structures
-dynamically, but can't free them yet.
-
-The value in inode-max denotes the maximum number of inode
-handlers. This value should be 3-4 times larger than the value
-in file-max, since stdin, stdout and network sockets also
-need an inode struct to handle them. When you regularly run
-out of inodes, you need to increase this value.
-
-The file inode-nr contains the first two items from
-inode-state, so we'll skip to that file...
-
-Inode-state contains three actual numbers and four dummies.
-The actual numbers are, in order of appearance, nr_inodes,
-nr_free_inodes and preshrink.
-
-Nr_inodes stands for the number of inodes the system has
-allocated, this can be slightly more than inode-max because
-Linux allocates them one pageful at a time.
-
-Nr_free_inodes represents the number of free inodes (?) and
-preshrink is nonzero when the nr_inodes > inode-max and the
-system needs to prune the inode list instead of allocating
-more.
-
-
-overflowgid & overflowuid
--------------------------
-
-Some filesystems only support 16-bit UIDs and GIDs, although in Linux
-UIDs and GIDs are 32 bits. When one of these filesystems is mounted
-with writes enabled, any UID or GID that would exceed 65535 is translated
-to a fixed value before being written to disk.
-
-These sysctls allow you to change the value of the fixed UID and GID.
-The default is 65534.
-
-
-pipe-user-pages-hard
---------------------
-
-Maximum total number of pages a non-privileged user may allocate for pipes.
-Once this limit is reached, no new pipes may be allocated until usage goes
-below the limit again. When set to 0, no limit is applied, which is the default
-setting.
-
-
-pipe-user-pages-soft
---------------------
-
-Maximum total number of pages a non-privileged user may allocate for pipes
-before the pipe size gets limited to a single page. Once this limit is reached,
-new pipes will be limited to a single page in size for this user in order to
-limit total memory usage, and trying to increase them using fcntl() will be
-denied until usage goes below the limit again. The default value allows to
-allocate up to 1024 pipes at their default size. When set to 0, no limit is
-applied.
-
-
-protected_fifos
----------------
-
-The intent of this protection is to avoid unintentional writes to
-an attacker-controlled FIFO, where a program expected to create a regular
-file.
-
-When set to "0", writing to FIFOs is unrestricted.
-
-When set to "1" don't allow O_CREAT open on FIFOs that we don't own
-in world writable sticky directories, unless they are owned by the
-owner of the directory.
-
-When set to "2" it also applies to group writable sticky directories.
-
-This protection is based on the restrictions in Openwall.
-
-
-protected_hardlinks
---------------------
-
-A long-standing class of security issues is the hardlink-based
-time-of-check-time-of-use race, most commonly seen in world-writable
-directories like /tmp. The common method of exploitation of this flaw
-is to cross privilege boundaries when following a given hardlink (i.e. a
-root process follows a hardlink created by another user). Additionally,
-on systems without separated partitions, this stops unauthorized users
-from "pinning" vulnerable setuid/setgid files against being upgraded by
-the administrator, or linking to special files.
-
-When set to "0", hardlink creation behavior is unrestricted.
-
-When set to "1" hardlinks cannot be created by users if they do not
-already own the source file, or do not have read/write access to it.
-
-This protection is based on the restrictions in Openwall and grsecurity.
-
-
-protected_regular
------------------
-
-This protection is similar to protected_fifos, but it
-avoids writes to an attacker-controlled regular file, where a program
-expected to create one.
-
-When set to "0", writing to regular files is unrestricted.
-
-When set to "1" don't allow O_CREAT open on regular files that we
-don't own in world writable sticky directories, unless they are
-owned by the owner of the directory.
-
-When set to "2" it also applies to group writable sticky directories.
-
-
-protected_symlinks
-------------------
-
-A long-standing class of security issues is the symlink-based
-time-of-check-time-of-use race, most commonly seen in world-writable
-directories like /tmp. The common method of exploitation of this flaw
-is to cross privilege boundaries when following a given symlink (i.e. a
-root process follows a symlink belonging to another user). For a likely
-incomplete list of hundreds of examples across the years, please see:
-http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
-
-When set to "0", symlink following behavior is unrestricted.
-
-When set to "1" symlinks are permitted to be followed only when outside
-a sticky world-writable directory, or when the uid of the symlink and
-follower match, or when the directory owner matches the symlink's owner.
-
-This protection is based on the restrictions in Openwall and grsecurity.
-
-
-suid_dumpable:
---------------
-
-This value can be used to query and set the core dump mode for setuid
-or otherwise protected/tainted binaries. The modes are
-
-=   ==========  ===============================================================
-0   (default)	traditional behaviour. Any process which has changed
-		privilege levels or is execute only will not be dumped.
-1   (debug)	all processes dump core when possible. The core dump is
-		owned by the current user and no security is applied. This is
-		intended for system debugging situations only.
-		Ptrace is unchecked.
-		This is insecure as it allows regular users to examine the
-		memory contents of privileged processes.
-2   (suidsafe)	any binary which normally would not be dumped is dumped
-		anyway, but only if the "core_pattern" kernel sysctl is set to
-		either a pipe handler or a fully qualified path. (For more
-		details on this limitation, see CVE-2006-2451.) This mode is
-		appropriate when administrators are attempting to debug
-		problems in a normal environment, and either have a core dump
-		pipe handler that knows to treat privileged core dumps with
-		care, or specific directory defined for catching core dumps.
-		If a core dump happens without a pipe handler or fully
-		qualified path, a message will be emitted to syslog warning
-		about the lack of a correct setting.
-=   ==========  ===============================================================
-
-
-super-max & super-nr
---------------------
-
-These numbers control the maximum number of superblocks, and
-thus the maximum number of mounted filesystems the kernel
-can have. You only need to increase super-max if you need to
-mount more filesystems than the current value in super-max
-allows you to.
-
-
-aio-nr & aio-max-nr
--------------------
-
-aio-nr shows the current system-wide number of asynchronous io
-requests.  aio-max-nr allows you to change the maximum value
-aio-nr can grow to.
-
-
-mount-max
----------
-
-This denotes the maximum number of mounts that may exist
-in a mount namespace.
-
-
-
-2. /proc/sys/fs/binfmt_misc
-===========================
-
-Documentation for the files in /proc/sys/fs/binfmt_misc is
-in Documentation/admin-guide/binfmt-misc.rst.
-
-
-3. /proc/sys/fs/mqueue - POSIX message queues filesystem
-========================================================
-
-
-The "mqueue"  filesystem provides  the necessary kernel features to enable the
-creation of a  user space  library that  implements  the  POSIX message queues
-API (as noted by the  MSG tag in the  POSIX 1003.1-2001 version  of the System
-Interfaces specification.)
-
-The "mqueue" filesystem contains values for determining/setting  the amount of
-resources used by the file system.
-
-/proc/sys/fs/mqueue/queues_max is a read/write  file for  setting/getting  the
-maximum number of message queues allowed on the system.
-
-/proc/sys/fs/mqueue/msg_max  is  a  read/write file  for  setting/getting  the
-maximum number of messages in a queue value.  In fact it is the limiting value
-for another (user) limit which is set in mq_open invocation. This attribute of
-a queue must be less or equal then msg_max.
-
-/proc/sys/fs/mqueue/msgsize_max is  a read/write  file for setting/getting the
-maximum  message size value (it is every  message queue's attribute set during
-its creation).
-
-/proc/sys/fs/mqueue/msg_default is  a read/write  file for setting/getting the
-default number of messages in a queue value if attr parameter of mq_open(2) is
-NULL. If it exceed msg_max, the default value is initialized msg_max.
-
-/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
-the default message size value if attr parameter of mq_open(2) is NULL. If it
-exceed msgsize_max, the default value is initialized msgsize_max.
-
-4. /proc/sys/fs/epoll - Configuration options for the epoll interface
-=====================================================================
-
-This directory contains configuration options for the epoll(7) interface.
-
-max_user_watches
-----------------
-
-Every epoll file descriptor can store a number of files to be monitored
-for event readiness. Each one of these monitored files constitutes a "watch".
-This configuration option sets the maximum number of "watches" that are
-allowed for each user.
-Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
-on a 64bit one.
-The current default value for  max_user_watches  is the 1/32 of the available
-low memory, divided for the "watch" cost in bytes.
diff --git a/Documentation/sysctl/index.rst b/Documentation/sysctl/index.rst
deleted file mode 100644
index efbcde8c1c9c..000000000000
--- a/Documentation/sysctl/index.rst
+++ /dev/null
@@ -1,100 +0,0 @@
-:orphan:
-
-===========================
-Documentation for /proc/sys
-===========================
-
-Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-------------------------------------------------------------------------------
-
-'Why', I hear you ask, 'would anyone even _want_ documentation
-for them sysctl files? If anybody really needs it, it's all in
-the source...'
-
-Well, this documentation is written because some people either
-don't know they need to tweak something, or because they don't
-have the time or knowledge to read the source code.
-
-Furthermore, the programmers who built sysctl have built it to
-be actually used, not just for the fun of programming it :-)
-
-------------------------------------------------------------------------------
-
-Legal blurb:
-
-As usual, there are two main things to consider:
-
-1. you get what you pay for
-2. it's free
-
-The consequences are that I won't guarantee the correctness of
-this document, and if you come to me complaining about how you
-screwed up your system because of wrong documentation, I won't
-feel sorry for you. I might even laugh at you...
-
-But of course, if you _do_ manage to screw up your system using
-only the sysctl options used in this file, I'd like to hear of
-it. Not only to have a great laugh, but also to make sure that
-you're the last RTFMing person to screw up.
-
-In short, e-mail your suggestions, corrections and / or horror
-stories to: <riel@nl.linux.org>
-
-Rik van Riel.
-
---------------------------------------------------------------
-
-Introduction
-============
-
-Sysctl is a means of configuring certain aspects of the kernel
-at run-time, and the /proc/sys/ directory is there so that you
-don't even need special tools to do it!
-In fact, there are only four things needed to use these config
-facilities:
-
-- a running Linux system
-- root access
-- common sense (this is especially hard to come by these days)
-- knowledge of what all those values mean
-
-As a quick 'ls /proc/sys' will show, the directory consists of
-several (arch-dependent?) subdirs. Each subdir is mainly about
-one part of the kernel, so you can do configuration on a piece
-by piece basis, or just some 'thematic frobbing'.
-
-This documentation is about:
-
-=============== ===============================================================
-abi/		execution domains & personalities
-debug/		<empty>
-dev/		device specific information (eg dev/cdrom/info)
-fs/		specific filesystems
-		filehandle, inode, dentry and quota tuning
-		binfmt_misc <Documentation/admin-guide/binfmt-misc.rst>
-kernel/		global kernel info / tuning
-		miscellaneous stuff
-net/		networking stuff, for documentation look in:
-		<Documentation/networking/>
-proc/		<empty>
-sunrpc/		SUN Remote Procedure Call (NFS)
-vm/		memory management tuning
-		buffer and cache management
-user/		Per user per user namespace limits
-=============== ===============================================================
-
-These are the subdirs I have on my system. There might be more
-or other subdirs in another setup. If you see another dir, I'd
-really like to hear about it :-)
-
-.. toctree::
-   :maxdepth: 1
-
-   abi
-   fs
-   kernel
-   net
-   sunrpc
-   user
-   vm
diff --git a/Documentation/sysctl/kernel.rst b/Documentation/sysctl/kernel.rst
deleted file mode 100644
index a0c1d4ce403a..000000000000
--- a/Documentation/sysctl/kernel.rst
+++ /dev/null
@@ -1,1177 +0,0 @@
-===================================
-Documentation for /proc/sys/kernel/
-===================================
-
-kernel version 2.2.10
-
-Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-Copyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
-
-For general info and legal blurb, please look in index.rst.
-
-------------------------------------------------------------------------------
-
-This file contains documentation for the sysctl files in
-/proc/sys/kernel/ and is valid for Linux kernel version 2.2.
-
-The files in this directory can be used to tune and monitor
-miscellaneous and general things in the operation of the Linux
-kernel. Since some of the files _can_ be used to screw up your
-system, it is advisable to read both documentation and source
-before actually making adjustments.
-
-Currently, these files might (depending on your configuration)
-show up in /proc/sys/kernel:
-
-- acct
-- acpi_video_flags
-- auto_msgmni
-- bootloader_type	     [ X86 only ]
-- bootloader_version	     [ X86 only ]
-- cap_last_cap
-- core_pattern
-- core_pipe_limit
-- core_uses_pid
-- ctrl-alt-del
-- dmesg_restrict
-- domainname
-- hostname
-- hotplug
-- hardlockup_all_cpu_backtrace
-- hardlockup_panic
-- hung_task_panic
-- hung_task_check_count
-- hung_task_timeout_secs
-- hung_task_check_interval_secs
-- hung_task_warnings
-- hyperv_record_panic_msg
-- kexec_load_disabled
-- kptr_restrict
-- l2cr                        [ PPC only ]
-- modprobe                    ==> Documentation/debugging-modules.txt
-- modules_disabled
-- msg_next_id		      [ sysv ipc ]
-- msgmax
-- msgmnb
-- msgmni
-- nmi_watchdog
-- osrelease
-- ostype
-- overflowgid
-- overflowuid
-- panic
-- panic_on_oops
-- panic_on_stackoverflow
-- panic_on_unrecovered_nmi
-- panic_on_warn
-- panic_print
-- panic_on_rcu_stall
-- perf_cpu_time_max_percent
-- perf_event_paranoid
-- perf_event_max_stack
-- perf_event_mlock_kb
-- perf_event_max_contexts_per_stack
-- pid_max
-- powersave-nap               [ PPC only ]
-- printk
-- printk_delay
-- printk_ratelimit
-- printk_ratelimit_burst
-- pty                         ==> Documentation/filesystems/devpts.txt
-- randomize_va_space
-- real-root-dev               ==> Documentation/admin-guide/initrd.rst
-- reboot-cmd                  [ SPARC only ]
-- rtsig-max
-- rtsig-nr
-- sched_energy_aware
-- seccomp/                    ==> Documentation/userspace-api/seccomp_filter.rst
-- sem
-- sem_next_id		      [ sysv ipc ]
-- sg-big-buff                 [ generic SCSI device (sg) ]
-- shm_next_id		      [ sysv ipc ]
-- shm_rmid_forced
-- shmall
-- shmmax                      [ sysv ipc ]
-- shmmni
-- softlockup_all_cpu_backtrace
-- soft_watchdog
-- stack_erasing
-- stop-a                      [ SPARC only ]
-- sysrq                       ==> Documentation/admin-guide/sysrq.rst
-- sysctl_writes_strict
-- tainted                     ==> Documentation/admin-guide/tainted-kernels.rst
-- threads-max
-- unknown_nmi_panic
-- watchdog
-- watchdog_thresh
-- version
-
-
-acct:
-=====
-
-highwater lowwater frequency
-
-If BSD-style process accounting is enabled these values control
-its behaviour. If free space on filesystem where the log lives
-goes below <lowwater>% accounting suspends. If free space gets
-above <highwater>% accounting resumes. <Frequency> determines
-how often do we check the amount of free space (value is in
-seconds). Default:
-4 2 30
-That is, suspend accounting if there left <= 2% free; resume it
-if we got >=4%; consider information about amount of free space
-valid for 30 seconds.
-
-
-acpi_video_flags:
-=================
-
-flags
-
-See Doc*/kernel/power/video.txt, it allows mode of video boot to be
-set during run time.
-
-
-auto_msgmni:
-============
-
-This variable has no effect and may be removed in future kernel
-releases. Reading it always returns 0.
-Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni
-upon memory add/remove or upon ipc namespace creation/removal.
-Echoing "1" into this file enabled msgmni automatic recomputing.
-Echoing "0" turned it off. auto_msgmni default value was 1.
-
-
-bootloader_type:
-================
-
-x86 bootloader identification
-
-This gives the bootloader type number as indicated by the bootloader,
-shifted left by 4, and OR'd with the low four bits of the bootloader
-version.  The reason for this encoding is that this used to match the
-type_of_loader field in the kernel header; the encoding is kept for
-backwards compatibility.  That is, if the full bootloader type number
-is 0x15 and the full version number is 0x234, this file will contain
-the value 340 = 0x154.
-
-See the type_of_loader and ext_loader_type fields in
-Documentation/x86/boot.rst for additional information.
-
-
-bootloader_version:
-===================
-
-x86 bootloader version
-
-The complete bootloader version number.  In the example above, this
-file will contain the value 564 = 0x234.
-
-See the type_of_loader and ext_loader_ver fields in
-Documentation/x86/boot.rst for additional information.
-
-
-cap_last_cap:
-=============
-
-Highest valid capability of the running kernel.  Exports
-CAP_LAST_CAP from the kernel.
-
-
-core_pattern:
-=============
-
-core_pattern is used to specify a core dumpfile pattern name.
-
-* max length 127 characters; default value is "core"
-* core_pattern is used as a pattern template for the output filename;
-  certain string patterns (beginning with '%') are substituted with
-  their actual values.
-* backward compatibility with core_uses_pid:
-
-	If core_pattern does not include "%p" (default does not)
-	and core_uses_pid is set, then .PID will be appended to
-	the filename.
-
-* corename format specifiers::
-
-	%<NUL>	'%' is dropped
-	%%	output one '%'
-	%p	pid
-	%P	global pid (init PID namespace)
-	%i	tid
-	%I	global tid (init PID namespace)
-	%u	uid (in initial user namespace)
-	%g	gid (in initial user namespace)
-	%d	dump mode, matches PR_SET_DUMPABLE and
-		/proc/sys/fs/suid_dumpable
-	%s	signal number
-	%t	UNIX time of dump
-	%h	hostname
-	%e	executable filename (may be shortened)
-	%E	executable path
-	%<OTHER> both are dropped
-
-* If the first character of the pattern is a '|', the kernel will treat
-  the rest of the pattern as a command to run.  The core dump will be
-  written to the standard input of that program instead of to a file.
-
-
-core_pipe_limit:
-================
-
-This sysctl is only applicable when core_pattern is configured to pipe
-core files to a user space helper (when the first character of
-core_pattern is a '|', see above).  When collecting cores via a pipe
-to an application, it is occasionally useful for the collecting
-application to gather data about the crashing process from its
-/proc/pid directory.  In order to do this safely, the kernel must wait
-for the collecting process to exit, so as not to remove the crashing
-processes proc files prematurely.  This in turn creates the
-possibility that a misbehaving userspace collecting process can block
-the reaping of a crashed process simply by never exiting.  This sysctl
-defends against that.  It defines how many concurrent crashing
-processes may be piped to user space applications in parallel.  If
-this value is exceeded, then those crashing processes above that value
-are noted via the kernel log and their cores are skipped.  0 is a
-special value, indicating that unlimited processes may be captured in
-parallel, but that no waiting will take place (i.e. the collecting
-process is not guaranteed access to /proc/<crashing pid>/).  This
-value defaults to 0.
-
-
-core_uses_pid:
-==============
-
-The default coredump filename is "core".  By setting
-core_uses_pid to 1, the coredump filename becomes core.PID.
-If core_pattern does not include "%p" (default does not)
-and core_uses_pid is set, then .PID will be appended to
-the filename.
-
-
-ctrl-alt-del:
-=============
-
-When the value in this file is 0, ctrl-alt-del is trapped and
-sent to the init(1) program to handle a graceful restart.
-When, however, the value is > 0, Linux's reaction to a Vulcan
-Nerve Pinch (tm) will be an immediate reboot, without even
-syncing its dirty buffers.
-
-Note:
-  when a program (like dosemu) has the keyboard in 'raw'
-  mode, the ctrl-alt-del is intercepted by the program before it
-  ever reaches the kernel tty layer, and it's up to the program
-  to decide what to do with it.
-
-
-dmesg_restrict:
-===============
-
-This toggle indicates whether unprivileged users are prevented
-from using dmesg(8) to view messages from the kernel's log buffer.
-When dmesg_restrict is set to (0) there are no restrictions. When
-dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
-dmesg(8).
-
-The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
-default value of dmesg_restrict.
-
-
-domainname & hostname:
-======================
-
-These files can be used to set the NIS/YP domainname and the
-hostname of your box in exactly the same way as the commands
-domainname and hostname, i.e.::
-
-	# echo "darkstar" > /proc/sys/kernel/hostname
-	# echo "mydomain" > /proc/sys/kernel/domainname
-
-has the same effect as::
-
-	# hostname "darkstar"
-	# domainname "mydomain"
-
-Note, however, that the classic darkstar.frop.org has the
-hostname "darkstar" and DNS (Internet Domain Name Server)
-domainname "frop.org", not to be confused with the NIS (Network
-Information Service) or YP (Yellow Pages) domainname. These two
-domain names are in general different. For a detailed discussion
-see the hostname(1) man page.
-
-
-hardlockup_all_cpu_backtrace:
-=============================
-
-This value controls the hard lockup detector behavior when a hard
-lockup condition is detected as to whether or not to gather further
-debug information. If enabled, arch-specific all-CPU stack dumping
-will be initiated.
-
-0: do nothing. This is the default behavior.
-
-1: on detection capture more debug information.
-
-
-hardlockup_panic:
-=================
-
-This parameter can be used to control whether the kernel panics
-when a hard lockup is detected.
-
-   0 - don't panic on hard lockup
-   1 - panic on hard lockup
-
-See Documentation/lockup-watchdogs.txt for more information.  This can
-also be set using the nmi_watchdog kernel parameter.
-
-
-hotplug:
-========
-
-Path for the hotplug policy agent.
-Default value is "/sbin/hotplug".
-
-
-hung_task_panic:
-================
-
-Controls the kernel's behavior when a hung task is detected.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-0: continue operation. This is the default behavior.
-
-1: panic immediately.
-
-
-hung_task_check_count:
-======================
-
-The upper bound on the number of tasks that are checked.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-
-hung_task_timeout_secs:
-=======================
-
-When a task in D state did not get scheduled
-for more than this value report a warning.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-0: means infinite timeout - no checking done.
-
-Possible values to set are in range {0..LONG_MAX/HZ}.
-
-
-hung_task_check_interval_secs:
-==============================
-
-Hung task check interval. If hung task checking is enabled
-(see hung_task_timeout_secs), the check is done every
-hung_task_check_interval_secs seconds.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
-0 (default): means use hung_task_timeout_secs as checking interval.
-Possible values to set are in range {0..LONG_MAX/HZ}.
-
-
-hung_task_warnings:
-===================
-
-The maximum number of warnings to report. During a check interval
-if a hung task is detected, this value is decreased by 1.
-When this value reaches 0, no more warnings will be reported.
-This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
-
--1: report an infinite number of warnings.
-
-
-hyperv_record_panic_msg:
-========================
-
-Controls whether the panic kmsg data should be reported to Hyper-V.
-
-0: do not report panic kmsg data.
-
-1: report the panic kmsg data. This is the default behavior.
-
-
-kexec_load_disabled:
-====================
-
-A toggle indicating if the kexec_load syscall has been disabled. This
-value defaults to 0 (false: kexec_load enabled), but can be set to 1
-(true: kexec_load disabled). Once true, kexec can no longer be used, and
-the toggle cannot be set back to false. This allows a kexec image to be
-loaded before disabling the syscall, allowing a system to set up (and
-later use) an image without it being altered. Generally used together
-with the "modules_disabled" sysctl.
-
-
-kptr_restrict:
-==============
-
-This toggle indicates whether restrictions are placed on
-exposing kernel addresses via /proc and other interfaces.
-
-When kptr_restrict is set to 0 (the default) the address is hashed before
-printing. (This is the equivalent to %p.)
-
-When kptr_restrict is set to (1), kernel pointers printed using the %pK
-format specifier will be replaced with 0's unless the user has CAP_SYSLOG
-and effective user and group ids are equal to the real ids. This is
-because %pK checks are done at read() time rather than open() time, so
-if permissions are elevated between the open() and the read() (e.g via
-a setuid binary) then %pK will not leak kernel pointers to unprivileged
-users. Note, this is a temporary solution only. The correct long-term
-solution is to do the permission checks at open() time. Consider removing
-world read permissions from files that use %pK, and using dmesg_restrict
-to protect against uses of %pK in dmesg(8) if leaking kernel pointer
-values to unprivileged users is a concern.
-
-When kptr_restrict is set to (2), kernel pointers printed using
-%pK will be replaced with 0's regardless of privileges.
-
-
-l2cr: (PPC only)
-================
-
-This flag controls the L2 cache of G3 processor boards. If
-0, the cache is disabled. Enabled if nonzero.
-
-
-modules_disabled:
-=================
-
-A toggle value indicating if modules are allowed to be loaded
-in an otherwise modular kernel.  This toggle defaults to off
-(0), but can be set true (1).  Once true, modules can be
-neither loaded nor unloaded, and the toggle cannot be set back
-to false.  Generally used with the "kexec_load_disabled" toggle.
-
-
-msg_next_id, sem_next_id, and shm_next_id:
-==========================================
-
-These three toggles allows to specify desired id for next allocated IPC
-object: message, semaphore or shared memory respectively.
-
-By default they are equal to -1, which means generic allocation logic.
-Possible values to set are in range {0..INT_MAX}.
-
-Notes:
-  1) kernel doesn't guarantee, that new object will have desired id. So,
-     it's up to userspace, how to handle an object with "wrong" id.
-  2) Toggle with non-default value will be set back to -1 by kernel after
-     successful IPC object allocation. If an IPC object allocation syscall
-     fails, it is undefined if the value remains unmodified or is reset to -1.
-
-
-nmi_watchdog:
-=============
-
-This parameter can be used to control the NMI watchdog
-(i.e. the hard lockup detector) on x86 systems.
-
-0 - disable the hard lockup detector
-
-1 - enable the hard lockup detector
-
-The hard lockup detector monitors each CPU for its ability to respond to
-timer interrupts. The mechanism utilizes CPU performance counter registers
-that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
-while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
-
-The NMI watchdog is disabled by default if the kernel is running as a guest
-in a KVM virtual machine. This default can be overridden by adding::
-
-   nmi_watchdog=1
-
-to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst).
-
-
-numa_balancing:
-===============
-
-Enables/disables automatic page fault based NUMA memory
-balancing. Memory is moved automatically to nodes
-that access it often.
-
-Enables/disables automatic NUMA memory balancing. On NUMA machines, there
-is a performance penalty if remote memory is accessed by a CPU. When this
-feature is enabled the kernel samples what task thread is accessing memory
-by periodically unmapping pages and later trapping a page fault. At the
-time of the page fault, it is determined if the data being accessed should
-be migrated to a local memory node.
-
-The unmapping of pages and trapping faults incur additional overhead that
-ideally is offset by improved memory locality but there is no universal
-guarantee. If the target workload is already bound to NUMA nodes then this
-feature should be disabled. Otherwise, if the system overhead from the
-feature is too high then the rate the kernel samples for NUMA hinting
-faults may be controlled by the numa_balancing_scan_period_min_ms,
-numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
-numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls.
-
-numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
-===============================================================================================================================
-
-
-Automatic NUMA balancing scans tasks address space and unmaps pages to
-detect if pages are properly placed or if the data should be migrated to a
-memory node local to where the task is running.  Every "scan delay" the task
-scans the next "scan size" number of pages in its address space. When the
-end of the address space is reached the scanner restarts from the beginning.
-
-In combination, the "scan delay" and "scan size" determine the scan rate.
-When "scan delay" decreases, the scan rate increases.  The scan delay and
-hence the scan rate of every task is adaptive and depends on historical
-behaviour. If pages are properly placed then the scan delay increases,
-otherwise the scan delay decreases.  The "scan size" is not adaptive but
-the higher the "scan size", the higher the scan rate.
-
-Higher scan rates incur higher system overhead as page faults must be
-trapped and potentially data must be migrated. However, the higher the scan
-rate, the more quickly a tasks memory is migrated to a local node if the
-workload pattern changes and minimises performance impact due to remote
-memory accesses. These sysctls control the thresholds for scan delays and
-the number of pages scanned.
-
-numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
-scan a tasks virtual memory. It effectively controls the maximum scanning
-rate for each task.
-
-numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
-when it initially forks.
-
-numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
-scan a tasks virtual memory. It effectively controls the minimum scanning
-rate for each task.
-
-numa_balancing_scan_size_mb is how many megabytes worth of pages are
-scanned for a given scan.
-
-
-osrelease, ostype & version:
-============================
-
-::
-
-  # cat osrelease
-  2.1.88
-  # cat ostype
-  Linux
-  # cat version
-  #5 Wed Feb 25 21:49:24 MET 1998
-
-The files osrelease and ostype should be clear enough. Version
-needs a little more clarification however. The '#5' means that
-this is the fifth kernel built from this source base and the
-date behind it indicates the time the kernel was built.
-The only way to tune these values is to rebuild the kernel :-)
-
-
-overflowgid & overflowuid:
-==========================
-
-if your architecture did not always support 32-bit UIDs (i.e. arm,
-i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
-applications that use the old 16-bit UID/GID system calls, if the
-actual UID or GID would exceed 65535.
-
-These sysctls allow you to change the value of the fixed UID and GID.
-The default is 65534.
-
-
-panic:
-======
-
-The value in this file represents the number of seconds the kernel
-waits before rebooting on a panic. When you use the software watchdog,
-the recommended setting is 60.
-
-
-panic_on_io_nmi:
-================
-
-Controls the kernel's behavior when a CPU receives an NMI caused by
-an IO error.
-
-0: try to continue operation (default)
-
-1: panic immediately. The IO error triggered an NMI. This indicates a
-   serious system condition which could result in IO data corruption.
-   Rather than continuing, panicking might be a better choice. Some
-   servers issue this sort of NMI when the dump button is pushed,
-   and you can use this option to take a crash dump.
-
-
-panic_on_oops:
-==============
-
-Controls the kernel's behaviour when an oops or BUG is encountered.
-
-0: try to continue operation
-
-1: panic immediately.  If the `panic` sysctl is also non-zero then the
-   machine will be rebooted.
-
-
-panic_on_stackoverflow:
-=======================
-
-Controls the kernel's behavior when detecting the overflows of
-kernel, IRQ and exception stacks except a user stack.
-This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
-
-0: try to continue operation.
-
-1: panic immediately.
-
-
-panic_on_unrecovered_nmi:
-=========================
-
-The default Linux behaviour on an NMI of either memory or unknown is
-to continue operation. For many environments such as scientific
-computing it is preferable that the box is taken out and the error
-dealt with than an uncorrected parity/ECC error get propagated.
-
-A small number of systems do generate NMI's for bizarre random reasons
-such as power management so the default is off. That sysctl works like
-the existing panic controls already in that directory.
-
-
-panic_on_warn:
-==============
-
-Calls panic() in the WARN() path when set to 1.  This is useful to avoid
-a kernel rebuild when attempting to kdump at the location of a WARN().
-
-0: only WARN(), default behaviour.
-
-1: call panic() after printing out WARN() location.
-
-
-panic_print:
-============
-
-Bitmask for printing system info when panic happens. User can chose
-combination of the following bits:
-
-=====  ========================================
-bit 0  print all tasks info
-bit 1  print system memory info
-bit 2  print timer info
-bit 3  print locks info if CONFIG_LOCKDEP is on
-bit 4  print ftrace buffer
-=====  ========================================
-
-So for example to print tasks and memory info on panic, user can::
-
-  echo 3 > /proc/sys/kernel/panic_print
-
-
-panic_on_rcu_stall:
-===================
-
-When set to 1, calls panic() after RCU stall detection messages. This
-is useful to define the root cause of RCU stalls using a vmcore.
-
-0: do not panic() when RCU stall takes place, default behavior.
-
-1: panic() after printing RCU stall messages.
-
-
-perf_cpu_time_max_percent:
-==========================
-
-Hints to the kernel how much CPU time it should be allowed to
-use to handle perf sampling events.  If the perf subsystem
-is informed that its samples are exceeding this limit, it
-will drop its sampling frequency to attempt to reduce its CPU
-usage.
-
-Some perf sampling happens in NMIs.  If these samples
-unexpectedly take too long to execute, the NMIs can become
-stacked up next to each other so much that nothing else is
-allowed to execute.
-
-0:
-   disable the mechanism.  Do not monitor or correct perf's
-   sampling rate no matter how CPU time it takes.
-
-1-100:
-   attempt to throttle perf's sample rate to this
-   percentage of CPU.  Note: the kernel calculates an
-   "expected" length of each sample event.  100 here means
-   100% of that expected length.  Even if this is set to
-   100, you may still see sample throttling if this
-   length is exceeded.  Set to 0 if you truly do not care
-   how much CPU is consumed.
-
-
-perf_event_paranoid:
-====================
-
-Controls use of the performance events system by unprivileged
-users (without CAP_SYS_ADMIN).  The default value is 2.
-
-===  ==================================================================
- -1  Allow use of (almost) all events by all users
-
-     Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
-
->=0  Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
-
-     Disallow raw tracepoint access by users without CAP_SYS_ADMIN
-
->=1  Disallow CPU event access by users without CAP_SYS_ADMIN
-
->=2  Disallow kernel profiling by users without CAP_SYS_ADMIN
-===  ==================================================================
-
-
-perf_event_max_stack:
-=====================
-
-Controls maximum number of stack frames to copy for (attr.sample_type &
-PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
-'perf record -g' or 'perf trace --call-graph fp'.
-
-This can only be done when no events are in use that have callchains
-enabled, otherwise writing to this file will return -EBUSY.
-
-The default value is 127.
-
-
-perf_event_mlock_kb:
-====================
-
-Control size of per-cpu ring buffer not counted agains mlock limit.
-
-The default value is 512 + 1 page
-
-
-perf_event_max_contexts_per_stack:
-==================================
-
-Controls maximum number of stack frame context entries for
-(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
-instance, when using 'perf record -g' or 'perf trace --call-graph fp'.
-
-This can only be done when no events are in use that have callchains
-enabled, otherwise writing to this file will return -EBUSY.
-
-The default value is 8.
-
-
-pid_max:
-========
-
-PID allocation wrap value.  When the kernel's next PID value
-reaches this value, it wraps back to a minimum PID value.
-PIDs of value pid_max or larger are not allocated.
-
-
-ns_last_pid:
-============
-
-The last pid allocated in the current (the one task using this sysctl
-lives in) pid namespace. When selecting a pid for a next task on fork
-kernel tries to allocate a number starting from this one.
-
-
-powersave-nap: (PPC only)
-=========================
-
-If set, Linux-PPC will use the 'nap' mode of powersaving,
-otherwise the 'doze' mode will be used.
-
-==============================================================
-
-printk:
-=======
-
-The four values in printk denote: console_loglevel,
-default_message_loglevel, minimum_console_loglevel and
-default_console_loglevel respectively.
-
-These values influence printk() behavior when printing or
-logging error messages. See 'man 2 syslog' for more info on
-the different loglevels.
-
-- console_loglevel:
-	messages with a higher priority than
-	this will be printed to the console
-- default_message_loglevel:
-	messages without an explicit priority
-	will be printed with this priority
-- minimum_console_loglevel:
-	minimum (highest) value to which
-	console_loglevel can be set
-- default_console_loglevel:
-	default value for console_loglevel
-
-
-printk_delay:
-=============
-
-Delay each printk message in printk_delay milliseconds
-
-Value from 0 - 10000 is allowed.
-
-
-printk_ratelimit:
-=================
-
-Some warning messages are rate limited. printk_ratelimit specifies
-the minimum length of time between these messages (in jiffies), by
-default we allow one every 5 seconds.
-
-A value of 0 will disable rate limiting.
-
-
-printk_ratelimit_burst:
-=======================
-
-While long term we enforce one message per printk_ratelimit
-seconds, we do allow a burst of messages to pass through.
-printk_ratelimit_burst specifies the number of messages we can
-send before ratelimiting kicks in.
-
-
-printk_devkmsg:
-===============
-
-Control the logging to /dev/kmsg from userspace:
-
-ratelimit:
-	default, ratelimited
-
-on: unlimited logging to /dev/kmsg from userspace
-
-off: logging to /dev/kmsg disabled
-
-The kernel command line parameter printk.devkmsg= overrides this and is
-a one-time setting until next reboot: once set, it cannot be changed by
-this sysctl interface anymore.
-
-
-randomize_va_space:
-===================
-
-This option can be used to select the type of process address
-space randomization that is used in the system, for architectures
-that support this feature.
-
-==  ===========================================================================
-0   Turn the process address space randomization off.  This is the
-    default for architectures that do not support this feature anyways,
-    and kernels that are booted with the "norandmaps" parameter.
-
-1   Make the addresses of mmap base, stack and VDSO page randomized.
-    This, among other things, implies that shared libraries will be
-    loaded to random addresses.  Also for PIE-linked binaries, the
-    location of code start is randomized.  This is the default if the
-    CONFIG_COMPAT_BRK option is enabled.
-
-2   Additionally enable heap randomization.  This is the default if
-    CONFIG_COMPAT_BRK is disabled.
-
-    There are a few legacy applications out there (such as some ancient
-    versions of libc.so.5 from 1996) that assume that brk area starts
-    just after the end of the code+bss.  These applications break when
-    start of the brk area is randomized.  There are however no known
-    non-legacy applications that would be broken this way, so for most
-    systems it is safe to choose full randomization.
-
-    Systems with ancient and/or broken binaries should be configured
-    with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
-    address space randomization.
-==  ===========================================================================
-
-
-reboot-cmd: (Sparc only)
-========================
-
-??? This seems to be a way to give an argument to the Sparc
-ROM/Flash boot loader. Maybe to tell it what to do after
-rebooting. ???
-
-
-rtsig-max & rtsig-nr:
-=====================
-
-The file rtsig-max can be used to tune the maximum number
-of POSIX realtime (queued) signals that can be outstanding
-in the system.
-
-rtsig-nr shows the number of RT signals currently queued.
-
-
-sched_energy_aware:
-===================
-
-Enables/disables Energy Aware Scheduling (EAS). EAS starts
-automatically on platforms where it can run (that is,
-platforms with asymmetric CPU topologies and having an Energy
-Model available). If your platform happens to meet the
-requirements for EAS but you do not want to use it, change
-this value to 0.
-
-
-sched_schedstats:
-=================
-
-Enables/disables scheduler statistics. Enabling this feature
-incurs a small amount of overhead in the scheduler but is
-useful for debugging and performance tuning.
-
-
-sg-big-buff:
-============
-
-This file shows the size of the generic SCSI (sg) buffer.
-You can't tune it just yet, but you could change it on
-compile time by editing include/scsi/sg.h and changing
-the value of SG_BIG_BUFF.
-
-There shouldn't be any reason to change this value. If
-you can come up with one, you probably know what you
-are doing anyway :)
-
-
-shmall:
-=======
-
-This parameter sets the total amount of shared memory pages that
-can be used system wide. Hence, SHMALL should always be at least
-ceil(shmmax/PAGE_SIZE).
-
-If you are not sure what the default PAGE_SIZE is on your Linux
-system, you can run the following command:
-
-	# getconf PAGE_SIZE
-
-
-shmmax:
-=======
-
-This value can be used to query and set the run time limit
-on the maximum shared memory segment size that can be created.
-Shared memory segments up to 1Gb are now supported in the
-kernel.  This value defaults to SHMMAX.
-
-
-shm_rmid_forced:
-================
-
-Linux lets you set resource limits, including how much memory one
-process can consume, via setrlimit(2).  Unfortunately, shared memory
-segments are allowed to exist without association with any process, and
-thus might not be counted against any resource limits.  If enabled,
-shared memory segments are automatically destroyed when their attach
-count becomes zero after a detach or a process termination.  It will
-also destroy segments that were created, but never attached to, on exit
-from the process.  The only use left for IPC_RMID is to immediately
-destroy an unattached segment.  Of course, this breaks the way things are
-defined, so some applications might stop working.  Note that this
-feature will do you no good unless you also configure your resource
-limits (in particular, RLIMIT_AS and RLIMIT_NPROC).  Most systems don't
-need this.
-
-Note that if you change this from 0 to 1, already created segments
-without users and with a dead originative process will be destroyed.
-
-
-sysctl_writes_strict:
-=====================
-
-Control how file position affects the behavior of updating sysctl values
-via the /proc/sys interface:
-
-  ==   ======================================================================
-  -1   Legacy per-write sysctl value handling, with no printk warnings.
-       Each write syscall must fully contain the sysctl value to be
-       written, and multiple writes on the same sysctl file descriptor
-       will rewrite the sysctl value, regardless of file position.
-   0   Same behavior as above, but warn about processes that perform writes
-       to a sysctl file descriptor when the file position is not 0.
-   1   (default) Respect file position when writing sysctl strings. Multiple
-       writes will append to the sysctl value buffer. Anything past the max
-       length of the sysctl value buffer will be ignored. Writes to numeric
-       sysctl entries must always be at file position 0 and the value must
-       be fully contained in the buffer sent in the write syscall.
-  ==   ======================================================================
-
-
-softlockup_all_cpu_backtrace:
-=============================
-
-This value controls the soft lockup detector thread's behavior
-when a soft lockup condition is detected as to whether or not
-to gather further debug information. If enabled, each cpu will
-be issued an NMI and instructed to capture stack trace.
-
-This feature is only applicable for architectures which support
-NMI.
-
-0: do nothing. This is the default behavior.
-
-1: on detection capture more debug information.
-
-
-soft_watchdog:
-==============
-
-This parameter can be used to control the soft lockup detector.
-
-   0 - disable the soft lockup detector
-
-   1 - enable the soft lockup detector
-
-The soft lockup detector monitors CPUs for threads that are hogging the CPUs
-without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
-from running. The mechanism depends on the CPUs ability to respond to timer
-interrupts which are needed for the 'watchdog/N' threads to be woken up by
-the watchdog timer function, otherwise the NMI watchdog - if enabled - can
-detect a hard lockup condition.
-
-
-stack_erasing:
-==============
-
-This parameter can be used to control kernel stack erasing at the end
-of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.
-
-That erasing reduces the information which kernel stack leak bugs
-can reveal and blocks some uninitialized stack variable attacks.
-The tradeoff is the performance impact: on a single CPU system kernel
-compilation sees a 1% slowdown, other systems and workloads may vary.
-
-  0: kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
-
-  1: kernel stack erasing is enabled (default), it is performed before
-     returning to the userspace at the end of syscalls.
-
-
-tainted
-=======
-
-Non-zero if the kernel has been tainted. Numeric values, which can be
-ORed together. The letters are seen in "Tainted" line of Oops reports.
-
-======  =====  ==============================================================
-     1  `(P)`  proprietary module was loaded
-     2  `(F)`  module was force loaded
-     4  `(S)`  SMP kernel oops on an officially SMP incapable processor
-     8  `(R)`  module was force unloaded
-    16  `(M)`  processor reported a Machine Check Exception (MCE)
-    32  `(B)`  bad page referenced or some unexpected page flags
-    64  `(U)`  taint requested by userspace application
-   128  `(D)`  kernel died recently, i.e. there was an OOPS or BUG
-   256  `(A)`  an ACPI table was overridden by user
-   512  `(W)`  kernel issued warning
-  1024  `(C)`  staging driver was loaded
-  2048  `(I)`  workaround for bug in platform firmware applied
-  4096  `(O)`  externally-built ("out-of-tree") module was loaded
-  8192  `(E)`  unsigned module was loaded
- 16384  `(L)`  soft lockup occurred
- 32768  `(K)`  kernel has been live patched
- 65536  `(X)`  Auxiliary taint, defined and used by for distros
-131072  `(T)`  The kernel was built with the struct randomization plugin
-======  =====  ==============================================================
-
-See Documentation/admin-guide/tainted-kernels.rst for more information.
-
-
-threads-max:
-============
-
-This value controls the maximum number of threads that can be created
-using fork().
-
-During initialization the kernel sets this value such that even if the
-maximum number of threads is created, the thread structures occupy only
-a part (1/8th) of the available RAM pages.
-
-The minimum value that can be written to threads-max is 20.
-
-The maximum value that can be written to threads-max is given by the
-constant FUTEX_TID_MASK (0x3fffffff).
-
-If a value outside of this range is written to threads-max an error
-EINVAL occurs.
-
-The value written is checked against the available RAM pages. If the
-thread structures would occupy too much (more than 1/8th) of the
-available RAM pages threads-max is reduced accordingly.
-
-
-unknown_nmi_panic:
-==================
-
-The value in this file affects behavior of handling NMI. When the
-value is non-zero, unknown NMI is trapped and then panic occurs. At
-that time, kernel debugging information is displayed on console.
-
-NMI switch that most IA32 servers have fires unknown NMI up, for
-example.  If a system hangs up, try pressing the NMI switch.
-
-
-watchdog:
-=========
-
-This parameter can be used to disable or enable the soft lockup detector
-_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
-
-   0 - disable both lockup detectors
-
-   1 - enable both lockup detectors
-
-The soft lockup detector and the NMI watchdog can also be disabled or
-enabled individually, using the soft_watchdog and nmi_watchdog parameters.
-If the watchdog parameter is read, for example by executing::
-
-   cat /proc/sys/kernel/watchdog
-
-the output of this command (0 or 1) shows the logical OR of soft_watchdog
-and nmi_watchdog.
-
-
-watchdog_cpumask:
-=================
-
-This value can be used to control on which cpus the watchdog may run.
-The default cpumask is all possible cores, but if NO_HZ_FULL is
-enabled in the kernel config, and cores are specified with the
-nohz_full= boot argument, those cores are excluded by default.
-Offline cores can be included in this mask, and if the core is later
-brought online, the watchdog will be started based on the mask value.
-
-Typically this value would only be touched in the nohz_full case
-to re-enable cores that by default were not running the watchdog,
-if a kernel lockup was suspected on those cores.
-
-The argument value is the standard cpulist format for cpumasks,
-so for example to enable the watchdog on cores 0, 2, 3, and 4 you
-might say::
-
-  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
-
-
-watchdog_thresh:
-================
-
-This value can be used to control the frequency of hrtimer and NMI
-events and the soft and hard lockup thresholds. The default threshold
-is 10 seconds.
-
-The softlockup threshold is (2 * watchdog_thresh). Setting this
-tunable to zero will disable lockup detection altogether.
diff --git a/Documentation/sysctl/net.rst b/Documentation/sysctl/net.rst
deleted file mode 100644
index a7d44e71019d..000000000000
--- a/Documentation/sysctl/net.rst
+++ /dev/null
@@ -1,461 +0,0 @@
-================================
-Documentation for /proc/sys/net/
-================================
-
-Copyright
-
-Copyright (c) 1999
-
-	- Terrehon Bowden <terrehon@pacbell.net>
-	- Bodo Bauer <bb@ricochet.net>
-
-Copyright (c) 2000
-
-	- Jorge Nerin <comandante@zaralinux.com>
-
-Copyright (c) 2009
-
-	- Shen Feng <shen@cn.fujitsu.com>
-
-For general info and legal blurb, please look in index.rst.
-
-------------------------------------------------------------------------------
-
-This file contains the documentation for the sysctl files in
-/proc/sys/net
-
-The interface  to  the  networking  parts  of  the  kernel  is  located  in
-/proc/sys/net. The following table shows all possible subdirectories.  You may
-see only some of them, depending on your kernel's configuration.
-
-
-Table : Subdirectories in /proc/sys/net
-
- ========= =================== = ========== ==================
- Directory Content               Directory  Content
- ========= =================== = ========== ==================
- core      General parameter     appletalk  Appletalk protocol
- unix      Unix domain sockets   netrom     NET/ROM
- 802       E802 protocol         ax25       AX25
- ethernet  Ethernet protocol     rose       X.25 PLP layer
- ipv4      IP version 4          x25        X.25 protocol
- ipx       IPX                   token-ring IBM token ring
- bridge    Bridging              decnet     DEC net
- ipv6      IP version 6          tipc       TIPC
- ========= =================== = ========== ==================
-
-1. /proc/sys/net/core - Network core options
-============================================
-
-bpf_jit_enable
---------------
-
-This enables the BPF Just in Time (JIT) compiler. BPF is a flexible
-and efficient infrastructure allowing to execute bytecode at various
-hook points. It is used in a number of Linux kernel subsystems such
-as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints)
-and security (e.g. seccomp). LLVM has a BPF back end that can compile
-restricted C into a sequence of BPF instructions. After program load
-through bpf(2) and passing a verifier in the kernel, a JIT will then
-translate these BPF proglets into native CPU instructions. There are
-two flavors of JITs, the newer eBPF JIT currently supported on:
-
-  - x86_64
-  - x86_32
-  - arm64
-  - arm32
-  - ppc64
-  - sparc64
-  - mips64
-  - s390x
-  - riscv
-
-And the older cBPF JIT supported on the following archs:
-
-  - mips
-  - ppc
-  - sparc
-
-eBPF JITs are a superset of cBPF JITs, meaning the kernel will
-migrate cBPF instructions into eBPF instructions and then JIT
-compile them transparently. Older cBPF JITs can only translate
-tcpdump filters, seccomp rules, etc, but not mentioned eBPF
-programs loaded through bpf(2).
-
-Values:
-
-	- 0 - disable the JIT (default value)
-	- 1 - enable the JIT
-	- 2 - enable the JIT and ask the compiler to emit traces on kernel log.
-
-bpf_jit_harden
---------------
-
-This enables hardening for the BPF JIT compiler. Supported are eBPF
-JIT backends. Enabling hardening trades off performance, but can
-mitigate JIT spraying.
-
-Values:
-
-	- 0 - disable JIT hardening (default value)
-	- 1 - enable JIT hardening for unprivileged users only
-	- 2 - enable JIT hardening for all users
-
-bpf_jit_kallsyms
-----------------
-
-When BPF JIT compiler is enabled, then compiled images are unknown
-addresses to the kernel, meaning they neither show up in traces nor
-in /proc/kallsyms. This enables export of these addresses, which can
-be used for debugging/tracing. If bpf_jit_harden is enabled, this
-feature is disabled.
-
-Values :
-
-	- 0 - disable JIT kallsyms export (default value)
-	- 1 - enable JIT kallsyms export for privileged users only
-
-bpf_jit_limit
--------------
-
-This enforces a global limit for memory allocations to the BPF JIT
-compiler in order to reject unprivileged JIT requests once it has
-been surpassed. bpf_jit_limit contains the value of the global limit
-in bytes.
-
-dev_weight
-----------
-
-The maximum number of packets that kernel can handle on a NAPI interrupt,
-it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
-aggregated packet is counted as one packet in this context.
-
-Default: 64
-
-dev_weight_rx_bias
-------------------
-
-RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
-of the driver for the per softirq cycle netdev_budget. This parameter influences
-the proportion of the configured netdev_budget that is spent on RPS based packet
-processing during RX softirq cycles. It is further meant for making current
-dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
-(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
-on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
-
-Default: 1
-
-dev_weight_tx_bias
-------------------
-
-Scales the maximum number of packets that can be processed during a TX softirq cycle.
-Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
-net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
-
-Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
-
-Default: 1
-
-default_qdisc
--------------
-
-The default queuing discipline to use for network devices. This allows
-overriding the default of pfifo_fast with an alternative. Since the default
-queuing discipline is created without additional parameters so is best suited
-to queuing disciplines that work well without configuration like stochastic
-fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
-queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
-which require setting up classes and bandwidths. Note that physical multiqueue
-interfaces still use mq as root qdisc, which in turn uses this default for its
-leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
-default to noqueue.
-
-Default: pfifo_fast
-
-busy_read
----------
-
-Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
-Approximate time in us to busy loop waiting for packets on the device queue.
-This sets the default value of the SO_BUSY_POLL socket option.
-Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
-which is the preferred method of enabling. If you need to enable the feature
-globally via sysctl, a value of 50 is recommended.
-
-Will increase power usage.
-
-Default: 0 (off)
-
-busy_poll
-----------------
-Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
-Approximate time in us to busy loop waiting for events.
-Recommended value depends on the number of sockets you poll on.
-For several sockets 50, for several hundreds 100.
-For more than that you probably want to use epoll.
-Note that only sockets with SO_BUSY_POLL set will be busy polled,
-so you want to either selectively set SO_BUSY_POLL on those sockets or set
-sysctl.net.busy_read globally.
-
-Will increase power usage.
-
-Default: 0 (off)
-
-rmem_default
-------------
-
-The default setting of the socket receive buffer in bytes.
-
-rmem_max
---------
-
-The maximum receive socket buffer size in bytes.
-
-tstamp_allow_data
------------------
-Allow processes to receive tx timestamps looped together with the original
-packet contents. If disabled, transmit timestamp requests from unprivileged
-processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
-
-Default: 1 (on)
-
-
-wmem_default
-------------
-
-The default setting (in bytes) of the socket send buffer.
-
-wmem_max
---------
-
-The maximum send socket buffer size in bytes.
-
-message_burst and message_cost
-------------------------------
-
-These parameters  are used to limit the warning messages written to the kernel
-log from  the  networking  code.  They  enforce  a  rate  limit  to  make  a
-denial-of-service attack  impossible. A higher message_cost factor, results in
-fewer messages that will be written. Message_burst controls when messages will
-be dropped.  The  default  settings  limit  warning messages to one every five
-seconds.
-
-warnings
---------
-
-This sysctl is now unused.
-
-This was used to control console messages from the networking stack that
-occur because of problems on the network like duplicate address or bad
-checksums.
-
-These messages are now emitted at KERN_DEBUG and can generally be enabled
-and controlled by the dynamic_debug facility.
-
-netdev_budget
--------------
-
-Maximum number of packets taken from all interfaces in one polling cycle (NAPI
-poll). In one polling cycle interfaces which are registered to polling are
-probed in a round-robin manner. Also, a polling cycle may not exceed
-netdev_budget_usecs microseconds, even if netdev_budget has not been
-exhausted.
-
-netdev_budget_usecs
----------------------
-
-Maximum number of microseconds in one NAPI polling cycle. Polling
-will exit when either netdev_budget_usecs have elapsed during the
-poll cycle or the number of packets processed reaches netdev_budget.
-
-netdev_max_backlog
-------------------
-
-Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
-receives packets faster than kernel can process them.
-
-netdev_rss_key
---------------
-
-RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
-randomly generated.
-Some user space might need to gather its content even if drivers do not
-provide ethtool -x support yet.
-
-::
-
-  myhost:~# cat /proc/sys/net/core/netdev_rss_key
-  84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
-
-File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
-
-Note:
-  /proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
-  but most drivers only use 40 bytes of it.
-
-::
-
-  myhost:~# ethtool -x eth0
-  RX flow hash indirection table for eth0 with 8 RX ring(s):
-      0:    0     1     2     3     4     5     6     7
-  RSS hash key:
-  84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
-
-netdev_tstamp_prequeue
-----------------------
-
-If set to 0, RX packet timestamps can be sampled after RPS processing, when
-the target CPU processes packets. It might give some delay on timestamps, but
-permit to distribute the load on several cpus.
-
-If set to 1 (default), timestamps are sampled as soon as possible, before
-queueing.
-
-optmem_max
-----------
-
-Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
-of struct cmsghdr structures with appended data.
-
-fb_tunnels_only_for_init_net
-----------------------------
-
-Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
-sit0, ip6tnl0, ip6gre0) are automatically created when a new
-network namespace is created, if corresponding tunnel is present
-in initial network namespace.
-If set to 1, these devices are not automatically created, and
-user space is responsible for creating them if needed.
-
-Default : 0  (for compatibility reasons)
-
-devconf_inherit_init_net
-------------------------
-
-Controls if a new network namespace should inherit all current
-settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
-default, we keep the current behavior: for IPv4 we inherit all current
-settings from init_net and for IPv6 we reset all settings to default.
-
-If set to 1, both IPv4 and IPv6 settings are forced to inherit from
-current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
-forced to reset to their default values.
-
-Default : 0  (for compatibility reasons)
-
-2. /proc/sys/net/unix - Parameters for Unix domain sockets
-----------------------------------------------------------
-
-There is only one file in this directory.
-unix_dgram_qlen limits the max number of datagrams queued in Unix domain
-socket's buffer. It will not take effect unless PF_UNIX flag is specified.
-
-
-3. /proc/sys/net/ipv4 - IPV4 settings
--------------------------------------
-Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
-descriptions of these entries.
-
-
-4. Appletalk
-------------
-
-The /proc/sys/net/appletalk  directory  holds the Appletalk configuration data
-when Appletalk is loaded. The configurable parameters are:
-
-aarp-expiry-time
-----------------
-
-The amount  of  time  we keep an ARP entry before expiring it. Used to age out
-old hosts.
-
-aarp-resolve-time
------------------
-
-The amount of time we will spend trying to resolve an Appletalk address.
-
-aarp-retransmit-limit
----------------------
-
-The number of times we will retransmit a query before giving up.
-
-aarp-tick-time
---------------
-
-Controls the rate at which expires are checked.
-
-The directory  /proc/net/appletalk  holds the list of active Appletalk sockets
-on a machine.
-
-The fields  indicate  the DDP type, the local address (in network:node format)
-the remote  address,  the  size of the transmit pending queue, the size of the
-received queue  (bytes waiting for applications to read) the state and the uid
-owning the socket.
-
-/proc/net/atalk_iface lists  all  the  interfaces  configured for appletalk.It
-shows the  name  of the interface, its Appletalk address, the network range on
-that address  (or  network number for phase 1 networks), and the status of the
-interface.
-
-/proc/net/atalk_route lists  each  known  network  route.  It lists the target
-(network) that the route leads to, the router (may be directly connected), the
-route flags, and the device the route is using.
-
-
-5. IPX
-------
-
-The IPX protocol has no tunable values in proc/sys/net.
-
-The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX
-socket giving  the  local  and  remote  addresses  in  Novell  format (that is
-network:node:port). In  accordance  with  the  strange  Novell  tradition,
-everything but the port is in hex. Not_Connected is displayed for sockets that
-are not  tied to a specific remote address. The Tx and Rx queue sizes indicate
-the number  of  bytes  pending  for  transmission  and  reception.  The  state
-indicates the  state  the  socket  is  in and the uid is the owning uid of the
-socket.
-
-The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface
-it gives  the network number, the node number, and indicates if the network is
-the primary  network.  It  also  indicates  which  device  it  is bound to (or
-Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux
-supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for
-IPX.
-
-The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it
-gives the  destination  network, the router node (or Directly) and the network
-address of the router (or Connected) for internal networks.
-
-6. TIPC
--------
-
-tipc_rmem
----------
-
-The TIPC protocol now has a tunable for the receive memory, similar to the
-tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
-
-::
-
-    # cat /proc/sys/net/tipc/tipc_rmem
-    4252725 34021800        68043600
-    #
-
-The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
-are scaled (shifted) versions of that same value.  Note that the min value
-is not at this point in time used in any meaningful way, but the triplet is
-preserved in order to be consistent with things like tcp_rmem.
-
-named_timeout
--------------
-
-TIPC name table updates are distributed asynchronously in a cluster, without
-any form of transaction handling. This means that different race scenarios are
-possible. One such is that a name withdrawal sent out by one node and received
-by another node may arrive after a second, overlapping name publication already
-has been accepted from a third node, although the conflicting updates
-originally may have been issued in the correct sequential order.
-If named_timeout is nonzero, failed topology updates will be placed on a defer
-queue until another event arrives that clears the error, or until the timeout
-expires. Value is in milliseconds.
diff --git a/Documentation/sysctl/sunrpc.rst b/Documentation/sysctl/sunrpc.rst
deleted file mode 100644
index 09780a682afd..000000000000
--- a/Documentation/sysctl/sunrpc.rst
+++ /dev/null
@@ -1,25 +0,0 @@
-===================================
-Documentation for /proc/sys/sunrpc/
-===================================
-
-kernel version 2.2.10
-
-Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-For general info and legal blurb, please look in index.rst.
-
-------------------------------------------------------------------------------
-
-This file contains the documentation for the sysctl files in
-/proc/sys/sunrpc and is valid for Linux kernel version 2.2.
-
-The files in this directory can be used to (re)set the debug
-flags of the SUN Remote Procedure Call (RPC) subsystem in
-the Linux kernel. This stuff is used for NFS, KNFSD and
-maybe a few other things as well.
-
-The files in there are used to control the debugging flags:
-rpc_debug, nfs_debug, nfsd_debug and nlm_debug.
-
-These flags are for kernel hackers only. You should read the
-source code in net/sunrpc/ for more information.
diff --git a/Documentation/sysctl/user.rst b/Documentation/sysctl/user.rst
deleted file mode 100644
index 650eaa03f15e..000000000000
--- a/Documentation/sysctl/user.rst
+++ /dev/null
@@ -1,78 +0,0 @@
-=================================
-Documentation for /proc/sys/user/
-=================================
-
-kernel version 4.9.0
-
-Copyright (c) 2016		Eric Biederman <ebiederm@xmission.com>
-
-------------------------------------------------------------------------------
-
-This file contains the documentation for the sysctl files in
-/proc/sys/user.
-
-The files in this directory can be used to override the default
-limits on the number of namespaces and other objects that have
-per user per user namespace limits.
-
-The primary purpose of these limits is to stop programs that
-malfunction and attempt to create a ridiculous number of objects,
-before the malfunction becomes a system wide problem.  It is the
-intention that the defaults of these limits are set high enough that
-no program in normal operation should run into these limits.
-
-The creation of per user per user namespace objects are charged to
-the user in the user namespace who created the object and
-verified to be below the per user limit in that user namespace.
-
-The creation of objects is also charged to all of the users
-who created user namespaces the creation of the object happens
-in (user namespaces can be nested) and verified to be below the per user
-limits in the user namespaces of those users.
-
-This recursive counting of created objects ensures that creating a
-user namespace does not allow a user to escape their current limits.
-
-Currently, these files are in /proc/sys/user:
-
-max_cgroup_namespaces
-=====================
-
-  The maximum number of cgroup namespaces that any user in the current
-  user namespace may create.
-
-max_ipc_namespaces
-==================
-
-  The maximum number of ipc namespaces that any user in the current
-  user namespace may create.
-
-max_mnt_namespaces
-==================
-
-  The maximum number of mount namespaces that any user in the current
-  user namespace may create.
-
-max_net_namespaces
-==================
-
-  The maximum number of network namespaces that any user in the
-  current user namespace may create.
-
-max_pid_namespaces
-==================
-
-  The maximum number of pid namespaces that any user in the current
-  user namespace may create.
-
-max_user_namespaces
-===================
-
-  The maximum number of user namespaces that any user in the current
-  user namespace may create.
-
-max_uts_namespaces
-==================
-
-  The maximum number of user namespaces that any user in the current
-  user namespace may create.
diff --git a/Documentation/sysctl/vm.rst b/Documentation/sysctl/vm.rst
deleted file mode 100644
index 5aceb5cd5ce7..000000000000
--- a/Documentation/sysctl/vm.rst
+++ /dev/null
@@ -1,964 +0,0 @@
-===============================
-Documentation for /proc/sys/vm/
-===============================
-
-kernel version 2.6.29
-
-Copyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
-
-Copyright (c) 2008         Peter W. Morreale <pmorreale@novell.com>
-
-For general info and legal blurb, please look in index.rst.
-
-------------------------------------------------------------------------------
-
-This file contains the documentation for the sysctl files in
-/proc/sys/vm and is valid for Linux kernel version 2.6.29.
-
-The files in this directory can be used to tune the operation
-of the virtual memory (VM) subsystem of the Linux kernel and
-the writeout of dirty data to disk.
-
-Default values and initialization routines for most of these
-files can be found in mm/swap.c.
-
-Currently, these files are in /proc/sys/vm:
-
-- admin_reserve_kbytes
-- block_dump
-- compact_memory
-- compact_unevictable_allowed
-- dirty_background_bytes
-- dirty_background_ratio
-- dirty_bytes
-- dirty_expire_centisecs
-- dirty_ratio
-- dirtytime_expire_seconds
-- dirty_writeback_centisecs
-- drop_caches
-- extfrag_threshold
-- hugetlb_shm_group
-- laptop_mode
-- legacy_va_layout
-- lowmem_reserve_ratio
-- max_map_count
-- memory_failure_early_kill
-- memory_failure_recovery
-- min_free_kbytes
-- min_slab_ratio
-- min_unmapped_ratio
-- mmap_min_addr
-- mmap_rnd_bits
-- mmap_rnd_compat_bits
-- nr_hugepages
-- nr_hugepages_mempolicy
-- nr_overcommit_hugepages
-- nr_trim_pages         (only if CONFIG_MMU=n)
-- numa_zonelist_order
-- oom_dump_tasks
-- oom_kill_allocating_task
-- overcommit_kbytes
-- overcommit_memory
-- overcommit_ratio
-- page-cluster
-- panic_on_oom
-- percpu_pagelist_fraction
-- stat_interval
-- stat_refresh
-- numa_stat
-- swappiness
-- unprivileged_userfaultfd
-- user_reserve_kbytes
-- vfs_cache_pressure
-- watermark_boost_factor
-- watermark_scale_factor
-- zone_reclaim_mode
-
-
-admin_reserve_kbytes
-====================
-
-The amount of free memory in the system that should be reserved for users
-with the capability cap_sys_admin.
-
-admin_reserve_kbytes defaults to min(3% of free pages, 8MB)
-
-That should provide enough for the admin to log in and kill a process,
-if necessary, under the default overcommit 'guess' mode.
-
-Systems running under overcommit 'never' should increase this to account
-for the full Virtual Memory Size of programs used to recover. Otherwise,
-root may not be able to log in to recover the system.
-
-How do you calculate a minimum useful reserve?
-
-sshd or login + bash (or some other shell) + top (or ps, kill, etc.)
-
-For overcommit 'guess', we can sum resident set sizes (RSS).
-On x86_64 this is about 8MB.
-
-For overcommit 'never', we can take the max of their virtual sizes (VSZ)
-and add the sum of their RSS.
-On x86_64 this is about 128MB.
-
-Changing this takes effect whenever an application requests memory.
-
-
-block_dump
-==========
-
-block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
-
-
-compact_memory
-==============
-
-Available only when CONFIG_COMPACTION is set. When 1 is written to the file,
-all zones are compacted such that free memory is available in contiguous
-blocks where possible. This can be important for example in the allocation of
-huge pages although processes will also directly compact memory as required.
-
-
-compact_unevictable_allowed
-===========================
-
-Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
-allowed to examine the unevictable lru (mlocked pages) for pages to compact.
-This should be used on systems where stalls for minor page faults are an
-acceptable trade for large contiguous free memory.  Set to 0 to prevent
-compaction from moving pages that are unevictable.  Default value is 1.
-
-
-dirty_background_bytes
-======================
-
-Contains the amount of dirty memory at which the background kernel
-flusher threads will start writeback.
-
-Note:
-  dirty_background_bytes is the counterpart of dirty_background_ratio. Only
-  one of them may be specified at a time. When one sysctl is written it is
-  immediately taken into account to evaluate the dirty memory limits and the
-  other appears as 0 when read.
-
-
-dirty_background_ratio
-======================
-
-Contains, as a percentage of total available memory that contains free pages
-and reclaimable pages, the number of pages at which the background kernel
-flusher threads will start writing out dirty data.
-
-The total available memory is not equal to total system memory.
-
-
-dirty_bytes
-===========
-
-Contains the amount of dirty memory at which a process generating disk writes
-will itself start writeback.
-
-Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be
-specified at a time. When one sysctl is written it is immediately taken into
-account to evaluate the dirty memory limits and the other appears as 0 when
-read.
-
-Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
-value lower than this limit will be ignored and the old configuration will be
-retained.
-
-
-dirty_expire_centisecs
-======================
-
-This tunable is used to define when dirty data is old enough to be eligible
-for writeout by the kernel flusher threads.  It is expressed in 100'ths
-of a second.  Data which has been dirty in-memory for longer than this
-interval will be written out next time a flusher thread wakes up.
-
-
-dirty_ratio
-===========
-
-Contains, as a percentage of total available memory that contains free pages
-and reclaimable pages, the number of pages at which a process which is
-generating disk writes will itself start writing out dirty data.
-
-The total available memory is not equal to total system memory.
-
-
-dirtytime_expire_seconds
-========================
-
-When a lazytime inode is constantly having its pages dirtied, the inode with
-an updated timestamp will never get chance to be written out.  And, if the
-only thing that has happened on the file system is a dirtytime inode caused
-by an atime update, a worker will be scheduled to make sure that inode
-eventually gets pushed out to disk.  This tunable is used to define when dirty
-inode is old enough to be eligible for writeback by the kernel flusher threads.
-And, it is also used as the interval to wakeup dirtytime_writeback thread.
-
-
-dirty_writeback_centisecs
-=========================
-
-The kernel flusher threads will periodically wake up and write `old` data
-out to disk.  This tunable expresses the interval between those wakeups, in
-100'ths of a second.
-
-Setting this to zero disables periodic writeback altogether.
-
-
-drop_caches
-===========
-
-Writing to this will cause the kernel to drop clean caches, as well as
-reclaimable slab objects like dentries and inodes.  Once dropped, their
-memory becomes free.
-
-To free pagecache::
-
-	echo 1 > /proc/sys/vm/drop_caches
-
-To free reclaimable slab objects (includes dentries and inodes)::
-
-	echo 2 > /proc/sys/vm/drop_caches
-
-To free slab objects and pagecache::
-
-	echo 3 > /proc/sys/vm/drop_caches
-
-This is a non-destructive operation and will not free any dirty objects.
-To increase the number of objects freed by this operation, the user may run
-`sync` prior to writing to /proc/sys/vm/drop_caches.  This will minimize the
-number of dirty objects on the system and create more candidates to be
-dropped.
-
-This file is not a means to control the growth of the various kernel caches
-(inodes, dentries, pagecache, etc...)  These objects are automatically
-reclaimed by the kernel when memory is needed elsewhere on the system.
-
-Use of this file can cause performance problems.  Since it discards cached
-objects, it may cost a significant amount of I/O and CPU to recreate the
-dropped objects, especially if they were under heavy use.  Because of this,
-use outside of a testing or debugging environment is not recommended.
-
-You may see informational messages in your kernel log when this file is
-used::
-
-	cat (1234): drop_caches: 3
-
-These are informational only.  They do not mean that anything is wrong
-with your system.  To disable them, echo 4 (bit 2) into drop_caches.
-
-
-extfrag_threshold
-=================
-
-This parameter affects whether the kernel will compact memory or direct
-reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
-debugfs shows what the fragmentation index for each order is in each zone in
-the system. Values tending towards 0 imply allocations would fail due to lack
-of memory, values towards 1000 imply failures are due to fragmentation and -1
-implies that the allocation will succeed as long as watermarks are met.
-
-The kernel will not compact memory in a zone if the
-fragmentation index is <= extfrag_threshold. The default value is 500.
-
-
-highmem_is_dirtyable
-====================
-
-Available only for systems with CONFIG_HIGHMEM enabled (32b systems).
-
-This parameter controls whether the high memory is considered for dirty
-writers throttling.  This is not the case by default which means that
-only the amount of memory directly visible/usable by the kernel can
-be dirtied. As a result, on systems with a large amount of memory and
-lowmem basically depleted writers might be throttled too early and
-streaming writes can get very slow.
-
-Changing the value to non zero would allow more memory to be dirtied
-and thus allow writers to write more data which can be flushed to the
-storage more effectively. Note this also comes with a risk of pre-mature
-OOM killer because some writers (e.g. direct block device writes) can
-only use the low memory and they can fill it up with dirty data without
-any throttling.
-
-
-hugetlb_shm_group
-=================
-
-hugetlb_shm_group contains group id that is allowed to create SysV
-shared memory segment using hugetlb page.
-
-
-laptop_mode
-===========
-
-laptop_mode is a knob that controls "laptop mode". All the things that are
-controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
-
-
-legacy_va_layout
-================
-
-If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel
-will use the legacy (2.4) layout for all processes.
-
-
-lowmem_reserve_ratio
-====================
-
-For some specialised workloads on highmem machines it is dangerous for
-the kernel to allow process memory to be allocated from the "lowmem"
-zone.  This is because that memory could then be pinned via the mlock()
-system call, or by unavailability of swapspace.
-
-And on large highmem machines this lack of reclaimable lowmem memory
-can be fatal.
-
-So the Linux page allocator has a mechanism which prevents allocations
-which *could* use highmem from using too much lowmem.  This means that
-a certain amount of lowmem is defended from the possibility of being
-captured into pinned user memory.
-
-(The same argument applies to the old 16 megabyte ISA DMA region.  This
-mechanism will also defend that region from allocations which could use
-highmem or lowmem).
-
-The `lowmem_reserve_ratio` tunable determines how aggressive the kernel is
-in defending these lower zones.
-
-If you have a machine which uses highmem or ISA DMA and your
-applications are using mlock(), or if you are running with no swap then
-you probably should change the lowmem_reserve_ratio setting.
-
-The lowmem_reserve_ratio is an array. You can see them by reading this file::
-
-	% cat /proc/sys/vm/lowmem_reserve_ratio
-	256     256     32
-
-But, these values are not used directly. The kernel calculates # of protection
-pages for each zones from them. These are shown as array of protection pages
-in /proc/zoneinfo like followings. (This is an example of x86-64 box).
-Each zone has an array of protection pages like this::
-
-  Node 0, zone      DMA
-    pages free     1355
-          min      3
-          low      3
-          high     4
-	:
-	:
-      numa_other   0
-          protection: (0, 2004, 2004, 2004)
-	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-    pagesets
-      cpu: 0 pcp: 0
-          :
-
-These protections are added to score to judge whether this zone should be used
-for page allocation or should be reclaimed.
-
-In this example, if normal pages (index=2) are required to this DMA zone and
-watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
-not be used because pages_free(1355) is smaller than watermark + protection[2]
-(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
-normal page requirement. If requirement is DMA zone(index=0), protection[0]
-(=0) is used.
-
-zone[i]'s protection[j] is calculated by following expression::
-
-  (i < j):
-    zone[i]->protection[j]
-    = (total sums of managed_pages from zone[i+1] to zone[j] on the node)
-      / lowmem_reserve_ratio[i];
-  (i = j):
-     (should not be protected. = 0;
-  (i > j):
-     (not necessary, but looks 0)
-
-The default values of lowmem_reserve_ratio[i] are
-
-    === ====================================
-    256 (if zone[i] means DMA or DMA32 zone)
-    32  (others)
-    === ====================================
-
-As above expression, they are reciprocal number of ratio.
-256 means 1/256. # of protection pages becomes about "0.39%" of total managed
-pages of higher zones on the node.
-
-If you would like to protect more pages, smaller values are effective.
-The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
-disables protection of the pages.
-
-
-max_map_count:
-==============
-
-This file contains the maximum number of memory map areas a process
-may have. Memory map areas are used as a side-effect of calling
-malloc, directly by mmap, mprotect, and madvise, and also when loading
-shared libraries.
-
-While most applications need less than a thousand maps, certain
-programs, particularly malloc debuggers, may consume lots of them,
-e.g., up to one or two maps per allocation.
-
-The default value is 65536.
-
-
-memory_failure_early_kill:
-==========================
-
-Control how to kill processes when uncorrected memory error (typically
-a 2bit error in a memory module) is detected in the background by hardware
-that cannot be handled by the kernel. In some cases (like the page
-still having a valid copy on disk) the kernel will handle the failure
-transparently without affecting any applications. But if there is
-no other uptodate copy of the data it will kill to prevent any data
-corruptions from propagating.
-
-1: Kill all processes that have the corrupted and not reloadable page mapped
-as soon as the corruption is detected.  Note this is not supported
-for a few types of pages, like kernel internally allocated data or
-the swap cache, but works for the majority of user pages.
-
-0: Only unmap the corrupted page from all processes and only kill a process
-who tries to access it.
-
-The kill is done using a catchable SIGBUS with BUS_MCEERR_AO, so processes can
-handle this if they want to.
-
-This is only active on architectures/platforms with advanced machine
-check handling and depends on the hardware capabilities.
-
-Applications can override this setting individually with the PR_MCE_KILL prctl
-
-
-memory_failure_recovery
-=======================
-
-Enable memory failure recovery (when supported by the platform)
-
-1: Attempt recovery.
-
-0: Always panic on a memory failure.
-
-
-min_free_kbytes
-===============
-
-This is used to force the Linux VM to keep a minimum number
-of kilobytes free.  The VM uses this number to compute a
-watermark[WMARK_MIN] value for each lowmem zone in the system.
-Each lowmem zone gets a number of reserved free pages based
-proportionally on its size.
-
-Some minimal amount of memory is needed to satisfy PF_MEMALLOC
-allocations; if you set this to lower than 1024KB, your system will
-become subtly broken, and prone to deadlock under high loads.
-
-Setting this too high will OOM your machine instantly.
-
-
-min_slab_ratio
-==============
-
-This is available only on NUMA kernels.
-
-A percentage of the total pages in each zone.  On Zone reclaim
-(fallback from the local zone occurs) slabs will be reclaimed if more
-than this percentage of pages in a zone are reclaimable slab pages.
-This insures that the slab growth stays under control even in NUMA
-systems that rarely perform global reclaim.
-
-The default is 5 percent.
-
-Note that slab reclaim is triggered in a per zone / node fashion.
-The process of reclaiming slab memory is currently not node specific
-and may not be fast.
-
-
-min_unmapped_ratio
-==================
-
-This is available only on NUMA kernels.
-
-This is a percentage of the total pages in each zone. Zone reclaim will
-only occur if more than this percentage of pages are in a state that
-zone_reclaim_mode allows to be reclaimed.
-
-If zone_reclaim_mode has the value 4 OR'd, then the percentage is compared
-against all file-backed unmapped pages including swapcache pages and tmpfs
-files. Otherwise, only unmapped pages backed by normal files but not tmpfs
-files and similar are considered.
-
-The default is 1 percent.
-
-
-mmap_min_addr
-=============
-
-This file indicates the amount of address space  which a user process will
-be restricted from mmapping.  Since kernel null dereference bugs could
-accidentally operate based on the information in the first couple of pages
-of memory userspace processes should not be allowed to write to them.  By
-default this value is set to 0 and no protections will be enforced by the
-security module.  Setting this value to something like 64k will allow the
-vast majority of applications to work correctly and provide defense in depth
-against future potential kernel bugs.
-
-
-mmap_rnd_bits
-=============
-
-This value can be used to select the number of bits to use to
-determine the random offset to the base address of vma regions
-resulting from mmap allocations on architectures which support
-tuning address space randomization.  This value will be bounded
-by the architecture's minimum and maximum supported values.
-
-This value can be changed after boot using the
-/proc/sys/vm/mmap_rnd_bits tunable
-
-
-mmap_rnd_compat_bits
-====================
-
-This value can be used to select the number of bits to use to
-determine the random offset to the base address of vma regions
-resulting from mmap allocations for applications run in
-compatibility mode on architectures which support tuning address
-space randomization.  This value will be bounded by the
-architecture's minimum and maximum supported values.
-
-This value can be changed after boot using the
-/proc/sys/vm/mmap_rnd_compat_bits tunable
-
-
-nr_hugepages
-============
-
-Change the minimum size of the hugepage pool.
-
-See Documentation/admin-guide/mm/hugetlbpage.rst
-
-
-nr_hugepages_mempolicy
-======================
-
-Change the size of the hugepage pool at run-time on a specific
-set of NUMA nodes.
-
-See Documentation/admin-guide/mm/hugetlbpage.rst
-
-
-nr_overcommit_hugepages
-=======================
-
-Change the maximum size of the hugepage pool. The maximum is
-nr_hugepages + nr_overcommit_hugepages.
-
-See Documentation/admin-guide/mm/hugetlbpage.rst
-
-
-nr_trim_pages
-=============
-
-This is available only on NOMMU kernels.
-
-This value adjusts the excess page trimming behaviour of power-of-2 aligned
-NOMMU mmap allocations.
-
-A value of 0 disables trimming of allocations entirely, while a value of 1
-trims excess pages aggressively. Any value >= 1 acts as the watermark where
-trimming of allocations is initiated.
-
-The default value is 1.
-
-See Documentation/nommu-mmap.txt for more information.
-
-
-numa_zonelist_order
-===================
-
-This sysctl is only for NUMA and it is deprecated. Anything but
-Node order will fail!
-
-'where the memory is allocated from' is controlled by zonelists.
-
-(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation.
-you may be able to read ZONE_DMA as ZONE_DMA32...)
-
-In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following.
-ZONE_NORMAL -> ZONE_DMA
-This means that a memory allocation request for GFP_KERNEL will
-get memory from ZONE_DMA only when ZONE_NORMAL is not available.
-
-In NUMA case, you can think of following 2 types of order.
-Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL::
-
-  (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
-  (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
-
-Type(A) offers the best locality for processes on Node(0), but ZONE_DMA
-will be used before ZONE_NORMAL exhaustion. This increases possibility of
-out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small.
-
-Type(B) cannot offer the best locality but is more robust against OOM of
-the DMA zone.
-
-Type(A) is called as "Node" order. Type (B) is "Zone" order.
-
-"Node order" orders the zonelists by node, then by zone within each node.
-Specify "[Nn]ode" for node order
-
-"Zone Order" orders the zonelists by zone type, then by node within each
-zone.  Specify "[Zz]one" for zone order.
-
-Specify "[Dd]efault" to request automatic configuration.
-
-On 32-bit, the Normal zone needs to be preserved for allocations accessible
-by the kernel, so "zone" order will be selected.
-
-On 64-bit, devices that require DMA32/DMA are relatively rare, so "node"
-order will be selected.
-
-Default order is recommended unless this is causing problems for your
-system/application.
-
-
-oom_dump_tasks
-==============
-
-Enables a system-wide task dump (excluding kernel threads) to be produced
-when the kernel performs an OOM-killing and includes such information as
-pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
-score, and name.  This is helpful to determine why the OOM killer was
-invoked, to identify the rogue task that caused it, and to determine why
-the OOM killer chose the task it did to kill.
-
-If this is set to zero, this information is suppressed.  On very
-large systems with thousands of tasks it may not be feasible to dump
-the memory state information for each one.  Such systems should not
-be forced to incur a performance penalty in OOM conditions when the
-information may not be desired.
-
-If this is set to non-zero, this information is shown whenever the
-OOM killer actually kills a memory-hogging task.
-
-The default value is 1 (enabled).
-
-
-oom_kill_allocating_task
-========================
-
-This enables or disables killing the OOM-triggering task in
-out-of-memory situations.
-
-If this is set to zero, the OOM killer will scan through the entire
-tasklist and select a task based on heuristics to kill.  This normally
-selects a rogue memory-hogging task that frees up a large amount of
-memory when killed.
-
-If this is set to non-zero, the OOM killer simply kills the task that
-triggered the out-of-memory condition.  This avoids the expensive
-tasklist scan.
-
-If panic_on_oom is selected, it takes precedence over whatever value
-is used in oom_kill_allocating_task.
-
-The default value is 0.
-
-
-overcommit_kbytes
-=================
-
-When overcommit_memory is set to 2, the committed address space is not
-permitted to exceed swap plus this amount of physical RAM. See below.
-
-Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one
-of them may be specified at a time. Setting one disables the other (which
-then appears as 0 when read).
-
-
-overcommit_memory
-=================
-
-This value contains a flag that enables memory overcommitment.
-
-When this flag is 0, the kernel attempts to estimate the amount
-of free memory left when userspace requests more memory.
-
-When this flag is 1, the kernel pretends there is always enough
-memory until it actually runs out.
-
-When this flag is 2, the kernel uses a "never overcommit"
-policy that attempts to prevent any overcommit of memory.
-Note that user_reserve_kbytes affects this policy.
-
-This feature can be very useful because there are a lot of
-programs that malloc() huge amounts of memory "just-in-case"
-and don't use much of it.
-
-The default value is 0.
-
-See Documentation/vm/overcommit-accounting.rst and
-mm/util.c::__vm_enough_memory() for more information.
-
-
-overcommit_ratio
-================
-
-When overcommit_memory is set to 2, the committed address
-space is not permitted to exceed swap plus this percentage
-of physical RAM.  See above.
-
-
-page-cluster
-============
-
-page-cluster controls the number of pages up to which consecutive pages
-are read in from swap in a single attempt. This is the swap counterpart
-to page cache readahead.
-The mentioned consecutivity is not in terms of virtual/physical addresses,
-but consecutive on swap space - that means they were swapped out together.
-
-It is a logarithmic value - setting it to zero means "1 page", setting
-it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
-Zero disables swap readahead completely.
-
-The default value is three (eight pages at a time).  There may be some
-small benefits in tuning this to a different value if your workload is
-swap-intensive.
-
-Lower values mean lower latencies for initial faults, but at the same time
-extra faults and I/O delays for following faults if they would have been part of
-that consecutive pages readahead would have brought in.
-
-
-panic_on_oom
-============
-
-This enables or disables panic on out-of-memory feature.
-
-If this is set to 0, the kernel will kill some rogue process,
-called oom_killer.  Usually, oom_killer can kill rogue processes and
-system will survive.
-
-If this is set to 1, the kernel panics when out-of-memory happens.
-However, if a process limits using nodes by mempolicy/cpusets,
-and those nodes become memory exhaustion status, one process
-may be killed by oom-killer. No panic occurs in this case.
-Because other nodes' memory may be free. This means system total status
-may be not fatal yet.
-
-If this is set to 2, the kernel panics compulsorily even on the
-above-mentioned. Even oom happens under memory cgroup, the whole
-system panics.
-
-The default value is 0.
-
-1 and 2 are for failover of clustering. Please select either
-according to your policy of failover.
-
-panic_on_oom=2+kdump gives you very strong tool to investigate
-why oom happens. You can get snapshot.
-
-
-percpu_pagelist_fraction
-========================
-
-This is the fraction of pages at most (high mark pcp->high) in each zone that
-are allocated for each per cpu page list.  The min value for this is 8.  It
-means that we don't allow more than 1/8th of pages in each zone to be
-allocated in any single per_cpu_pagelist.  This entry only changes the value
-of hot per cpu pagelists.  User can specify a number like 100 to allocate
-1/100th of each zone to each per cpu page list.
-
-The batch value of each per cpu pagelist is also updated as a result.  It is
-set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
-
-The initial value is zero.  Kernel does not use this value at boot time to set
-the high water marks for each per cpu page list.  If the user writes '0' to this
-sysctl, it will revert to this default behavior.
-
-
-stat_interval
-=============
-
-The time interval between which vm statistics are updated.  The default
-is 1 second.
-
-
-stat_refresh
-============
-
-Any read or write (by root only) flushes all the per-cpu vm statistics
-into their global totals, for more accurate reports when testing
-e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
-
-As a side-effect, it also checks for negative totals (elsewhere reported
-as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
-(At time of writing, a few stats are known sometimes to be found negative,
-with no ill effects: errors and warnings on these stats are suppressed.)
-
-
-numa_stat
-=========
-
-This interface allows runtime configuration of numa statistics.
-
-When page allocation performance becomes a bottleneck and you can tolerate
-some possible tool breakage and decreased numa counter precision, you can
-do::
-
-	echo 0 > /proc/sys/vm/numa_stat
-
-When page allocation performance is not a bottleneck and you want all
-tooling to work, you can do::
-
-	echo 1 > /proc/sys/vm/numa_stat
-
-
-swappiness
-==========
-
-This control is used to define how aggressive the kernel will swap
-memory pages.  Higher values will increase aggressiveness, lower values
-decrease the amount of swap.  A value of 0 instructs the kernel not to
-initiate swap until the amount of free and file-backed pages is less
-than the high water mark in a zone.
-
-The default value is 60.
-
-
-unprivileged_userfaultfd
-========================
-
-This flag controls whether unprivileged users can use the userfaultfd
-system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
-
-The default value is 1.
-
-
-user_reserve_kbytes
-===================
-
-When overcommit_memory is set to 2, "never overcommit" mode, reserve
-min(3% of current process size, user_reserve_kbytes) of free memory.
-This is intended to prevent a user from starting a single memory hogging
-process, such that they cannot recover (kill the hog).
-
-user_reserve_kbytes defaults to min(3% of the current process size, 128MB).
-
-If this is reduced to zero, then the user will be allowed to allocate
-all free memory with a single process, minus admin_reserve_kbytes.
-Any subsequent attempts to execute a command will result in
-"fork: Cannot allocate memory".
-
-Changing this takes effect whenever an application requests memory.
-
-
-vfs_cache_pressure
-==================
-
-This percentage value controls the tendency of the kernel to reclaim
-the memory which is used for caching of directory and inode objects.
-
-At the default value of vfs_cache_pressure=100 the kernel will attempt to
-reclaim dentries and inodes at a "fair" rate with respect to pagecache and
-swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
-to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
-never reclaim dentries and inodes due to memory pressure and this can easily
-lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
-causes the kernel to prefer to reclaim dentries and inodes.
-
-Increasing vfs_cache_pressure significantly beyond 100 may have negative
-performance impact. Reclaim code needs to take various locks to find freeable
-directory and inode objects. With vfs_cache_pressure=1000, it will look for
-ten times more freeable objects than there are.
-
-
-watermark_boost_factor
-======================
-
-This factor controls the level of reclaim when memory is being fragmented.
-It defines the percentage of the high watermark of a zone that will be
-reclaimed if pages of different mobility are being mixed within pageblocks.
-The intent is that compaction has less work to do in the future and to
-increase the success rate of future high-order allocations such as SLUB
-allocations, THP and hugetlbfs pages.
-
-To make it sensible with respect to the watermark_scale_factor
-parameter, the unit is in fractions of 10,000. The default value of
-15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
-watermark will be reclaimed in the event of a pageblock being mixed due
-to fragmentation. The level of reclaim is determined by the number of
-fragmentation events that occurred in the recent past. If this value is
-smaller than a pageblock then a pageblocks worth of pages will be reclaimed
-(e.g.  2MB on 64-bit x86). A boost factor of 0 will disable the feature.
-
-
-watermark_scale_factor
-======================
-
-This factor controls the aggressiveness of kswapd. It defines the
-amount of memory left in a node/system before kswapd is woken up and
-how much memory needs to be free before kswapd goes back to sleep.
-
-The unit is in fractions of 10,000. The default value of 10 means the
-distances between watermarks are 0.1% of the available memory in the
-node/system. The maximum value is 1000, or 10% of memory.
-
-A high rate of threads entering direct reclaim (allocstall) or kswapd
-going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
-that the number of free pages kswapd maintains for latency reasons is
-too small for the allocation bursts occurring in the system. This knob
-can then be used to tune kswapd aggressiveness accordingly.
-
-
-zone_reclaim_mode
-=================
-
-Zone_reclaim_mode allows someone to set more or less aggressive approaches to
-reclaim memory when a zone runs out of memory. If it is set to zero then no
-zone reclaim occurs. Allocations will be satisfied from other zones / nodes
-in the system.
-
-This is value OR'ed together of
-
-=	===================================
-1	Zone reclaim on
-2	Zone reclaim writes dirty pages out
-4	Zone reclaim swaps pages
-=	===================================
-
-zone_reclaim_mode is disabled by default.  For file servers or workloads
-that benefit from having their data cached, zone_reclaim_mode should be
-left disabled as the caching effect is likely to be more important than
-data locality.
-
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction.  The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
-
-Allowing zone reclaim to write out pages stops processes that are
-writing large amounts of data from dirtying pages on other nodes. Zone
-reclaim will write out dirty pages if a zone fills up and so effectively
-throttle the process. This may decrease the performance of a single process
-since it cannot use all of system memory to buffer the outgoing writes
-anymore but it preserve the memory on other nodes so that the performance
-of other processes running on other nodes will not be affected.
-
-Allowing regular swap effectively restricts allocations to the local
-node unless explicitly overridden by memory policies or cpuset
-configurations.
diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
index 8ba656f37cd8..109052215bce 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -439,7 +439,7 @@ Compacting MLOCKED Pages
 
 The unevictable LRU can be scanned for compactable regions and the default
 behavior is to do so.  /proc/sys/vm/compact_unevictable_allowed controls
-this behavior (see Documentation/sysctl/vm.rst).  Once scanning of the
+this behavior (see Documentation/admin-guide/sysctl/vm.rst).  Once scanning of the
 unevictable LRU is enabled, the work of compaction is mostly handled by
 the page migration code and the same work flow as described in MIGRATING
 MLOCKED PAGES will apply.
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 4c3dcb718961..47d2651fd9dc 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -72,7 +72,7 @@ config PROC_SYSCTL
 	  interface is through /proc/sys.  If you say Y here a tree of
 	  modifiable sysctl entries will be generated beneath the
           /proc/sys directory. They are explained in the files
-	  in <file:Documentation/sysctl/>.  Note that enabling this
+	  in <file:Documentation/admin-guide/sysctl/>.  Note that enabling this
 	  option will enlarge the kernel by at least 8 KB.
 
 	  As it is generally a good thing, you should say Y here unless
diff --git a/kernel/panic.c b/kernel/panic.c
index e0ea74bbb41d..057540b6eee9 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -372,7 +372,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
 /**
  * print_tainted - return a string to represent the kernel taint state.
  *
- * For individual taint flag meanings, see Documentation/sysctl/kernel.rst
+ * For individual taint flag meanings, see Documentation/admin-guide/sysctl/kernel.rst
  *
  * The string is overwritten by the next call to print_tainted(),
  * but is always NULL terminated.
diff --git a/mm/swap.c b/mm/swap.c
index 83a2a15f4836..ae300397dfda 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -8,7 +8,7 @@
 /*
  * This file contains the default values for the operation of the
  * Linux VM subsystem. Fine-tuning documentation can be found in
- * Documentation/sysctl/vm.rst.
+ * Documentation/admin-guide/sysctl/vm.rst.
  * Started 18.12.91
  * Swap aging added 23.2.95, Stephen Tweedie.
  * Buffermem limits added 12.3.98, Rik van Riel.
-- 
cgit v1.2.3-55-g7522


From 9e1cbede267916e737c4a755059418da3ac4de95 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 13 Jun 2019 15:07:43 -0300
Subject: docs: admin-guide: add laptops documentation

The docs under Documentation/laptops contain users specific
information.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
---
 Documentation/ABI/testing/sysfs-block-device       |    2 +-
 .../ABI/testing/sysfs-platform-asus-laptop         |    2 +-
 Documentation/admin-guide/index.rst                |    1 +
 Documentation/admin-guide/kernel-parameters.txt    |    2 +-
 Documentation/admin-guide/laptops/asus-laptop.rst  |  271 ++++
 .../admin-guide/laptops/disk-shock-protection.rst  |  151 ++
 Documentation/admin-guide/laptops/index.rst        |   16 +
 Documentation/admin-guide/laptops/laptop-mode.rst  |  781 ++++++++++
 Documentation/admin-guide/laptops/lg-laptop.rst    |   84 ++
 Documentation/admin-guide/laptops/sony-laptop.rst  |  174 +++
 Documentation/admin-guide/laptops/sonypi.rst       |  160 ++
 .../admin-guide/laptops/thinkpad-acpi.rst          | 1562 ++++++++++++++++++++
 Documentation/admin-guide/laptops/toshiba_haps.rst |   87 ++
 Documentation/admin-guide/sysctl/vm.rst            |    4 +-
 Documentation/laptops/asus-laptop.rst              |  271 ----
 Documentation/laptops/disk-shock-protection.rst    |  151 --
 Documentation/laptops/index.rst                    |   17 -
 Documentation/laptops/laptop-mode.rst              |  781 ----------
 Documentation/laptops/lg-laptop.rst                |   85 --
 Documentation/laptops/sony-laptop.rst              |  174 ---
 Documentation/laptops/sonypi.rst                   |  160 --
 Documentation/laptops/thinkpad-acpi.rst            | 1562 --------------------
 Documentation/laptops/toshiba_haps.rst             |   87 --
 MAINTAINERS                                        |    4 +-
 drivers/char/Kconfig                               |    2 +-
 drivers/platform/x86/Kconfig                       |    4 +-
 26 files changed, 3297 insertions(+), 3298 deletions(-)
 create mode 100644 Documentation/admin-guide/laptops/asus-laptop.rst
 create mode 100644 Documentation/admin-guide/laptops/disk-shock-protection.rst
 create mode 100644 Documentation/admin-guide/laptops/index.rst
 create mode 100644 Documentation/admin-guide/laptops/laptop-mode.rst
 create mode 100644 Documentation/admin-guide/laptops/lg-laptop.rst
 create mode 100644 Documentation/admin-guide/laptops/sony-laptop.rst
 create mode 100644 Documentation/admin-guide/laptops/sonypi.rst
 create mode 100644 Documentation/admin-guide/laptops/thinkpad-acpi.rst
 create mode 100644 Documentation/admin-guide/laptops/toshiba_haps.rst
 delete mode 100644 Documentation/laptops/asus-laptop.rst
 delete mode 100644 Documentation/laptops/disk-shock-protection.rst
 delete mode 100644 Documentation/laptops/index.rst
 delete mode 100644 Documentation/laptops/laptop-mode.rst
 delete mode 100644 Documentation/laptops/lg-laptop.rst
 delete mode 100644 Documentation/laptops/sony-laptop.rst
 delete mode 100644 Documentation/laptops/sonypi.rst
 delete mode 100644 Documentation/laptops/thinkpad-acpi.rst
 delete mode 100644 Documentation/laptops/toshiba_haps.rst

diff --git a/Documentation/ABI/testing/sysfs-block-device b/Documentation/ABI/testing/sysfs-block-device
index 0d57bbb4fddc..17f2bc7dd261 100644
--- a/Documentation/ABI/testing/sysfs-block-device
+++ b/Documentation/ABI/testing/sysfs-block-device
@@ -45,7 +45,7 @@ Description:
 		- Values below -2 are rejected with -EINVAL
 
 		For more information, see
-		Documentation/laptops/disk-shock-protection.rst
+		Documentation/admin-guide/laptops/disk-shock-protection.rst
 
 
 What:		/sys/block/*/device/ncq_prio_enable
diff --git a/Documentation/ABI/testing/sysfs-platform-asus-laptop b/Documentation/ABI/testing/sysfs-platform-asus-laptop
index d67fa4bafa70..8b0e8205a6a2 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-laptop
+++ b/Documentation/ABI/testing/sysfs-platform-asus-laptop
@@ -31,7 +31,7 @@ Description:
 		To control the LED display, use the following :
 		    echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
 		where T control the 3 letters display, and DDD the 3 digits display.
-		The DDD table can be found in Documentation/laptops/asus-laptop.rst
+		The DDD table can be found in Documentation/admin-guide/laptops/asus-laptop.rst
 
 What:		/sys/devices/platform/asus_laptop/bluetooth
 Date:		January 2007
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 5c6ae1ccee1a..6fcc83aaa9b6 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
    perf-security
    acpi/index
    device-mapper/index
+   laptops/index
 
 .. only::  subproject and html
 
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b323f5d4366a..4821175a3769 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4347,7 +4347,7 @@
 			Format: <integer>
 
 	sonypi.*=	[HW] Sony Programmable I/O Control Device driver
-			See Documentation/laptops/sonypi.rst
+			See Documentation/admin-guide/laptops/sonypi.rst
 
 	spectre_v2=	[X86] Control mitigation of Spectre variant 2
 			(indirect branch speculation) vulnerability.
diff --git a/Documentation/admin-guide/laptops/asus-laptop.rst b/Documentation/admin-guide/laptops/asus-laptop.rst
new file mode 100644
index 000000000000..95176321a25a
--- /dev/null
+++ b/Documentation/admin-guide/laptops/asus-laptop.rst
@@ -0,0 +1,271 @@
+==================
+Asus Laptop Extras
+==================
+
+Version 0.1
+
+August 6, 2009
+
+Corentin Chary <corentincj@iksaif.net>
+http://acpi4asus.sf.net/
+
+ This driver provides support for extra features of ACPI-compatible ASUS laptops.
+ It may also support some MEDION, JVC or VICTOR laptops (such as MEDION 9675 or
+ VICTOR XP7210 for example). It makes all the extra buttons generate input
+ events (like keyboards).
+
+ On some models adds support for changing the display brightness and output,
+ switching the LCD backlight on and off, and most importantly, allows you to
+ blink those fancy LEDs intended for reporting mail and wireless status.
+
+This driver supersedes the old asus_acpi driver.
+
+Requirements
+------------
+
+  Kernel 2.6.X sources, configured for your computer, with ACPI support.
+  You also need CONFIG_INPUT and CONFIG_ACPI.
+
+Status
+------
+
+ The features currently supported are the following (see below for
+ detailed description):
+
+ - Fn key combinations
+ - Bluetooth enable and disable
+ - Wlan enable and disable
+ - GPS enable and disable
+ - Video output switching
+ - Ambient Light Sensor on and off
+ - LED control
+ - LED Display control
+ - LCD brightness control
+ - LCD on and off
+
+ A compatibility table by model and feature is maintained on the web
+ site, http://acpi4asus.sf.net/.
+
+Usage
+-----
+
+  Try "modprobe asus-laptop". Check your dmesg (simply type dmesg). You should
+  see some lines like this :
+
+      Asus Laptop Extras version 0.42
+        - L2D model detected.
+
+  If it is not the output you have on your laptop, send it (and the laptop's
+  DSDT) to me.
+
+  That's all, now, all the events generated by the hotkeys of your laptop
+  should be reported via netlink events. You can check with
+  "acpi_genl monitor" (part of the acpica project).
+
+  Hotkeys are also reported as input keys (like keyboards) you can check
+  which key are supported using "xev" under X11.
+
+  You can get information on the version of your DSDT table by reading the
+  /sys/devices/platform/asus-laptop/infos entry. If you have a question or a
+  bug report to do, please include the output of this entry.
+
+LEDs
+----
+
+  You can modify LEDs be echoing values to `/sys/class/leds/asus/*/brightness`::
+
+    echo 1 >  /sys/class/leds/asus::mail/brightness
+
+  will switch the mail LED on.
+
+  You can also know if they are on/off by reading their content and use
+  kernel triggers like disk-activity or heartbeat.
+
+Backlight
+---------
+
+  You can control lcd backlight power and brightness with
+  /sys/class/backlight/asus-laptop/. Brightness Values are between 0 and 15.
+
+Wireless devices
+----------------
+
+  You can turn the internal Bluetooth adapter on/off with the bluetooth entry
+  (only on models with Bluetooth). This usually controls the associated LED.
+  Same for Wlan adapter.
+
+Display switching
+-----------------
+
+  Note: the display switching code is currently considered EXPERIMENTAL.
+
+  Switching works for the following models:
+
+    - L3800C
+    - A2500H
+    - L5800C
+    - M5200N
+    - W1000N (albeit with some glitches)
+    - M6700R
+    - A6JC
+    - F3J
+
+  Switching doesn't work for the following:
+
+    - M3700N
+    - L2X00D (locks the laptop under certain conditions)
+
+  To switch the displays, echo values from 0 to 15 to
+  /sys/devices/platform/asus-laptop/display. The significance of those values
+  is as follows:
+
+  +-------+-----+-----+-----+-----+-----+
+  | Bin   | Val | DVI | TV  | CRT | LCD |
+  +-------+-----+-----+-----+-----+-----+
+  | 0000  |   0 |     |     |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0001  |   1 |     |     |     |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 0010  |   2 |     |     |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0011  |   3 |     |     |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 0100  |   4 |     |  X  |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0101  |   5 |     |  X  |     | X   |
+  +-------+-----+-----+-----+-----+-----+
+  | 0110  |   6 |     |  X  |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 0111  |   7 |     |  X  |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1000  |   8 |  X  |     |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1001  |   9 |  X  |     |     |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1010  |  10 |  X  |     |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1011  |  11 |  X  |     |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1100  |  12 |  X  |  X  |     |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1101  |  13 |  X  |  X  |     |  X  |
+  +-------+-----+-----+-----+-----+-----+
+  | 1110  |  14 |  X  |  X  |  X  |     |
+  +-------+-----+-----+-----+-----+-----+
+  | 1111  |  15 |  X  |  X  |  X  |  X  |
+  +-------+-----+-----+-----+-----+-----+
+
+  In most cases, the appropriate displays must be plugged in for the above
+  combinations to work. TV-Out may need to be initialized at boot time.
+
+  Debugging:
+
+  1) Check whether the Fn+F8 key:
+
+     a) does not lock the laptop (try a boot with noapic / nolapic if it does)
+     b) generates events (0x6n, where n is the value corresponding to the
+        configuration above)
+     c) actually works
+
+     Record the disp value at every configuration.
+  2) Echo values from 0 to 15 to /sys/devices/platform/asus-laptop/display.
+     Record its value, note any change. If nothing changes, try a broader range,
+     up to 65535.
+  3) Send ANY output (both positive and negative reports are needed, unless your
+     machine is already listed above) to the acpi4asus-user mailing list.
+
+  Note: on some machines (e.g. L3C), after the module has been loaded, only 0x6n
+  events are generated and no actual switching occurs. In such a case, a line
+  like::
+
+    echo $((10#$arg-60)) > /sys/devices/platform/asus-laptop/display
+
+  will usually do the trick ($arg is the 0000006n-like event passed to acpid).
+
+  Note: there is currently no reliable way to read display status on xxN
+  (Centrino) models.
+
+LED display
+-----------
+
+  Some models like the W1N have a LED display that can be used to display
+  several items of information.
+
+  LED display works for the following models:
+
+    - W1000N
+    - W1J
+
+  To control the LED display, use the following::
+
+    echo 0x0T000DDD > /sys/devices/platform/asus-laptop/
+
+  where T control the 3 letters display, and DDD the 3 digits display,
+  according to the tables below::
+
+         DDD (digits)
+         000 to 999 = display digits
+         AAA        = ---
+         BBB to FFF = turn-off
+
+         T  (type)
+         0 = off
+         1 = dvd
+         2 = vcd
+         3 = mp3
+         4 = cd
+         5 = tv
+         6 = cpu
+         7 = vol
+
+  For example "echo 0x01000001 >/sys/devices/platform/asus-laptop/ledd"
+  would display "DVD001".
+
+Driver options
+--------------
+
+ Options can be passed to the asus-laptop driver using the standard
+ module argument syntax (<param>=<value> when passing the option to the
+ module or asus-laptop.<param>=<value> on the kernel boot line when
+ asus-laptop is statically linked into the kernel).
+
+	     wapf: WAPF defines the behavior of the Fn+Fx wlan key
+		   The significance of values is yet to be found, but
+		   most of the time:
+
+		   - 0x0 should do nothing
+		   - 0x1 should allow to control the device with Fn+Fx key.
+		   - 0x4 should send an ACPI event (0x88) while pressing the Fn+Fx key
+		   - 0x5 like 0x1 or 0x4
+
+ The default value is 0x1.
+
+Unsupported models
+------------------
+
+ These models will never be supported by this module, as they use a completely
+ different mechanism to handle LEDs and extra stuff (meaning we have no clue
+ how it works):
+
+ - ASUS A1300 (A1B), A1370D
+ - ASUS L7300G
+ - ASUS L8400
+
+Patches, Errors, Questions
+--------------------------
+
+ I appreciate any success or failure
+ reports, especially if they add to or correct the compatibility table.
+ Please include the following information in your report:
+
+ - Asus model name
+ - a copy of your ACPI tables, using the "acpidump" utility
+ - a copy of /sys/devices/platform/asus-laptop/infos
+ - which driver features work and which don't
+ - the observed behavior of non-working features
+
+ Any other comments or patches are also more than welcome.
+
+ acpi4asus-user@lists.sourceforge.net
+
+ http://sourceforge.net/projects/acpi4asus
diff --git a/Documentation/admin-guide/laptops/disk-shock-protection.rst b/Documentation/admin-guide/laptops/disk-shock-protection.rst
new file mode 100644
index 000000000000..e97c5f78d8c3
--- /dev/null
+++ b/Documentation/admin-guide/laptops/disk-shock-protection.rst
@@ -0,0 +1,151 @@
+==========================
+Hard disk shock protection
+==========================
+
+Author: Elias Oltmanns <eo@nebensachen.de>
+
+Last modified: 2008-10-03
+
+
+.. 0. Contents
+
+   1. Intro
+   2. The interface
+   3. References
+   4. CREDITS
+
+
+1. Intro
+--------
+
+ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
+Issuing this command should cause the drive to switch to idle mode and
+unload disk heads. This feature is being used in modern laptops in
+conjunction with accelerometers and appropriate software to implement
+a shock protection facility. The idea is to stop all I/O operations on
+the internal hard drive and park its heads on the ramp when critical
+situations are anticipated. The desire to have such a feature
+available on GNU/Linux systems has been the original motivation to
+implement a generic disk head parking interface in the Linux kernel.
+Please note, however, that other components have to be set up on your
+system in order to get disk shock protection working (see
+section 3. References below for pointers to more information about
+that).
+
+
+2. The interface
+----------------
+
+For each ATA device, the kernel exports the file
+`block/*/device/unload_heads` in sysfs (here assumed to be mounted under
+/sys). Access to `/sys/block/*/device/unload_heads` is denied with
+-EOPNOTSUPP if the device does not support the unload feature.
+Otherwise, writing an integer value to this file will take the heads
+of the respective drive off the platter and block all I/O operations
+for the specified number of milliseconds. When the timeout expires and
+no further disk head park request has been issued in the meantime,
+normal operation will be resumed. The maximal value accepted for a
+timeout is 30000 milliseconds. Exceeding this limit will return
+-EOVERFLOW, but heads will be parked anyway and the timeout will be
+set to 30 seconds. However, you can always change a timeout to any
+value between 0 and 30000 by issuing a subsequent head park request
+before the timeout of the previous one has expired. In particular, the
+total timeout can exceed 30 seconds and, more importantly, you can
+cancel a previously set timeout and resume normal operation
+immediately by specifying a timeout of 0. Values below -2 are rejected
+with -EINVAL (see below for the special meaning of -1 and -2). If the
+timeout specified for a recent head park request has not yet expired,
+reading from `/sys/block/*/device/unload_heads` will report the number
+of milliseconds remaining until normal operation will be resumed;
+otherwise, reading the unload_heads attribute will return 0.
+
+For example, do the following in order to park the heads of drive
+/dev/sda and stop all I/O operations for five seconds::
+
+	# echo 5000 > /sys/block/sda/device/unload_heads
+
+A simple::
+
+	# cat /sys/block/sda/device/unload_heads
+
+will show you how many milliseconds are left before normal operation
+will be resumed.
+
+A word of caution: The fact that the interface operates on a basis of
+milliseconds may raise expectations that cannot be satisfied in
+reality. In fact, the ATA specs clearly state that the time for an
+unload operation to complete is vendor specific. The hint in ATA-7
+that this will typically be within 500 milliseconds apparently has
+been dropped in ATA-8.
+
+There is a technical detail of this implementation that may cause some
+confusion and should be discussed here. When a head park request has
+been issued to a device successfully, all I/O operations on the
+controller port this device is attached to will be deferred. That is
+to say, any other device that may be connected to the same port will
+be affected too. The only exception is that a subsequent head unload
+request to that other device will be executed immediately. Further
+operations on that port will be deferred until the timeout specified
+for either device on the port has expired. As far as PATA (old style
+IDE) configurations are concerned, there can only be two devices
+attached to any single port. In SATA world we have port multipliers
+which means that a user-issued head parking request to one device may
+actually result in stopping I/O to a whole bunch of devices. However,
+since this feature is supposed to be used on laptops and does not seem
+to be very useful in any other environment, there will be mostly one
+device per port. Even if the CD/DVD writer happens to be connected to
+the same port as the hard drive, it generally *should* recover just
+fine from the occasional buffer under-run incurred by a head park
+request to the HD. Actually, when you are using an ide driver rather
+than its libata counterpart (i.e. your disk is called /dev/hda
+instead of /dev/sda), then parking the heads of one drive (drive X)
+will generally not affect the mode of operation of another drive
+(drive Y) on the same port as described above. It is only when a port
+reset is required to recover from an exception on drive Y that further
+I/O operations on that drive (and the reset itself) will be delayed
+until drive X is no longer in the parked state.
+
+Finally, there are some hard drives that only comply with an earlier
+version of the ATA standard than ATA-7, but do support the unload
+feature nonetheless. Unfortunately, there is no safe way Linux can
+detect these devices, so you won't be able to write to the
+unload_heads attribute. If you know that your device really does
+support the unload feature (for instance, because the vendor of your
+laptop or the hard drive itself told you so), then you can tell the
+kernel to enable the usage of this feature for that drive by writing
+the special value -1 to the unload_heads attribute::
+
+	# echo -1 > /sys/block/sda/device/unload_heads
+
+will enable the feature for /dev/sda, and giving -2 instead of -1 will
+disable it again.
+
+
+3. References
+-------------
+
+There are several laptops from different vendors featuring shock
+protection capabilities. As manufacturers have refused to support open
+source development of the required software components so far, Linux
+support for shock protection varies considerably between different
+hardware implementations. Ideally, this section should contain a list
+of pointers at different projects aiming at an implementation of shock
+protection on different systems. Unfortunately, I only know of a
+single project which, although still considered experimental, is fit
+for use. Please feel free to add projects that have been the victims
+of my ignorance.
+
+- http://www.thinkwiki.org/wiki/HDAPS
+
+  See this page for information about Linux support of the hard disk
+  active protection system as implemented in IBM/Lenovo Thinkpads.
+
+
+4. CREDITS
+----------
+
+This implementation of disk head parking has been inspired by a patch
+originally published by Jon Escombe <lists@dresco.co.uk>. My efforts
+to develop an implementation of this feature that is fit to be merged
+into mainline have been aided by various kernel developers, in
+particular by Tejun Heo and Bartlomiej Zolnierkiewicz.
diff --git a/Documentation/admin-guide/laptops/index.rst b/Documentation/admin-guide/laptops/index.rst
new file mode 100644
index 000000000000..6b554e39863b
--- /dev/null
+++ b/Documentation/admin-guide/laptops/index.rst
@@ -0,0 +1,16 @@
+
+==============
+Laptop Drivers
+==============
+
+.. toctree::
+   :maxdepth: 1
+
+   asus-laptop
+   disk-shock-protection
+   laptop-mode
+   lg-laptop
+   sony-laptop
+   sonypi
+   thinkpad-acpi
+   toshiba_haps
diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst
new file mode 100644
index 000000000000..c984c4262f2e
--- /dev/null
+++ b/Documentation/admin-guide/laptops/laptop-mode.rst
@@ -0,0 +1,781 @@
+===============================================
+How to conserve battery power using laptop-mode
+===============================================
+
+Document Author: Bart Samwel (bart@samwel.tk)
+
+Date created: January 2, 2004
+
+Last modified: December 06, 2004
+
+Introduction
+------------
+
+Laptop mode is used to minimize the time that the hard disk needs to be spun up,
+to conserve battery power on laptops. It has been reported to cause significant
+power savings.
+
+.. Contents
+
+   * Introduction
+   * Installation
+   * Caveats
+   * The Details
+   * Tips & Tricks
+   * Control script
+   * ACPI integration
+   * Monitoring tool
+
+
+Installation
+------------
+
+To use laptop mode, you don't need to set any kernel configuration options
+or anything. Simply install all the files included in this document, and
+laptop mode will automatically be started when you're on battery. For
+your convenience, a tarball containing an installer can be downloaded at:
+
+	http://www.samwel.tk/laptop_mode/laptop_mode/
+
+To configure laptop mode, you need to edit the configuration file, which is
+located in /etc/default/laptop-mode on Debian-based systems, or in
+/etc/sysconfig/laptop-mode on other systems.
+
+Unfortunately, automatic enabling of laptop mode does not work for
+laptops that don't have ACPI. On those laptops, you need to start laptop
+mode manually. To start laptop mode, run "laptop_mode start", and to
+stop it, run "laptop_mode stop". (Note: The laptop mode tools package now
+has experimental support for APM, you might want to try that first.)
+
+
+Caveats
+-------
+
+* The downside of laptop mode is that you have a chance of losing up to 10
+  minutes of work. If you cannot afford this, don't use it! The supplied ACPI
+  scripts automatically turn off laptop mode when the battery almost runs out,
+  so that you won't lose any data at the end of your battery life.
+
+* Most desktop hard drives have a very limited lifetime measured in spindown
+  cycles, typically about 50.000 times (it's usually listed on the spec sheet).
+  Check your drive's rating, and don't wear down your drive's lifetime if you
+  don't need to.
+
+* If you mount some of your ext3/reiserfs filesystems with the -n option, then
+  the control script will not be able to remount them correctly. You must set
+  DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
+  wrong options -- or it will fail because it cannot write to /etc/mtab.
+
+* If you have your filesystems listed as type "auto" in fstab, like I did, then
+  the control script will not recognize them as filesystems that need remounting.
+  You must list the filesystems with their true type instead.
+
+* It has been reported that some versions of the mutt mail client use file access
+  times to determine whether a folder contains new mail. If you use mutt and
+  experience this, you must disable the noatime remounting by setting the option
+  DO_REMOUNT_NOATIME to 0 in the configuration file.
+
+
+The Details
+-----------
+
+Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is
+present for all kernels that have the laptop mode patch, regardless of any
+configuration options. When the knob is set, any physical disk I/O (that might
+have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The
+result of this is that after a disk has spun down, it will not be spun up
+anymore to write dirty blocks, because those blocks had already been written
+immediately after the most recent read operation. The value of the laptop_mode
+knob determines the time between the occurrence of disk I/O and when the flush
+is triggered. A sensible value for the knob is 5 seconds. Setting the knob to
+0 disables laptop mode.
+
+To increase the effectiveness of the laptop_mode strategy, the laptop_mode
+control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
+/proc/sys/vm to about 10 minutes (by default), which means that pages that are
+dirtied are not forced to be written to disk as often. The control script also
+changes the dirty background ratio, so that background writeback of dirty pages
+is not done anymore. Combined with a higher commit value (also 10 minutes) for
+ext3 or ReiserFS filesystems (also done automatically by the control script),
+this results in concentration of disk activity in a small time interval which
+occurs only once every 10 minutes, or whenever the disk is forced to spin up by
+a cache miss. The disk can then be spun down in the periods of inactivity.
+
+If you want to find out which process caused the disk to spin up, you can
+gather information by setting the flag /proc/sys/vm/block_dump. When this flag
+is set, Linux reports all disk read and write operations that take place, and
+all block dirtyings done to files. This makes it possible to debug why a disk
+needs to spin up, and to increase battery life even more. The output of
+block_dump is written to the kernel output, and it can be retrieved using
+"dmesg". When you use block_dump and your kernel logging level also includes
+kernel debugging messages, you probably want to turn off klogd, otherwise
+the output of block_dump will be logged, causing disk activity that is not
+normally there.
+
+
+Configuration
+-------------
+
+The laptop mode configuration file is located in /etc/default/laptop-mode on
+Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It
+contains the following options:
+
+MAX_AGE:
+
+Maximum time, in seconds, of hard drive spindown time that you are
+comfortable with. Worst case, it's possible that you could lose this
+amount of work if your battery fails while you're in laptop mode.
+
+MINIMUM_BATTERY_MINUTES:
+
+Automatically disable laptop mode if the remaining number of minutes of
+battery power is less than this value. Default is 10 minutes.
+
+AC_HD/BATT_HD:
+
+The idle timeout that should be set on your hard drive when laptop mode
+is active (BATT_HD) and when it is not active (AC_HD). The defaults are
+20 seconds (value 4) for BATT_HD  and 2 hours (value 244) for AC_HD. The
+possible values are those listed in the manual page for "hdparm" for the
+"-S" option.
+
+HD:
+
+The devices for which the spindown timeout should be adjusted by laptop mode.
+Default is /dev/hda. If you specify multiple devices, separate them by a space.
+
+READAHEAD:
+
+Disk readahead, in 512-byte sectors, while laptop mode is active. A large
+readahead can prevent disk accesses for things like executable pages (which are
+loaded on demand while the application executes) and sequentially accessed data
+(MP3s).
+
+DO_REMOUNTS:
+
+The control script automatically remounts any mounted journaled filesystems
+with appropriate commit interval options. When this option is set to 0, this
+feature is disabled.
+
+DO_REMOUNT_NOATIME:
+
+When remounting, should the filesystems be remounted with the noatime option?
+Normally, this is set to "1" (enabled), but there may be programs that require
+access time recording.
+
+DIRTY_RATIO:
+
+The percentage of memory that is allowed to contain "dirty" or unsaved data
+before a writeback is forced, while laptop mode is active. Corresponds to
+the /proc/sys/vm/dirty_ratio sysctl.
+
+DIRTY_BACKGROUND_RATIO:
+
+The percentage of memory that is allowed to contain "dirty" or unsaved data
+after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set
+this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio
+sysctl.
+
+Note that the behaviour of dirty_background_ratio is quite different
+when laptop mode is active and when it isn't. When laptop mode is inactive,
+dirty_background_ratio is the threshold percentage at which background writeouts
+start taking place. When laptop mode is active, however, background writeouts
+are disabled, and the dirty_background_ratio only determines how much writeback
+is done when dirty_ratio is reached.
+
+DO_CPU:
+
+Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup.
+See Documentation/admin-guide/pm/cpufreq.rst for more info. Disabled by default.)
+
+CPU_MAXFREQ:
+
+When on battery, what is the maximum CPU speed that the system should use? Legal
+values are "slowest" for the slowest speed that your CPU is able to operate at,
+or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies.
+
+
+Tips & Tricks
+-------------
+
+* Bartek Kania reports getting up to 50 minutes of extra battery life (on top
+  of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1).
+
+* You can spin down the disk while playing MP3, by setting disk readahead
+  to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at
+  once, and will then spin down while the MP3 is playing. (Thanks to Bartek
+  Kania.)
+
+* Drew Scott Daniels observed: "I don't know why, but when I decrease the number
+  of colours that my display uses it consumes less battery power. I've seen
+  this on powerbooks too. I hope that this is a piece of information that
+  might be useful to the Laptop Mode patch or its users."
+
+* In syslog.conf, you can prefix entries with a dash `-` to omit syncing the
+  file after every logging. When you're using laptop-mode and your disk doesn't
+  spin down, this is a likely culprit.
+
+* Richard Atterer observed that laptop mode does not work well with noflushd
+  (http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode
+  from doing its thing.
+
+* If you're worried about your data, you might want to consider using a USB
+  memory stick or something like that as a "working area". (Be aware though
+  that flash memory can only handle a limited number of writes, and overuse
+  may wear out your memory stick pretty quickly. Do _not_ use journalling
+  filesystems on flash memory sticks.)
+
+
+Configuration file for control and ACPI battery scripts
+-------------------------------------------------------
+
+This allows the tunables to be changed for the scripts via an external
+configuration file
+
+It should be installed as /etc/default/laptop-mode on Debian, and as
+/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes.
+
+Config file::
+
+  # Maximum time, in seconds, of hard drive spindown time that you are
+  # comfortable with. Worst case, it's possible that you could lose this
+  # amount of work if your battery fails you while in laptop mode.
+  #MAX_AGE=600
+
+  # Automatically disable laptop mode when the number of minutes of battery
+  # that you have left goes below this threshold.
+  MINIMUM_BATTERY_MINUTES=10
+
+  # Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG
+  # by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk
+  # will read a complete MP3 at once, and will then spin down while the MP3/OGG is
+  # playing.
+  #READAHEAD=4096
+
+  # Shall we remount journaled fs. with appropriate commit interval? (1=yes)
+  #DO_REMOUNTS=1
+
+  # And shall we add the "noatime" option to that as well? (1=yes)
+  #DO_REMOUNT_NOATIME=1
+
+  # Dirty synchronous ratio.  At this percentage of dirty pages the process
+  # which
+  # calls write() does its own writeback
+  #DIRTY_RATIO=40
+
+  #
+  # Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
+  # exceeded, the kernel will wake flusher threads which will then reduce the
+  # amount of dirty memory to dirty_background_ratio.  Set this nice and low,
+  # so once some writeout has commenced, we do a lot of it.
+  #
+  #DIRTY_BACKGROUND_RATIO=5
+
+  # kernel default dirty buffer age
+  #DEF_AGE=30
+  #DEF_UPDATE=5
+  #DEF_DIRTY_BACKGROUND_RATIO=10
+  #DEF_DIRTY_RATIO=40
+  #DEF_XFS_AGE_BUFFER=15
+  #DEF_XFS_SYNC_INTERVAL=30
+  #DEF_XFS_BUFD_INTERVAL=1
+
+  # This must be adjusted manually to the value of HZ in the running kernel
+  # on 2.4, until the XFS people change their 2.4 external interfaces to work in
+  # centisecs. This can be automated, but it's a work in progress that still
+  # needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for
+  # external interfaces, and that is currently always set to 100. So you don't
+  # need to change this on 2.6.
+  #XFS_HZ=100
+
+  # Should the maximum CPU frequency be adjusted down while on battery?
+  # Requires CPUFreq to be setup.
+  # See Documentation/admin-guide/pm/cpufreq.rst for more info
+  #DO_CPU=0
+
+  # When on battery what is the maximum CPU speed that the system should
+  # use? Legal values are "slowest" for the slowest speed that your
+  # CPU is able to operate at, or a value listed in:
+  # /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
+  # Only applicable if DO_CPU=1.
+  #CPU_MAXFREQ=slowest
+
+  # Idle timeout for your hard drive (man hdparm for valid values, -S option)
+  # Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4).
+  #AC_HD=244
+  #BATT_HD=4
+
+  # The drives for which to adjust the idle timeout. Separate them by a space,
+  # e.g. HD="/dev/hda /dev/hdb".
+  #HD="/dev/hda"
+
+  # Set the spindown timeout on a hard drive?
+  #DO_HD=1
+
+
+Control script
+--------------
+
+Please note that this control script works for the Linux 2.4 and 2.6 series (thanks
+to Kiko Piris).
+
+Control script::
+
+  #!/bin/bash
+
+  # start or stop laptop_mode, best run by a power management daemon when
+  # ac gets connected/disconnected from a laptop
+  #
+  # install as /sbin/laptop_mode
+  #
+  # Contributors to this script:   Kiko Piris
+  #				 Bart Samwel
+  #				 Micha Feigin
+  #				 Andrew Morton
+  #				 Herve Eychenne
+  #				 Dax Kelson
+  #
+  # Original Linux 2.4 version by: Jens Axboe
+
+  #############################################################################
+
+  # Source config
+  if [ -f /etc/default/laptop-mode ] ; then
+	# Debian
+	. /etc/default/laptop-mode
+  elif [ -f /etc/sysconfig/laptop-mode ] ; then
+	# Others
+          . /etc/sysconfig/laptop-mode
+  fi
+
+  # Don't raise an error if the config file is incomplete
+  # set defaults instead:
+
+  # Maximum time, in seconds, of hard drive spindown time that you are
+  # comfortable with. Worst case, it's possible that you could lose this
+  # amount of work if your battery fails you while in laptop mode.
+  MAX_AGE=${MAX_AGE:-'600'}
+
+  # Read-ahead, in kilobytes
+  READAHEAD=${READAHEAD:-'4096'}
+
+  # Shall we remount journaled fs. with appropriate commit interval? (1=yes)
+  DO_REMOUNTS=${DO_REMOUNTS:-'1'}
+
+  # And shall we add the "noatime" option to that as well? (1=yes)
+  DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'}
+
+  # Shall we adjust the idle timeout on a hard drive?
+  DO_HD=${DO_HD:-'1'}
+
+  # Adjust idle timeout on which hard drive?
+  HD="${HD:-'/dev/hda'}"
+
+  # spindown time for HD (hdparm -S values)
+  AC_HD=${AC_HD:-'244'}
+  BATT_HD=${BATT_HD:-'4'}
+
+  # Dirty synchronous ratio.  At this percentage of dirty pages the process which
+  # calls write() does its own writeback
+  DIRTY_RATIO=${DIRTY_RATIO:-'40'}
+
+  # cpu frequency scaling
+  # See Documentation/admin-guide/pm/cpufreq.rst for more info
+  DO_CPU=${CPU_MANAGE:-'0'}
+  CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'}
+
+  #
+  # Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
+  # exceeded, the kernel will wake flusher threads which will then reduce the
+  # amount of dirty memory to dirty_background_ratio.  Set this nice and low,
+  # so once some writeout has commenced, we do a lot of it.
+  #
+  DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'}
+
+  # kernel default dirty buffer age
+  DEF_AGE=${DEF_AGE:-'30'}
+  DEF_UPDATE=${DEF_UPDATE:-'5'}
+  DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'}
+  DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'}
+  DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'}
+  DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'}
+  DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'}
+
+  # This must be adjusted manually to the value of HZ in the running kernel
+  # on 2.4, until the XFS people change their 2.4 external interfaces to work in
+  # centisecs. This can be automated, but it's a work in progress that still needs
+  # some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external
+  # interfaces, and that is currently always set to 100. So you don't need to
+  # change this on 2.6.
+  XFS_HZ=${XFS_HZ:-'100'}
+
+  #############################################################################
+
+  KLEVEL="$(uname -r |
+               {
+	       IFS='.' read a b c
+	       echo $a.$b
+	     }
+  )"
+  case "$KLEVEL" in
+	"2.4"|"2.6")
+		;;
+	*)
+		echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2
+		exit 1
+		;;
+  esac
+
+  if [ ! -e /proc/sys/vm/laptop_mode ] ; then
+	echo "Kernel is not patched with laptop_mode patch." >&2
+	exit 1
+  fi
+
+  if [ ! -w /proc/sys/vm/laptop_mode ] ; then
+	echo "You do not have enough privileges to enable laptop_mode." >&2
+	exit 1
+  fi
+
+  # Remove an option (the first parameter) of the form option=<number> from
+  # a mount options string (the rest of the parameters).
+  parse_mount_opts () {
+	OPT="$1"
+	shift
+	echo ",$*," | sed		\
+	 -e 's/,'"$OPT"'=[0-9]*,/,/g'	\
+	 -e 's/,,*/,/g'			\
+	 -e 's/^,//'			\
+	 -e 's/,$//'
+  }
+
+  # Remove an option (the first parameter) without any arguments from
+  # a mount option string (the rest of the parameters).
+  parse_nonumber_mount_opts () {
+	OPT="$1"
+	shift
+	echo ",$*," | sed		\
+	 -e 's/,'"$OPT"',/,/g'		\
+	 -e 's/,,*/,/g'			\
+	 -e 's/^,//'			\
+	 -e 's/,$//'
+  }
+
+  # Find out the state of a yes/no option (e.g. "atime"/"noatime") in
+  # fstab for a given filesystem, and use this state to replace the
+  # value of the option in another mount options string. The device
+  # is the first argument, the option name the second, and the default
+  # value the third. The remainder is the mount options string.
+  #
+  # Example:
+  # parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime
+  #
+  # If fstab contains, say, "rw" for this filesystem, then the result
+  # will be "defaults,atime".
+  parse_yesno_opts_wfstab () {
+	L_DEV="$1"
+	OPT="$2"
+	DEF_OPT="$3"
+	shift 3
+	L_OPTS="$*"
+	PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)"
+	PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)"
+	# Watch for a default atime in fstab
+	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
+	if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then
+		# option specified in fstab: extract the value and use it
+		if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then
+			echo "$PARSEDOPTS1,no$OPT"
+		else
+			# no$OPT not found -- so we must have $OPT.
+			echo "$PARSEDOPTS1,$OPT"
+		fi
+	else
+		# option not specified in fstab -- choose the default.
+		echo "$PARSEDOPTS1,$DEF_OPT"
+	fi
+  }
+
+  # Find out the state of a numbered option (e.g. "commit=NNN") in
+  # fstab for a given filesystem, and use this state to replace the
+  # value of the option in another mount options string. The device
+  # is the first argument, and the option name the second. The
+  # remainder is the mount options string in which the replacement
+  # must be done.
+  #
+  # Example:
+  # parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7
+  #
+  # If fstab contains, say, "commit=3,rw" for this filesystem, then the
+  # result will be "rw,commit=3".
+  parse_mount_opts_wfstab () {
+	L_DEV="$1"
+	OPT="$2"
+	shift 2
+	L_OPTS="$*"
+	PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)"
+	# Watch for a default commit in fstab
+	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
+	if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then
+		# option specified in fstab: extract the value, and use it
+		echo -n "$PARSEDOPTS1,$OPT="
+		echo ",$FSTAB_OPTS," | sed \
+		 -e 's/.*,'"$OPT"'=//'	\
+		 -e 's/,.*//'
+	else
+		# option not specified in fstab: set it to 0
+		echo "$PARSEDOPTS1,$OPT=0"
+	fi
+  }
+
+  deduce_fstype () {
+	MP="$1"
+	# My root filesystem unfortunately has
+	# type "unknown" in /etc/mtab. If we encounter
+	# "unknown", we try to get the type from fstab.
+	cat /etc/fstab |
+	grep -v '^#' |
+	while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do
+		if [ "$FSTAB_MP" = "$MP" ]; then
+			echo $FSTAB_FST
+			exit 0
+		fi
+	done
+  }
+
+  if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then
+	NOATIME_OPT=",noatime"
+  fi
+
+  case "$1" in
+	start)
+		AGE=$((100*$MAX_AGE))
+		XFS_AGE=$(($XFS_HZ*$MAX_AGE))
+		echo -n "Starting laptop_mode"
+
+		if [ -d /proc/sys/vm/pagebuf ] ; then
+			# (For 2.4 and early 2.6.)
+			# This only needs to be set, not reset -- it is only used when
+			# laptop mode is enabled.
+			echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age
+			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
+		elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
+			# (A couple of early 2.6 laptop mode patches had these.)
+			# The same goes for these.
+			echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer
+			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
+		elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then
+			# (2.6.6)
+			# But not for these -- they are also used in normal
+			# operation.
+			echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer
+			echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval
+		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
+			# (2.6.7 upwards)
+			# And not for these either. These are in centisecs,
+			# not USER_HZ, so we have to use $AGE, not $XFS_AGE.
+			echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs
+			echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs
+			echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs
+		fi
+
+		case "$KLEVEL" in
+			"2.4")
+				echo 1					> /proc/sys/vm/laptop_mode
+				echo "30 500 0 0 $AGE $AGE 60 20 0"	> /proc/sys/vm/bdflush
+				;;
+			"2.6")
+				echo 5					> /proc/sys/vm/laptop_mode
+				echo "$AGE"				> /proc/sys/vm/dirty_writeback_centisecs
+				echo "$AGE"				> /proc/sys/vm/dirty_expire_centisecs
+				echo "$DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
+				echo "$DIRTY_BACKGROUND_RATIO"		> /proc/sys/vm/dirty_background_ratio
+				;;
+		esac
+		if [ $DO_REMOUNTS -eq 1 ]; then
+			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
+				PARSEDOPTS="$(parse_mount_opts "$OPTS")"
+				if [ "$FST" = 'unknown' ]; then
+					FST=$(deduce_fstype $MP)
+				fi
+				case "$FST" in
+					"ext3"|"reiserfs")
+						PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
+						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
+						;;
+					"xfs")
+						mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT
+						;;
+				esac
+				if [ -b $DEV ] ; then
+					blockdev --setra $(($READAHEAD * 2)) $DEV
+				fi
+			done
+		fi
+		if [ $DO_HD -eq 1 ] ; then
+			for THISHD in $HD ; do
+				/sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1
+				/sbin/hdparm -B 1 $THISHD > /dev/null 2>&1
+			done
+		fi
+		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
+			if [ $CPU_MAXFREQ = 'slowest' ]; then
+				CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq`
+			fi
+			echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
+		fi
+		echo "."
+		;;
+	stop)
+		U_AGE=$((100*$DEF_UPDATE))
+		B_AGE=$((100*$DEF_AGE))
+		echo -n "Stopping laptop_mode"
+		echo 0 > /proc/sys/vm/laptop_mode
+		if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
+			# These need to be restored, if there are no lm_*.
+			echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER))	 	> /proc/sys/fs/xfs/age_buffer
+			echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) 	> /proc/sys/fs/xfs/sync_interval
+		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
+			# These need to be restored as well.
+			echo $((100*$DEF_XFS_AGE_BUFFER))	> /proc/sys/fs/xfs/age_buffer_centisecs
+			echo $((100*$DEF_XFS_SYNC_INTERVAL))	> /proc/sys/fs/xfs/xfssyncd_centisecs
+			echo $((100*$DEF_XFS_BUFD_INTERVAL))	> /proc/sys/fs/xfs/xfsbufd_centisecs
+		fi
+		case "$KLEVEL" in
+			"2.4")
+				echo "30 500 0 0 $U_AGE $B_AGE 60 20 0"	> /proc/sys/vm/bdflush
+				;;
+			"2.6")
+				echo "$U_AGE"				> /proc/sys/vm/dirty_writeback_centisecs
+				echo "$B_AGE"				> /proc/sys/vm/dirty_expire_centisecs
+				echo "$DEF_DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
+				echo "$DEF_DIRTY_BACKGROUND_RATIO"	> /proc/sys/vm/dirty_background_ratio
+				;;
+		esac
+		if [ $DO_REMOUNTS -eq 1 ] ; then
+			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
+				# Reset commit and atime options to defaults.
+				if [ "$FST" = 'unknown' ]; then
+					FST=$(deduce_fstype $MP)
+				fi
+				case "$FST" in
+					"ext3"|"reiserfs")
+						PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
+						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
+						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
+						;;
+					"xfs")
+						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)"
+						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
+						;;
+				esac
+				if [ -b $DEV ] ; then
+					blockdev --setra 256 $DEV
+				fi
+			done
+		fi
+		if [ $DO_HD -eq 1 ] ; then
+			for THISHD in $HD ; do
+				/sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1
+				/sbin/hdparm -B 255 $THISHD > /dev/null 2>&1
+			done
+		fi
+		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
+			echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
+		fi
+		echo "."
+		;;
+	*)
+		echo "Usage: $0 {start|stop}" 2>&1
+		exit 1
+		;;
+
+  esac
+
+  exit 0
+
+
+ACPI integration
+----------------
+
+Dax Kelson submitted this so that the ACPI acpid daemon will
+kick off the laptop_mode script and run hdparm. The part that
+automatically disables laptop mode when the battery is low was
+written by Jan Topinski.
+
+/etc/acpi/events/ac_adapter::
+
+	event=ac_adapter
+	action=/etc/acpi/actions/ac.sh %e
+
+/etc/acpi/events/battery::
+
+	event=battery.*
+	action=/etc/acpi/actions/battery.sh %e
+
+/etc/acpi/actions/ac.sh::
+
+  #!/bin/bash
+
+  # ac on/offline event handler
+
+  status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state`
+
+  case $status in
+          "on-line")
+                  /sbin/laptop_mode stop
+                  exit 0
+          ;;
+          "off-line")
+                  /sbin/laptop_mode start
+                  exit 0
+          ;;
+  esac
+
+
+/etc/acpi/actions/battery.sh::
+
+  #! /bin/bash
+
+  # Automatically disable laptop mode when the battery almost runs out.
+
+  BATT_INFO=/proc/acpi/battery/$2/state
+
+  if [[ -f /proc/sys/vm/laptop_mode ]]
+  then
+     LM=`cat /proc/sys/vm/laptop_mode`
+     if [[ $LM -gt 0 ]]
+     then
+       if [[ -f $BATT_INFO ]]
+       then
+          # Source the config file only now that we know we need
+          if [ -f /etc/default/laptop-mode ] ; then
+                  # Debian
+                  . /etc/default/laptop-mode
+          elif [ -f /etc/sysconfig/laptop-mode ] ; then
+                  # Others
+                  . /etc/sysconfig/laptop-mode
+          fi
+          MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'}
+
+          ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`"
+          if [[ ACTION -eq "discharging" ]]
+          then
+             PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
+             REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
+          fi
+          if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES))
+          then
+             /sbin/laptop_mode stop
+          fi
+       else
+         logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path."
+       fi
+     fi
+  fi
+
+
+Monitoring tool
+---------------
+
+Bartek Kania submitted this, it can be used to measure how much time your disk
+spends spun up/down.  See tools/laptop/dslm/dslm.c
diff --git a/Documentation/admin-guide/laptops/lg-laptop.rst b/Documentation/admin-guide/laptops/lg-laptop.rst
new file mode 100644
index 000000000000..ce9b14671cb9
--- /dev/null
+++ b/Documentation/admin-guide/laptops/lg-laptop.rst
@@ -0,0 +1,84 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+
+LG Gram laptop extra features
+=============================
+
+By Matan Ziv-Av <matan@svgalib.org>
+
+
+Hotkeys
+-------
+
+The following FN keys are ignored by the kernel without this driver:
+
+- FN-F1 (LG control panel)   - Generates F15
+- FN-F5 (Touchpad toggle)    - Generates F13
+- FN-F6 (Airplane mode)      - Generates RFKILL
+- FN-F8 (Keyboard backlight) - Generates F16.
+  This key also changes keyboard backlight mode.
+- FN-F9 (Reader mode)        - Generates F14
+
+The rest of the FN keys work without a need for a special driver.
+
+
+Reader mode
+-----------
+
+Writing 0/1 to /sys/devices/platform/lg-laptop/reader_mode disables/enables
+reader mode. In this mode the screen colors change (blue color reduced),
+and the reader mode indicator LED (on F9 key) turns on.
+
+
+FN Lock
+-------
+
+Writing 0/1 to /sys/devices/platform/lg-laptop/fn_lock disables/enables
+FN lock.
+
+
+Battery care limit
+------------------
+
+Writing 80/100 to /sys/devices/platform/lg-laptop/battery_care_limit
+sets the maximum capacity to charge the battery. Limiting the charge
+reduces battery capacity loss over time.
+
+This value is reset to 100 when the kernel boots.
+
+
+Fan mode
+--------
+
+Writing 1/0 to /sys/devices/platform/lg-laptop/fan_mode disables/enables
+the fan silent mode.
+
+
+USB charge
+----------
+
+Writing 0/1 to /sys/devices/platform/lg-laptop/usb_charge disables/enables
+charging another device from the USB port while the device is turned off.
+
+This value is reset to 0 when the kernel boots.
+
+
+LEDs
+~~~~
+
+The are two LED devices supported by the driver:
+
+Keyboard backlight
+------------------
+
+A led device named kbd_led controls the keyboard backlight. There are three
+lighting level: off (0), low (127) and high (255).
+
+The keyboard backlight is also controlled by the key combination FN-F8
+which cycles through those levels.
+
+
+Touchpad indicator LED
+----------------------
+
+On the F5 key. Controlled by led device names tpad_led.
diff --git a/Documentation/admin-guide/laptops/sony-laptop.rst b/Documentation/admin-guide/laptops/sony-laptop.rst
new file mode 100644
index 000000000000..9edcc7f6612f
--- /dev/null
+++ b/Documentation/admin-guide/laptops/sony-laptop.rst
@@ -0,0 +1,174 @@
+=========================================
+Sony Notebook Control Driver (SNC) Readme
+=========================================
+
+	- Copyright (C) 2004- 2005 Stelian Pop <stelian@popies.net>
+	- Copyright (C) 2007 Mattia Dongili <malattia@linux.it>
+
+This mini-driver drives the SNC and SPIC device present in the ACPI BIOS of the
+Sony Vaio laptops. This driver mixes both devices functions under the same
+(hopefully consistent) interface. This also means that the sonypi driver is
+obsoleted by sony-laptop now.
+
+Fn keys (hotkeys):
+------------------
+
+Some models report hotkeys through the SNC or SPIC devices, such events are
+reported both through the ACPI subsystem as acpi events and through the INPUT
+subsystem. See the logs of /proc/bus/input/devices to find out what those
+events are and which input devices are created by the driver.
+Additionally, loading the driver with the debug option will report all events
+in the kernel log.
+
+The "scancodes" passed to the input system (that can be remapped with udev)
+are indexes to the table "sony_laptop_input_keycode_map" in the sony-laptop.c
+module.  For example the "FN/E" key combination (EJECTCD on some models)
+generates the scancode 20 (0x14).
+
+Backlight control:
+------------------
+If your laptop model supports it, you will find sysfs files in the
+/sys/class/backlight/sony/
+directory. You will be able to query and set the current screen
+brightness:
+
+	======================	=========================================
+	brightness		get/set screen brightness (an integer
+				between 0 and 7)
+	actual_brightness	reading from this file will query the HW
+				to get real brightness value
+	max_brightness		the maximum brightness value
+	======================	=========================================
+
+
+Platform specific:
+------------------
+Loading the sony-laptop module will create a
+/sys/devices/platform/sony-laptop/
+directory populated with some files.
+
+You then read/write integer values from/to those files by using
+standard UNIX tools.
+
+The files are:
+
+	======================	==========================================
+	brightness_default	screen brightness which will be set
+				when the laptop will be rebooted
+	cdpower			power on/off the internal CD drive
+	audiopower		power on/off the internal sound card
+	lanpower		power on/off the internal ethernet card
+				(only in debug mode)
+	bluetoothpower		power on/off the internal bluetooth device
+	fanspeed		get/set the fan speed
+	======================	==========================================
+
+Note that some files may be missing if they are not supported
+by your particular laptop model.
+
+Example usage::
+
+	# echo "1" > /sys/devices/platform/sony-laptop/brightness_default
+
+sets the lowest screen brightness for the next and later reboots
+
+::
+
+	# echo "8" > /sys/devices/platform/sony-laptop/brightness_default
+
+sets the highest screen brightness for the next and later reboots
+
+::
+
+	# cat /sys/devices/platform/sony-laptop/brightness_default
+
+retrieves the value
+
+::
+
+	# echo "0" > /sys/devices/platform/sony-laptop/audiopower
+
+powers off the sound card
+
+::
+
+	# echo "1" > /sys/devices/platform/sony-laptop/audiopower
+
+powers on the sound card.
+
+
+RFkill control:
+---------------
+More recent Vaio models expose a consistent set of ACPI methods to
+control radio frequency emitting devices. If you are a lucky owner of
+such a laptop you will find the necessary rfkill devices under
+/sys/class/rfkill. Check those starting with sony-* in::
+
+	# grep . /sys/class/rfkill/*/{state,name}
+
+
+Development:
+------------
+
+If you want to help with the development of this driver (and
+you are not afraid of any side effects doing strange things with
+your ACPI BIOS could have on your laptop), load the driver and
+pass the option 'debug=1'.
+
+REPEAT:
+	**DON'T DO THIS IF YOU DON'T LIKE RISKY BUSINESS.**
+
+In your kernel logs you will find the list of all ACPI methods
+the SNC device has on your laptop.
+
+* For new models you will see a long list of meaningless method names,
+  reading the DSDT table source should reveal that:
+
+(1) the SNC device uses an internal capability lookup table
+(2) SN00 is used to find values in the lookup table
+(3) SN06 and SN07 are used to call into the real methods based on
+    offsets you can obtain iterating the table using SN00
+(4) SN02 used to enable events.
+
+Some values in the capability lookup table are more or less known, see
+the code for all sony_call_snc_handle calls, others are more obscure.
+
+* For old models you can see the GCDP/GCDP methods used to pwer on/off
+  the CD drive, but there are others and they are usually different from
+  model to model.
+
+**I HAVE NO IDEA WHAT THOSE METHODS DO.**
+
+The sony-laptop driver creates, for some of those methods (the most
+current ones found on several Vaio models), an entry under
+/sys/devices/platform/sony-laptop, just like the 'cdpower' one.
+You can create other entries corresponding to your own laptop methods by
+further editing the source (see the 'sony_nc_values' table, and add a new
+entry to this table with your get/set method names using the
+SNC_HANDLE_NAMES macro).
+
+Your mission, should you accept it, is to try finding out what
+those entries are for, by reading/writing random values from/to those
+files and find out what is the impact on your laptop.
+
+Should you find anything interesting, please report it back to me,
+I will not disavow all knowledge of your actions :)
+
+See also http://www.linux.it/~malattia/wiki/index.php/Sony_drivers for other
+useful info.
+
+Bugs/Limitations:
+-----------------
+
+* This driver is not based on official documentation from Sony
+  (because there is none), so there is no guarantee this driver
+  will work at all, or do the right thing. Although this hasn't
+  happened to me, this driver could do very bad things to your
+  laptop, including permanent damage.
+
+* The sony-laptop and sonypi drivers do not interact at all. In the
+  future, sonypi will be removed and replaced by sony-laptop.
+
+* spicctrl, which is the userspace tool used to communicate with the
+  sonypi driver (through /dev/sonypi) is deprecated as well since all
+  its features are now available under the sysfs tree via sony-laptop.
diff --git a/Documentation/admin-guide/laptops/sonypi.rst b/Documentation/admin-guide/laptops/sonypi.rst
new file mode 100644
index 000000000000..2a1975ed7ee4
--- /dev/null
+++ b/Documentation/admin-guide/laptops/sonypi.rst
@@ -0,0 +1,160 @@
+==================================================
+Sony Programmable I/O Control Device Driver Readme
+==================================================
+
+	- Copyright (C) 2001-2004 Stelian Pop <stelian@popies.net>
+	- Copyright (C) 2001-2002 Alcôve <www.alcove.com>
+	- Copyright (C) 2001 Michael Ashley <m.ashley@unsw.edu.au>
+	- Copyright (C) 2001 Junichi Morita <jun1m@mars.dti.ne.jp>
+	- Copyright (C) 2000 Takaya Kinjo <t-kinjo@tc4.so-net.ne.jp>
+	- Copyright (C) 2000 Andrew Tridgell <tridge@samba.org>
+
+This driver enables access to the Sony Programmable I/O Control Device which
+can be found in many Sony Vaio laptops. Some newer Sony laptops (seems to be
+limited to new FX series laptops, at least the FX501 and the FX702) lack a
+sonypi device and are not supported at all by this driver.
+
+It will give access (through a user space utility) to some events those laptops
+generate, like:
+
+	- jogdial events (the small wheel on the side of Vaios)
+	- capture button events (only on Vaio Picturebook series)
+	- Fn keys
+	- bluetooth button (only on C1VR model)
+	- programmable keys, back, help, zoom, thumbphrase buttons, etc.
+	  (when available)
+
+Those events (see linux/sonypi.h) can be polled using the character device node
+/dev/sonypi (major 10, minor auto allocated or specified as a option).
+A simple daemon which translates the jogdial movements into mouse wheel events
+can be downloaded at: <http://popies.net/sonypi/>
+
+Another option to intercept the events is to get them directly through the
+input layer.
+
+This driver supports also some ioctl commands for setting the LCD screen
+brightness and querying the batteries charge information (some more
+commands may be added in the future).
+
+This driver can also be used to set the camera controls on Picturebook series
+(brightness, contrast etc), and is used by the video4linux driver for the
+Motion Eye camera.
+
+Please note that this driver was created by reverse engineering the Windows
+driver and the ACPI BIOS, because Sony doesn't agree to release any programming
+specs for its laptops. If someone convinces them to do so, drop me a note.
+
+Driver options:
+---------------
+
+Several options can be passed to the sonypi driver using the standard
+module argument syntax (<param>=<value> when passing the option to the
+module or sonypi.<param>=<value> on the kernel boot line when sonypi is
+statically linked into the kernel). Those options are:
+
+	=============== =======================================================
+	minor: 		minor number of the misc device /dev/sonypi,
+			default is -1 (automatic allocation, see /proc/misc
+			or kernel logs)
+
+	camera:		if you have a PictureBook series Vaio (with the
+			integrated MotionEye camera), set this parameter to 1
+			in order to let the driver access to the camera
+
+	fnkeyinit:	on some Vaios (C1VE, C1VR etc), the Fn key events don't
+			get enabled unless you set this parameter to 1.
+			Do not use this option unless it's actually necessary,
+			some Vaio models don't deal well with this option.
+			This option is available only if the kernel is
+			compiled without ACPI support (since it conflicts
+			with it and it shouldn't be required anyway if
+			ACPI is already enabled).
+
+	verbose:	set to 1 to print unknown events received from the
+			sonypi device.
+			set to 2 to print all events received from the
+			sonypi device.
+
+	compat:		uses some compatibility code for enabling the sonypi
+			events. If the driver worked for you in the past
+			(prior to version 1.5) and does not work anymore,
+			add this option and report to the author.
+
+	mask:		event mask telling the driver what events will be
+			reported to the user. This parameter is required for
+			some Vaio models where the hardware reuses values
+			used in other Vaio models (like the FX series who does
+			not have a jogdial but reuses the jogdial events for
+			programmable keys events). The default event mask is
+			set to 0xffffffff, meaning that all possible events
+			will be tried. You can use the following bits to
+			construct your own event mask (from
+			drivers/char/sonypi.h):
+
+				========================	======
+				SONYPI_JOGGER_MASK 		0x0001
+				SONYPI_CAPTURE_MASK 		0x0002
+				SONYPI_FNKEY_MASK 		0x0004
+				SONYPI_BLUETOOTH_MASK 		0x0008
+				SONYPI_PKEY_MASK 		0x0010
+				SONYPI_BACK_MASK 		0x0020
+				SONYPI_HELP_MASK 		0x0040
+				SONYPI_LID_MASK 		0x0080
+				SONYPI_ZOOM_MASK 		0x0100
+				SONYPI_THUMBPHRASE_MASK 	0x0200
+				SONYPI_MEYE_MASK		0x0400
+				SONYPI_MEMORYSTICK_MASK		0x0800
+				SONYPI_BATTERY_MASK		0x1000
+				SONYPI_WIRELESS_MASK		0x2000
+				========================	======
+
+	useinput:	if set (which is the default) two input devices are
+			created, one which interprets the jogdial events as
+			mouse events, the other one which acts like a
+			keyboard reporting the pressing of the special keys.
+	=============== =======================================================
+
+Module use:
+-----------
+
+In order to automatically load the sonypi module on use, you can put those
+lines a configuration file in /etc/modprobe.d/::
+
+	alias char-major-10-250 sonypi
+	options sonypi minor=250
+
+This supposes the use of minor 250 for the sonypi device::
+
+	# mknod /dev/sonypi c 10 250
+
+Bugs:
+-----
+
+	- several users reported that this driver disables the BIOS-managed
+	  Fn-keys which put the laptop in sleeping state, or switch the
+	  external monitor on/off. There is no workaround yet, since this
+	  driver disables all APM management for those keys, by enabling the
+	  ACPI management (and the ACPI core stuff is not complete yet). If
+	  you have one of those laptops with working Fn keys and want to
+	  continue to use them, don't use this driver.
+
+	- some users reported that the laptop speed is lower (dhrystone
+	  tested) when using the driver with the fnkeyinit parameter. I cannot
+	  reproduce it on my laptop and not all users have this problem.
+	  This happens because the fnkeyinit parameter enables the ACPI
+	  mode (but without additional ACPI control, like processor
+	  speed handling etc). Use ACPI instead of APM if it works on your
+	  laptop.
+
+	- sonypi lacks the ability to distinguish between certain key
+	  events on some models.
+
+	- some models with the nvidia card (geforce go 6200 tc) uses a
+	  different way to adjust the backlighting of the screen. There
+	  is a userspace utility to adjust the brightness on those models,
+	  which can be downloaded from
+	  http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
+
+	- since all development was done by reverse engineering, there is
+	  *absolutely no guarantee* that this driver will not crash your
+	  laptop. Permanently.
diff --git a/Documentation/admin-guide/laptops/thinkpad-acpi.rst b/Documentation/admin-guide/laptops/thinkpad-acpi.rst
new file mode 100644
index 000000000000..19d52fc3c5e9
--- /dev/null
+++ b/Documentation/admin-guide/laptops/thinkpad-acpi.rst
@@ -0,0 +1,1562 @@
+===========================
+ThinkPad ACPI Extras Driver
+===========================
+
+Version 0.25
+
+October 16th,  2013
+
+- Borislav Deianov <borislav@users.sf.net>
+- Henrique de Moraes Holschuh <hmh@hmh.eng.br>
+
+http://ibm-acpi.sf.net/
+
+This is a Linux driver for the IBM and Lenovo ThinkPad laptops. It
+supports various features of these laptops which are accessible
+through the ACPI and ACPI EC framework, but not otherwise fully
+supported by the generic Linux ACPI drivers.
+
+This driver used to be named ibm-acpi until kernel 2.6.21 and release
+0.13-20070314.  It used to be in the drivers/acpi tree, but it was
+moved to the drivers/misc tree and renamed to thinkpad-acpi for kernel
+2.6.22, and release 0.14.  It was moved to drivers/platform/x86 for
+kernel 2.6.29 and release 0.22.
+
+The driver is named "thinkpad-acpi".  In some places, like module
+names and log messages, "thinkpad_acpi" is used because of userspace
+issues.
+
+"tpacpi" is used as a shorthand where "thinkpad-acpi" would be too
+long due to length limitations on some Linux kernel versions.
+
+Status
+------
+
+The features currently supported are the following (see below for
+detailed description):
+
+	- Fn key combinations
+	- Bluetooth enable and disable
+	- video output switching, expansion control
+	- ThinkLight on and off
+	- CMOS/UCMS control
+	- LED control
+	- ACPI sounds
+	- temperature sensors
+	- Experimental: embedded controller register dump
+	- LCD brightness control
+	- Volume control
+	- Fan control and monitoring: fan speed, fan enable/disable
+	- WAN enable and disable
+	- UWB enable and disable
+
+A compatibility table by model and feature is maintained on the web
+site, http://ibm-acpi.sf.net/. I appreciate any success or failure
+reports, especially if they add to or correct the compatibility table.
+Please include the following information in your report:
+
+	- ThinkPad model name
+	- a copy of your ACPI tables, using the "acpidump" utility
+	- a copy of the output of dmidecode, with serial numbers
+	  and UUIDs masked off
+	- which driver features work and which don't
+	- the observed behavior of non-working features
+
+Any other comments or patches are also more than welcome.
+
+
+Installation
+------------
+
+If you are compiling this driver as included in the Linux kernel
+sources, look for the CONFIG_THINKPAD_ACPI Kconfig option.
+It is located on the menu path: "Device Drivers" -> "X86 Platform
+Specific Device Drivers" -> "ThinkPad ACPI Laptop Extras".
+
+
+Features
+--------
+
+The driver exports two different interfaces to userspace, which can be
+used to access the features it provides.  One is a legacy procfs-based
+interface, which will be removed at some time in the future.  The other
+is a new sysfs-based interface which is not complete yet.
+
+The procfs interface creates the /proc/acpi/ibm directory.  There is a
+file under that directory for each feature it supports.  The procfs
+interface is mostly frozen, and will change very little if at all: it
+will not be extended to add any new functionality in the driver, instead
+all new functionality will be implemented on the sysfs interface.
+
+The sysfs interface tries to blend in the generic Linux sysfs subsystems
+and classes as much as possible.  Since some of these subsystems are not
+yet ready or stabilized, it is expected that this interface will change,
+and any and all userspace programs must deal with it.
+
+
+Notes about the sysfs interface
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Unlike what was done with the procfs interface, correctness when talking
+to the sysfs interfaces will be enforced, as will correctness in the
+thinkpad-acpi's implementation of sysfs interfaces.
+
+Also, any bugs in the thinkpad-acpi sysfs driver code or in the
+thinkpad-acpi's implementation of the sysfs interfaces will be fixed for
+maximum correctness, even if that means changing an interface in
+non-compatible ways.  As these interfaces mature both in the kernel and
+in thinkpad-acpi, such changes should become quite rare.
+
+Applications interfacing to the thinkpad-acpi sysfs interfaces must
+follow all sysfs guidelines and correctly process all errors (the sysfs
+interface makes extensive use of errors).  File descriptors and open /
+close operations to the sysfs inodes must also be properly implemented.
+
+The version of thinkpad-acpi's sysfs interface is exported by the driver
+as a driver attribute (see below).
+
+Sysfs driver attributes are on the driver's sysfs attribute space,
+for 2.6.23+ this is /sys/bus/platform/drivers/thinkpad_acpi/ and
+/sys/bus/platform/drivers/thinkpad_hwmon/
+
+Sysfs device attributes are on the thinkpad_acpi device sysfs attribute
+space, for 2.6.23+ this is /sys/devices/platform/thinkpad_acpi/.
+
+Sysfs device attributes for the sensors and fan are on the
+thinkpad_hwmon device's sysfs attribute space, but you should locate it
+looking for a hwmon device with the name attribute of "thinkpad", or
+better yet, through libsensors. For 4.14+ sysfs attributes were moved to the
+hwmon device (/sys/bus/platform/devices/thinkpad_hwmon/hwmon/hwmon? or
+/sys/class/hwmon/hwmon?).
+
+Driver version
+--------------
+
+procfs: /proc/acpi/ibm/driver
+
+sysfs driver attribute: version
+
+The driver name and version. No commands can be written to this file.
+
+
+Sysfs interface version
+-----------------------
+
+sysfs driver attribute: interface_version
+
+Version of the thinkpad-acpi sysfs interface, as an unsigned long
+(output in hex format: 0xAAAABBCC), where:
+
+	AAAA
+	  - major revision
+	BB
+	  - minor revision
+	CC
+	  - bugfix revision
+
+The sysfs interface version changelog for the driver can be found at the
+end of this document.  Changes to the sysfs interface done by the kernel
+subsystems are not documented here, nor are they tracked by this
+attribute.
+
+Changes to the thinkpad-acpi sysfs interface are only considered
+non-experimental when they are submitted to Linux mainline, at which
+point the changes in this interface are documented and interface_version
+may be updated.  If you are using any thinkpad-acpi features not yet
+sent to mainline for merging, you do so on your own risk: these features
+may disappear, or be implemented in a different and incompatible way by
+the time they are merged in Linux mainline.
+
+Changes that are backwards-compatible by nature (e.g. the addition of
+attributes that do not change the way the other attributes work) do not
+always warrant an update of interface_version.  Therefore, one must
+expect that an attribute might not be there, and deal with it properly
+(an attribute not being there *is* a valid way to make it clear that a
+feature is not available in sysfs).
+
+
+Hot keys
+--------
+
+procfs: /proc/acpi/ibm/hotkey
+
+sysfs device attribute: hotkey_*
+
+In a ThinkPad, the ACPI HKEY handler is responsible for communicating
+some important events and also keyboard hot key presses to the operating
+system.  Enabling the hotkey functionality of thinkpad-acpi signals the
+firmware that such a driver is present, and modifies how the ThinkPad
+firmware will behave in many situations.
+
+The driver enables the HKEY ("hot key") event reporting automatically
+when loaded, and disables it when it is removed.
+
+The driver will report HKEY events in the following format::
+
+	ibm/hotkey HKEY 00000080 0000xxxx
+
+Some of these events refer to hot key presses, but not all of them.
+
+The driver will generate events over the input layer for hot keys and
+radio switches, and over the ACPI netlink layer for other events.  The
+input layer support accepts the standard IOCTLs to remap the keycodes
+assigned to each hot key.
+
+The hot key bit mask allows some control over which hot keys generate
+events.  If a key is "masked" (bit set to 0 in the mask), the firmware
+will handle it.  If it is "unmasked", it signals the firmware that
+thinkpad-acpi would prefer to handle it, if the firmware would be so
+kind to allow it (and it often doesn't!).
+
+Not all bits in the mask can be modified.  Not all bits that can be
+modified do anything.  Not all hot keys can be individually controlled
+by the mask.  Some models do not support the mask at all.  The behaviour
+of the mask is, therefore, highly dependent on the ThinkPad model.
+
+The driver will filter out any unmasked hotkeys, so even if the firmware
+doesn't allow disabling an specific hotkey, the driver will not report
+events for unmasked hotkeys.
+
+Note that unmasking some keys prevents their default behavior.  For
+example, if Fn+F5 is unmasked, that key will no longer enable/disable
+Bluetooth by itself in firmware.
+
+Note also that not all Fn key combinations are supported through ACPI
+depending on the ThinkPad model and firmware version.  On those
+ThinkPads, it is still possible to support some extra hotkeys by
+polling the "CMOS NVRAM" at least 10 times per second.  The driver
+attempts to enables this functionality automatically when required.
+
+procfs notes
+^^^^^^^^^^^^
+
+The following commands can be written to the /proc/acpi/ibm/hotkey file::
+
+	echo 0xffffffff > /proc/acpi/ibm/hotkey -- enable all hot keys
+	echo 0 > /proc/acpi/ibm/hotkey -- disable all possible hot keys
+	... any other 8-hex-digit mask ...
+	echo reset > /proc/acpi/ibm/hotkey -- restore the recommended mask
+
+The following commands have been deprecated and will cause the kernel
+to log a warning::
+
+	echo enable > /proc/acpi/ibm/hotkey -- does nothing
+	echo disable > /proc/acpi/ibm/hotkey -- returns an error
+
+The procfs interface does not support NVRAM polling control.  So as to
+maintain maximum bug-to-bug compatibility, it does not report any masks,
+nor does it allow one to manipulate the hot key mask when the firmware
+does not support masks at all, even if NVRAM polling is in use.
+
+sysfs notes
+^^^^^^^^^^^
+
+	hotkey_bios_enabled:
+		DEPRECATED, WILL BE REMOVED SOON.
+
+		Returns 0.
+
+	hotkey_bios_mask:
+		DEPRECATED, DON'T USE, WILL BE REMOVED IN THE FUTURE.
+
+		Returns the hot keys mask when thinkpad-acpi was loaded.
+		Upon module unload, the hot keys mask will be restored
+		to this value.   This is always 0x80c, because those are
+		the hotkeys that were supported by ancient firmware
+		without mask support.
+
+	hotkey_enable:
+		DEPRECATED, WILL BE REMOVED SOON.
+
+		0: returns -EPERM
+		1: does nothing
+
+	hotkey_mask:
+		bit mask to enable reporting (and depending on
+		the firmware, ACPI event generation) for each hot key
+		(see above).  Returns the current status of the hot keys
+		mask, and allows one to modify it.
+
+	hotkey_all_mask:
+		bit mask that should enable event reporting for all
+		supported hot keys, when echoed to hotkey_mask above.
+		Unless you know which events need to be handled
+		passively (because the firmware *will* handle them
+		anyway), do *not* use hotkey_all_mask.  Use
+		hotkey_recommended_mask, instead. You have been warned.
+
+	hotkey_recommended_mask:
+		bit mask that should enable event reporting for all
+		supported hot keys, except those which are always
+		handled by the firmware anyway.  Echo it to
+		hotkey_mask above, to use.  This is the default mask
+		used by the driver.
+
+	hotkey_source_mask:
+		bit mask that selects which hot keys will the driver
+		poll the NVRAM for.  This is auto-detected by the driver
+		based on the capabilities reported by the ACPI firmware,
+		but it can be overridden at runtime.
+
+		Hot keys whose bits are set in hotkey_source_mask are
+		polled for in NVRAM, and reported as hotkey events if
+		enabled in hotkey_mask.  Only a few hot keys are
+		available through CMOS NVRAM polling.
+
+		Warning: when in NVRAM mode, the volume up/down/mute
+		keys are synthesized according to changes in the mixer,
+		which uses a single volume up or volume down hotkey
+		press to unmute, as per the ThinkPad volume mixer user
+		interface.  When in ACPI event mode, volume up/down/mute
+		events are reported by the firmware and can behave
+		differently (and that behaviour changes with firmware
+		version -- not just with firmware models -- as well as
+		OSI(Linux) state).
+
+	hotkey_poll_freq:
+		frequency in Hz for hot key polling. It must be between
+		0 and 25 Hz.  Polling is only carried out when strictly
+		needed.
+
+		Setting hotkey_poll_freq to zero disables polling, and
+		will cause hot key presses that require NVRAM polling
+		to never be reported.
+
+		Setting hotkey_poll_freq too low may cause repeated
+		pressings of the same hot key to be misreported as a
+		single key press, or to not even be detected at all.
+		The recommended polling frequency is 10Hz.
+
+	hotkey_radio_sw:
+		If the ThinkPad has a hardware radio switch, this
+		attribute will read 0 if the switch is in the "radios
+		disabled" position, and 1 if the switch is in the
+		"radios enabled" position.
+
+		This attribute has poll()/select() support.
+
+	hotkey_tablet_mode:
+		If the ThinkPad has tablet capabilities, this attribute
+		will read 0 if the ThinkPad is in normal mode, and
+		1 if the ThinkPad is in tablet mode.
+
+		This attribute has poll()/select() support.
+
+	wakeup_reason:
+		Set to 1 if the system is waking up because the user
+		requested a bay ejection.  Set to 2 if the system is
+		waking up because the user requested the system to
+		undock.  Set to zero for normal wake-ups or wake-ups
+		due to unknown reasons.
+
+		This attribute has poll()/select() support.
+
+	wakeup_hotunplug_complete:
+		Set to 1 if the system was waken up because of an
+		undock or bay ejection request, and that request
+		was successfully completed.  At this point, it might
+		be useful to send the system back to sleep, at the
+		user's choice.  Refer to HKEY events 0x4003 and
+		0x3003, below.
+
+		This attribute has poll()/select() support.
+
+input layer notes
+^^^^^^^^^^^^^^^^^
+
+A Hot key is mapped to a single input layer EV_KEY event, possibly
+followed by an EV_MSC MSC_SCAN event that shall contain that key's scan
+code.  An EV_SYN event will always be generated to mark the end of the
+event block.
+
+Do not use the EV_MSC MSC_SCAN events to process keys.  They are to be
+used as a helper to remap keys, only.  They are particularly useful when
+remapping KEY_UNKNOWN keys.
+
+The events are available in an input device, with the following id:
+
+	==============  ==============================
+	Bus		BUS_HOST
+	vendor		0x1014 (PCI_VENDOR_ID_IBM)  or
+			0x17aa (PCI_VENDOR_ID_LENOVO)
+	product		0x5054 ("TP")
+	version		0x4101
+	==============  ==============================
+
+The version will have its LSB incremented if the keymap changes in a
+backwards-compatible way.  The MSB shall always be 0x41 for this input
+device.  If the MSB is not 0x41, do not use the device as described in
+this section, as it is either something else (e.g. another input device
+exported by a thinkpad driver, such as HDAPS) or its functionality has
+been changed in a non-backwards compatible way.
+
+Adding other event types for other functionalities shall be considered a
+backwards-compatible change for this input device.
+
+Thinkpad-acpi Hot Key event map (version 0x4101):
+
+=======	=======	==============	==============================================
+ACPI	Scan
+event	code	Key		Notes
+=======	=======	==============	==============================================
+0x1001	0x00	FN+F1		-
+
+0x1002	0x01	FN+F2		IBM: battery (rare)
+				Lenovo: Screen lock
+
+0x1003	0x02	FN+F3		Many IBM models always report
+				this hot key, even with hot keys
+				disabled or with Fn+F3 masked
+				off
+				IBM: screen lock, often turns
+				off the ThinkLight as side-effect
+				Lenovo: battery
+
+0x1004	0x03	FN+F4		Sleep button (ACPI sleep button
+				semantics, i.e. sleep-to-RAM).
+				It always generates some kind
+				of event, either the hot key
+				event or an ACPI sleep button
+				event. The firmware may
+				refuse to generate further FN+F4
+				key presses until a S3 or S4 ACPI
+				sleep cycle is performed or some
+				time passes.
+
+0x1005	0x04	FN+F5		Radio.  Enables/disables
+				the internal Bluetooth hardware
+				and W-WAN card if left in control
+				of the firmware.  Does not affect
+				the WLAN card.
+				Should be used to turn on/off all
+				radios (Bluetooth+W-WAN+WLAN),
+				really.
+
+0x1006	0x05	FN+F6		-
+
+0x1007	0x06	FN+F7		Video output cycle.
+				Do you feel lucky today?
+
+0x1008	0x07	FN+F8		IBM: toggle screen expand
+				Lenovo: configure UltraNav,
+				or toggle screen expand
+
+0x1009	0x08	FN+F9		-
+
+...	...	...		...
+
+0x100B	0x0A	FN+F11		-
+
+0x100C	0x0B	FN+F12		Sleep to disk.  You are always
+				supposed to handle it yourself,
+				either through the ACPI event,
+				or through a hotkey event.
+				The firmware may refuse to
+				generate further FN+F12 key
+				press events until a S3 or S4
+				ACPI sleep cycle is performed,
+				or some time passes.
+
+0x100D	0x0C	FN+BACKSPACE	-
+0x100E	0x0D	FN+INSERT	-
+0x100F	0x0E	FN+DELETE	-
+
+0x1010	0x0F	FN+HOME		Brightness up.  This key is
+				always handled by the firmware
+				in IBM ThinkPads, even when
+				unmasked.  Just leave it alone.
+				For Lenovo ThinkPads with a new
+				BIOS, it has to be handled either
+				by the ACPI OSI, or by userspace.
+				The driver does the right thing,
+				never mess with this.
+0x1011	0x10	FN+END		Brightness down.  See brightness
+				up for details.
+
+0x1012	0x11	FN+PGUP		ThinkLight toggle.  This key is
+				always handled by the firmware,
+				even when unmasked.
+
+0x1013	0x12	FN+PGDOWN	-
+
+0x1014	0x13	FN+SPACE	Zoom key
+
+0x1015	0x14	VOLUME UP	Internal mixer volume up. This
+				key is always handled by the
+				firmware, even when unmasked.
+				NOTE: Lenovo seems to be changing
+				this.
+0x1016	0x15	VOLUME DOWN	Internal mixer volume up. This
+				key is always handled by the
+				firmware, even when unmasked.
+				NOTE: Lenovo seems to be changing
+				this.
+0x1017	0x16	MUTE		Mute internal mixer. This
+				key is always handled by the
+				firmware, even when unmasked.
+
+0x1018	0x17	THINKPAD	ThinkPad/Access IBM/Lenovo key
+
+0x1019	0x18	unknown
+
+...	...	...
+
+0x1020	0x1F	unknown
+=======	=======	==============	==============================================
+
+The ThinkPad firmware does not allow one to differentiate when most hot
+keys are pressed or released (either that, or we don't know how to, yet).
+For these keys, the driver generates a set of events for a key press and
+immediately issues the same set of events for a key release.  It is
+unknown by the driver if the ThinkPad firmware triggered these events on
+hot key press or release, but the firmware will do it for either one, not
+both.
+
+If a key is mapped to KEY_RESERVED, it generates no input events at all.
+If a key is mapped to KEY_UNKNOWN, it generates an input event that
+includes an scan code.  If a key is mapped to anything else, it will
+generate input device EV_KEY events.
+
+In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
+events for switches:
+
+==============	==============================================
+SW_RFKILL_ALL	T60 and later hardware rfkill rocker switch
+SW_TABLET_MODE	Tablet ThinkPads HKEY events 0x5009 and 0x500A
+==============	==============================================
+
+Non hotkey ACPI HKEY event map
+------------------------------
+
+Events that are never propagated by the driver:
+
+======		==================================================
+0x2304		System is waking up from suspend to undock
+0x2305		System is waking up from suspend to eject bay
+0x2404		System is waking up from hibernation to undock
+0x2405		System is waking up from hibernation to eject bay
+0x5001		Lid closed
+0x5002		Lid opened
+0x5009		Tablet swivel: switched to tablet mode
+0x500A		Tablet swivel: switched to normal mode
+0x5010		Brightness level changed/control event
+0x6000		KEYBOARD: Numlock key pressed
+0x6005		KEYBOARD: Fn key pressed (TO BE VERIFIED)
+0x7000		Radio Switch may have changed state
+======		==================================================
+
+
+Events that are propagated by the driver to userspace:
+
+======		=====================================================
+0x2313		ALARM: System is waking up from suspend because
+		the battery is nearly empty
+0x2413		ALARM: System is waking up from hibernation because
+		the battery is nearly empty
+0x3003		Bay ejection (see 0x2x05) complete, can sleep again
+0x3006		Bay hotplug request (hint to power up SATA link when
+		the optical drive tray is ejected)
+0x4003		Undocked (see 0x2x04), can sleep again
+0x4010		Docked into hotplug port replicator (non-ACPI dock)
+0x4011		Undocked from hotplug port replicator (non-ACPI dock)
+0x500B		Tablet pen inserted into its storage bay
+0x500C		Tablet pen removed from its storage bay
+0x6011		ALARM: battery is too hot
+0x6012		ALARM: battery is extremely hot
+0x6021		ALARM: a sensor is too hot
+0x6022		ALARM: a sensor is extremely hot
+0x6030		System thermal table changed
+0x6032		Thermal Control command set completion  (DYTC, Windows)
+0x6040		Nvidia Optimus/AC adapter related (TO BE VERIFIED)
+0x60C0		X1 Yoga 2016, Tablet mode status changed
+0x60F0		Thermal Transformation changed (GMTS, Windows)
+======		=====================================================
+
+Battery nearly empty alarms are a last resort attempt to get the
+operating system to hibernate or shutdown cleanly (0x2313), or shutdown
+cleanly (0x2413) before power is lost.  They must be acted upon, as the
+wake up caused by the firmware will have negated most safety nets...
+
+When any of the "too hot" alarms happen, according to Lenovo the user
+should suspend or hibernate the laptop (and in the case of battery
+alarms, unplug the AC adapter) to let it cool down.  These alarms do
+signal that something is wrong, they should never happen on normal
+operating conditions.
+
+The "extremely hot" alarms are emergencies.  According to Lenovo, the
+operating system is to force either an immediate suspend or hibernate
+cycle, or a system shutdown.  Obviously, something is very wrong if this
+happens.
+
+
+Brightness hotkey notes
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Don't mess with the brightness hotkeys in a Thinkpad.  If you want
+notifications for OSD, use the sysfs backlight class event support.
+
+The driver will issue KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN events
+automatically for the cases were userspace has to do something to
+implement brightness changes.  When you override these events, you will
+either fail to handle properly the ThinkPads that require explicit
+action to change backlight brightness, or the ThinkPads that require
+that no action be taken to work properly.
+
+
+Bluetooth
+---------
+
+procfs: /proc/acpi/ibm/bluetooth
+
+sysfs device attribute: bluetooth_enable (deprecated)
+
+sysfs rfkill class: switch "tpacpi_bluetooth_sw"
+
+This feature shows the presence and current state of a ThinkPad
+Bluetooth device in the internal ThinkPad CDC slot.
+
+If the ThinkPad supports it, the Bluetooth state is stored in NVRAM,
+so it is kept across reboots and power-off.
+
+Procfs notes
+^^^^^^^^^^^^
+
+If Bluetooth is installed, the following commands can be used::
+
+	echo enable > /proc/acpi/ibm/bluetooth
+	echo disable > /proc/acpi/ibm/bluetooth
+
+Sysfs notes
+^^^^^^^^^^^
+
+	If the Bluetooth CDC card is installed, it can be enabled /
+	disabled through the "bluetooth_enable" thinkpad-acpi device
+	attribute, and its current status can also be queried.
+
+	enable:
+
+		- 0: disables Bluetooth / Bluetooth is disabled
+		- 1: enables Bluetooth / Bluetooth is enabled.
+
+	Note: this interface has been superseded by the	generic rfkill
+	class.  It has been deprecated, and it will be removed in year
+	2010.
+
+	rfkill controller switch "tpacpi_bluetooth_sw": refer to
+	Documentation/rfkill.txt for details.
+
+
+Video output control -- /proc/acpi/ibm/video
+--------------------------------------------
+
+This feature allows control over the devices used for video output -
+LCD, CRT or DVI (if available). The following commands are available::
+
+	echo lcd_enable > /proc/acpi/ibm/video
+	echo lcd_disable > /proc/acpi/ibm/video
+	echo crt_enable > /proc/acpi/ibm/video
+	echo crt_disable > /proc/acpi/ibm/video
+	echo dvi_enable > /proc/acpi/ibm/video
+	echo dvi_disable > /proc/acpi/ibm/video
+	echo auto_enable > /proc/acpi/ibm/video
+	echo auto_disable > /proc/acpi/ibm/video
+	echo expand_toggle > /proc/acpi/ibm/video
+	echo video_switch > /proc/acpi/ibm/video
+
+NOTE:
+  Access to this feature is restricted to processes owning the
+  CAP_SYS_ADMIN capability for safety reasons, as it can interact badly
+  enough with some versions of X.org to crash it.
+
+Each video output device can be enabled or disabled individually.
+Reading /proc/acpi/ibm/video shows the status of each device.
+
+Automatic video switching can be enabled or disabled.  When automatic
+video switching is enabled, certain events (e.g. opening the lid,
+docking or undocking) cause the video output device to change
+automatically. While this can be useful, it also causes flickering
+and, on the X40, video corruption. By disabling automatic switching,
+the flickering or video corruption can be avoided.
+
+The video_switch command cycles through the available video outputs
+(it simulates the behavior of Fn-F7).
+
+Video expansion can be toggled through this feature. This controls
+whether the display is expanded to fill the entire LCD screen when a
+mode with less than full resolution is used. Note that the current
+video expansion status cannot be determined through this feature.
+
+Note that on many models (particularly those using Radeon graphics
+chips) the X driver configures the video card in a way which prevents
+Fn-F7 from working. This also disables the video output switching
+features of this driver, as it uses the same ACPI methods as
+Fn-F7. Video switching on the console should still work.
+
+UPDATE: refer to https://bugs.freedesktop.org/show_bug.cgi?id=2000
+
+
+ThinkLight control
+------------------
+
+procfs: /proc/acpi/ibm/light
+
+sysfs attributes: as per LED class, for the "tpacpi::thinklight" LED
+
+procfs notes
+^^^^^^^^^^^^
+
+The ThinkLight status can be read and set through the procfs interface.  A
+few models which do not make the status available will show the ThinkLight
+status as "unknown". The available commands are::
+
+	echo on  > /proc/acpi/ibm/light
+	echo off > /proc/acpi/ibm/light
+
+sysfs notes
+^^^^^^^^^^^
+
+The ThinkLight sysfs interface is documented by the LED class
+documentation, in Documentation/leds/leds-class.rst.  The ThinkLight LED name
+is "tpacpi::thinklight".
+
+Due to limitations in the sysfs LED class, if the status of the ThinkLight
+cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
+It is impossible to know if the status returned through sysfs is valid.
+
+
+CMOS/UCMS control
+-----------------
+
+procfs: /proc/acpi/ibm/cmos
+
+sysfs device attribute: cmos_command
+
+This feature is mostly used internally by the ACPI firmware to keep the legacy
+CMOS NVRAM bits in sync with the current machine state, and to record this
+state so that the ThinkPad will retain such settings across reboots.
+
+Some of these commands actually perform actions in some ThinkPad models, but
+this is expected to disappear more and more in newer models.  As an example, in
+a T43 and in a X40, commands 12 and 13 still control the ThinkLight state for
+real, but commands 0 to 2 don't control the mixer anymore (they have been
+phased out) and just update the NVRAM.
+
+The range of valid cmos command numbers is 0 to 21, but not all have an
+effect and the behavior varies from model to model.  Here is the behavior
+on the X40 (tpb is the ThinkPad Buttons utility):
+
+	- 0 - Related to "Volume down" key press
+	- 1 - Related to "Volume up" key press
+	- 2 - Related to "Mute on" key press
+	- 3 - Related to "Access IBM" key press
+	- 4 - Related to "LCD brightness up" key press
+	- 5 - Related to "LCD brightness down" key press
+	- 11 - Related to "toggle screen expansion" key press/function
+	- 12 - Related to "ThinkLight on"
+	- 13 - Related to "ThinkLight off"
+	- 14 - Related to "ThinkLight" key press (toggle ThinkLight)
+
+The cmos command interface is prone to firmware split-brain problems, as
+in newer ThinkPads it is just a compatibility layer.  Do not use it, it is
+exported just as a debug tool.
+
+
+LED control
+-----------
+
+procfs: /proc/acpi/ibm/led
+sysfs attributes: as per LED class, see below for names
+
+Some of the LED indicators can be controlled through this feature.  On
+some older ThinkPad models, it is possible to query the status of the
+LED indicators as well.  Newer ThinkPads cannot query the real status
+of the LED indicators.
+
+Because misuse of the LEDs could induce an unaware user to perform
+dangerous actions (like undocking or ejecting a bay device while the
+buses are still active), or mask an important alarm (such as a nearly
+empty battery, or a broken battery), access to most LEDs is
+restricted.
+
+Unrestricted access to all LEDs requires that thinkpad-acpi be
+compiled with the CONFIG_THINKPAD_ACPI_UNSAFE_LEDS option enabled.
+Distributions must never enable this option.  Individual users that
+are aware of the consequences are welcome to enabling it.
+
+Audio mute and microphone mute LEDs are supported, but currently not
+visible to userspace. They are used by the snd-hda-intel audio driver.
+
+procfs notes
+^^^^^^^^^^^^
+
+The available commands are::
+
+	echo '<LED number> on' >/proc/acpi/ibm/led
+	echo '<LED number> off' >/proc/acpi/ibm/led
+	echo '<LED number> blink' >/proc/acpi/ibm/led
+
+The <LED number> range is 0 to 15. The set of LEDs that can be
+controlled varies from model to model. Here is the common ThinkPad
+mapping:
+
+	- 0 - power
+	- 1 - battery (orange)
+	- 2 - battery (green)
+	- 3 - UltraBase/dock
+	- 4 - UltraBay
+	- 5 - UltraBase battery slot
+	- 6 - (unknown)
+	- 7 - standby
+	- 8 - dock status 1
+	- 9 - dock status 2
+	- 10, 11 - (unknown)
+	- 12 - thinkvantage
+	- 13, 14, 15 - (unknown)
+
+All of the above can be turned on and off and can be made to blink.
+
+sysfs notes
+^^^^^^^^^^^
+
+The ThinkPad LED sysfs interface is described in detail by the LED class
+documentation, in Documentation/leds/leds-class.rst.
+
+The LEDs are named (in LED ID order, from 0 to 12):
+"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
+"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
+"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
+"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
+"tpacpi::thinkvantage".
+
+Due to limitations in the sysfs LED class, if the status of the LED
+indicators cannot be read due to an error, thinkpad-acpi will report it as
+a brightness of zero (same as LED off).
+
+If the thinkpad firmware doesn't support reading the current status,
+trying to read the current LED brightness will just return whatever
+brightness was last written to that attribute.
+
+These LEDs can blink using hardware acceleration.  To request that a
+ThinkPad indicator LED should blink in hardware accelerated mode, use the
+"timer" trigger, and leave the delay_on and delay_off parameters set to
+zero (to request hardware acceleration autodetection).
+
+LEDs that are known not to exist in a given ThinkPad model are not
+made available through the sysfs interface.  If you have a dock and you
+notice there are LEDs listed for your ThinkPad that do not exist (and
+are not in the dock), or if you notice that there are missing LEDs,
+a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
+
+
+ACPI sounds -- /proc/acpi/ibm/beep
+----------------------------------
+
+The BEEP method is used internally by the ACPI firmware to provide
+audible alerts in various situations. This feature allows the same
+sounds to be triggered manually.
+
+The commands are non-negative integer numbers::
+
+	echo <number> >/proc/acpi/ibm/beep
+
+The valid <number> range is 0 to 17. Not all numbers trigger sounds
+and the sounds vary from model to model. Here is the behavior on the
+X40:
+
+	- 0 - stop a sound in progress (but use 17 to stop 16)
+	- 2 - two beeps, pause, third beep ("low battery")
+	- 3 - single beep
+	- 4 - high, followed by low-pitched beep ("unable")
+	- 5 - single beep
+	- 6 - very high, followed by high-pitched beep ("AC/DC")
+	- 7 - high-pitched beep
+	- 9 - three short beeps
+	- 10 - very long beep
+	- 12 - low-pitched beep
+	- 15 - three high-pitched beeps repeating constantly, stop with 0
+	- 16 - one medium-pitched beep repeating constantly, stop with 17
+	- 17 - stop 16
+
+
+Temperature sensors
+-------------------
+
+procfs: /proc/acpi/ibm/thermal
+
+sysfs device attributes: (hwmon "thinkpad") temp*_input
+
+Most ThinkPads include six or more separate temperature sensors but only
+expose the CPU temperature through the standard ACPI methods.  This
+feature shows readings from up to eight different sensors on older
+ThinkPads, and up to sixteen different sensors on newer ThinkPads.
+
+For example, on the X40, a typical output may be:
+
+temperatures:
+	42 42 45 41 36 -128 33 -128
+
+On the T43/p, a typical output may be:
+
+temperatures:
+	48 48 36 52 38 -128 31 -128 48 52 48 -128 -128 -128 -128 -128
+
+The mapping of thermal sensors to physical locations varies depending on
+system-board model (and thus, on ThinkPad model).
+
+http://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
+tries to track down these locations for various models.
+
+Most (newer?) models seem to follow this pattern:
+
+- 1:  CPU
+- 2:  (depends on model)
+- 3:  (depends on model)
+- 4:  GPU
+- 5:  Main battery: main sensor
+- 6:  Bay battery: main sensor
+- 7:  Main battery: secondary sensor
+- 8:  Bay battery: secondary sensor
+- 9-15: (depends on model)
+
+For the R51 (source: Thomas Gruber):
+
+- 2:  Mini-PCI
+- 3:  Internal HDD
+
+For the T43, T43/p (source: Shmidoax/Thinkwiki.org)
+http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
+
+- 2:  System board, left side (near PCMCIA slot), reported as HDAPS temp
+- 3:  PCMCIA slot
+- 9:  MCH (northbridge) to DRAM Bus
+- 10: Clock-generator, mini-pci card and ICH (southbridge), under Mini-PCI
+      card, under touchpad
+- 11: Power regulator, underside of system board, below F2 key
+
+The A31 has a very atypical layout for the thermal sensors
+(source: Milos Popovic, http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
+
+- 1:  CPU
+- 2:  Main Battery: main sensor
+- 3:  Power Converter
+- 4:  Bay Battery: main sensor
+- 5:  MCH (northbridge)
+- 6:  PCMCIA/ambient
+- 7:  Main Battery: secondary sensor
+- 8:  Bay Battery: secondary sensor
+
+
+Procfs notes
+^^^^^^^^^^^^
+
+	Readings from sensors that are not available return -128.
+	No commands can be written to this file.
+
+Sysfs notes
+^^^^^^^^^^^
+
+	Sensors that are not available return the ENXIO error.  This
+	status may change at runtime, as there are hotplug thermal
+	sensors, like those inside the batteries and docks.
+
+	thinkpad-acpi thermal sensors are reported through the hwmon
+	subsystem, and follow all of the hwmon guidelines at
+	Documentation/hwmon.
+
+EXPERIMENTAL: Embedded controller register dump
+-----------------------------------------------
+
+This feature is not included in the thinkpad driver anymore.
+Instead the EC can be accessed through /sys/kernel/debug/ec with
+a userspace tool which can be found here:
+ftp://ftp.suse.com/pub/people/trenn/sources/ec
+
+Use it to determine the register holding the fan
+speed on some models. To do that, do the following:
+
+	- make sure the battery is fully charged
+	- make sure the fan is running
+	- use above mentioned tool to read out the EC
+
+Often fan and temperature values vary between
+readings. Since temperatures don't change vary fast, you can take
+several quick dumps to eliminate them.
+
+You can use a similar method to figure out the meaning of other
+embedded controller registers - e.g. make sure nothing else changes
+except the charging or discharging battery to determine which
+registers contain the current battery capacity, etc. If you experiment
+with this, do send me your results (including some complete dumps with
+a description of the conditions when they were taken.)
+
+
+LCD brightness control
+----------------------
+
+procfs: /proc/acpi/ibm/brightness
+
+sysfs backlight device "thinkpad_screen"
+
+This feature allows software control of the LCD brightness on ThinkPad
+models which don't have a hardware brightness slider.
+
+It has some limitations: the LCD backlight cannot be actually turned
+on or off by this interface, it just controls the backlight brightness
+level.
+
+On IBM (and some of the earlier Lenovo) ThinkPads, the backlight control
+has eight brightness levels, ranging from 0 to 7.  Some of the levels
+may not be distinct.  Later Lenovo models that implement the ACPI
+display backlight brightness control methods have 16 levels, ranging
+from 0 to 15.
+
+For IBM ThinkPads, there are two interfaces to the firmware for direct
+brightness control, EC and UCMS (or CMOS).  To select which one should be
+used, use the brightness_mode module parameter: brightness_mode=1 selects
+EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
+mode with NVRAM backing (so that brightness changes are remembered across
+shutdown/reboot).
+
+The driver tries to select which interface to use from a table of
+defaults for each ThinkPad model.  If it makes a wrong choice, please
+report this as a bug, so that we can fix it.
+
+Lenovo ThinkPads only support brightness_mode=2 (UCMS).
+
+When display backlight brightness controls are available through the
+standard ACPI interface, it is best to use it instead of this direct
+ThinkPad-specific interface.  The driver will disable its native
+backlight brightness control interface if it detects that the standard
+ACPI interface is available in the ThinkPad.
+
+If you want to use the thinkpad-acpi backlight brightness control
+instead of the generic ACPI video backlight brightness control for some
+reason, you should use the acpi_backlight=vendor kernel parameter.
+
+The brightness_enable module parameter can be used to control whether
+the LCD brightness control feature will be enabled when available.
+brightness_enable=0 forces it to be disabled.  brightness_enable=1
+forces it to be enabled when available, even if the standard ACPI
+interface is also available.
+
+Procfs notes
+^^^^^^^^^^^^
+
+The available commands are::
+
+	echo up   >/proc/acpi/ibm/brightness
+	echo down >/proc/acpi/ibm/brightness
+	echo 'level <level>' >/proc/acpi/ibm/brightness
+
+Sysfs notes
+^^^^^^^^^^^
+
+The interface is implemented through the backlight sysfs class, which is
+poorly documented at this time.
+
+Locate the thinkpad_screen device under /sys/class/backlight, and inside
+it there will be the following attributes:
+
+	max_brightness:
+		Reads the maximum brightness the hardware can be set to.
+		The minimum is always zero.
+
+	actual_brightness:
+		Reads what brightness the screen is set to at this instant.
+
+	brightness:
+		Writes request the driver to change brightness to the
+		given value.  Reads will tell you what brightness the
+		driver is trying to set the display to when "power" is set
+		to zero and the display has not been dimmed by a kernel
+		power management event.
+
+	power:
+		power management mode, where 0 is "display on", and 1 to 3
+		will dim the display backlight to brightness level 0
+		because thinkpad-acpi cannot really turn the backlight
+		off.  Kernel power management events can temporarily
+		increase the current power management level, i.e. they can
+		dim the display.
+
+
+WARNING:
+
+    Whatever you do, do NOT ever call thinkpad-acpi backlight-level change
+    interface and the ACPI-based backlight level change interface
+    (available on newer BIOSes, and driven by the Linux ACPI video driver)
+    at the same time.  The two will interact in bad ways, do funny things,
+    and maybe reduce the life of the backlight lamps by needlessly kicking
+    its level up and down at every change.
+
+
+Volume control (Console Audio control)
+--------------------------------------
+
+procfs: /proc/acpi/ibm/volume
+
+ALSA: "ThinkPad Console Audio Control", default ID: "ThinkPadEC"
+
+NOTE: by default, the volume control interface operates in read-only
+mode, as it is supposed to be used for on-screen-display purposes.
+The read/write mode can be enabled through the use of the
+"volume_control=1" module parameter.
+
+NOTE: distros are urged to not enable volume_control by default, this
+should be done by the local admin only.  The ThinkPad UI is for the
+console audio control to be done through the volume keys only, and for
+the desktop environment to just provide on-screen-display feedback.
+Software volume control should be done only in the main AC97/HDA
+mixer.
+
+
+About the ThinkPad Console Audio control
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ThinkPads have a built-in amplifier and muting circuit that drives the
+console headphone and speakers.  This circuit is after the main AC97
+or HDA mixer in the audio path, and under exclusive control of the
+firmware.
+
+ThinkPads have three special hotkeys to interact with the console
+audio control: volume up, volume down and mute.
+
+It is worth noting that the normal way the mute function works (on
+ThinkPads that do not have a "mute LED") is:
+
+1. Press mute to mute.  It will *always* mute, you can press it as
+   many times as you want, and the sound will remain mute.
+
+2. Press either volume key to unmute the ThinkPad (it will _not_
+   change the volume, it will just unmute).
+
+This is a very superior design when compared to the cheap software-only
+mute-toggle solution found on normal consumer laptops:  you can be
+absolutely sure the ThinkPad will not make noise if you press the mute
+button, no matter the previous state.
+
+The IBM ThinkPads, and the earlier Lenovo ThinkPads have variable-gain
+amplifiers driving the speakers and headphone output, and the firmware
+also handles volume control for the headphone and speakers on these
+ThinkPads without any help from the operating system (this volume
+control stage exists after the main AC97 or HDA mixer in the audio
+path).
+
+The newer Lenovo models only have firmware mute control, and depend on
+the main HDA mixer to do volume control (which is done by the operating
+system).  In this case, the volume keys are filtered out for unmute
+key press (there are some firmware bugs in this area) and delivered as
+normal key presses to the operating system (thinkpad-acpi is not
+involved).
+
+
+The ThinkPad-ACPI volume control
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The preferred way to interact with the Console Audio control is the
+ALSA interface.
+
+The legacy procfs interface allows one to read the current state,
+and if volume control is enabled, accepts the following commands::
+
+	echo up   >/proc/acpi/ibm/volume
+	echo down >/proc/acpi/ibm/volume
+	echo mute >/proc/acpi/ibm/volume
+	echo unmute >/proc/acpi/ibm/volume
+	echo 'level <level>' >/proc/acpi/ibm/volume
+
+The <level> number range is 0 to 14 although not all of them may be
+distinct. To unmute the volume after the mute command, use either the
+up or down command (the level command will not unmute the volume), or
+the unmute command.
+
+You can use the volume_capabilities parameter to tell the driver
+whether your thinkpad has volume control or mute-only control:
+volume_capabilities=1 for mixers with mute and volume control,
+volume_capabilities=2 for mixers with only mute control.
+
+If the driver misdetects the capabilities for your ThinkPad model,
+please report this to ibm-acpi-devel@lists.sourceforge.net, so that we
+can update the driver.
+
+There are two strategies for volume control.  To select which one
+should be used, use the volume_mode module parameter: volume_mode=1
+selects EC mode, and volume_mode=3 selects EC mode with NVRAM backing
+(so that volume/mute changes are remembered across shutdown/reboot).
+
+The driver will operate in volume_mode=3 by default. If that does not
+work well on your ThinkPad model, please report this to
+ibm-acpi-devel@lists.sourceforge.net.
+
+The driver supports the standard ALSA module parameters.  If the ALSA
+mixer is disabled, the driver will disable all volume functionality.
+
+
+Fan control and monitoring: fan speed, fan enable/disable
+---------------------------------------------------------
+
+procfs: /proc/acpi/ibm/fan
+
+sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1, pwm1_enable, fan2_input
+
+sysfs hwmon driver attributes: fan_watchdog
+
+NOTE NOTE NOTE:
+   fan control operations are disabled by default for
+   safety reasons.  To enable them, the module parameter "fan_control=1"
+   must be given to thinkpad-acpi.
+
+This feature attempts to show the current fan speed, control mode and
+other fan data that might be available.  The speed is read directly
+from the hardware registers of the embedded controller.  This is known
+to work on later R, T, X and Z series ThinkPads but may show a bogus
+value on other models.
+
+Some Lenovo ThinkPads support a secondary fan.  This fan cannot be
+controlled separately, it shares the main fan control.
+
+Fan levels
+^^^^^^^^^^
+
+Most ThinkPad fans work in "levels" at the firmware interface.  Level 0
+stops the fan.  The higher the level, the higher the fan speed, although
+adjacent levels often map to the same fan speed.  7 is the highest
+level, where the fan reaches the maximum recommended speed.
+
+Level "auto" means the EC changes the fan level according to some
+internal algorithm, usually based on readings from the thermal sensors.
+
+There is also a "full-speed" level, also known as "disengaged" level.
+In this level, the EC disables the speed-locked closed-loop fan control,
+and drives the fan as fast as it can go, which might exceed hardware
+limits, so use this level with caution.
+
+The fan usually ramps up or down slowly from one speed to another, and
+it is normal for the EC to take several seconds to react to fan
+commands.  The full-speed level may take up to two minutes to ramp up to
+maximum speed, and in some ThinkPads, the tachometer readings go stale
+while the EC is transitioning to the full-speed level.
+
+WARNING WARNING WARNING: do not leave the fan disabled unless you are
+monitoring all of the temperature sensor readings and you are ready to
+enable it if necessary to avoid overheating.
+
+An enabled fan in level "auto" may stop spinning if the EC decides the
+ThinkPad is cool enough and doesn't need the extra airflow.  This is
+normal, and the EC will spin the fan up if the various thermal readings
+rise too much.
+
+On the X40, this seems to depend on the CPU and HDD temperatures.
+Specifically, the fan is turned on when either the CPU temperature
+climbs to 56 degrees or the HDD temperature climbs to 46 degrees.  The
+fan is turned off when the CPU temperature drops to 49 degrees and the
+HDD temperature drops to 41 degrees.  These thresholds cannot
+currently be controlled.
+
+The ThinkPad's ACPI DSDT code will reprogram the fan on its own when
+certain conditions are met.  It will override any fan programming done
+through thinkpad-acpi.
+
+The thinkpad-acpi kernel driver can be programmed to revert the fan
+level to a safe setting if userspace does not issue one of the procfs
+fan commands: "enable", "disable", "level" or "watchdog", or if there
+are no writes to pwm1_enable (or to pwm1 *if and only if* pwm1_enable is
+set to 1, manual mode) within a configurable amount of time of up to
+120 seconds.  This functionality is called fan safety watchdog.
+
+Note that the watchdog timer stops after it enables the fan.  It will be
+rearmed again automatically (using the same interval) when one of the
+above mentioned fan commands is received.  The fan watchdog is,
+therefore, not suitable to protect against fan mode changes made through
+means other than the "enable", "disable", and "level" procfs fan
+commands, or the hwmon fan control sysfs interface.
+
+Procfs notes
+^^^^^^^^^^^^
+
+The fan may be enabled or disabled with the following commands::
+
+	echo enable  >/proc/acpi/ibm/fan
+	echo disable >/proc/acpi/ibm/fan
+
+Placing a fan on level 0 is the same as disabling it.  Enabling a fan
+will try to place it in a safe level if it is too slow or disabled.
+
+The fan level can be controlled with the command::
+
+	echo 'level <level>' > /proc/acpi/ibm/fan
+
+Where <level> is an integer from 0 to 7, or one of the words "auto" or
+"full-speed" (without the quotes).  Not all ThinkPads support the "auto"
+and "full-speed" levels.  The driver accepts "disengaged" as an alias for
+"full-speed", and reports it as "disengaged" for backwards
+compatibility.
+
+On the X31 and X40 (and ONLY on those models), the fan speed can be
+controlled to a certain degree.  Once the fan is running, it can be
+forced to run faster or slower with the following command::
+
+	echo 'speed <speed>' > /proc/acpi/ibm/fan
+
+The sustainable range of fan speeds on the X40 appears to be from about
+3700 to about 7350. Values outside this range either do not have any
+effect or the fan speed eventually settles somewhere in that range.  The
+fan cannot be stopped or started with this command.  This functionality
+is incomplete, and not available through the sysfs interface.
+
+To program the safety watchdog, use the "watchdog" command::
+
+	echo 'watchdog <interval in seconds>' > /proc/acpi/ibm/fan
+
+If you want to disable the watchdog, use 0 as the interval.
+
+Sysfs notes
+^^^^^^^^^^^
+
+The sysfs interface follows the hwmon subsystem guidelines for the most
+part, and the exception is the fan safety watchdog.
+
+Writes to any of the sysfs attributes may return the EINVAL error if
+that operation is not supported in a given ThinkPad or if the parameter
+is out-of-bounds, and EPERM if it is forbidden.  They may also return
+EINTR (interrupted system call), and EIO (I/O error while trying to talk
+to the firmware).
+
+Features not yet implemented by the driver return ENOSYS.
+
+hwmon device attribute pwm1_enable:
+	- 0: PWM offline (fan is set to full-speed mode)
+	- 1: Manual PWM control (use pwm1 to set fan level)
+	- 2: Hardware PWM control (EC "auto" mode)
+	- 3: reserved (Software PWM control, not implemented yet)
+
+	Modes 0 and 2 are not supported by all ThinkPads, and the
+	driver is not always able to detect this.  If it does know a
+	mode is unsupported, it will return -EINVAL.
+
+hwmon device attribute pwm1:
+	Fan level, scaled from the firmware values of 0-7 to the hwmon
+	scale of 0-255.  0 means fan stopped, 255 means highest normal
+	speed (level 7).
+
+	This attribute only commands the fan if pmw1_enable is set to 1
+	(manual PWM control).
+
+hwmon device attribute fan1_input:
+	Fan tachometer reading, in RPM.  May go stale on certain
+	ThinkPads while the EC transitions the PWM to offline mode,
+	which can take up to two minutes.  May return rubbish on older
+	ThinkPads.
+
+hwmon device attribute fan2_input:
+	Fan tachometer reading, in RPM, for the secondary fan.
+	Available only on some ThinkPads.  If the secondary fan is
+	not installed, will always read 0.
+
+hwmon driver attribute fan_watchdog:
+	Fan safety watchdog timer interval, in seconds.  Minimum is
+	1 second, maximum is 120 seconds.  0 disables the watchdog.
+
+To stop the fan: set pwm1 to zero, and pwm1_enable to 1.
+
+To start the fan in a safe mode: set pwm1_enable to 2.  If that fails
+with EINVAL, try to set pwm1_enable to 1 and pwm1 to at least 128 (255
+would be the safest choice, though).
+
+
+WAN
+---
+
+procfs: /proc/acpi/ibm/wan
+
+sysfs device attribute: wwan_enable (deprecated)
+
+sysfs rfkill class: switch "tpacpi_wwan_sw"
+
+This feature shows the presence and current state of the built-in
+Wireless WAN device.
+
+If the ThinkPad supports it, the WWAN state is stored in NVRAM,
+so it is kept across reboots and power-off.
+
+It was tested on a Lenovo ThinkPad X60. It should probably work on other
+ThinkPad models which come with this module installed.
+
+Procfs notes
+^^^^^^^^^^^^
+
+If the W-WAN card is installed, the following commands can be used::
+
+	echo enable > /proc/acpi/ibm/wan
+	echo disable > /proc/acpi/ibm/wan
+
+Sysfs notes
+^^^^^^^^^^^
+
+	If the W-WAN card is installed, it can be enabled /
+	disabled through the "wwan_enable" thinkpad-acpi device
+	attribute, and its current status can also be queried.
+
+	enable:
+		- 0: disables WWAN card / WWAN card is disabled
+		- 1: enables WWAN card / WWAN card is enabled.
+
+	Note: this interface has been superseded by the	generic rfkill
+	class.  It has been deprecated, and it will be removed in year
+	2010.
+
+	rfkill controller switch "tpacpi_wwan_sw": refer to
+	Documentation/rfkill.txt for details.
+
+
+EXPERIMENTAL: UWB
+-----------------
+
+This feature is considered EXPERIMENTAL because it has not been extensively
+tested and validated in various ThinkPad models yet.  The feature may not
+work as expected. USE WITH CAUTION! To use this feature, you need to supply
+the experimental=1 parameter when loading the module.
+
+sysfs rfkill class: switch "tpacpi_uwb_sw"
+
+This feature exports an rfkill controller for the UWB device, if one is
+present and enabled in the BIOS.
+
+Sysfs notes
+^^^^^^^^^^^
+
+	rfkill controller switch "tpacpi_uwb_sw": refer to
+	Documentation/rfkill.txt for details.
+
+Adaptive keyboard
+-----------------
+
+sysfs device attribute: adaptive_kbd_mode
+
+This sysfs attribute controls the keyboard "face" that will be shown on the
+Lenovo X1 Carbon 2nd gen (2014)'s adaptive keyboard. The value can be read
+and set.
+
+- 1 = Home mode
+- 2 = Web-browser mode
+- 3 = Web-conference mode
+- 4 = Function mode
+- 5 = Layflat mode
+
+For more details about which buttons will appear depending on the mode, please
+review the laptop's user guide:
+http://www.lenovo.com/shop/americas/content/user_guides/x1carbon_2_ug_en.pdf
+
+Multiple Commands, Module Parameters
+------------------------------------
+
+Multiple commands can be written to the proc files in one shot by
+separating them with commas, for example::
+
+	echo enable,0xffff > /proc/acpi/ibm/hotkey
+	echo lcd_disable,crt_enable > /proc/acpi/ibm/video
+
+Commands can also be specified when loading the thinkpad-acpi module,
+for example::
+
+	modprobe thinkpad_acpi hotkey=enable,0xffff video=auto_disable
+
+
+Enabling debugging output
+-------------------------
+
+The module takes a debug parameter which can be used to selectively
+enable various classes of debugging output, for example::
+
+	 modprobe thinkpad_acpi debug=0xffff
+
+will enable all debugging output classes.  It takes a bitmask, so
+to enable more than one output class, just add their values.
+
+	=============		======================================
+	Debug bitmask		Description
+	=============		======================================
+	0x8000			Disclose PID of userspace programs
+				accessing some functions of the driver
+	0x0001			Initialization and probing
+	0x0002			Removal
+	0x0004			RF Transmitter control (RFKILL)
+				(bluetooth, WWAN, UWB...)
+	0x0008			HKEY event interface, hotkeys
+	0x0010			Fan control
+	0x0020			Backlight brightness
+	0x0040			Audio mixer/volume control
+	=============		======================================
+
+There is also a kernel build option to enable more debugging
+information, which may be necessary to debug driver problems.
+
+The level of debugging information output by the driver can be changed
+at runtime through sysfs, using the driver attribute debug_level.  The
+attribute takes the same bitmask as the debug module parameter above.
+
+
+Force loading of module
+-----------------------
+
+If thinkpad-acpi refuses to detect your ThinkPad, you can try to specify
+the module parameter force_load=1.  Regardless of whether this works or
+not, please contact ibm-acpi-devel@lists.sourceforge.net with a report.
+
+
+Sysfs interface changelog
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+=========	===============================================================
+0x000100:	Initial sysfs support, as a single platform driver and
+		device.
+0x000200:	Hot key support for 32 hot keys, and radio slider switch
+		support.
+0x010000:	Hot keys are now handled by default over the input
+		layer, the radio switch generates input event EV_RADIO,
+		and the driver enables hot key handling by default in
+		the firmware.
+
+0x020000:	ABI fix: added a separate hwmon platform device and
+		driver, which must be located by name (thinkpad)
+		and the hwmon class for libsensors4 (lm-sensors 3)
+		compatibility.  Moved all hwmon attributes to this
+		new platform device.
+
+0x020100:	Marker for thinkpad-acpi with hot key NVRAM polling
+		support.  If you must, use it to know you should not
+		start a userspace NVRAM poller (allows to detect when
+		NVRAM is compiled out by the user because it is
+		unneeded/undesired in the first place).
+0x020101:	Marker for thinkpad-acpi with hot key NVRAM polling
+		and proper hotkey_mask semantics (version 8 of the
+		NVRAM polling patch).  Some development snapshots of
+		0.18 had an earlier version that did strange things
+		to hotkey_mask.
+
+0x020200:	Add poll()/select() support to the following attributes:
+		hotkey_radio_sw, wakeup_hotunplug_complete, wakeup_reason
+
+0x020300:	hotkey enable/disable support removed, attributes
+		hotkey_bios_enabled and hotkey_enable deprecated and
+		marked for removal.
+
+0x020400:	Marker for 16 LEDs support.  Also, LEDs that are known
+		to not exist in a given model are not registered with
+		the LED sysfs class anymore.
+
+0x020500:	Updated hotkey driver, hotkey_mask is always available
+		and it is always able to disable hot keys.  Very old
+		thinkpads are properly supported.  hotkey_bios_mask
+		is deprecated and marked for removal.
+
+0x020600:	Marker for backlight change event support.
+
+0x020700:	Support for mute-only mixers.
+		Volume control in read-only mode by default.
+		Marker for ALSA mixer support.
+
+0x030000:	Thermal and fan sysfs attributes were moved to the hwmon
+		device instead of being attached to the backing platform
+		device.
+=========	===============================================================
diff --git a/Documentation/admin-guide/laptops/toshiba_haps.rst b/Documentation/admin-guide/laptops/toshiba_haps.rst
new file mode 100644
index 000000000000..11dfc428c080
--- /dev/null
+++ b/Documentation/admin-guide/laptops/toshiba_haps.rst
@@ -0,0 +1,87 @@
+====================================
+Toshiba HDD Active Protection Sensor
+====================================
+
+Kernel driver: toshiba_haps
+
+Author: Azael Avalos <coproscefalo@gmail.com>
+
+
+.. 0. Contents
+
+   1. Description
+   2. Interface
+   3. Accelerometer axes
+   4. Supported devices
+   5. Usage
+
+
+1. Description
+--------------
+
+This driver provides support for the accelerometer found in various Toshiba
+laptops, being called "Toshiba HDD Protection - Shock Sensor" officially,
+and detects laptops automatically with this device.
+On Windows, Toshiba provided software monitors this device and provides
+automatic HDD protection (head unload) on sudden moves or harsh vibrations,
+however, this driver only provides a notification via a sysfs file to let
+userspace tools or daemons act accordingly, as well as providing a sysfs
+file to set the desired protection level or sensor sensibility.
+
+
+2. Interface
+------------
+
+This device comes with 3 methods:
+
+====	=====================================================================
+_STA    Checks existence of the device, returning Zero if the device does not
+	exists or is not supported.
+PTLV    Sets the desired protection level.
+RSSS    Shuts down the HDD protection interface for a few seconds,
+	then restores normal operation.
+====	=====================================================================
+
+Note:
+  The presence of Solid State Drives (SSD) can make this driver to fail loading,
+  given the fact that such drives have no movable parts, and thus, not requiring
+  any "protection" as well as failing during the evaluation of the _STA method
+  found under this device.
+
+
+3. Accelerometer axes
+---------------------
+
+This device does not report any axes, however, to query the sensor position
+a couple HCI (Hardware Configuration Interface) calls (0x6D and 0xA6) are
+provided to query such information, handled by the kernel module toshiba_acpi
+since kernel version 3.15.
+
+
+4. Supported devices
+--------------------
+
+This driver binds itself to the ACPI device TOS620A, and any Toshiba laptop
+with this device is supported, given the fact that they have the presence of
+conventional HDD and not only SSD, or a combination of both HDD and SSD.
+
+
+5. Usage
+--------
+
+The sysfs files under /sys/devices/LNXSYSTM:00/LNXSYBUS:00/TOS620A:00/ are:
+
+================   ============================================================
+protection_level   The protection_level is readable and writeable, and
+		   provides a way to let userspace query the current protection
+		   level, as well as set the desired protection level, the
+		   available protection levels are:
+
+		   ============   =======   ==========   ========
+		   0 - Disabled   1 - Low   2 - Medium   3 - High
+		   ============   =======   ==========   ========
+
+reset_protection   The reset_protection entry is writeable only, being "1"
+		   the only parameter it accepts, it is used to trigger
+		   a reset of the protection interface.
+================   ============================================================
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 5aceb5cd5ce7..64aeee1009ca 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -108,7 +108,7 @@ block_dump
 ==========
 
 block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
+information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
 
 
 compact_memory
@@ -298,7 +298,7 @@ laptop_mode
 ===========
 
 laptop_mode is a knob that controls "laptop mode". All the things that are
-controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
+controlled by this knob are discussed in Documentation/admin-guide/laptops/laptop-mode.rst.
 
 
 legacy_va_layout
diff --git a/Documentation/laptops/asus-laptop.rst b/Documentation/laptops/asus-laptop.rst
deleted file mode 100644
index 95176321a25a..000000000000
--- a/Documentation/laptops/asus-laptop.rst
+++ /dev/null
@@ -1,271 +0,0 @@
-==================
-Asus Laptop Extras
-==================
-
-Version 0.1
-
-August 6, 2009
-
-Corentin Chary <corentincj@iksaif.net>
-http://acpi4asus.sf.net/
-
- This driver provides support for extra features of ACPI-compatible ASUS laptops.
- It may also support some MEDION, JVC or VICTOR laptops (such as MEDION 9675 or
- VICTOR XP7210 for example). It makes all the extra buttons generate input
- events (like keyboards).
-
- On some models adds support for changing the display brightness and output,
- switching the LCD backlight on and off, and most importantly, allows you to
- blink those fancy LEDs intended for reporting mail and wireless status.
-
-This driver supersedes the old asus_acpi driver.
-
-Requirements
-------------
-
-  Kernel 2.6.X sources, configured for your computer, with ACPI support.
-  You also need CONFIG_INPUT and CONFIG_ACPI.
-
-Status
-------
-
- The features currently supported are the following (see below for
- detailed description):
-
- - Fn key combinations
- - Bluetooth enable and disable
- - Wlan enable and disable
- - GPS enable and disable
- - Video output switching
- - Ambient Light Sensor on and off
- - LED control
- - LED Display control
- - LCD brightness control
- - LCD on and off
-
- A compatibility table by model and feature is maintained on the web
- site, http://acpi4asus.sf.net/.
-
-Usage
------
-
-  Try "modprobe asus-laptop". Check your dmesg (simply type dmesg). You should
-  see some lines like this :
-
-      Asus Laptop Extras version 0.42
-        - L2D model detected.
-
-  If it is not the output you have on your laptop, send it (and the laptop's
-  DSDT) to me.
-
-  That's all, now, all the events generated by the hotkeys of your laptop
-  should be reported via netlink events. You can check with
-  "acpi_genl monitor" (part of the acpica project).
-
-  Hotkeys are also reported as input keys (like keyboards) you can check
-  which key are supported using "xev" under X11.
-
-  You can get information on the version of your DSDT table by reading the
-  /sys/devices/platform/asus-laptop/infos entry. If you have a question or a
-  bug report to do, please include the output of this entry.
-
-LEDs
-----
-
-  You can modify LEDs be echoing values to `/sys/class/leds/asus/*/brightness`::
-
-    echo 1 >  /sys/class/leds/asus::mail/brightness
-
-  will switch the mail LED on.
-
-  You can also know if they are on/off by reading their content and use
-  kernel triggers like disk-activity or heartbeat.
-
-Backlight
----------
-
-  You can control lcd backlight power and brightness with
-  /sys/class/backlight/asus-laptop/. Brightness Values are between 0 and 15.
-
-Wireless devices
-----------------
-
-  You can turn the internal Bluetooth adapter on/off with the bluetooth entry
-  (only on models with Bluetooth). This usually controls the associated LED.
-  Same for Wlan adapter.
-
-Display switching
------------------
-
-  Note: the display switching code is currently considered EXPERIMENTAL.
-
-  Switching works for the following models:
-
-    - L3800C
-    - A2500H
-    - L5800C
-    - M5200N
-    - W1000N (albeit with some glitches)
-    - M6700R
-    - A6JC
-    - F3J
-
-  Switching doesn't work for the following:
-
-    - M3700N
-    - L2X00D (locks the laptop under certain conditions)
-
-  To switch the displays, echo values from 0 to 15 to
-  /sys/devices/platform/asus-laptop/display. The significance of those values
-  is as follows:
-
-  +-------+-----+-----+-----+-----+-----+
-  | Bin   | Val | DVI | TV  | CRT | LCD |
-  +-------+-----+-----+-----+-----+-----+
-  | 0000  |   0 |     |     |     |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 0001  |   1 |     |     |     |  X  |
-  +-------+-----+-----+-----+-----+-----+
-  | 0010  |   2 |     |     |  X  |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 0011  |   3 |     |     |  X  |  X  |
-  +-------+-----+-----+-----+-----+-----+
-  | 0100  |   4 |     |  X  |     |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 0101  |   5 |     |  X  |     | X   |
-  +-------+-----+-----+-----+-----+-----+
-  | 0110  |   6 |     |  X  |  X  |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 0111  |   7 |     |  X  |  X  |  X  |
-  +-------+-----+-----+-----+-----+-----+
-  | 1000  |   8 |  X  |     |     |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 1001  |   9 |  X  |     |     |  X  |
-  +-------+-----+-----+-----+-----+-----+
-  | 1010  |  10 |  X  |     |  X  |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 1011  |  11 |  X  |     |  X  |  X  |
-  +-------+-----+-----+-----+-----+-----+
-  | 1100  |  12 |  X  |  X  |     |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 1101  |  13 |  X  |  X  |     |  X  |
-  +-------+-----+-----+-----+-----+-----+
-  | 1110  |  14 |  X  |  X  |  X  |     |
-  +-------+-----+-----+-----+-----+-----+
-  | 1111  |  15 |  X  |  X  |  X  |  X  |
-  +-------+-----+-----+-----+-----+-----+
-
-  In most cases, the appropriate displays must be plugged in for the above
-  combinations to work. TV-Out may need to be initialized at boot time.
-
-  Debugging:
-
-  1) Check whether the Fn+F8 key:
-
-     a) does not lock the laptop (try a boot with noapic / nolapic if it does)
-     b) generates events (0x6n, where n is the value corresponding to the
-        configuration above)
-     c) actually works
-
-     Record the disp value at every configuration.
-  2) Echo values from 0 to 15 to /sys/devices/platform/asus-laptop/display.
-     Record its value, note any change. If nothing changes, try a broader range,
-     up to 65535.
-  3) Send ANY output (both positive and negative reports are needed, unless your
-     machine is already listed above) to the acpi4asus-user mailing list.
-
-  Note: on some machines (e.g. L3C), after the module has been loaded, only 0x6n
-  events are generated and no actual switching occurs. In such a case, a line
-  like::
-
-    echo $((10#$arg-60)) > /sys/devices/platform/asus-laptop/display
-
-  will usually do the trick ($arg is the 0000006n-like event passed to acpid).
-
-  Note: there is currently no reliable way to read display status on xxN
-  (Centrino) models.
-
-LED display
------------
-
-  Some models like the W1N have a LED display that can be used to display
-  several items of information.
-
-  LED display works for the following models:
-
-    - W1000N
-    - W1J
-
-  To control the LED display, use the following::
-
-    echo 0x0T000DDD > /sys/devices/platform/asus-laptop/
-
-  where T control the 3 letters display, and DDD the 3 digits display,
-  according to the tables below::
-
-         DDD (digits)
-         000 to 999 = display digits
-         AAA        = ---
-         BBB to FFF = turn-off
-
-         T  (type)
-         0 = off
-         1 = dvd
-         2 = vcd
-         3 = mp3
-         4 = cd
-         5 = tv
-         6 = cpu
-         7 = vol
-
-  For example "echo 0x01000001 >/sys/devices/platform/asus-laptop/ledd"
-  would display "DVD001".
-
-Driver options
---------------
-
- Options can be passed to the asus-laptop driver using the standard
- module argument syntax (<param>=<value> when passing the option to the
- module or asus-laptop.<param>=<value> on the kernel boot line when
- asus-laptop is statically linked into the kernel).
-
-	     wapf: WAPF defines the behavior of the Fn+Fx wlan key
-		   The significance of values is yet to be found, but
-		   most of the time:
-
-		   - 0x0 should do nothing
-		   - 0x1 should allow to control the device with Fn+Fx key.
-		   - 0x4 should send an ACPI event (0x88) while pressing the Fn+Fx key
-		   - 0x5 like 0x1 or 0x4
-
- The default value is 0x1.
-
-Unsupported models
-------------------
-
- These models will never be supported by this module, as they use a completely
- different mechanism to handle LEDs and extra stuff (meaning we have no clue
- how it works):
-
- - ASUS A1300 (A1B), A1370D
- - ASUS L7300G
- - ASUS L8400
-
-Patches, Errors, Questions
---------------------------
-
- I appreciate any success or failure
- reports, especially if they add to or correct the compatibility table.
- Please include the following information in your report:
-
- - Asus model name
- - a copy of your ACPI tables, using the "acpidump" utility
- - a copy of /sys/devices/platform/asus-laptop/infos
- - which driver features work and which don't
- - the observed behavior of non-working features
-
- Any other comments or patches are also more than welcome.
-
- acpi4asus-user@lists.sourceforge.net
-
- http://sourceforge.net/projects/acpi4asus
diff --git a/Documentation/laptops/disk-shock-protection.rst b/Documentation/laptops/disk-shock-protection.rst
deleted file mode 100644
index e97c5f78d8c3..000000000000
--- a/Documentation/laptops/disk-shock-protection.rst
+++ /dev/null
@@ -1,151 +0,0 @@
-==========================
-Hard disk shock protection
-==========================
-
-Author: Elias Oltmanns <eo@nebensachen.de>
-
-Last modified: 2008-10-03
-
-
-.. 0. Contents
-
-   1. Intro
-   2. The interface
-   3. References
-   4. CREDITS
-
-
-1. Intro
---------
-
-ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature.
-Issuing this command should cause the drive to switch to idle mode and
-unload disk heads. This feature is being used in modern laptops in
-conjunction with accelerometers and appropriate software to implement
-a shock protection facility. The idea is to stop all I/O operations on
-the internal hard drive and park its heads on the ramp when critical
-situations are anticipated. The desire to have such a feature
-available on GNU/Linux systems has been the original motivation to
-implement a generic disk head parking interface in the Linux kernel.
-Please note, however, that other components have to be set up on your
-system in order to get disk shock protection working (see
-section 3. References below for pointers to more information about
-that).
-
-
-2. The interface
-----------------
-
-For each ATA device, the kernel exports the file
-`block/*/device/unload_heads` in sysfs (here assumed to be mounted under
-/sys). Access to `/sys/block/*/device/unload_heads` is denied with
--EOPNOTSUPP if the device does not support the unload feature.
-Otherwise, writing an integer value to this file will take the heads
-of the respective drive off the platter and block all I/O operations
-for the specified number of milliseconds. When the timeout expires and
-no further disk head park request has been issued in the meantime,
-normal operation will be resumed. The maximal value accepted for a
-timeout is 30000 milliseconds. Exceeding this limit will return
--EOVERFLOW, but heads will be parked anyway and the timeout will be
-set to 30 seconds. However, you can always change a timeout to any
-value between 0 and 30000 by issuing a subsequent head park request
-before the timeout of the previous one has expired. In particular, the
-total timeout can exceed 30 seconds and, more importantly, you can
-cancel a previously set timeout and resume normal operation
-immediately by specifying a timeout of 0. Values below -2 are rejected
-with -EINVAL (see below for the special meaning of -1 and -2). If the
-timeout specified for a recent head park request has not yet expired,
-reading from `/sys/block/*/device/unload_heads` will report the number
-of milliseconds remaining until normal operation will be resumed;
-otherwise, reading the unload_heads attribute will return 0.
-
-For example, do the following in order to park the heads of drive
-/dev/sda and stop all I/O operations for five seconds::
-
-	# echo 5000 > /sys/block/sda/device/unload_heads
-
-A simple::
-
-	# cat /sys/block/sda/device/unload_heads
-
-will show you how many milliseconds are left before normal operation
-will be resumed.
-
-A word of caution: The fact that the interface operates on a basis of
-milliseconds may raise expectations that cannot be satisfied in
-reality. In fact, the ATA specs clearly state that the time for an
-unload operation to complete is vendor specific. The hint in ATA-7
-that this will typically be within 500 milliseconds apparently has
-been dropped in ATA-8.
-
-There is a technical detail of this implementation that may cause some
-confusion and should be discussed here. When a head park request has
-been issued to a device successfully, all I/O operations on the
-controller port this device is attached to will be deferred. That is
-to say, any other device that may be connected to the same port will
-be affected too. The only exception is that a subsequent head unload
-request to that other device will be executed immediately. Further
-operations on that port will be deferred until the timeout specified
-for either device on the port has expired. As far as PATA (old style
-IDE) configurations are concerned, there can only be two devices
-attached to any single port. In SATA world we have port multipliers
-which means that a user-issued head parking request to one device may
-actually result in stopping I/O to a whole bunch of devices. However,
-since this feature is supposed to be used on laptops and does not seem
-to be very useful in any other environment, there will be mostly one
-device per port. Even if the CD/DVD writer happens to be connected to
-the same port as the hard drive, it generally *should* recover just
-fine from the occasional buffer under-run incurred by a head park
-request to the HD. Actually, when you are using an ide driver rather
-than its libata counterpart (i.e. your disk is called /dev/hda
-instead of /dev/sda), then parking the heads of one drive (drive X)
-will generally not affect the mode of operation of another drive
-(drive Y) on the same port as described above. It is only when a port
-reset is required to recover from an exception on drive Y that further
-I/O operations on that drive (and the reset itself) will be delayed
-until drive X is no longer in the parked state.
-
-Finally, there are some hard drives that only comply with an earlier
-version of the ATA standard than ATA-7, but do support the unload
-feature nonetheless. Unfortunately, there is no safe way Linux can
-detect these devices, so you won't be able to write to the
-unload_heads attribute. If you know that your device really does
-support the unload feature (for instance, because the vendor of your
-laptop or the hard drive itself told you so), then you can tell the
-kernel to enable the usage of this feature for that drive by writing
-the special value -1 to the unload_heads attribute::
-
-	# echo -1 > /sys/block/sda/device/unload_heads
-
-will enable the feature for /dev/sda, and giving -2 instead of -1 will
-disable it again.
-
-
-3. References
--------------
-
-There are several laptops from different vendors featuring shock
-protection capabilities. As manufacturers have refused to support open
-source development of the required software components so far, Linux
-support for shock protection varies considerably between different
-hardware implementations. Ideally, this section should contain a list
-of pointers at different projects aiming at an implementation of shock
-protection on different systems. Unfortunately, I only know of a
-single project which, although still considered experimental, is fit
-for use. Please feel free to add projects that have been the victims
-of my ignorance.
-
-- http://www.thinkwiki.org/wiki/HDAPS
-
-  See this page for information about Linux support of the hard disk
-  active protection system as implemented in IBM/Lenovo Thinkpads.
-
-
-4. CREDITS
-----------
-
-This implementation of disk head parking has been inspired by a patch
-originally published by Jon Escombe <lists@dresco.co.uk>. My efforts
-to develop an implementation of this feature that is fit to be merged
-into mainline have been aided by various kernel developers, in
-particular by Tejun Heo and Bartlomiej Zolnierkiewicz.
diff --git a/Documentation/laptops/index.rst b/Documentation/laptops/index.rst
deleted file mode 100644
index 001a30910d09..000000000000
--- a/Documentation/laptops/index.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-:orphan:
-
-==============
-Laptop Drivers
-==============
-
-.. toctree::
-   :maxdepth: 1
-
-   asus-laptop
-   disk-shock-protection
-   laptop-mode
-   lg-laptop
-   sony-laptop
-   sonypi
-   thinkpad-acpi
-   toshiba_haps
diff --git a/Documentation/laptops/laptop-mode.rst b/Documentation/laptops/laptop-mode.rst
deleted file mode 100644
index c984c4262f2e..000000000000
--- a/Documentation/laptops/laptop-mode.rst
+++ /dev/null
@@ -1,781 +0,0 @@
-===============================================
-How to conserve battery power using laptop-mode
-===============================================
-
-Document Author: Bart Samwel (bart@samwel.tk)
-
-Date created: January 2, 2004
-
-Last modified: December 06, 2004
-
-Introduction
-------------
-
-Laptop mode is used to minimize the time that the hard disk needs to be spun up,
-to conserve battery power on laptops. It has been reported to cause significant
-power savings.
-
-.. Contents
-
-   * Introduction
-   * Installation
-   * Caveats
-   * The Details
-   * Tips & Tricks
-   * Control script
-   * ACPI integration
-   * Monitoring tool
-
-
-Installation
-------------
-
-To use laptop mode, you don't need to set any kernel configuration options
-or anything. Simply install all the files included in this document, and
-laptop mode will automatically be started when you're on battery. For
-your convenience, a tarball containing an installer can be downloaded at:
-
-	http://www.samwel.tk/laptop_mode/laptop_mode/
-
-To configure laptop mode, you need to edit the configuration file, which is
-located in /etc/default/laptop-mode on Debian-based systems, or in
-/etc/sysconfig/laptop-mode on other systems.
-
-Unfortunately, automatic enabling of laptop mode does not work for
-laptops that don't have ACPI. On those laptops, you need to start laptop
-mode manually. To start laptop mode, run "laptop_mode start", and to
-stop it, run "laptop_mode stop". (Note: The laptop mode tools package now
-has experimental support for APM, you might want to try that first.)
-
-
-Caveats
--------
-
-* The downside of laptop mode is that you have a chance of losing up to 10
-  minutes of work. If you cannot afford this, don't use it! The supplied ACPI
-  scripts automatically turn off laptop mode when the battery almost runs out,
-  so that you won't lose any data at the end of your battery life.
-
-* Most desktop hard drives have a very limited lifetime measured in spindown
-  cycles, typically about 50.000 times (it's usually listed on the spec sheet).
-  Check your drive's rating, and don't wear down your drive's lifetime if you
-  don't need to.
-
-* If you mount some of your ext3/reiserfs filesystems with the -n option, then
-  the control script will not be able to remount them correctly. You must set
-  DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
-  wrong options -- or it will fail because it cannot write to /etc/mtab.
-
-* If you have your filesystems listed as type "auto" in fstab, like I did, then
-  the control script will not recognize them as filesystems that need remounting.
-  You must list the filesystems with their true type instead.
-
-* It has been reported that some versions of the mutt mail client use file access
-  times to determine whether a folder contains new mail. If you use mutt and
-  experience this, you must disable the noatime remounting by setting the option
-  DO_REMOUNT_NOATIME to 0 in the configuration file.
-
-
-The Details
------------
-
-Laptop mode is controlled by the knob /proc/sys/vm/laptop_mode. This knob is
-present for all kernels that have the laptop mode patch, regardless of any
-configuration options. When the knob is set, any physical disk I/O (that might
-have caused the hard disk to spin up) causes Linux to flush all dirty blocks. The
-result of this is that after a disk has spun down, it will not be spun up
-anymore to write dirty blocks, because those blocks had already been written
-immediately after the most recent read operation. The value of the laptop_mode
-knob determines the time between the occurrence of disk I/O and when the flush
-is triggered. A sensible value for the knob is 5 seconds. Setting the knob to
-0 disables laptop mode.
-
-To increase the effectiveness of the laptop_mode strategy, the laptop_mode
-control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
-/proc/sys/vm to about 10 minutes (by default), which means that pages that are
-dirtied are not forced to be written to disk as often. The control script also
-changes the dirty background ratio, so that background writeback of dirty pages
-is not done anymore. Combined with a higher commit value (also 10 minutes) for
-ext3 or ReiserFS filesystems (also done automatically by the control script),
-this results in concentration of disk activity in a small time interval which
-occurs only once every 10 minutes, or whenever the disk is forced to spin up by
-a cache miss. The disk can then be spun down in the periods of inactivity.
-
-If you want to find out which process caused the disk to spin up, you can
-gather information by setting the flag /proc/sys/vm/block_dump. When this flag
-is set, Linux reports all disk read and write operations that take place, and
-all block dirtyings done to files. This makes it possible to debug why a disk
-needs to spin up, and to increase battery life even more. The output of
-block_dump is written to the kernel output, and it can be retrieved using
-"dmesg". When you use block_dump and your kernel logging level also includes
-kernel debugging messages, you probably want to turn off klogd, otherwise
-the output of block_dump will be logged, causing disk activity that is not
-normally there.
-
-
-Configuration
--------------
-
-The laptop mode configuration file is located in /etc/default/laptop-mode on
-Debian-based systems, or in /etc/sysconfig/laptop-mode on other systems. It
-contains the following options:
-
-MAX_AGE:
-
-Maximum time, in seconds, of hard drive spindown time that you are
-comfortable with. Worst case, it's possible that you could lose this
-amount of work if your battery fails while you're in laptop mode.
-
-MINIMUM_BATTERY_MINUTES:
-
-Automatically disable laptop mode if the remaining number of minutes of
-battery power is less than this value. Default is 10 minutes.
-
-AC_HD/BATT_HD:
-
-The idle timeout that should be set on your hard drive when laptop mode
-is active (BATT_HD) and when it is not active (AC_HD). The defaults are
-20 seconds (value 4) for BATT_HD  and 2 hours (value 244) for AC_HD. The
-possible values are those listed in the manual page for "hdparm" for the
-"-S" option.
-
-HD:
-
-The devices for which the spindown timeout should be adjusted by laptop mode.
-Default is /dev/hda. If you specify multiple devices, separate them by a space.
-
-READAHEAD:
-
-Disk readahead, in 512-byte sectors, while laptop mode is active. A large
-readahead can prevent disk accesses for things like executable pages (which are
-loaded on demand while the application executes) and sequentially accessed data
-(MP3s).
-
-DO_REMOUNTS:
-
-The control script automatically remounts any mounted journaled filesystems
-with appropriate commit interval options. When this option is set to 0, this
-feature is disabled.
-
-DO_REMOUNT_NOATIME:
-
-When remounting, should the filesystems be remounted with the noatime option?
-Normally, this is set to "1" (enabled), but there may be programs that require
-access time recording.
-
-DIRTY_RATIO:
-
-The percentage of memory that is allowed to contain "dirty" or unsaved data
-before a writeback is forced, while laptop mode is active. Corresponds to
-the /proc/sys/vm/dirty_ratio sysctl.
-
-DIRTY_BACKGROUND_RATIO:
-
-The percentage of memory that is allowed to contain "dirty" or unsaved data
-after a forced writeback is done due to an exceeding of DIRTY_RATIO. Set
-this nice and low. This corresponds to the /proc/sys/vm/dirty_background_ratio
-sysctl.
-
-Note that the behaviour of dirty_background_ratio is quite different
-when laptop mode is active and when it isn't. When laptop mode is inactive,
-dirty_background_ratio is the threshold percentage at which background writeouts
-start taking place. When laptop mode is active, however, background writeouts
-are disabled, and the dirty_background_ratio only determines how much writeback
-is done when dirty_ratio is reached.
-
-DO_CPU:
-
-Enable CPU frequency scaling when in laptop mode. (Requires CPUFreq to be setup.
-See Documentation/admin-guide/pm/cpufreq.rst for more info. Disabled by default.)
-
-CPU_MAXFREQ:
-
-When on battery, what is the maximum CPU speed that the system should use? Legal
-values are "slowest" for the slowest speed that your CPU is able to operate at,
-or a value listed in /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies.
-
-
-Tips & Tricks
--------------
-
-* Bartek Kania reports getting up to 50 minutes of extra battery life (on top
-  of his regular 3 to 3.5 hours) using a spindown time of 5 seconds (BATT_HD=1).
-
-* You can spin down the disk while playing MP3, by setting disk readahead
-  to 8MB (READAHEAD=16384). Effectively, the disk will read a complete MP3 at
-  once, and will then spin down while the MP3 is playing. (Thanks to Bartek
-  Kania.)
-
-* Drew Scott Daniels observed: "I don't know why, but when I decrease the number
-  of colours that my display uses it consumes less battery power. I've seen
-  this on powerbooks too. I hope that this is a piece of information that
-  might be useful to the Laptop Mode patch or its users."
-
-* In syslog.conf, you can prefix entries with a dash `-` to omit syncing the
-  file after every logging. When you're using laptop-mode and your disk doesn't
-  spin down, this is a likely culprit.
-
-* Richard Atterer observed that laptop mode does not work well with noflushd
-  (http://noflushd.sourceforge.net/), it seems that noflushd prevents laptop-mode
-  from doing its thing.
-
-* If you're worried about your data, you might want to consider using a USB
-  memory stick or something like that as a "working area". (Be aware though
-  that flash memory can only handle a limited number of writes, and overuse
-  may wear out your memory stick pretty quickly. Do _not_ use journalling
-  filesystems on flash memory sticks.)
-
-
-Configuration file for control and ACPI battery scripts
--------------------------------------------------------
-
-This allows the tunables to be changed for the scripts via an external
-configuration file
-
-It should be installed as /etc/default/laptop-mode on Debian, and as
-/etc/sysconfig/laptop-mode on Red Hat, SUSE, Mandrake, and other work-alikes.
-
-Config file::
-
-  # Maximum time, in seconds, of hard drive spindown time that you are
-  # comfortable with. Worst case, it's possible that you could lose this
-  # amount of work if your battery fails you while in laptop mode.
-  #MAX_AGE=600
-
-  # Automatically disable laptop mode when the number of minutes of battery
-  # that you have left goes below this threshold.
-  MINIMUM_BATTERY_MINUTES=10
-
-  # Read-ahead, in 512-byte sectors. You can spin down the disk while playing MP3/OGG
-  # by setting the disk readahead to 8MB (READAHEAD=16384). Effectively, the disk
-  # will read a complete MP3 at once, and will then spin down while the MP3/OGG is
-  # playing.
-  #READAHEAD=4096
-
-  # Shall we remount journaled fs. with appropriate commit interval? (1=yes)
-  #DO_REMOUNTS=1
-
-  # And shall we add the "noatime" option to that as well? (1=yes)
-  #DO_REMOUNT_NOATIME=1
-
-  # Dirty synchronous ratio.  At this percentage of dirty pages the process
-  # which
-  # calls write() does its own writeback
-  #DIRTY_RATIO=40
-
-  #
-  # Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
-  # exceeded, the kernel will wake flusher threads which will then reduce the
-  # amount of dirty memory to dirty_background_ratio.  Set this nice and low,
-  # so once some writeout has commenced, we do a lot of it.
-  #
-  #DIRTY_BACKGROUND_RATIO=5
-
-  # kernel default dirty buffer age
-  #DEF_AGE=30
-  #DEF_UPDATE=5
-  #DEF_DIRTY_BACKGROUND_RATIO=10
-  #DEF_DIRTY_RATIO=40
-  #DEF_XFS_AGE_BUFFER=15
-  #DEF_XFS_SYNC_INTERVAL=30
-  #DEF_XFS_BUFD_INTERVAL=1
-
-  # This must be adjusted manually to the value of HZ in the running kernel
-  # on 2.4, until the XFS people change their 2.4 external interfaces to work in
-  # centisecs. This can be automated, but it's a work in progress that still
-  # needs# some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for
-  # external interfaces, and that is currently always set to 100. So you don't
-  # need to change this on 2.6.
-  #XFS_HZ=100
-
-  # Should the maximum CPU frequency be adjusted down while on battery?
-  # Requires CPUFreq to be setup.
-  # See Documentation/admin-guide/pm/cpufreq.rst for more info
-  #DO_CPU=0
-
-  # When on battery what is the maximum CPU speed that the system should
-  # use? Legal values are "slowest" for the slowest speed that your
-  # CPU is able to operate at, or a value listed in:
-  # /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
-  # Only applicable if DO_CPU=1.
-  #CPU_MAXFREQ=slowest
-
-  # Idle timeout for your hard drive (man hdparm for valid values, -S option)
-  # Default is 2 hours on AC (AC_HD=244) and 20 seconds for battery (BATT_HD=4).
-  #AC_HD=244
-  #BATT_HD=4
-
-  # The drives for which to adjust the idle timeout. Separate them by a space,
-  # e.g. HD="/dev/hda /dev/hdb".
-  #HD="/dev/hda"
-
-  # Set the spindown timeout on a hard drive?
-  #DO_HD=1
-
-
-Control script
---------------
-
-Please note that this control script works for the Linux 2.4 and 2.6 series (thanks
-to Kiko Piris).
-
-Control script::
-
-  #!/bin/bash
-
-  # start or stop laptop_mode, best run by a power management daemon when
-  # ac gets connected/disconnected from a laptop
-  #
-  # install as /sbin/laptop_mode
-  #
-  # Contributors to this script:   Kiko Piris
-  #				 Bart Samwel
-  #				 Micha Feigin
-  #				 Andrew Morton
-  #				 Herve Eychenne
-  #				 Dax Kelson
-  #
-  # Original Linux 2.4 version by: Jens Axboe
-
-  #############################################################################
-
-  # Source config
-  if [ -f /etc/default/laptop-mode ] ; then
-	# Debian
-	. /etc/default/laptop-mode
-  elif [ -f /etc/sysconfig/laptop-mode ] ; then
-	# Others
-          . /etc/sysconfig/laptop-mode
-  fi
-
-  # Don't raise an error if the config file is incomplete
-  # set defaults instead:
-
-  # Maximum time, in seconds, of hard drive spindown time that you are
-  # comfortable with. Worst case, it's possible that you could lose this
-  # amount of work if your battery fails you while in laptop mode.
-  MAX_AGE=${MAX_AGE:-'600'}
-
-  # Read-ahead, in kilobytes
-  READAHEAD=${READAHEAD:-'4096'}
-
-  # Shall we remount journaled fs. with appropriate commit interval? (1=yes)
-  DO_REMOUNTS=${DO_REMOUNTS:-'1'}
-
-  # And shall we add the "noatime" option to that as well? (1=yes)
-  DO_REMOUNT_NOATIME=${DO_REMOUNT_NOATIME:-'1'}
-
-  # Shall we adjust the idle timeout on a hard drive?
-  DO_HD=${DO_HD:-'1'}
-
-  # Adjust idle timeout on which hard drive?
-  HD="${HD:-'/dev/hda'}"
-
-  # spindown time for HD (hdparm -S values)
-  AC_HD=${AC_HD:-'244'}
-  BATT_HD=${BATT_HD:-'4'}
-
-  # Dirty synchronous ratio.  At this percentage of dirty pages the process which
-  # calls write() does its own writeback
-  DIRTY_RATIO=${DIRTY_RATIO:-'40'}
-
-  # cpu frequency scaling
-  # See Documentation/admin-guide/pm/cpufreq.rst for more info
-  DO_CPU=${CPU_MANAGE:-'0'}
-  CPU_MAXFREQ=${CPU_MAXFREQ:-'slowest'}
-
-  #
-  # Allowed dirty background ratio, in percent.  Once DIRTY_RATIO has been
-  # exceeded, the kernel will wake flusher threads which will then reduce the
-  # amount of dirty memory to dirty_background_ratio.  Set this nice and low,
-  # so once some writeout has commenced, we do a lot of it.
-  #
-  DIRTY_BACKGROUND_RATIO=${DIRTY_BACKGROUND_RATIO:-'5'}
-
-  # kernel default dirty buffer age
-  DEF_AGE=${DEF_AGE:-'30'}
-  DEF_UPDATE=${DEF_UPDATE:-'5'}
-  DEF_DIRTY_BACKGROUND_RATIO=${DEF_DIRTY_BACKGROUND_RATIO:-'10'}
-  DEF_DIRTY_RATIO=${DEF_DIRTY_RATIO:-'40'}
-  DEF_XFS_AGE_BUFFER=${DEF_XFS_AGE_BUFFER:-'15'}
-  DEF_XFS_SYNC_INTERVAL=${DEF_XFS_SYNC_INTERVAL:-'30'}
-  DEF_XFS_BUFD_INTERVAL=${DEF_XFS_BUFD_INTERVAL:-'1'}
-
-  # This must be adjusted manually to the value of HZ in the running kernel
-  # on 2.4, until the XFS people change their 2.4 external interfaces to work in
-  # centisecs. This can be automated, but it's a work in progress that still needs
-  # some fixes. On 2.6 kernels, XFS uses USER_HZ instead of HZ for external
-  # interfaces, and that is currently always set to 100. So you don't need to
-  # change this on 2.6.
-  XFS_HZ=${XFS_HZ:-'100'}
-
-  #############################################################################
-
-  KLEVEL="$(uname -r |
-               {
-	       IFS='.' read a b c
-	       echo $a.$b
-	     }
-  )"
-  case "$KLEVEL" in
-	"2.4"|"2.6")
-		;;
-	*)
-		echo "Unhandled kernel version: $KLEVEL ('uname -r' = '$(uname -r)')" >&2
-		exit 1
-		;;
-  esac
-
-  if [ ! -e /proc/sys/vm/laptop_mode ] ; then
-	echo "Kernel is not patched with laptop_mode patch." >&2
-	exit 1
-  fi
-
-  if [ ! -w /proc/sys/vm/laptop_mode ] ; then
-	echo "You do not have enough privileges to enable laptop_mode." >&2
-	exit 1
-  fi
-
-  # Remove an option (the first parameter) of the form option=<number> from
-  # a mount options string (the rest of the parameters).
-  parse_mount_opts () {
-	OPT="$1"
-	shift
-	echo ",$*," | sed		\
-	 -e 's/,'"$OPT"'=[0-9]*,/,/g'	\
-	 -e 's/,,*/,/g'			\
-	 -e 's/^,//'			\
-	 -e 's/,$//'
-  }
-
-  # Remove an option (the first parameter) without any arguments from
-  # a mount option string (the rest of the parameters).
-  parse_nonumber_mount_opts () {
-	OPT="$1"
-	shift
-	echo ",$*," | sed		\
-	 -e 's/,'"$OPT"',/,/g'		\
-	 -e 's/,,*/,/g'			\
-	 -e 's/^,//'			\
-	 -e 's/,$//'
-  }
-
-  # Find out the state of a yes/no option (e.g. "atime"/"noatime") in
-  # fstab for a given filesystem, and use this state to replace the
-  # value of the option in another mount options string. The device
-  # is the first argument, the option name the second, and the default
-  # value the third. The remainder is the mount options string.
-  #
-  # Example:
-  # parse_yesno_opts_wfstab /dev/hda1 atime atime defaults,noatime
-  #
-  # If fstab contains, say, "rw" for this filesystem, then the result
-  # will be "defaults,atime".
-  parse_yesno_opts_wfstab () {
-	L_DEV="$1"
-	OPT="$2"
-	DEF_OPT="$3"
-	shift 3
-	L_OPTS="$*"
-	PARSEDOPTS1="$(parse_nonumber_mount_opts $OPT $L_OPTS)"
-	PARSEDOPTS1="$(parse_nonumber_mount_opts no$OPT $PARSEDOPTS1)"
-	# Watch for a default atime in fstab
-	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
-	if echo "$FSTAB_OPTS" | grep "$OPT" > /dev/null ; then
-		# option specified in fstab: extract the value and use it
-		if echo "$FSTAB_OPTS" | grep "no$OPT" > /dev/null ; then
-			echo "$PARSEDOPTS1,no$OPT"
-		else
-			# no$OPT not found -- so we must have $OPT.
-			echo "$PARSEDOPTS1,$OPT"
-		fi
-	else
-		# option not specified in fstab -- choose the default.
-		echo "$PARSEDOPTS1,$DEF_OPT"
-	fi
-  }
-
-  # Find out the state of a numbered option (e.g. "commit=NNN") in
-  # fstab for a given filesystem, and use this state to replace the
-  # value of the option in another mount options string. The device
-  # is the first argument, and the option name the second. The
-  # remainder is the mount options string in which the replacement
-  # must be done.
-  #
-  # Example:
-  # parse_mount_opts_wfstab /dev/hda1 commit defaults,commit=7
-  #
-  # If fstab contains, say, "commit=3,rw" for this filesystem, then the
-  # result will be "rw,commit=3".
-  parse_mount_opts_wfstab () {
-	L_DEV="$1"
-	OPT="$2"
-	shift 2
-	L_OPTS="$*"
-	PARSEDOPTS1="$(parse_mount_opts $OPT $L_OPTS)"
-	# Watch for a default commit in fstab
-	FSTAB_OPTS="$(awk '$1 == "'$L_DEV'" { print $4 }' /etc/fstab)"
-	if echo "$FSTAB_OPTS" | grep "$OPT=" > /dev/null ; then
-		# option specified in fstab: extract the value, and use it
-		echo -n "$PARSEDOPTS1,$OPT="
-		echo ",$FSTAB_OPTS," | sed \
-		 -e 's/.*,'"$OPT"'=//'	\
-		 -e 's/,.*//'
-	else
-		# option not specified in fstab: set it to 0
-		echo "$PARSEDOPTS1,$OPT=0"
-	fi
-  }
-
-  deduce_fstype () {
-	MP="$1"
-	# My root filesystem unfortunately has
-	# type "unknown" in /etc/mtab. If we encounter
-	# "unknown", we try to get the type from fstab.
-	cat /etc/fstab |
-	grep -v '^#' |
-	while read FSTAB_DEV FSTAB_MP FSTAB_FST FSTAB_OPTS FSTAB_DUMP FSTAB_DUMP ; do
-		if [ "$FSTAB_MP" = "$MP" ]; then
-			echo $FSTAB_FST
-			exit 0
-		fi
-	done
-  }
-
-  if [ $DO_REMOUNT_NOATIME -eq 1 ] ; then
-	NOATIME_OPT=",noatime"
-  fi
-
-  case "$1" in
-	start)
-		AGE=$((100*$MAX_AGE))
-		XFS_AGE=$(($XFS_HZ*$MAX_AGE))
-		echo -n "Starting laptop_mode"
-
-		if [ -d /proc/sys/vm/pagebuf ] ; then
-			# (For 2.4 and early 2.6.)
-			# This only needs to be set, not reset -- it is only used when
-			# laptop mode is enabled.
-			echo $XFS_AGE > /proc/sys/vm/pagebuf/lm_flush_age
-			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
-		elif [ -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
-			# (A couple of early 2.6 laptop mode patches had these.)
-			# The same goes for these.
-			echo $XFS_AGE > /proc/sys/fs/xfs/lm_age_buffer
-			echo $XFS_AGE > /proc/sys/fs/xfs/lm_sync_interval
-		elif [ -f /proc/sys/fs/xfs/age_buffer ] ; then
-			# (2.6.6)
-			# But not for these -- they are also used in normal
-			# operation.
-			echo $XFS_AGE > /proc/sys/fs/xfs/age_buffer
-			echo $XFS_AGE > /proc/sys/fs/xfs/sync_interval
-		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
-			# (2.6.7 upwards)
-			# And not for these either. These are in centisecs,
-			# not USER_HZ, so we have to use $AGE, not $XFS_AGE.
-			echo $AGE > /proc/sys/fs/xfs/age_buffer_centisecs
-			echo $AGE > /proc/sys/fs/xfs/xfssyncd_centisecs
-			echo 3000 > /proc/sys/fs/xfs/xfsbufd_centisecs
-		fi
-
-		case "$KLEVEL" in
-			"2.4")
-				echo 1					> /proc/sys/vm/laptop_mode
-				echo "30 500 0 0 $AGE $AGE 60 20 0"	> /proc/sys/vm/bdflush
-				;;
-			"2.6")
-				echo 5					> /proc/sys/vm/laptop_mode
-				echo "$AGE"				> /proc/sys/vm/dirty_writeback_centisecs
-				echo "$AGE"				> /proc/sys/vm/dirty_expire_centisecs
-				echo "$DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
-				echo "$DIRTY_BACKGROUND_RATIO"		> /proc/sys/vm/dirty_background_ratio
-				;;
-		esac
-		if [ $DO_REMOUNTS -eq 1 ]; then
-			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
-				PARSEDOPTS="$(parse_mount_opts "$OPTS")"
-				if [ "$FST" = 'unknown' ]; then
-					FST=$(deduce_fstype $MP)
-				fi
-				case "$FST" in
-					"ext3"|"reiserfs")
-						PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
-						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
-						;;
-					"xfs")
-						mount $DEV -t $FST $MP -o remount,$OPTS$NOATIME_OPT
-						;;
-				esac
-				if [ -b $DEV ] ; then
-					blockdev --setra $(($READAHEAD * 2)) $DEV
-				fi
-			done
-		fi
-		if [ $DO_HD -eq 1 ] ; then
-			for THISHD in $HD ; do
-				/sbin/hdparm -S $BATT_HD $THISHD > /dev/null 2>&1
-				/sbin/hdparm -B 1 $THISHD > /dev/null 2>&1
-			done
-		fi
-		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
-			if [ $CPU_MAXFREQ = 'slowest' ]; then
-				CPU_MAXFREQ=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq`
-			fi
-			echo $CPU_MAXFREQ > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-		fi
-		echo "."
-		;;
-	stop)
-		U_AGE=$((100*$DEF_UPDATE))
-		B_AGE=$((100*$DEF_AGE))
-		echo -n "Stopping laptop_mode"
-		echo 0 > /proc/sys/vm/laptop_mode
-		if [ -f /proc/sys/fs/xfs/age_buffer -a ! -f /proc/sys/fs/xfs/lm_age_buffer ] ; then
-			# These need to be restored, if there are no lm_*.
-			echo $(($XFS_HZ*$DEF_XFS_AGE_BUFFER))	 	> /proc/sys/fs/xfs/age_buffer
-			echo $(($XFS_HZ*$DEF_XFS_SYNC_INTERVAL)) 	> /proc/sys/fs/xfs/sync_interval
-		elif [ -f /proc/sys/fs/xfs/age_buffer_centisecs ] ; then
-			# These need to be restored as well.
-			echo $((100*$DEF_XFS_AGE_BUFFER))	> /proc/sys/fs/xfs/age_buffer_centisecs
-			echo $((100*$DEF_XFS_SYNC_INTERVAL))	> /proc/sys/fs/xfs/xfssyncd_centisecs
-			echo $((100*$DEF_XFS_BUFD_INTERVAL))	> /proc/sys/fs/xfs/xfsbufd_centisecs
-		fi
-		case "$KLEVEL" in
-			"2.4")
-				echo "30 500 0 0 $U_AGE $B_AGE 60 20 0"	> /proc/sys/vm/bdflush
-				;;
-			"2.6")
-				echo "$U_AGE"				> /proc/sys/vm/dirty_writeback_centisecs
-				echo "$B_AGE"				> /proc/sys/vm/dirty_expire_centisecs
-				echo "$DEF_DIRTY_RATIO"			> /proc/sys/vm/dirty_ratio
-				echo "$DEF_DIRTY_BACKGROUND_RATIO"	> /proc/sys/vm/dirty_background_ratio
-				;;
-		esac
-		if [ $DO_REMOUNTS -eq 1 ] ; then
-			cat /etc/mtab | while read DEV MP FST OPTS DUMP PASS ; do
-				# Reset commit and atime options to defaults.
-				if [ "$FST" = 'unknown' ]; then
-					FST=$(deduce_fstype $MP)
-				fi
-				case "$FST" in
-					"ext3"|"reiserfs")
-						PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
-						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
-						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
-						;;
-					"xfs")
-						PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $OPTS)"
-						mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
-						;;
-				esac
-				if [ -b $DEV ] ; then
-					blockdev --setra 256 $DEV
-				fi
-			done
-		fi
-		if [ $DO_HD -eq 1 ] ; then
-			for THISHD in $HD ; do
-				/sbin/hdparm -S $AC_HD $THISHD > /dev/null 2>&1
-				/sbin/hdparm -B 255 $THISHD > /dev/null 2>&1
-			done
-		fi
-		if [ $DO_CPU -eq 1 -a -e /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq ]; then
-			echo `cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq` > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-		fi
-		echo "."
-		;;
-	*)
-		echo "Usage: $0 {start|stop}" 2>&1
-		exit 1
-		;;
-
-  esac
-
-  exit 0
-
-
-ACPI integration
-----------------
-
-Dax Kelson submitted this so that the ACPI acpid daemon will
-kick off the laptop_mode script and run hdparm. The part that
-automatically disables laptop mode when the battery is low was
-written by Jan Topinski.
-
-/etc/acpi/events/ac_adapter::
-
-	event=ac_adapter
-	action=/etc/acpi/actions/ac.sh %e
-
-/etc/acpi/events/battery::
-
-	event=battery.*
-	action=/etc/acpi/actions/battery.sh %e
-
-/etc/acpi/actions/ac.sh::
-
-  #!/bin/bash
-
-  # ac on/offline event handler
-
-  status=`awk '/^state: / { print $2 }' /proc/acpi/ac_adapter/$2/state`
-
-  case $status in
-          "on-line")
-                  /sbin/laptop_mode stop
-                  exit 0
-          ;;
-          "off-line")
-                  /sbin/laptop_mode start
-                  exit 0
-          ;;
-  esac
-
-
-/etc/acpi/actions/battery.sh::
-
-  #! /bin/bash
-
-  # Automatically disable laptop mode when the battery almost runs out.
-
-  BATT_INFO=/proc/acpi/battery/$2/state
-
-  if [[ -f /proc/sys/vm/laptop_mode ]]
-  then
-     LM=`cat /proc/sys/vm/laptop_mode`
-     if [[ $LM -gt 0 ]]
-     then
-       if [[ -f $BATT_INFO ]]
-       then
-          # Source the config file only now that we know we need
-          if [ -f /etc/default/laptop-mode ] ; then
-                  # Debian
-                  . /etc/default/laptop-mode
-          elif [ -f /etc/sysconfig/laptop-mode ] ; then
-                  # Others
-                  . /etc/sysconfig/laptop-mode
-          fi
-          MINIMUM_BATTERY_MINUTES=${MINIMUM_BATTERY_MINUTES:-'10'}
-
-          ACTION="`cat $BATT_INFO | grep charging | cut -c 26-`"
-          if [[ ACTION -eq "discharging" ]]
-          then
-             PRESENT_RATE=`cat $BATT_INFO | grep "present rate:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
-             REMAINING=`cat $BATT_INFO | grep "remaining capacity:" | sed  "s/.* \([0-9][0-9]* \).*/\1/" `
-          fi
-          if (($REMAINING * 60 / $PRESENT_RATE < $MINIMUM_BATTERY_MINUTES))
-          then
-             /sbin/laptop_mode stop
-          fi
-       else
-         logger -p daemon.warning "You are using laptop mode and your battery interface $BATT_INFO is missing. This may lead to loss of data when the battery runs out. Check kernel ACPI support and /proc/acpi/battery folder, and edit /etc/acpi/battery.sh to set BATT_INFO to the correct path."
-       fi
-     fi
-  fi
-
-
-Monitoring tool
----------------
-
-Bartek Kania submitted this, it can be used to measure how much time your disk
-spends spun up/down.  See tools/laptop/dslm/dslm.c
diff --git a/Documentation/laptops/lg-laptop.rst b/Documentation/laptops/lg-laptop.rst
deleted file mode 100644
index f2c2ffe31101..000000000000
--- a/Documentation/laptops/lg-laptop.rst
+++ /dev/null
@@ -1,85 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0+
-
-:orphan:
-
-LG Gram laptop extra features
-=============================
-
-By Matan Ziv-Av <matan@svgalib.org>
-
-
-Hotkeys
--------
-
-The following FN keys are ignored by the kernel without this driver:
-
-- FN-F1 (LG control panel)   - Generates F15
-- FN-F5 (Touchpad toggle)    - Generates F13
-- FN-F6 (Airplane mode)      - Generates RFKILL
-- FN-F8 (Keyboard backlight) - Generates F16.
-  This key also changes keyboard backlight mode.
-- FN-F9 (Reader mode)        - Generates F14
-
-The rest of the FN keys work without a need for a special driver.
-
-
-Reader mode
------------
-
-Writing 0/1 to /sys/devices/platform/lg-laptop/reader_mode disables/enables
-reader mode. In this mode the screen colors change (blue color reduced),
-and the reader mode indicator LED (on F9 key) turns on.
-
-
-FN Lock
--------
-
-Writing 0/1 to /sys/devices/platform/lg-laptop/fn_lock disables/enables
-FN lock.
-
-
-Battery care limit
-------------------
-
-Writing 80/100 to /sys/devices/platform/lg-laptop/battery_care_limit
-sets the maximum capacity to charge the battery. Limiting the charge
-reduces battery capacity loss over time.
-
-This value is reset to 100 when the kernel boots.
-
-
-Fan mode
---------
-
-Writing 1/0 to /sys/devices/platform/lg-laptop/fan_mode disables/enables
-the fan silent mode.
-
-
-USB charge
-----------
-
-Writing 0/1 to /sys/devices/platform/lg-laptop/usb_charge disables/enables
-charging another device from the USB port while the device is turned off.
-
-This value is reset to 0 when the kernel boots.
-
-
-LEDs
-~~~~
-
-The are two LED devices supported by the driver:
-
-Keyboard backlight
-------------------
-
-A led device named kbd_led controls the keyboard backlight. There are three
-lighting level: off (0), low (127) and high (255).
-
-The keyboard backlight is also controlled by the key combination FN-F8
-which cycles through those levels.
-
-
-Touchpad indicator LED
-----------------------
-
-On the F5 key. Controlled by led device names tpad_led.
diff --git a/Documentation/laptops/sony-laptop.rst b/Documentation/laptops/sony-laptop.rst
deleted file mode 100644
index 9edcc7f6612f..000000000000
--- a/Documentation/laptops/sony-laptop.rst
+++ /dev/null
@@ -1,174 +0,0 @@
-=========================================
-Sony Notebook Control Driver (SNC) Readme
-=========================================
-
-	- Copyright (C) 2004- 2005 Stelian Pop <stelian@popies.net>
-	- Copyright (C) 2007 Mattia Dongili <malattia@linux.it>
-
-This mini-driver drives the SNC and SPIC device present in the ACPI BIOS of the
-Sony Vaio laptops. This driver mixes both devices functions under the same
-(hopefully consistent) interface. This also means that the sonypi driver is
-obsoleted by sony-laptop now.
-
-Fn keys (hotkeys):
-------------------
-
-Some models report hotkeys through the SNC or SPIC devices, such events are
-reported both through the ACPI subsystem as acpi events and through the INPUT
-subsystem. See the logs of /proc/bus/input/devices to find out what those
-events are and which input devices are created by the driver.
-Additionally, loading the driver with the debug option will report all events
-in the kernel log.
-
-The "scancodes" passed to the input system (that can be remapped with udev)
-are indexes to the table "sony_laptop_input_keycode_map" in the sony-laptop.c
-module.  For example the "FN/E" key combination (EJECTCD on some models)
-generates the scancode 20 (0x14).
-
-Backlight control:
-------------------
-If your laptop model supports it, you will find sysfs files in the
-/sys/class/backlight/sony/
-directory. You will be able to query and set the current screen
-brightness:
-
-	======================	=========================================
-	brightness		get/set screen brightness (an integer
-				between 0 and 7)
-	actual_brightness	reading from this file will query the HW
-				to get real brightness value
-	max_brightness		the maximum brightness value
-	======================	=========================================
-
-
-Platform specific:
-------------------
-Loading the sony-laptop module will create a
-/sys/devices/platform/sony-laptop/
-directory populated with some files.
-
-You then read/write integer values from/to those files by using
-standard UNIX tools.
-
-The files are:
-
-	======================	==========================================
-	brightness_default	screen brightness which will be set
-				when the laptop will be rebooted
-	cdpower			power on/off the internal CD drive
-	audiopower		power on/off the internal sound card
-	lanpower		power on/off the internal ethernet card
-				(only in debug mode)
-	bluetoothpower		power on/off the internal bluetooth device
-	fanspeed		get/set the fan speed
-	======================	==========================================
-
-Note that some files may be missing if they are not supported
-by your particular laptop model.
-
-Example usage::
-
-	# echo "1" > /sys/devices/platform/sony-laptop/brightness_default
-
-sets the lowest screen brightness for the next and later reboots
-
-::
-
-	# echo "8" > /sys/devices/platform/sony-laptop/brightness_default
-
-sets the highest screen brightness for the next and later reboots
-
-::
-
-	# cat /sys/devices/platform/sony-laptop/brightness_default
-
-retrieves the value
-
-::
-
-	# echo "0" > /sys/devices/platform/sony-laptop/audiopower
-
-powers off the sound card
-
-::
-
-	# echo "1" > /sys/devices/platform/sony-laptop/audiopower
-
-powers on the sound card.
-
-
-RFkill control:
----------------
-More recent Vaio models expose a consistent set of ACPI methods to
-control radio frequency emitting devices. If you are a lucky owner of
-such a laptop you will find the necessary rfkill devices under
-/sys/class/rfkill. Check those starting with sony-* in::
-
-	# grep . /sys/class/rfkill/*/{state,name}
-
-
-Development:
-------------
-
-If you want to help with the development of this driver (and
-you are not afraid of any side effects doing strange things with
-your ACPI BIOS could have on your laptop), load the driver and
-pass the option 'debug=1'.
-
-REPEAT:
-	**DON'T DO THIS IF YOU DON'T LIKE RISKY BUSINESS.**
-
-In your kernel logs you will find the list of all ACPI methods
-the SNC device has on your laptop.
-
-* For new models you will see a long list of meaningless method names,
-  reading the DSDT table source should reveal that:
-
-(1) the SNC device uses an internal capability lookup table
-(2) SN00 is used to find values in the lookup table
-(3) SN06 and SN07 are used to call into the real methods based on
-    offsets you can obtain iterating the table using SN00
-(4) SN02 used to enable events.
-
-Some values in the capability lookup table are more or less known, see
-the code for all sony_call_snc_handle calls, others are more obscure.
-
-* For old models you can see the GCDP/GCDP methods used to pwer on/off
-  the CD drive, but there are others and they are usually different from
-  model to model.
-
-**I HAVE NO IDEA WHAT THOSE METHODS DO.**
-
-The sony-laptop driver creates, for some of those methods (the most
-current ones found on several Vaio models), an entry under
-/sys/devices/platform/sony-laptop, just like the 'cdpower' one.
-You can create other entries corresponding to your own laptop methods by
-further editing the source (see the 'sony_nc_values' table, and add a new
-entry to this table with your get/set method names using the
-SNC_HANDLE_NAMES macro).
-
-Your mission, should you accept it, is to try finding out what
-those entries are for, by reading/writing random values from/to those
-files and find out what is the impact on your laptop.
-
-Should you find anything interesting, please report it back to me,
-I will not disavow all knowledge of your actions :)
-
-See also http://www.linux.it/~malattia/wiki/index.php/Sony_drivers for other
-useful info.
-
-Bugs/Limitations:
------------------
-
-* This driver is not based on official documentation from Sony
-  (because there is none), so there is no guarantee this driver
-  will work at all, or do the right thing. Although this hasn't
-  happened to me, this driver could do very bad things to your
-  laptop, including permanent damage.
-
-* The sony-laptop and sonypi drivers do not interact at all. In the
-  future, sonypi will be removed and replaced by sony-laptop.
-
-* spicctrl, which is the userspace tool used to communicate with the
-  sonypi driver (through /dev/sonypi) is deprecated as well since all
-  its features are now available under the sysfs tree via sony-laptop.
diff --git a/Documentation/laptops/sonypi.rst b/Documentation/laptops/sonypi.rst
deleted file mode 100644
index 2a1975ed7ee4..000000000000
--- a/Documentation/laptops/sonypi.rst
+++ /dev/null
@@ -1,160 +0,0 @@
-==================================================
-Sony Programmable I/O Control Device Driver Readme
-==================================================
-
-	- Copyright (C) 2001-2004 Stelian Pop <stelian@popies.net>
-	- Copyright (C) 2001-2002 Alcôve <www.alcove.com>
-	- Copyright (C) 2001 Michael Ashley <m.ashley@unsw.edu.au>
-	- Copyright (C) 2001 Junichi Morita <jun1m@mars.dti.ne.jp>
-	- Copyright (C) 2000 Takaya Kinjo <t-kinjo@tc4.so-net.ne.jp>
-	- Copyright (C) 2000 Andrew Tridgell <tridge@samba.org>
-
-This driver enables access to the Sony Programmable I/O Control Device which
-can be found in many Sony Vaio laptops. Some newer Sony laptops (seems to be
-limited to new FX series laptops, at least the FX501 and the FX702) lack a
-sonypi device and are not supported at all by this driver.
-
-It will give access (through a user space utility) to some events those laptops
-generate, like:
-
-	- jogdial events (the small wheel on the side of Vaios)
-	- capture button events (only on Vaio Picturebook series)
-	- Fn keys
-	- bluetooth button (only on C1VR model)
-	- programmable keys, back, help, zoom, thumbphrase buttons, etc.
-	  (when available)
-
-Those events (see linux/sonypi.h) can be polled using the character device node
-/dev/sonypi (major 10, minor auto allocated or specified as a option).
-A simple daemon which translates the jogdial movements into mouse wheel events
-can be downloaded at: <http://popies.net/sonypi/>
-
-Another option to intercept the events is to get them directly through the
-input layer.
-
-This driver supports also some ioctl commands for setting the LCD screen
-brightness and querying the batteries charge information (some more
-commands may be added in the future).
-
-This driver can also be used to set the camera controls on Picturebook series
-(brightness, contrast etc), and is used by the video4linux driver for the
-Motion Eye camera.
-
-Please note that this driver was created by reverse engineering the Windows
-driver and the ACPI BIOS, because Sony doesn't agree to release any programming
-specs for its laptops. If someone convinces them to do so, drop me a note.
-
-Driver options:
----------------
-
-Several options can be passed to the sonypi driver using the standard
-module argument syntax (<param>=<value> when passing the option to the
-module or sonypi.<param>=<value> on the kernel boot line when sonypi is
-statically linked into the kernel). Those options are:
-
-	=============== =======================================================
-	minor: 		minor number of the misc device /dev/sonypi,
-			default is -1 (automatic allocation, see /proc/misc
-			or kernel logs)
-
-	camera:		if you have a PictureBook series Vaio (with the
-			integrated MotionEye camera), set this parameter to 1
-			in order to let the driver access to the camera
-
-	fnkeyinit:	on some Vaios (C1VE, C1VR etc), the Fn key events don't
-			get enabled unless you set this parameter to 1.
-			Do not use this option unless it's actually necessary,
-			some Vaio models don't deal well with this option.
-			This option is available only if the kernel is
-			compiled without ACPI support (since it conflicts
-			with it and it shouldn't be required anyway if
-			ACPI is already enabled).
-
-	verbose:	set to 1 to print unknown events received from the
-			sonypi device.
-			set to 2 to print all events received from the
-			sonypi device.
-
-	compat:		uses some compatibility code for enabling the sonypi
-			events. If the driver worked for you in the past
-			(prior to version 1.5) and does not work anymore,
-			add this option and report to the author.
-
-	mask:		event mask telling the driver what events will be
-			reported to the user. This parameter is required for
-			some Vaio models where the hardware reuses values
-			used in other Vaio models (like the FX series who does
-			not have a jogdial but reuses the jogdial events for
-			programmable keys events). The default event mask is
-			set to 0xffffffff, meaning that all possible events
-			will be tried. You can use the following bits to
-			construct your own event mask (from
-			drivers/char/sonypi.h):
-
-				========================	======
-				SONYPI_JOGGER_MASK 		0x0001
-				SONYPI_CAPTURE_MASK 		0x0002
-				SONYPI_FNKEY_MASK 		0x0004
-				SONYPI_BLUETOOTH_MASK 		0x0008
-				SONYPI_PKEY_MASK 		0x0010
-				SONYPI_BACK_MASK 		0x0020
-				SONYPI_HELP_MASK 		0x0040
-				SONYPI_LID_MASK 		0x0080
-				SONYPI_ZOOM_MASK 		0x0100
-				SONYPI_THUMBPHRASE_MASK 	0x0200
-				SONYPI_MEYE_MASK		0x0400
-				SONYPI_MEMORYSTICK_MASK		0x0800
-				SONYPI_BATTERY_MASK		0x1000
-				SONYPI_WIRELESS_MASK		0x2000
-				========================	======
-
-	useinput:	if set (which is the default) two input devices are
-			created, one which interprets the jogdial events as
-			mouse events, the other one which acts like a
-			keyboard reporting the pressing of the special keys.
-	=============== =======================================================
-
-Module use:
------------
-
-In order to automatically load the sonypi module on use, you can put those
-lines a configuration file in /etc/modprobe.d/::
-
-	alias char-major-10-250 sonypi
-	options sonypi minor=250
-
-This supposes the use of minor 250 for the sonypi device::
-
-	# mknod /dev/sonypi c 10 250
-
-Bugs:
------
-
-	- several users reported that this driver disables the BIOS-managed
-	  Fn-keys which put the laptop in sleeping state, or switch the
-	  external monitor on/off. There is no workaround yet, since this
-	  driver disables all APM management for those keys, by enabling the
-	  ACPI management (and the ACPI core stuff is not complete yet). If
-	  you have one of those laptops with working Fn keys and want to
-	  continue to use them, don't use this driver.
-
-	- some users reported that the laptop speed is lower (dhrystone
-	  tested) when using the driver with the fnkeyinit parameter. I cannot
-	  reproduce it on my laptop and not all users have this problem.
-	  This happens because the fnkeyinit parameter enables the ACPI
-	  mode (but without additional ACPI control, like processor
-	  speed handling etc). Use ACPI instead of APM if it works on your
-	  laptop.
-
-	- sonypi lacks the ability to distinguish between certain key
-	  events on some models.
-
-	- some models with the nvidia card (geforce go 6200 tc) uses a
-	  different way to adjust the backlighting of the screen. There
-	  is a userspace utility to adjust the brightness on those models,
-	  which can be downloaded from
-	  http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
-
-	- since all development was done by reverse engineering, there is
-	  *absolutely no guarantee* that this driver will not crash your
-	  laptop. Permanently.
diff --git a/Documentation/laptops/thinkpad-acpi.rst b/Documentation/laptops/thinkpad-acpi.rst
deleted file mode 100644
index 19d52fc3c5e9..000000000000
--- a/Documentation/laptops/thinkpad-acpi.rst
+++ /dev/null
@@ -1,1562 +0,0 @@
-===========================
-ThinkPad ACPI Extras Driver
-===========================
-
-Version 0.25
-
-October 16th,  2013
-
-- Borislav Deianov <borislav@users.sf.net>
-- Henrique de Moraes Holschuh <hmh@hmh.eng.br>
-
-http://ibm-acpi.sf.net/
-
-This is a Linux driver for the IBM and Lenovo ThinkPad laptops. It
-supports various features of these laptops which are accessible
-through the ACPI and ACPI EC framework, but not otherwise fully
-supported by the generic Linux ACPI drivers.
-
-This driver used to be named ibm-acpi until kernel 2.6.21 and release
-0.13-20070314.  It used to be in the drivers/acpi tree, but it was
-moved to the drivers/misc tree and renamed to thinkpad-acpi for kernel
-2.6.22, and release 0.14.  It was moved to drivers/platform/x86 for
-kernel 2.6.29 and release 0.22.
-
-The driver is named "thinkpad-acpi".  In some places, like module
-names and log messages, "thinkpad_acpi" is used because of userspace
-issues.
-
-"tpacpi" is used as a shorthand where "thinkpad-acpi" would be too
-long due to length limitations on some Linux kernel versions.
-
-Status
-------
-
-The features currently supported are the following (see below for
-detailed description):
-
-	- Fn key combinations
-	- Bluetooth enable and disable
-	- video output switching, expansion control
-	- ThinkLight on and off
-	- CMOS/UCMS control
-	- LED control
-	- ACPI sounds
-	- temperature sensors
-	- Experimental: embedded controller register dump
-	- LCD brightness control
-	- Volume control
-	- Fan control and monitoring: fan speed, fan enable/disable
-	- WAN enable and disable
-	- UWB enable and disable
-
-A compatibility table by model and feature is maintained on the web
-site, http://ibm-acpi.sf.net/. I appreciate any success or failure
-reports, especially if they add to or correct the compatibility table.
-Please include the following information in your report:
-
-	- ThinkPad model name
-	- a copy of your ACPI tables, using the "acpidump" utility
-	- a copy of the output of dmidecode, with serial numbers
-	  and UUIDs masked off
-	- which driver features work and which don't
-	- the observed behavior of non-working features
-
-Any other comments or patches are also more than welcome.
-
-
-Installation
-------------
-
-If you are compiling this driver as included in the Linux kernel
-sources, look for the CONFIG_THINKPAD_ACPI Kconfig option.
-It is located on the menu path: "Device Drivers" -> "X86 Platform
-Specific Device Drivers" -> "ThinkPad ACPI Laptop Extras".
-
-
-Features
---------
-
-The driver exports two different interfaces to userspace, which can be
-used to access the features it provides.  One is a legacy procfs-based
-interface, which will be removed at some time in the future.  The other
-is a new sysfs-based interface which is not complete yet.
-
-The procfs interface creates the /proc/acpi/ibm directory.  There is a
-file under that directory for each feature it supports.  The procfs
-interface is mostly frozen, and will change very little if at all: it
-will not be extended to add any new functionality in the driver, instead
-all new functionality will be implemented on the sysfs interface.
-
-The sysfs interface tries to blend in the generic Linux sysfs subsystems
-and classes as much as possible.  Since some of these subsystems are not
-yet ready or stabilized, it is expected that this interface will change,
-and any and all userspace programs must deal with it.
-
-
-Notes about the sysfs interface
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Unlike what was done with the procfs interface, correctness when talking
-to the sysfs interfaces will be enforced, as will correctness in the
-thinkpad-acpi's implementation of sysfs interfaces.
-
-Also, any bugs in the thinkpad-acpi sysfs driver code or in the
-thinkpad-acpi's implementation of the sysfs interfaces will be fixed for
-maximum correctness, even if that means changing an interface in
-non-compatible ways.  As these interfaces mature both in the kernel and
-in thinkpad-acpi, such changes should become quite rare.
-
-Applications interfacing to the thinkpad-acpi sysfs interfaces must
-follow all sysfs guidelines and correctly process all errors (the sysfs
-interface makes extensive use of errors).  File descriptors and open /
-close operations to the sysfs inodes must also be properly implemented.
-
-The version of thinkpad-acpi's sysfs interface is exported by the driver
-as a driver attribute (see below).
-
-Sysfs driver attributes are on the driver's sysfs attribute space,
-for 2.6.23+ this is /sys/bus/platform/drivers/thinkpad_acpi/ and
-/sys/bus/platform/drivers/thinkpad_hwmon/
-
-Sysfs device attributes are on the thinkpad_acpi device sysfs attribute
-space, for 2.6.23+ this is /sys/devices/platform/thinkpad_acpi/.
-
-Sysfs device attributes for the sensors and fan are on the
-thinkpad_hwmon device's sysfs attribute space, but you should locate it
-looking for a hwmon device with the name attribute of "thinkpad", or
-better yet, through libsensors. For 4.14+ sysfs attributes were moved to the
-hwmon device (/sys/bus/platform/devices/thinkpad_hwmon/hwmon/hwmon? or
-/sys/class/hwmon/hwmon?).
-
-Driver version
---------------
-
-procfs: /proc/acpi/ibm/driver
-
-sysfs driver attribute: version
-
-The driver name and version. No commands can be written to this file.
-
-
-Sysfs interface version
------------------------
-
-sysfs driver attribute: interface_version
-
-Version of the thinkpad-acpi sysfs interface, as an unsigned long
-(output in hex format: 0xAAAABBCC), where:
-
-	AAAA
-	  - major revision
-	BB
-	  - minor revision
-	CC
-	  - bugfix revision
-
-The sysfs interface version changelog for the driver can be found at the
-end of this document.  Changes to the sysfs interface done by the kernel
-subsystems are not documented here, nor are they tracked by this
-attribute.
-
-Changes to the thinkpad-acpi sysfs interface are only considered
-non-experimental when they are submitted to Linux mainline, at which
-point the changes in this interface are documented and interface_version
-may be updated.  If you are using any thinkpad-acpi features not yet
-sent to mainline for merging, you do so on your own risk: these features
-may disappear, or be implemented in a different and incompatible way by
-the time they are merged in Linux mainline.
-
-Changes that are backwards-compatible by nature (e.g. the addition of
-attributes that do not change the way the other attributes work) do not
-always warrant an update of interface_version.  Therefore, one must
-expect that an attribute might not be there, and deal with it properly
-(an attribute not being there *is* a valid way to make it clear that a
-feature is not available in sysfs).
-
-
-Hot keys
---------
-
-procfs: /proc/acpi/ibm/hotkey
-
-sysfs device attribute: hotkey_*
-
-In a ThinkPad, the ACPI HKEY handler is responsible for communicating
-some important events and also keyboard hot key presses to the operating
-system.  Enabling the hotkey functionality of thinkpad-acpi signals the
-firmware that such a driver is present, and modifies how the ThinkPad
-firmware will behave in many situations.
-
-The driver enables the HKEY ("hot key") event reporting automatically
-when loaded, and disables it when it is removed.
-
-The driver will report HKEY events in the following format::
-
-	ibm/hotkey HKEY 00000080 0000xxxx
-
-Some of these events refer to hot key presses, but not all of them.
-
-The driver will generate events over the input layer for hot keys and
-radio switches, and over the ACPI netlink layer for other events.  The
-input layer support accepts the standard IOCTLs to remap the keycodes
-assigned to each hot key.
-
-The hot key bit mask allows some control over which hot keys generate
-events.  If a key is "masked" (bit set to 0 in the mask), the firmware
-will handle it.  If it is "unmasked", it signals the firmware that
-thinkpad-acpi would prefer to handle it, if the firmware would be so
-kind to allow it (and it often doesn't!).
-
-Not all bits in the mask can be modified.  Not all bits that can be
-modified do anything.  Not all hot keys can be individually controlled
-by the mask.  Some models do not support the mask at all.  The behaviour
-of the mask is, therefore, highly dependent on the ThinkPad model.
-
-The driver will filter out any unmasked hotkeys, so even if the firmware
-doesn't allow disabling an specific hotkey, the driver will not report
-events for unmasked hotkeys.
-
-Note that unmasking some keys prevents their default behavior.  For
-example, if Fn+F5 is unmasked, that key will no longer enable/disable
-Bluetooth by itself in firmware.
-
-Note also that not all Fn key combinations are supported through ACPI
-depending on the ThinkPad model and firmware version.  On those
-ThinkPads, it is still possible to support some extra hotkeys by
-polling the "CMOS NVRAM" at least 10 times per second.  The driver
-attempts to enables this functionality automatically when required.
-
-procfs notes
-^^^^^^^^^^^^
-
-The following commands can be written to the /proc/acpi/ibm/hotkey file::
-
-	echo 0xffffffff > /proc/acpi/ibm/hotkey -- enable all hot keys
-	echo 0 > /proc/acpi/ibm/hotkey -- disable all possible hot keys
-	... any other 8-hex-digit mask ...
-	echo reset > /proc/acpi/ibm/hotkey -- restore the recommended mask
-
-The following commands have been deprecated and will cause the kernel
-to log a warning::
-
-	echo enable > /proc/acpi/ibm/hotkey -- does nothing
-	echo disable > /proc/acpi/ibm/hotkey -- returns an error
-
-The procfs interface does not support NVRAM polling control.  So as to
-maintain maximum bug-to-bug compatibility, it does not report any masks,
-nor does it allow one to manipulate the hot key mask when the firmware
-does not support masks at all, even if NVRAM polling is in use.
-
-sysfs notes
-^^^^^^^^^^^
-
-	hotkey_bios_enabled:
-		DEPRECATED, WILL BE REMOVED SOON.
-
-		Returns 0.
-
-	hotkey_bios_mask:
-		DEPRECATED, DON'T USE, WILL BE REMOVED IN THE FUTURE.
-
-		Returns the hot keys mask when thinkpad-acpi was loaded.
-		Upon module unload, the hot keys mask will be restored
-		to this value.   This is always 0x80c, because those are
-		the hotkeys that were supported by ancient firmware
-		without mask support.
-
-	hotkey_enable:
-		DEPRECATED, WILL BE REMOVED SOON.
-
-		0: returns -EPERM
-		1: does nothing
-
-	hotkey_mask:
-		bit mask to enable reporting (and depending on
-		the firmware, ACPI event generation) for each hot key
-		(see above).  Returns the current status of the hot keys
-		mask, and allows one to modify it.
-
-	hotkey_all_mask:
-		bit mask that should enable event reporting for all
-		supported hot keys, when echoed to hotkey_mask above.
-		Unless you know which events need to be handled
-		passively (because the firmware *will* handle them
-		anyway), do *not* use hotkey_all_mask.  Use
-		hotkey_recommended_mask, instead. You have been warned.
-
-	hotkey_recommended_mask:
-		bit mask that should enable event reporting for all
-		supported hot keys, except those which are always
-		handled by the firmware anyway.  Echo it to
-		hotkey_mask above, to use.  This is the default mask
-		used by the driver.
-
-	hotkey_source_mask:
-		bit mask that selects which hot keys will the driver
-		poll the NVRAM for.  This is auto-detected by the driver
-		based on the capabilities reported by the ACPI firmware,
-		but it can be overridden at runtime.
-
-		Hot keys whose bits are set in hotkey_source_mask are
-		polled for in NVRAM, and reported as hotkey events if
-		enabled in hotkey_mask.  Only a few hot keys are
-		available through CMOS NVRAM polling.
-
-		Warning: when in NVRAM mode, the volume up/down/mute
-		keys are synthesized according to changes in the mixer,
-		which uses a single volume up or volume down hotkey
-		press to unmute, as per the ThinkPad volume mixer user
-		interface.  When in ACPI event mode, volume up/down/mute
-		events are reported by the firmware and can behave
-		differently (and that behaviour changes with firmware
-		version -- not just with firmware models -- as well as
-		OSI(Linux) state).
-
-	hotkey_poll_freq:
-		frequency in Hz for hot key polling. It must be between
-		0 and 25 Hz.  Polling is only carried out when strictly
-		needed.
-
-		Setting hotkey_poll_freq to zero disables polling, and
-		will cause hot key presses that require NVRAM polling
-		to never be reported.
-
-		Setting hotkey_poll_freq too low may cause repeated
-		pressings of the same hot key to be misreported as a
-		single key press, or to not even be detected at all.
-		The recommended polling frequency is 10Hz.
-
-	hotkey_radio_sw:
-		If the ThinkPad has a hardware radio switch, this
-		attribute will read 0 if the switch is in the "radios
-		disabled" position, and 1 if the switch is in the
-		"radios enabled" position.
-
-		This attribute has poll()/select() support.
-
-	hotkey_tablet_mode:
-		If the ThinkPad has tablet capabilities, this attribute
-		will read 0 if the ThinkPad is in normal mode, and
-		1 if the ThinkPad is in tablet mode.
-
-		This attribute has poll()/select() support.
-
-	wakeup_reason:
-		Set to 1 if the system is waking up because the user
-		requested a bay ejection.  Set to 2 if the system is
-		waking up because the user requested the system to
-		undock.  Set to zero for normal wake-ups or wake-ups
-		due to unknown reasons.
-
-		This attribute has poll()/select() support.
-
-	wakeup_hotunplug_complete:
-		Set to 1 if the system was waken up because of an
-		undock or bay ejection request, and that request
-		was successfully completed.  At this point, it might
-		be useful to send the system back to sleep, at the
-		user's choice.  Refer to HKEY events 0x4003 and
-		0x3003, below.
-
-		This attribute has poll()/select() support.
-
-input layer notes
-^^^^^^^^^^^^^^^^^
-
-A Hot key is mapped to a single input layer EV_KEY event, possibly
-followed by an EV_MSC MSC_SCAN event that shall contain that key's scan
-code.  An EV_SYN event will always be generated to mark the end of the
-event block.
-
-Do not use the EV_MSC MSC_SCAN events to process keys.  They are to be
-used as a helper to remap keys, only.  They are particularly useful when
-remapping KEY_UNKNOWN keys.
-
-The events are available in an input device, with the following id:
-
-	==============  ==============================
-	Bus		BUS_HOST
-	vendor		0x1014 (PCI_VENDOR_ID_IBM)  or
-			0x17aa (PCI_VENDOR_ID_LENOVO)
-	product		0x5054 ("TP")
-	version		0x4101
-	==============  ==============================
-
-The version will have its LSB incremented if the keymap changes in a
-backwards-compatible way.  The MSB shall always be 0x41 for this input
-device.  If the MSB is not 0x41, do not use the device as described in
-this section, as it is either something else (e.g. another input device
-exported by a thinkpad driver, such as HDAPS) or its functionality has
-been changed in a non-backwards compatible way.
-
-Adding other event types for other functionalities shall be considered a
-backwards-compatible change for this input device.
-
-Thinkpad-acpi Hot Key event map (version 0x4101):
-
-=======	=======	==============	==============================================
-ACPI	Scan
-event	code	Key		Notes
-=======	=======	==============	==============================================
-0x1001	0x00	FN+F1		-
-
-0x1002	0x01	FN+F2		IBM: battery (rare)
-				Lenovo: Screen lock
-
-0x1003	0x02	FN+F3		Many IBM models always report
-				this hot key, even with hot keys
-				disabled or with Fn+F3 masked
-				off
-				IBM: screen lock, often turns
-				off the ThinkLight as side-effect
-				Lenovo: battery
-
-0x1004	0x03	FN+F4		Sleep button (ACPI sleep button
-				semantics, i.e. sleep-to-RAM).
-				It always generates some kind
-				of event, either the hot key
-				event or an ACPI sleep button
-				event. The firmware may
-				refuse to generate further FN+F4
-				key presses until a S3 or S4 ACPI
-				sleep cycle is performed or some
-				time passes.
-
-0x1005	0x04	FN+F5		Radio.  Enables/disables
-				the internal Bluetooth hardware
-				and W-WAN card if left in control
-				of the firmware.  Does not affect
-				the WLAN card.
-				Should be used to turn on/off all
-				radios (Bluetooth+W-WAN+WLAN),
-				really.
-
-0x1006	0x05	FN+F6		-
-
-0x1007	0x06	FN+F7		Video output cycle.
-				Do you feel lucky today?
-
-0x1008	0x07	FN+F8		IBM: toggle screen expand
-				Lenovo: configure UltraNav,
-				or toggle screen expand
-
-0x1009	0x08	FN+F9		-
-
-...	...	...		...
-
-0x100B	0x0A	FN+F11		-
-
-0x100C	0x0B	FN+F12		Sleep to disk.  You are always
-				supposed to handle it yourself,
-				either through the ACPI event,
-				or through a hotkey event.
-				The firmware may refuse to
-				generate further FN+F12 key
-				press events until a S3 or S4
-				ACPI sleep cycle is performed,
-				or some time passes.
-
-0x100D	0x0C	FN+BACKSPACE	-
-0x100E	0x0D	FN+INSERT	-
-0x100F	0x0E	FN+DELETE	-
-
-0x1010	0x0F	FN+HOME		Brightness up.  This key is
-				always handled by the firmware
-				in IBM ThinkPads, even when
-				unmasked.  Just leave it alone.
-				For Lenovo ThinkPads with a new
-				BIOS, it has to be handled either
-				by the ACPI OSI, or by userspace.
-				The driver does the right thing,
-				never mess with this.
-0x1011	0x10	FN+END		Brightness down.  See brightness
-				up for details.
-
-0x1012	0x11	FN+PGUP		ThinkLight toggle.  This key is
-				always handled by the firmware,
-				even when unmasked.
-
-0x1013	0x12	FN+PGDOWN	-
-
-0x1014	0x13	FN+SPACE	Zoom key
-
-0x1015	0x14	VOLUME UP	Internal mixer volume up. This
-				key is always handled by the
-				firmware, even when unmasked.
-				NOTE: Lenovo seems to be changing
-				this.
-0x1016	0x15	VOLUME DOWN	Internal mixer volume up. This
-				key is always handled by the
-				firmware, even when unmasked.
-				NOTE: Lenovo seems to be changing
-				this.
-0x1017	0x16	MUTE		Mute internal mixer. This
-				key is always handled by the
-				firmware, even when unmasked.
-
-0x1018	0x17	THINKPAD	ThinkPad/Access IBM/Lenovo key
-
-0x1019	0x18	unknown
-
-...	...	...
-
-0x1020	0x1F	unknown
-=======	=======	==============	==============================================
-
-The ThinkPad firmware does not allow one to differentiate when most hot
-keys are pressed or released (either that, or we don't know how to, yet).
-For these keys, the driver generates a set of events for a key press and
-immediately issues the same set of events for a key release.  It is
-unknown by the driver if the ThinkPad firmware triggered these events on
-hot key press or release, but the firmware will do it for either one, not
-both.
-
-If a key is mapped to KEY_RESERVED, it generates no input events at all.
-If a key is mapped to KEY_UNKNOWN, it generates an input event that
-includes an scan code.  If a key is mapped to anything else, it will
-generate input device EV_KEY events.
-
-In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
-events for switches:
-
-==============	==============================================
-SW_RFKILL_ALL	T60 and later hardware rfkill rocker switch
-SW_TABLET_MODE	Tablet ThinkPads HKEY events 0x5009 and 0x500A
-==============	==============================================
-
-Non hotkey ACPI HKEY event map
-------------------------------
-
-Events that are never propagated by the driver:
-
-======		==================================================
-0x2304		System is waking up from suspend to undock
-0x2305		System is waking up from suspend to eject bay
-0x2404		System is waking up from hibernation to undock
-0x2405		System is waking up from hibernation to eject bay
-0x5001		Lid closed
-0x5002		Lid opened
-0x5009		Tablet swivel: switched to tablet mode
-0x500A		Tablet swivel: switched to normal mode
-0x5010		Brightness level changed/control event
-0x6000		KEYBOARD: Numlock key pressed
-0x6005		KEYBOARD: Fn key pressed (TO BE VERIFIED)
-0x7000		Radio Switch may have changed state
-======		==================================================
-
-
-Events that are propagated by the driver to userspace:
-
-======		=====================================================
-0x2313		ALARM: System is waking up from suspend because
-		the battery is nearly empty
-0x2413		ALARM: System is waking up from hibernation because
-		the battery is nearly empty
-0x3003		Bay ejection (see 0x2x05) complete, can sleep again
-0x3006		Bay hotplug request (hint to power up SATA link when
-		the optical drive tray is ejected)
-0x4003		Undocked (see 0x2x04), can sleep again
-0x4010		Docked into hotplug port replicator (non-ACPI dock)
-0x4011		Undocked from hotplug port replicator (non-ACPI dock)
-0x500B		Tablet pen inserted into its storage bay
-0x500C		Tablet pen removed from its storage bay
-0x6011		ALARM: battery is too hot
-0x6012		ALARM: battery is extremely hot
-0x6021		ALARM: a sensor is too hot
-0x6022		ALARM: a sensor is extremely hot
-0x6030		System thermal table changed
-0x6032		Thermal Control command set completion  (DYTC, Windows)
-0x6040		Nvidia Optimus/AC adapter related (TO BE VERIFIED)
-0x60C0		X1 Yoga 2016, Tablet mode status changed
-0x60F0		Thermal Transformation changed (GMTS, Windows)
-======		=====================================================
-
-Battery nearly empty alarms are a last resort attempt to get the
-operating system to hibernate or shutdown cleanly (0x2313), or shutdown
-cleanly (0x2413) before power is lost.  They must be acted upon, as the
-wake up caused by the firmware will have negated most safety nets...
-
-When any of the "too hot" alarms happen, according to Lenovo the user
-should suspend or hibernate the laptop (and in the case of battery
-alarms, unplug the AC adapter) to let it cool down.  These alarms do
-signal that something is wrong, they should never happen on normal
-operating conditions.
-
-The "extremely hot" alarms are emergencies.  According to Lenovo, the
-operating system is to force either an immediate suspend or hibernate
-cycle, or a system shutdown.  Obviously, something is very wrong if this
-happens.
-
-
-Brightness hotkey notes
-^^^^^^^^^^^^^^^^^^^^^^^
-
-Don't mess with the brightness hotkeys in a Thinkpad.  If you want
-notifications for OSD, use the sysfs backlight class event support.
-
-The driver will issue KEY_BRIGHTNESS_UP and KEY_BRIGHTNESS_DOWN events
-automatically for the cases were userspace has to do something to
-implement brightness changes.  When you override these events, you will
-either fail to handle properly the ThinkPads that require explicit
-action to change backlight brightness, or the ThinkPads that require
-that no action be taken to work properly.
-
-
-Bluetooth
----------
-
-procfs: /proc/acpi/ibm/bluetooth
-
-sysfs device attribute: bluetooth_enable (deprecated)
-
-sysfs rfkill class: switch "tpacpi_bluetooth_sw"
-
-This feature shows the presence and current state of a ThinkPad
-Bluetooth device in the internal ThinkPad CDC slot.
-
-If the ThinkPad supports it, the Bluetooth state is stored in NVRAM,
-so it is kept across reboots and power-off.
-
-Procfs notes
-^^^^^^^^^^^^
-
-If Bluetooth is installed, the following commands can be used::
-
-	echo enable > /proc/acpi/ibm/bluetooth
-	echo disable > /proc/acpi/ibm/bluetooth
-
-Sysfs notes
-^^^^^^^^^^^
-
-	If the Bluetooth CDC card is installed, it can be enabled /
-	disabled through the "bluetooth_enable" thinkpad-acpi device
-	attribute, and its current status can also be queried.
-
-	enable:
-
-		- 0: disables Bluetooth / Bluetooth is disabled
-		- 1: enables Bluetooth / Bluetooth is enabled.
-
-	Note: this interface has been superseded by the	generic rfkill
-	class.  It has been deprecated, and it will be removed in year
-	2010.
-
-	rfkill controller switch "tpacpi_bluetooth_sw": refer to
-	Documentation/rfkill.txt for details.
-
-
-Video output control -- /proc/acpi/ibm/video
---------------------------------------------
-
-This feature allows control over the devices used for video output -
-LCD, CRT or DVI (if available). The following commands are available::
-
-	echo lcd_enable > /proc/acpi/ibm/video
-	echo lcd_disable > /proc/acpi/ibm/video
-	echo crt_enable > /proc/acpi/ibm/video
-	echo crt_disable > /proc/acpi/ibm/video
-	echo dvi_enable > /proc/acpi/ibm/video
-	echo dvi_disable > /proc/acpi/ibm/video
-	echo auto_enable > /proc/acpi/ibm/video
-	echo auto_disable > /proc/acpi/ibm/video
-	echo expand_toggle > /proc/acpi/ibm/video
-	echo video_switch > /proc/acpi/ibm/video
-
-NOTE:
-  Access to this feature is restricted to processes owning the
-  CAP_SYS_ADMIN capability for safety reasons, as it can interact badly
-  enough with some versions of X.org to crash it.
-
-Each video output device can be enabled or disabled individually.
-Reading /proc/acpi/ibm/video shows the status of each device.
-
-Automatic video switching can be enabled or disabled.  When automatic
-video switching is enabled, certain events (e.g. opening the lid,
-docking or undocking) cause the video output device to change
-automatically. While this can be useful, it also causes flickering
-and, on the X40, video corruption. By disabling automatic switching,
-the flickering or video corruption can be avoided.
-
-The video_switch command cycles through the available video outputs
-(it simulates the behavior of Fn-F7).
-
-Video expansion can be toggled through this feature. This controls
-whether the display is expanded to fill the entire LCD screen when a
-mode with less than full resolution is used. Note that the current
-video expansion status cannot be determined through this feature.
-
-Note that on many models (particularly those using Radeon graphics
-chips) the X driver configures the video card in a way which prevents
-Fn-F7 from working. This also disables the video output switching
-features of this driver, as it uses the same ACPI methods as
-Fn-F7. Video switching on the console should still work.
-
-UPDATE: refer to https://bugs.freedesktop.org/show_bug.cgi?id=2000
-
-
-ThinkLight control
-------------------
-
-procfs: /proc/acpi/ibm/light
-
-sysfs attributes: as per LED class, for the "tpacpi::thinklight" LED
-
-procfs notes
-^^^^^^^^^^^^
-
-The ThinkLight status can be read and set through the procfs interface.  A
-few models which do not make the status available will show the ThinkLight
-status as "unknown". The available commands are::
-
-	echo on  > /proc/acpi/ibm/light
-	echo off > /proc/acpi/ibm/light
-
-sysfs notes
-^^^^^^^^^^^
-
-The ThinkLight sysfs interface is documented by the LED class
-documentation, in Documentation/leds/leds-class.rst.  The ThinkLight LED name
-is "tpacpi::thinklight".
-
-Due to limitations in the sysfs LED class, if the status of the ThinkLight
-cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
-It is impossible to know if the status returned through sysfs is valid.
-
-
-CMOS/UCMS control
------------------
-
-procfs: /proc/acpi/ibm/cmos
-
-sysfs device attribute: cmos_command
-
-This feature is mostly used internally by the ACPI firmware to keep the legacy
-CMOS NVRAM bits in sync with the current machine state, and to record this
-state so that the ThinkPad will retain such settings across reboots.
-
-Some of these commands actually perform actions in some ThinkPad models, but
-this is expected to disappear more and more in newer models.  As an example, in
-a T43 and in a X40, commands 12 and 13 still control the ThinkLight state for
-real, but commands 0 to 2 don't control the mixer anymore (they have been
-phased out) and just update the NVRAM.
-
-The range of valid cmos command numbers is 0 to 21, but not all have an
-effect and the behavior varies from model to model.  Here is the behavior
-on the X40 (tpb is the ThinkPad Buttons utility):
-
-	- 0 - Related to "Volume down" key press
-	- 1 - Related to "Volume up" key press
-	- 2 - Related to "Mute on" key press
-	- 3 - Related to "Access IBM" key press
-	- 4 - Related to "LCD brightness up" key press
-	- 5 - Related to "LCD brightness down" key press
-	- 11 - Related to "toggle screen expansion" key press/function
-	- 12 - Related to "ThinkLight on"
-	- 13 - Related to "ThinkLight off"
-	- 14 - Related to "ThinkLight" key press (toggle ThinkLight)
-
-The cmos command interface is prone to firmware split-brain problems, as
-in newer ThinkPads it is just a compatibility layer.  Do not use it, it is
-exported just as a debug tool.
-
-
-LED control
------------
-
-procfs: /proc/acpi/ibm/led
-sysfs attributes: as per LED class, see below for names
-
-Some of the LED indicators can be controlled through this feature.  On
-some older ThinkPad models, it is possible to query the status of the
-LED indicators as well.  Newer ThinkPads cannot query the real status
-of the LED indicators.
-
-Because misuse of the LEDs could induce an unaware user to perform
-dangerous actions (like undocking or ejecting a bay device while the
-buses are still active), or mask an important alarm (such as a nearly
-empty battery, or a broken battery), access to most LEDs is
-restricted.
-
-Unrestricted access to all LEDs requires that thinkpad-acpi be
-compiled with the CONFIG_THINKPAD_ACPI_UNSAFE_LEDS option enabled.
-Distributions must never enable this option.  Individual users that
-are aware of the consequences are welcome to enabling it.
-
-Audio mute and microphone mute LEDs are supported, but currently not
-visible to userspace. They are used by the snd-hda-intel audio driver.
-
-procfs notes
-^^^^^^^^^^^^
-
-The available commands are::
-
-	echo '<LED number> on' >/proc/acpi/ibm/led
-	echo '<LED number> off' >/proc/acpi/ibm/led
-	echo '<LED number> blink' >/proc/acpi/ibm/led
-
-The <LED number> range is 0 to 15. The set of LEDs that can be
-controlled varies from model to model. Here is the common ThinkPad
-mapping:
-
-	- 0 - power
-	- 1 - battery (orange)
-	- 2 - battery (green)
-	- 3 - UltraBase/dock
-	- 4 - UltraBay
-	- 5 - UltraBase battery slot
-	- 6 - (unknown)
-	- 7 - standby
-	- 8 - dock status 1
-	- 9 - dock status 2
-	- 10, 11 - (unknown)
-	- 12 - thinkvantage
-	- 13, 14, 15 - (unknown)
-
-All of the above can be turned on and off and can be made to blink.
-
-sysfs notes
-^^^^^^^^^^^
-
-The ThinkPad LED sysfs interface is described in detail by the LED class
-documentation, in Documentation/leds/leds-class.rst.
-
-The LEDs are named (in LED ID order, from 0 to 12):
-"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
-"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
-"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
-"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
-"tpacpi::thinkvantage".
-
-Due to limitations in the sysfs LED class, if the status of the LED
-indicators cannot be read due to an error, thinkpad-acpi will report it as
-a brightness of zero (same as LED off).
-
-If the thinkpad firmware doesn't support reading the current status,
-trying to read the current LED brightness will just return whatever
-brightness was last written to that attribute.
-
-These LEDs can blink using hardware acceleration.  To request that a
-ThinkPad indicator LED should blink in hardware accelerated mode, use the
-"timer" trigger, and leave the delay_on and delay_off parameters set to
-zero (to request hardware acceleration autodetection).
-
-LEDs that are known not to exist in a given ThinkPad model are not
-made available through the sysfs interface.  If you have a dock and you
-notice there are LEDs listed for your ThinkPad that do not exist (and
-are not in the dock), or if you notice that there are missing LEDs,
-a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
-
-
-ACPI sounds -- /proc/acpi/ibm/beep
-----------------------------------
-
-The BEEP method is used internally by the ACPI firmware to provide
-audible alerts in various situations. This feature allows the same
-sounds to be triggered manually.
-
-The commands are non-negative integer numbers::
-
-	echo <number> >/proc/acpi/ibm/beep
-
-The valid <number> range is 0 to 17. Not all numbers trigger sounds
-and the sounds vary from model to model. Here is the behavior on the
-X40:
-
-	- 0 - stop a sound in progress (but use 17 to stop 16)
-	- 2 - two beeps, pause, third beep ("low battery")
-	- 3 - single beep
-	- 4 - high, followed by low-pitched beep ("unable")
-	- 5 - single beep
-	- 6 - very high, followed by high-pitched beep ("AC/DC")
-	- 7 - high-pitched beep
-	- 9 - three short beeps
-	- 10 - very long beep
-	- 12 - low-pitched beep
-	- 15 - three high-pitched beeps repeating constantly, stop with 0
-	- 16 - one medium-pitched beep repeating constantly, stop with 17
-	- 17 - stop 16
-
-
-Temperature sensors
--------------------
-
-procfs: /proc/acpi/ibm/thermal
-
-sysfs device attributes: (hwmon "thinkpad") temp*_input
-
-Most ThinkPads include six or more separate temperature sensors but only
-expose the CPU temperature through the standard ACPI methods.  This
-feature shows readings from up to eight different sensors on older
-ThinkPads, and up to sixteen different sensors on newer ThinkPads.
-
-For example, on the X40, a typical output may be:
-
-temperatures:
-	42 42 45 41 36 -128 33 -128
-
-On the T43/p, a typical output may be:
-
-temperatures:
-	48 48 36 52 38 -128 31 -128 48 52 48 -128 -128 -128 -128 -128
-
-The mapping of thermal sensors to physical locations varies depending on
-system-board model (and thus, on ThinkPad model).
-
-http://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
-tries to track down these locations for various models.
-
-Most (newer?) models seem to follow this pattern:
-
-- 1:  CPU
-- 2:  (depends on model)
-- 3:  (depends on model)
-- 4:  GPU
-- 5:  Main battery: main sensor
-- 6:  Bay battery: main sensor
-- 7:  Main battery: secondary sensor
-- 8:  Bay battery: secondary sensor
-- 9-15: (depends on model)
-
-For the R51 (source: Thomas Gruber):
-
-- 2:  Mini-PCI
-- 3:  Internal HDD
-
-For the T43, T43/p (source: Shmidoax/Thinkwiki.org)
-http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
-
-- 2:  System board, left side (near PCMCIA slot), reported as HDAPS temp
-- 3:  PCMCIA slot
-- 9:  MCH (northbridge) to DRAM Bus
-- 10: Clock-generator, mini-pci card and ICH (southbridge), under Mini-PCI
-      card, under touchpad
-- 11: Power regulator, underside of system board, below F2 key
-
-The A31 has a very atypical layout for the thermal sensors
-(source: Milos Popovic, http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
-
-- 1:  CPU
-- 2:  Main Battery: main sensor
-- 3:  Power Converter
-- 4:  Bay Battery: main sensor
-- 5:  MCH (northbridge)
-- 6:  PCMCIA/ambient
-- 7:  Main Battery: secondary sensor
-- 8:  Bay Battery: secondary sensor
-
-
-Procfs notes
-^^^^^^^^^^^^
-
-	Readings from sensors that are not available return -128.
-	No commands can be written to this file.
-
-Sysfs notes
-^^^^^^^^^^^
-
-	Sensors that are not available return the ENXIO error.  This
-	status may change at runtime, as there are hotplug thermal
-	sensors, like those inside the batteries and docks.
-
-	thinkpad-acpi thermal sensors are reported through the hwmon
-	subsystem, and follow all of the hwmon guidelines at
-	Documentation/hwmon.
-
-EXPERIMENTAL: Embedded controller register dump
------------------------------------------------
-
-This feature is not included in the thinkpad driver anymore.
-Instead the EC can be accessed through /sys/kernel/debug/ec with
-a userspace tool which can be found here:
-ftp://ftp.suse.com/pub/people/trenn/sources/ec
-
-Use it to determine the register holding the fan
-speed on some models. To do that, do the following:
-
-	- make sure the battery is fully charged
-	- make sure the fan is running
-	- use above mentioned tool to read out the EC
-
-Often fan and temperature values vary between
-readings. Since temperatures don't change vary fast, you can take
-several quick dumps to eliminate them.
-
-You can use a similar method to figure out the meaning of other
-embedded controller registers - e.g. make sure nothing else changes
-except the charging or discharging battery to determine which
-registers contain the current battery capacity, etc. If you experiment
-with this, do send me your results (including some complete dumps with
-a description of the conditions when they were taken.)
-
-
-LCD brightness control
-----------------------
-
-procfs: /proc/acpi/ibm/brightness
-
-sysfs backlight device "thinkpad_screen"
-
-This feature allows software control of the LCD brightness on ThinkPad
-models which don't have a hardware brightness slider.
-
-It has some limitations: the LCD backlight cannot be actually turned
-on or off by this interface, it just controls the backlight brightness
-level.
-
-On IBM (and some of the earlier Lenovo) ThinkPads, the backlight control
-has eight brightness levels, ranging from 0 to 7.  Some of the levels
-may not be distinct.  Later Lenovo models that implement the ACPI
-display backlight brightness control methods have 16 levels, ranging
-from 0 to 15.
-
-For IBM ThinkPads, there are two interfaces to the firmware for direct
-brightness control, EC and UCMS (or CMOS).  To select which one should be
-used, use the brightness_mode module parameter: brightness_mode=1 selects
-EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
-mode with NVRAM backing (so that brightness changes are remembered across
-shutdown/reboot).
-
-The driver tries to select which interface to use from a table of
-defaults for each ThinkPad model.  If it makes a wrong choice, please
-report this as a bug, so that we can fix it.
-
-Lenovo ThinkPads only support brightness_mode=2 (UCMS).
-
-When display backlight brightness controls are available through the
-standard ACPI interface, it is best to use it instead of this direct
-ThinkPad-specific interface.  The driver will disable its native
-backlight brightness control interface if it detects that the standard
-ACPI interface is available in the ThinkPad.
-
-If you want to use the thinkpad-acpi backlight brightness control
-instead of the generic ACPI video backlight brightness control for some
-reason, you should use the acpi_backlight=vendor kernel parameter.
-
-The brightness_enable module parameter can be used to control whether
-the LCD brightness control feature will be enabled when available.
-brightness_enable=0 forces it to be disabled.  brightness_enable=1
-forces it to be enabled when available, even if the standard ACPI
-interface is also available.
-
-Procfs notes
-^^^^^^^^^^^^
-
-The available commands are::
-
-	echo up   >/proc/acpi/ibm/brightness
-	echo down >/proc/acpi/ibm/brightness
-	echo 'level <level>' >/proc/acpi/ibm/brightness
-
-Sysfs notes
-^^^^^^^^^^^
-
-The interface is implemented through the backlight sysfs class, which is
-poorly documented at this time.
-
-Locate the thinkpad_screen device under /sys/class/backlight, and inside
-it there will be the following attributes:
-
-	max_brightness:
-		Reads the maximum brightness the hardware can be set to.
-		The minimum is always zero.
-
-	actual_brightness:
-		Reads what brightness the screen is set to at this instant.
-
-	brightness:
-		Writes request the driver to change brightness to the
-		given value.  Reads will tell you what brightness the
-		driver is trying to set the display to when "power" is set
-		to zero and the display has not been dimmed by a kernel
-		power management event.
-
-	power:
-		power management mode, where 0 is "display on", and 1 to 3
-		will dim the display backlight to brightness level 0
-		because thinkpad-acpi cannot really turn the backlight
-		off.  Kernel power management events can temporarily
-		increase the current power management level, i.e. they can
-		dim the display.
-
-
-WARNING:
-
-    Whatever you do, do NOT ever call thinkpad-acpi backlight-level change
-    interface and the ACPI-based backlight level change interface
-    (available on newer BIOSes, and driven by the Linux ACPI video driver)
-    at the same time.  The two will interact in bad ways, do funny things,
-    and maybe reduce the life of the backlight lamps by needlessly kicking
-    its level up and down at every change.
-
-
-Volume control (Console Audio control)
---------------------------------------
-
-procfs: /proc/acpi/ibm/volume
-
-ALSA: "ThinkPad Console Audio Control", default ID: "ThinkPadEC"
-
-NOTE: by default, the volume control interface operates in read-only
-mode, as it is supposed to be used for on-screen-display purposes.
-The read/write mode can be enabled through the use of the
-"volume_control=1" module parameter.
-
-NOTE: distros are urged to not enable volume_control by default, this
-should be done by the local admin only.  The ThinkPad UI is for the
-console audio control to be done through the volume keys only, and for
-the desktop environment to just provide on-screen-display feedback.
-Software volume control should be done only in the main AC97/HDA
-mixer.
-
-
-About the ThinkPad Console Audio control
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-ThinkPads have a built-in amplifier and muting circuit that drives the
-console headphone and speakers.  This circuit is after the main AC97
-or HDA mixer in the audio path, and under exclusive control of the
-firmware.
-
-ThinkPads have three special hotkeys to interact with the console
-audio control: volume up, volume down and mute.
-
-It is worth noting that the normal way the mute function works (on
-ThinkPads that do not have a "mute LED") is:
-
-1. Press mute to mute.  It will *always* mute, you can press it as
-   many times as you want, and the sound will remain mute.
-
-2. Press either volume key to unmute the ThinkPad (it will _not_
-   change the volume, it will just unmute).
-
-This is a very superior design when compared to the cheap software-only
-mute-toggle solution found on normal consumer laptops:  you can be
-absolutely sure the ThinkPad will not make noise if you press the mute
-button, no matter the previous state.
-
-The IBM ThinkPads, and the earlier Lenovo ThinkPads have variable-gain
-amplifiers driving the speakers and headphone output, and the firmware
-also handles volume control for the headphone and speakers on these
-ThinkPads without any help from the operating system (this volume
-control stage exists after the main AC97 or HDA mixer in the audio
-path).
-
-The newer Lenovo models only have firmware mute control, and depend on
-the main HDA mixer to do volume control (which is done by the operating
-system).  In this case, the volume keys are filtered out for unmute
-key press (there are some firmware bugs in this area) and delivered as
-normal key presses to the operating system (thinkpad-acpi is not
-involved).
-
-
-The ThinkPad-ACPI volume control
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The preferred way to interact with the Console Audio control is the
-ALSA interface.
-
-The legacy procfs interface allows one to read the current state,
-and if volume control is enabled, accepts the following commands::
-
-	echo up   >/proc/acpi/ibm/volume
-	echo down >/proc/acpi/ibm/volume
-	echo mute >/proc/acpi/ibm/volume
-	echo unmute >/proc/acpi/ibm/volume
-	echo 'level <level>' >/proc/acpi/ibm/volume
-
-The <level> number range is 0 to 14 although not all of them may be
-distinct. To unmute the volume after the mute command, use either the
-up or down command (the level command will not unmute the volume), or
-the unmute command.
-
-You can use the volume_capabilities parameter to tell the driver
-whether your thinkpad has volume control or mute-only control:
-volume_capabilities=1 for mixers with mute and volume control,
-volume_capabilities=2 for mixers with only mute control.
-
-If the driver misdetects the capabilities for your ThinkPad model,
-please report this to ibm-acpi-devel@lists.sourceforge.net, so that we
-can update the driver.
-
-There are two strategies for volume control.  To select which one
-should be used, use the volume_mode module parameter: volume_mode=1
-selects EC mode, and volume_mode=3 selects EC mode with NVRAM backing
-(so that volume/mute changes are remembered across shutdown/reboot).
-
-The driver will operate in volume_mode=3 by default. If that does not
-work well on your ThinkPad model, please report this to
-ibm-acpi-devel@lists.sourceforge.net.
-
-The driver supports the standard ALSA module parameters.  If the ALSA
-mixer is disabled, the driver will disable all volume functionality.
-
-
-Fan control and monitoring: fan speed, fan enable/disable
----------------------------------------------------------
-
-procfs: /proc/acpi/ibm/fan
-
-sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1, pwm1_enable, fan2_input
-
-sysfs hwmon driver attributes: fan_watchdog
-
-NOTE NOTE NOTE:
-   fan control operations are disabled by default for
-   safety reasons.  To enable them, the module parameter "fan_control=1"
-   must be given to thinkpad-acpi.
-
-This feature attempts to show the current fan speed, control mode and
-other fan data that might be available.  The speed is read directly
-from the hardware registers of the embedded controller.  This is known
-to work on later R, T, X and Z series ThinkPads but may show a bogus
-value on other models.
-
-Some Lenovo ThinkPads support a secondary fan.  This fan cannot be
-controlled separately, it shares the main fan control.
-
-Fan levels
-^^^^^^^^^^
-
-Most ThinkPad fans work in "levels" at the firmware interface.  Level 0
-stops the fan.  The higher the level, the higher the fan speed, although
-adjacent levels often map to the same fan speed.  7 is the highest
-level, where the fan reaches the maximum recommended speed.
-
-Level "auto" means the EC changes the fan level according to some
-internal algorithm, usually based on readings from the thermal sensors.
-
-There is also a "full-speed" level, also known as "disengaged" level.
-In this level, the EC disables the speed-locked closed-loop fan control,
-and drives the fan as fast as it can go, which might exceed hardware
-limits, so use this level with caution.
-
-The fan usually ramps up or down slowly from one speed to another, and
-it is normal for the EC to take several seconds to react to fan
-commands.  The full-speed level may take up to two minutes to ramp up to
-maximum speed, and in some ThinkPads, the tachometer readings go stale
-while the EC is transitioning to the full-speed level.
-
-WARNING WARNING WARNING: do not leave the fan disabled unless you are
-monitoring all of the temperature sensor readings and you are ready to
-enable it if necessary to avoid overheating.
-
-An enabled fan in level "auto" may stop spinning if the EC decides the
-ThinkPad is cool enough and doesn't need the extra airflow.  This is
-normal, and the EC will spin the fan up if the various thermal readings
-rise too much.
-
-On the X40, this seems to depend on the CPU and HDD temperatures.
-Specifically, the fan is turned on when either the CPU temperature
-climbs to 56 degrees or the HDD temperature climbs to 46 degrees.  The
-fan is turned off when the CPU temperature drops to 49 degrees and the
-HDD temperature drops to 41 degrees.  These thresholds cannot
-currently be controlled.
-
-The ThinkPad's ACPI DSDT code will reprogram the fan on its own when
-certain conditions are met.  It will override any fan programming done
-through thinkpad-acpi.
-
-The thinkpad-acpi kernel driver can be programmed to revert the fan
-level to a safe setting if userspace does not issue one of the procfs
-fan commands: "enable", "disable", "level" or "watchdog", or if there
-are no writes to pwm1_enable (or to pwm1 *if and only if* pwm1_enable is
-set to 1, manual mode) within a configurable amount of time of up to
-120 seconds.  This functionality is called fan safety watchdog.
-
-Note that the watchdog timer stops after it enables the fan.  It will be
-rearmed again automatically (using the same interval) when one of the
-above mentioned fan commands is received.  The fan watchdog is,
-therefore, not suitable to protect against fan mode changes made through
-means other than the "enable", "disable", and "level" procfs fan
-commands, or the hwmon fan control sysfs interface.
-
-Procfs notes
-^^^^^^^^^^^^
-
-The fan may be enabled or disabled with the following commands::
-
-	echo enable  >/proc/acpi/ibm/fan
-	echo disable >/proc/acpi/ibm/fan
-
-Placing a fan on level 0 is the same as disabling it.  Enabling a fan
-will try to place it in a safe level if it is too slow or disabled.
-
-The fan level can be controlled with the command::
-
-	echo 'level <level>' > /proc/acpi/ibm/fan
-
-Where <level> is an integer from 0 to 7, or one of the words "auto" or
-"full-speed" (without the quotes).  Not all ThinkPads support the "auto"
-and "full-speed" levels.  The driver accepts "disengaged" as an alias for
-"full-speed", and reports it as "disengaged" for backwards
-compatibility.
-
-On the X31 and X40 (and ONLY on those models), the fan speed can be
-controlled to a certain degree.  Once the fan is running, it can be
-forced to run faster or slower with the following command::
-
-	echo 'speed <speed>' > /proc/acpi/ibm/fan
-
-The sustainable range of fan speeds on the X40 appears to be from about
-3700 to about 7350. Values outside this range either do not have any
-effect or the fan speed eventually settles somewhere in that range.  The
-fan cannot be stopped or started with this command.  This functionality
-is incomplete, and not available through the sysfs interface.
-
-To program the safety watchdog, use the "watchdog" command::
-
-	echo 'watchdog <interval in seconds>' > /proc/acpi/ibm/fan
-
-If you want to disable the watchdog, use 0 as the interval.
-
-Sysfs notes
-^^^^^^^^^^^
-
-The sysfs interface follows the hwmon subsystem guidelines for the most
-part, and the exception is the fan safety watchdog.
-
-Writes to any of the sysfs attributes may return the EINVAL error if
-that operation is not supported in a given ThinkPad or if the parameter
-is out-of-bounds, and EPERM if it is forbidden.  They may also return
-EINTR (interrupted system call), and EIO (I/O error while trying to talk
-to the firmware).
-
-Features not yet implemented by the driver return ENOSYS.
-
-hwmon device attribute pwm1_enable:
-	- 0: PWM offline (fan is set to full-speed mode)
-	- 1: Manual PWM control (use pwm1 to set fan level)
-	- 2: Hardware PWM control (EC "auto" mode)
-	- 3: reserved (Software PWM control, not implemented yet)
-
-	Modes 0 and 2 are not supported by all ThinkPads, and the
-	driver is not always able to detect this.  If it does know a
-	mode is unsupported, it will return -EINVAL.
-
-hwmon device attribute pwm1:
-	Fan level, scaled from the firmware values of 0-7 to the hwmon
-	scale of 0-255.  0 means fan stopped, 255 means highest normal
-	speed (level 7).
-
-	This attribute only commands the fan if pmw1_enable is set to 1
-	(manual PWM control).
-
-hwmon device attribute fan1_input:
-	Fan tachometer reading, in RPM.  May go stale on certain
-	ThinkPads while the EC transitions the PWM to offline mode,
-	which can take up to two minutes.  May return rubbish on older
-	ThinkPads.
-
-hwmon device attribute fan2_input:
-	Fan tachometer reading, in RPM, for the secondary fan.
-	Available only on some ThinkPads.  If the secondary fan is
-	not installed, will always read 0.
-
-hwmon driver attribute fan_watchdog:
-	Fan safety watchdog timer interval, in seconds.  Minimum is
-	1 second, maximum is 120 seconds.  0 disables the watchdog.
-
-To stop the fan: set pwm1 to zero, and pwm1_enable to 1.
-
-To start the fan in a safe mode: set pwm1_enable to 2.  If that fails
-with EINVAL, try to set pwm1_enable to 1 and pwm1 to at least 128 (255
-would be the safest choice, though).
-
-
-WAN
----
-
-procfs: /proc/acpi/ibm/wan
-
-sysfs device attribute: wwan_enable (deprecated)
-
-sysfs rfkill class: switch "tpacpi_wwan_sw"
-
-This feature shows the presence and current state of the built-in
-Wireless WAN device.
-
-If the ThinkPad supports it, the WWAN state is stored in NVRAM,
-so it is kept across reboots and power-off.
-
-It was tested on a Lenovo ThinkPad X60. It should probably work on other
-ThinkPad models which come with this module installed.
-
-Procfs notes
-^^^^^^^^^^^^
-
-If the W-WAN card is installed, the following commands can be used::
-
-	echo enable > /proc/acpi/ibm/wan
-	echo disable > /proc/acpi/ibm/wan
-
-Sysfs notes
-^^^^^^^^^^^
-
-	If the W-WAN card is installed, it can be enabled /
-	disabled through the "wwan_enable" thinkpad-acpi device
-	attribute, and its current status can also be queried.
-
-	enable:
-		- 0: disables WWAN card / WWAN card is disabled
-		- 1: enables WWAN card / WWAN card is enabled.
-
-	Note: this interface has been superseded by the	generic rfkill
-	class.  It has been deprecated, and it will be removed in year
-	2010.
-
-	rfkill controller switch "tpacpi_wwan_sw": refer to
-	Documentation/rfkill.txt for details.
-
-
-EXPERIMENTAL: UWB
------------------
-
-This feature is considered EXPERIMENTAL because it has not been extensively
-tested and validated in various ThinkPad models yet.  The feature may not
-work as expected. USE WITH CAUTION! To use this feature, you need to supply
-the experimental=1 parameter when loading the module.
-
-sysfs rfkill class: switch "tpacpi_uwb_sw"
-
-This feature exports an rfkill controller for the UWB device, if one is
-present and enabled in the BIOS.
-
-Sysfs notes
-^^^^^^^^^^^
-
-	rfkill controller switch "tpacpi_uwb_sw": refer to
-	Documentation/rfkill.txt for details.
-
-Adaptive keyboard
------------------
-
-sysfs device attribute: adaptive_kbd_mode
-
-This sysfs attribute controls the keyboard "face" that will be shown on the
-Lenovo X1 Carbon 2nd gen (2014)'s adaptive keyboard. The value can be read
-and set.
-
-- 1 = Home mode
-- 2 = Web-browser mode
-- 3 = Web-conference mode
-- 4 = Function mode
-- 5 = Layflat mode
-
-For more details about which buttons will appear depending on the mode, please
-review the laptop's user guide:
-http://www.lenovo.com/shop/americas/content/user_guides/x1carbon_2_ug_en.pdf
-
-Multiple Commands, Module Parameters
-------------------------------------
-
-Multiple commands can be written to the proc files in one shot by
-separating them with commas, for example::
-
-	echo enable,0xffff > /proc/acpi/ibm/hotkey
-	echo lcd_disable,crt_enable > /proc/acpi/ibm/video
-
-Commands can also be specified when loading the thinkpad-acpi module,
-for example::
-
-	modprobe thinkpad_acpi hotkey=enable,0xffff video=auto_disable
-
-
-Enabling debugging output
--------------------------
-
-The module takes a debug parameter which can be used to selectively
-enable various classes of debugging output, for example::
-
-	 modprobe thinkpad_acpi debug=0xffff
-
-will enable all debugging output classes.  It takes a bitmask, so
-to enable more than one output class, just add their values.
-
-	=============		======================================
-	Debug bitmask		Description
-	=============		======================================
-	0x8000			Disclose PID of userspace programs
-				accessing some functions of the driver
-	0x0001			Initialization and probing
-	0x0002			Removal
-	0x0004			RF Transmitter control (RFKILL)
-				(bluetooth, WWAN, UWB...)
-	0x0008			HKEY event interface, hotkeys
-	0x0010			Fan control
-	0x0020			Backlight brightness
-	0x0040			Audio mixer/volume control
-	=============		======================================
-
-There is also a kernel build option to enable more debugging
-information, which may be necessary to debug driver problems.
-
-The level of debugging information output by the driver can be changed
-at runtime through sysfs, using the driver attribute debug_level.  The
-attribute takes the same bitmask as the debug module parameter above.
-
-
-Force loading of module
------------------------
-
-If thinkpad-acpi refuses to detect your ThinkPad, you can try to specify
-the module parameter force_load=1.  Regardless of whether this works or
-not, please contact ibm-acpi-devel@lists.sourceforge.net with a report.
-
-
-Sysfs interface changelog
-^^^^^^^^^^^^^^^^^^^^^^^^^
-
-=========	===============================================================
-0x000100:	Initial sysfs support, as a single platform driver and
-		device.
-0x000200:	Hot key support for 32 hot keys, and radio slider switch
-		support.
-0x010000:	Hot keys are now handled by default over the input
-		layer, the radio switch generates input event EV_RADIO,
-		and the driver enables hot key handling by default in
-		the firmware.
-
-0x020000:	ABI fix: added a separate hwmon platform device and
-		driver, which must be located by name (thinkpad)
-		and the hwmon class for libsensors4 (lm-sensors 3)
-		compatibility.  Moved all hwmon attributes to this
-		new platform device.
-
-0x020100:	Marker for thinkpad-acpi with hot key NVRAM polling
-		support.  If you must, use it to know you should not
-		start a userspace NVRAM poller (allows to detect when
-		NVRAM is compiled out by the user because it is
-		unneeded/undesired in the first place).
-0x020101:	Marker for thinkpad-acpi with hot key NVRAM polling
-		and proper hotkey_mask semantics (version 8 of the
-		NVRAM polling patch).  Some development snapshots of
-		0.18 had an earlier version that did strange things
-		to hotkey_mask.
-
-0x020200:	Add poll()/select() support to the following attributes:
-		hotkey_radio_sw, wakeup_hotunplug_complete, wakeup_reason
-
-0x020300:	hotkey enable/disable support removed, attributes
-		hotkey_bios_enabled and hotkey_enable deprecated and
-		marked for removal.
-
-0x020400:	Marker for 16 LEDs support.  Also, LEDs that are known
-		to not exist in a given model are not registered with
-		the LED sysfs class anymore.
-
-0x020500:	Updated hotkey driver, hotkey_mask is always available
-		and it is always able to disable hot keys.  Very old
-		thinkpads are properly supported.  hotkey_bios_mask
-		is deprecated and marked for removal.
-
-0x020600:	Marker for backlight change event support.
-
-0x020700:	Support for mute-only mixers.
-		Volume control in read-only mode by default.
-		Marker for ALSA mixer support.
-
-0x030000:	Thermal and fan sysfs attributes were moved to the hwmon
-		device instead of being attached to the backing platform
-		device.
-=========	===============================================================
diff --git a/Documentation/laptops/toshiba_haps.rst b/Documentation/laptops/toshiba_haps.rst
deleted file mode 100644
index 11dfc428c080..000000000000
--- a/Documentation/laptops/toshiba_haps.rst
+++ /dev/null
@@ -1,87 +0,0 @@
-====================================
-Toshiba HDD Active Protection Sensor
-====================================
-
-Kernel driver: toshiba_haps
-
-Author: Azael Avalos <coproscefalo@gmail.com>
-
-
-.. 0. Contents
-
-   1. Description
-   2. Interface
-   3. Accelerometer axes
-   4. Supported devices
-   5. Usage
-
-
-1. Description
---------------
-
-This driver provides support for the accelerometer found in various Toshiba
-laptops, being called "Toshiba HDD Protection - Shock Sensor" officially,
-and detects laptops automatically with this device.
-On Windows, Toshiba provided software monitors this device and provides
-automatic HDD protection (head unload) on sudden moves or harsh vibrations,
-however, this driver only provides a notification via a sysfs file to let
-userspace tools or daemons act accordingly, as well as providing a sysfs
-file to set the desired protection level or sensor sensibility.
-
-
-2. Interface
-------------
-
-This device comes with 3 methods:
-
-====	=====================================================================
-_STA    Checks existence of the device, returning Zero if the device does not
-	exists or is not supported.
-PTLV    Sets the desired protection level.
-RSSS    Shuts down the HDD protection interface for a few seconds,
-	then restores normal operation.
-====	=====================================================================
-
-Note:
-  The presence of Solid State Drives (SSD) can make this driver to fail loading,
-  given the fact that such drives have no movable parts, and thus, not requiring
-  any "protection" as well as failing during the evaluation of the _STA method
-  found under this device.
-
-
-3. Accelerometer axes
----------------------
-
-This device does not report any axes, however, to query the sensor position
-a couple HCI (Hardware Configuration Interface) calls (0x6D and 0xA6) are
-provided to query such information, handled by the kernel module toshiba_acpi
-since kernel version 3.15.
-
-
-4. Supported devices
---------------------
-
-This driver binds itself to the ACPI device TOS620A, and any Toshiba laptop
-with this device is supported, given the fact that they have the presence of
-conventional HDD and not only SSD, or a combination of both HDD and SSD.
-
-
-5. Usage
---------
-
-The sysfs files under /sys/devices/LNXSYSTM:00/LNXSYBUS:00/TOS620A:00/ are:
-
-================   ============================================================
-protection_level   The protection_level is readable and writeable, and
-		   provides a way to let userspace query the current protection
-		   level, as well as set the desired protection level, the
-		   available protection levels are:
-
-		   ============   =======   ==========   ========
-		   0 - Disabled   1 - Low   2 - Medium   3 - High
-		   ============   =======   ==========   ========
-
-reset_protection   The reset_protection entry is writeable only, being "1"
-		   the only parameter it accepts, it is used to trigger
-		   a reset of the protection interface.
-================   ============================================================
diff --git a/MAINTAINERS b/MAINTAINERS
index b0e044be81ac..288f84dbd480 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9029,7 +9029,7 @@ M:	Matan Ziv-Av <matan@svgalib.org>
 L:	platform-driver-x86@vger.kernel.org
 S:	Maintained
 F:	Documentation/ABI/testing/sysfs-platform-lg-laptop
-F:	Documentation/laptops/lg-laptop.rst
+F:	Documentation/admin-guide/laptops/lg-laptop.rst
 F:	drivers/platform/x86/lg-laptop.c
 
 LG2160 MEDIA DRIVER
@@ -14888,7 +14888,7 @@ M:	Mattia Dongili <malattia@linux.it>
 L:	platform-driver-x86@vger.kernel.org
 W:	http://www.linux.it/~malattia/wiki/index.php/Sony_drivers
 S:	Maintained
-F:	Documentation/laptops/sony-laptop.rst
+F:	Documentation/admin-guide/laptops/sony-laptop.rst
 F:	drivers/char/sonypi.c
 F:	drivers/platform/x86/sony-laptop.c
 F:	include/linux/sony-laptop.h
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index bb734066075f..442403abd73a 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -382,7 +382,7 @@ config SONYPI
 	  Device which can be found in many (all ?) Sony Vaio laptops.
 
 	  If you have one of those laptops, read
-	  <file:Documentation/laptops/sonypi.rst>, and say Y or M here.
+	  <file:Documentation/admin-guide/laptops/sonypi.rst>, and say Y or M here.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called sonypi.
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 8f91d9ef8a7b..5f580580a8e0 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -448,7 +448,7 @@ config SONY_LAPTOP
 	  screen brightness control, Fn keys and allows powering on/off some
 	  devices.
 
-	  Read <file:Documentation/laptops/sony-laptop.rst> for more information.
+	  Read <file:Documentation/admin-guide/laptops/sony-laptop.rst> for more information.
 
 config SONYPI_COMPAT
 	bool "Sonypi compatibility"
@@ -500,7 +500,7 @@ config THINKPAD_ACPI
 	  support for Fn-Fx key combinations, Bluetooth control, video
 	  output switching, ThinkLight control, UltraBay eject and more.
 	  For more information about this driver see
-	  <file:Documentation/laptops/thinkpad-acpi.rst> and
+	  <file:Documentation/admin-guide/laptops/thinkpad-acpi.rst> and
 	  <http://ibm-acpi.sf.net/> .
 
 	  This driver was formerly known as ibm-acpi.
-- 
cgit v1.2.3-55-g7522


From 330d48105245abfb8c9ca491dc53ea500657217a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 13 Jun 2019 15:21:39 -0300
Subject: docs: admin-guide: add kdump documentation into it

The Kdump documentation describes procedures with admins use
in order to solve issues on their systems.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/bug-hunting.rst         |   4 +-
 Documentation/admin-guide/index.rst               |   1 +
 Documentation/admin-guide/kdump/gdbmacros.txt     | 264 +++++++++++
 Documentation/admin-guide/kdump/index.rst         |  20 +
 Documentation/admin-guide/kdump/kdump.rst         | 534 ++++++++++++++++++++++
 Documentation/admin-guide/kdump/vmcoreinfo.rst    | 488 ++++++++++++++++++++
 Documentation/admin-guide/kernel-parameters.txt   |   6 +-
 Documentation/kdump/gdbmacros.txt                 | 264 -----------
 Documentation/kdump/index.rst                     |  21 -
 Documentation/kdump/kdump.rst                     | 534 ----------------------
 Documentation/kdump/vmcoreinfo.rst                | 488 --------------------
 Documentation/powerpc/firmware-assisted-dump.txt  |   2 +-
 Documentation/translations/zh_CN/oops-tracing.txt |   4 +-
 MAINTAINERS                                       |   2 +-
 arch/arm/Kconfig                                  |   2 +-
 arch/arm64/Kconfig                                |   2 +-
 arch/sh/Kconfig                                   |   2 +-
 arch/x86/Kconfig                                  |   4 +-
 18 files changed, 1321 insertions(+), 1321 deletions(-)
 create mode 100644 Documentation/admin-guide/kdump/gdbmacros.txt
 create mode 100644 Documentation/admin-guide/kdump/index.rst
 create mode 100644 Documentation/admin-guide/kdump/kdump.rst
 create mode 100644 Documentation/admin-guide/kdump/vmcoreinfo.rst
 delete mode 100644 Documentation/kdump/gdbmacros.txt
 delete mode 100644 Documentation/kdump/index.rst
 delete mode 100644 Documentation/kdump/kdump.rst
 delete mode 100644 Documentation/kdump/vmcoreinfo.rst

diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index b761aa2a51d2..44b8a4edd348 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -90,9 +90,9 @@ the disk is not available then you have three options:
     run a null modem to a second machine and capture the output there
     using your favourite communication program.  Minicom works well.
 
-(3) Use Kdump (see Documentation/kdump/kdump.rst),
+(3) Use Kdump (see Documentation/admin-guide/kdump/kdump.rst),
     extract the kernel ring buffer from old memory with using dmesg
-    gdbmacro in Documentation/kdump/gdbmacros.txt.
+    gdbmacro in Documentation/admin-guide/kdump/gdbmacros.txt.
 
 Finding the bug's location
 --------------------------
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 6fcc83aaa9b6..5b63182ceb5f 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -39,6 +39,7 @@ problems and bugs in particular.
    ramoops
    dynamic-debug-howto
    init
+   kdump/index
    perf/index
 
 This is the beginning of a section with information of interest to
diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt b/Documentation/admin-guide/kdump/gdbmacros.txt
new file mode 100644
index 000000000000..220d0a80ca2c
--- /dev/null
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -0,0 +1,264 @@
+#
+# This file contains a few gdb macros (user defined commands) to extract
+# useful information from kernel crashdump (kdump) like stack traces of
+# all the processes or a particular process and trapinfo.
+#
+# These macros can be used by copying this file in .gdbinit (put in home
+# directory or current directory) or by invoking gdb command with
+# --command=<command-file-name> option
+#
+# Credits:
+# Alexander Nyberg <alexn@telia.com>
+# V Srivatsa <vatsa@in.ibm.com>
+# Maneesh Soni <maneesh@in.ibm.com>
+#
+
+define bttnobp
+	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
+	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
+	set $init_t=&init_task
+	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
+	set var $stacksize = sizeof(union thread_union)
+	while ($next_t != $init_t)
+		set $next_t=(struct task_struct *)$next_t
+		printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm
+		printf "===================\n"
+		set var $stackp = $next_t.thread.sp
+		set var $stack_top = ($stackp & ~($stacksize - 1)) + $stacksize
+
+		while ($stackp < $stack_top)
+			if (*($stackp) > _stext && *($stackp) < _sinittext)
+				info symbol *($stackp)
+			end
+			set $stackp += 4
+		end
+		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
+		while ($next_th != $next_t)
+			set $next_th=(struct task_struct *)$next_th
+			printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm
+			printf "===================\n"
+			set var $stackp = $next_t.thread.sp
+			set var $stack_top = ($stackp & ~($stacksize - 1)) + stacksize
+
+			while ($stackp < $stack_top)
+				if (*($stackp) > _stext && *($stackp) < _sinittext)
+					info symbol *($stackp)
+				end
+				set $stackp += 4
+			end
+			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
+		end
+		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
+	end
+end
+document bttnobp
+	dump all thread stack traces on a kernel compiled with !CONFIG_FRAME_POINTER
+end
+
+define btthreadstack
+	set var $pid_task = $arg0
+
+	printf "\npid %d; comm %s:\n", $pid_task.pid, $pid_task.comm
+	printf "task struct: "
+	print $pid_task
+	printf "===================\n"
+	set var $stackp = $pid_task.thread.sp
+	set var $stacksize = sizeof(union thread_union)
+	set var $stack_top = ($stackp & ~($stacksize - 1)) + $stacksize
+	set var $stack_bot = ($stackp & ~($stacksize - 1))
+
+	set $stackp = *((unsigned long *) $stackp)
+	while (($stackp < $stack_top) && ($stackp > $stack_bot))
+		set var $addr = *(((unsigned long *) $stackp) + 1)
+		info symbol $addr
+		set $stackp = *((unsigned long *) $stackp)
+	end
+end
+document btthreadstack
+	 dump a thread stack using the given task structure pointer
+end
+
+
+define btt
+	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
+	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
+	set $init_t=&init_task
+	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
+	while ($next_t != $init_t)
+		set $next_t=(struct task_struct *)$next_t
+		btthreadstack $next_t
+
+		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
+		while ($next_th != $next_t)
+			set $next_th=(struct task_struct *)$next_th
+			btthreadstack $next_th
+			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
+		end
+		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
+	end
+end
+document btt
+	dump all thread stack traces on a kernel compiled with CONFIG_FRAME_POINTER
+end
+
+define btpid
+	set var $pid = $arg0
+	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
+	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
+	set $init_t=&init_task
+	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
+	set var $pid_task = 0
+
+	while ($next_t != $init_t)
+		set $next_t=(struct task_struct *)$next_t
+
+		if ($next_t.pid == $pid)
+			set $pid_task = $next_t
+		end
+
+		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
+		while ($next_th != $next_t)
+			set $next_th=(struct task_struct *)$next_th
+			if ($next_th.pid == $pid)
+				set $pid_task = $next_th
+			end
+			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
+		end
+		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
+	end
+
+	btthreadstack $pid_task
+end
+document btpid
+	backtrace of pid
+end
+
+
+define trapinfo
+	set var $pid = $arg0
+	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
+	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
+	set $init_t=&init_task
+	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
+	set var $pid_task = 0
+
+	while ($next_t != $init_t)
+		set $next_t=(struct task_struct *)$next_t
+
+		if ($next_t.pid == $pid)
+			set $pid_task = $next_t
+		end
+
+		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
+		while ($next_th != $next_t)
+			set $next_th=(struct task_struct *)$next_th
+			if ($next_th.pid == $pid)
+				set $pid_task = $next_th
+			end
+			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
+		end
+		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
+	end
+
+	printf "Trapno %ld, cr2 0x%lx, error_code %ld\n", $pid_task.thread.trap_no, \
+				$pid_task.thread.cr2, $pid_task.thread.error_code
+
+end
+document trapinfo
+	Run info threads and lookup pid of thread #1
+	'trapinfo <pid>' will tell you by which trap & possibly
+	address the kernel panicked.
+end
+
+define dump_log_idx
+	set $idx = $arg0
+	if ($argc > 1)
+		set $prev_flags = $arg1
+	else
+		set $prev_flags = 0
+	end
+	set $msg = ((struct printk_log *) (log_buf + $idx))
+	set $prefix = 1
+	set $newline = 1
+	set $log = log_buf + $idx + sizeof(*$msg)
+
+	# prev & LOG_CONT && !(msg->flags & LOG_PREIX)
+	if (($prev_flags & 8) && !($msg->flags & 4))
+		set $prefix = 0
+	end
+
+	# msg->flags & LOG_CONT
+	if ($msg->flags & 8)
+		# (prev & LOG_CONT && !(prev & LOG_NEWLINE))
+		if (($prev_flags & 8) && !($prev_flags & 2))
+			set $prefix = 0
+		end
+		# (!(msg->flags & LOG_NEWLINE))
+		if (!($msg->flags & 2))
+			set $newline = 0
+		end
+	end
+
+	if ($prefix)
+		printf "[%5lu.%06lu] ", $msg->ts_nsec / 1000000000, $msg->ts_nsec % 1000000000
+	end
+	if ($msg->text_len != 0)
+		eval "printf \"%%%d.%ds\", $log", $msg->text_len, $msg->text_len
+	end
+	if ($newline)
+		printf "\n"
+	end
+	if ($msg->dict_len > 0)
+		set $dict = $log + $msg->text_len
+		set $idx = 0
+		set $line = 1
+		while ($idx < $msg->dict_len)
+			if ($line)
+				printf " "
+				set $line = 0
+			end
+			set $c = $dict[$idx]
+			if ($c == '\0')
+				printf "\n"
+				set $line = 1
+			else
+				if ($c < ' ' || $c >= 127 || $c == '\\')
+					printf "\\x%02x", $c
+				else
+					printf "%c", $c
+				end
+			end
+			set $idx = $idx + 1
+		end
+		printf "\n"
+	end
+end
+document dump_log_idx
+	Dump a single log given its index in the log buffer.  The first
+	parameter is the index into log_buf, the second is optional and
+	specified the previous log buffer's flags, used for properly
+	formatting continued lines.
+end
+
+define dmesg
+	set $i = log_first_idx
+	set $end_idx = log_first_idx
+	set $prev_flags = 0
+
+	while (1)
+		set $msg = ((struct printk_log *) (log_buf + $i))
+		if ($msg->len == 0)
+			set $i = 0
+		else
+			dump_log_idx $i $prev_flags
+			set $i = $i + $msg->len
+			set $prev_flags = $msg->flags
+		end
+		if ($i == $end_idx)
+			loop_break
+		end
+	end
+end
+document dmesg
+	print the kernel ring buffer
+end
diff --git a/Documentation/admin-guide/kdump/index.rst b/Documentation/admin-guide/kdump/index.rst
new file mode 100644
index 000000000000..8e2ebd0383cd
--- /dev/null
+++ b/Documentation/admin-guide/kdump/index.rst
@@ -0,0 +1,20 @@
+
+================================================================
+Documentation for Kdump - The kexec-based Crash Dumping Solution
+================================================================
+
+This document includes overview, setup and installation, and analysis
+information.
+
+.. toctree::
+    :maxdepth: 1
+
+    kdump
+    vmcoreinfo
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
new file mode 100644
index 000000000000..ac7e131d2935
--- /dev/null
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -0,0 +1,534 @@
+================================================================
+Documentation for Kdump - The kexec-based Crash Dumping Solution
+================================================================
+
+This document includes overview, setup and installation, and analysis
+information.
+
+Overview
+========
+
+Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
+dump of the system kernel's memory needs to be taken (for example, when
+the system panics). The system kernel's memory image is preserved across
+the reboot and is accessible to the dump-capture kernel.
+
+You can use common commands, such as cp and scp, to copy the
+memory image to a dump file on the local disk, or across the network to
+a remote system.
+
+Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
+s390x, arm and arm64 architectures.
+
+When the system kernel boots, it reserves a small section of memory for
+the dump-capture kernel. This ensures that ongoing Direct Memory Access
+(DMA) from the system kernel does not corrupt the dump-capture kernel.
+The kexec -p command loads the dump-capture kernel into this reserved
+memory.
+
+On x86 machines, the first 640 KB of physical memory is needed to boot,
+regardless of where the kernel loads. Therefore, kexec backs up this
+region just before rebooting into the dump-capture kernel.
+
+Similarly on PPC64 machines first 32KB of physical memory is needed for
+booting regardless of where the kernel is loaded and to support 64K page
+size kexec backs up the first 64KB memory.
+
+For s390x, when kdump is triggered, the crashkernel region is exchanged
+with the region [0, crashkernel region size] and then the kdump kernel
+runs in [0, crashkernel region size]. Therefore no relocatable kernel is
+needed for s390x.
+
+All of the necessary information about the system kernel's core image is
+encoded in the ELF format, and stored in a reserved area of memory
+before a crash. The physical address of the start of the ELF header is
+passed to the dump-capture kernel through the elfcorehdr= boot
+parameter. Optionally the size of the ELF header can also be passed
+when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax.
+
+
+With the dump-capture kernel, you can access the memory image through
+/proc/vmcore. This exports the dump as an ELF-format file that you can
+write out using file copy commands such as cp or scp. Further, you can
+use analysis tools such as the GNU Debugger (GDB) and the Crash tool to
+debug the dump file. This method ensures that the dump pages are correctly
+ordered.
+
+
+Setup and Installation
+======================
+
+Install kexec-tools
+-------------------
+
+1) Login as the root user.
+
+2) Download the kexec-tools user-space package from the following URL:
+
+http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
+
+This is a symlink to the latest version.
+
+The latest kexec-tools git tree is available at:
+
+- git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
+- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
+
+There is also a gitweb interface available at
+http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
+
+More information about kexec-tools can be found at
+http://horms.net/projects/kexec/
+
+3) Unpack the tarball with the tar command, as follows::
+
+	tar xvpzf kexec-tools.tar.gz
+
+4) Change to the kexec-tools directory, as follows::
+
+	cd kexec-tools-VERSION
+
+5) Configure the package, as follows::
+
+	./configure
+
+6) Compile the package, as follows::
+
+	make
+
+7) Install the package, as follows::
+
+	make install
+
+
+Build the system and dump-capture kernels
+-----------------------------------------
+There are two possible methods of using Kdump.
+
+1) Build a separate custom dump-capture kernel for capturing the
+   kernel core dump.
+
+2) Or use the system kernel binary itself as dump-capture kernel and there is
+   no need to build a separate dump-capture kernel. This is possible
+   only with the architectures which support a relocatable kernel. As
+   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+   relocatable kernel.
+
+Building a relocatable kernel is advantageous from the point of view that
+one does not have to build a second kernel for capturing the dump. But
+at the same time one might want to build a custom dump capture kernel
+suitable to his needs.
+
+Following are the configuration setting required for system and
+dump-capture kernels for enabling kdump support.
+
+System kernel config options
+----------------------------
+
+1) Enable "kexec system call" in "Processor type and features."::
+
+	CONFIG_KEXEC=y
+
+2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
+   filesystems." This is usually enabled by default::
+
+	CONFIG_SYSFS=y
+
+   Note that "sysfs file system support" might not appear in the "Pseudo
+   filesystems" menu if "Configure standard kernel features (for small
+   systems)" is not enabled in "General Setup." In this case, check the
+   .config file itself to ensure that sysfs is turned on, as follows::
+
+	grep 'CONFIG_SYSFS' .config
+
+3) Enable "Compile the kernel with debug info" in "Kernel hacking."::
+
+	CONFIG_DEBUG_INFO=Y
+
+   This causes the kernel to be built with debug symbols. The dump
+   analysis tools require a vmlinux with debug symbols in order to read
+   and analyze a dump file.
+
+Dump-capture kernel config options (Arch Independent)
+-----------------------------------------------------
+
+1) Enable "kernel crash dumps" support under "Processor type and
+   features"::
+
+	CONFIG_CRASH_DUMP=y
+
+2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems"::
+
+	CONFIG_PROC_VMCORE=y
+
+   (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
+
+Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
+--------------------------------------------------------------------
+
+1) On i386, enable high memory support under "Processor type and
+   features"::
+
+	CONFIG_HIGHMEM64G=y
+
+   or::
+
+	CONFIG_HIGHMEM4G
+
+2) On i386 and x86_64, disable symmetric multi-processing support
+   under "Processor type and features"::
+
+	CONFIG_SMP=n
+
+   (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
+   when loading the dump-capture kernel, see section "Load the Dump-capture
+   Kernel".)
+
+3) If one wants to build and use a relocatable kernel,
+   Enable "Build a relocatable kernel" support under "Processor type and
+   features"::
+
+	CONFIG_RELOCATABLE=y
+
+4) Use a suitable value for "Physical address where the kernel is
+   loaded" (under "Processor type and features"). This only appears when
+   "kernel crash dumps" is enabled. A suitable value depends upon
+   whether kernel is relocatable or not.
+
+   If you are using a relocatable kernel use CONFIG_PHYSICAL_START=0x100000
+   This will compile the kernel for physical address 1MB, but given the fact
+   kernel is relocatable, it can be run from any physical address hence
+   kexec boot loader will load it in memory region reserved for dump-capture
+   kernel.
+
+   Otherwise it should be the start of memory region reserved for
+   second kernel using boot parameter "crashkernel=Y@X". Here X is
+   start of memory region reserved for dump-capture kernel.
+   Generally X is 16MB (0x1000000). So you can set
+   CONFIG_PHYSICAL_START=0x1000000
+
+5) Make and install the kernel and its modules. DO NOT add this kernel
+   to the boot loader configuration files.
+
+Dump-capture kernel config options (Arch Dependent, ppc64)
+----------------------------------------------------------
+
+1) Enable "Build a kdump crash kernel" support under "Kernel" options::
+
+	CONFIG_CRASH_DUMP=y
+
+2)   Enable "Build a relocatable kernel" support::
+
+	CONFIG_RELOCATABLE=y
+
+   Make and install the kernel and its modules.
+
+Dump-capture kernel config options (Arch Dependent, ia64)
+----------------------------------------------------------
+
+- No specific options are required to create a dump-capture kernel
+  for ia64, other than those specified in the arch independent section
+  above. This means that it is possible to use the system kernel
+  as a dump-capture kernel if desired.
+
+  The crashkernel region can be automatically placed by the system
+  kernel at run time. This is done by specifying the base address as 0,
+  or omitting it all together::
+
+	crashkernel=256M@0
+
+  or::
+
+	crashkernel=256M
+
+  If the start address is specified, note that the start address of the
+  kernel will be aligned to 64Mb, so if the start address is not then
+  any space below the alignment point will be wasted.
+
+Dump-capture kernel config options (Arch Dependent, arm)
+----------------------------------------------------------
+
+-   To use a relocatable kernel,
+    Enable "AUTO_ZRELADDR" support under "Boot" options::
+
+	AUTO_ZRELADDR=y
+
+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+- Please note that kvm of the dump-capture kernel will not be enabled
+  on non-VHE systems even if it is configured. This is because the CPU
+  will not be reset to EL2 on panic.
+
+Extended crashkernel syntax
+===========================
+
+While the "crashkernel=size[@offset]" syntax is sufficient for most
+configurations, sometimes it's handy to have the reserved memory dependent
+on the value of System RAM -- that's mostly for distributors that pre-setup
+the kernel command line to avoid a unbootable system after some memory has
+been removed from the machine.
+
+The syntax is::
+
+    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
+    range=start-[end]
+
+For example::
+
+    crashkernel=512M-2G:64M,2G-:128M
+
+This would mean:
+
+    1) if the RAM is smaller than 512M, then don't reserve anything
+       (this is the "rescue" case)
+    2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
+    3) if the RAM size is larger than 2G, then reserve 128M
+
+
+
+Boot into System Kernel
+=======================
+
+1) Update the boot loader (such as grub, yaboot, or lilo) configuration
+   files as necessary.
+
+2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
+   where Y specifies how much memory to reserve for the dump-capture kernel
+   and X specifies the beginning of this reserved memory. For example,
+   "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
+   starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
+
+   On x86 and x86_64, use "crashkernel=64M@16M".
+
+   On ppc64, use "crashkernel=128M@32M".
+
+   On ia64, 256M@256M is a generous value that typically works.
+   The region may be automatically placed on ia64, see the
+   dump-capture kernel config option notes above.
+   If use sparse memory, the size should be rounded to GRANULE boundaries.
+
+   On s390x, typically use "crashkernel=xxM". The value of xx is dependent
+   on the memory consumption of the kdump system. In general this is not
+   dependent on the memory size of the production system.
+
+   On arm, the use of "crashkernel=Y@X" is no longer necessary; the
+   kernel will automatically locate the crash kernel image within the
+   first 512MB of RAM if X is not given.
+
+   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+
+Load the Dump-capture Kernel
+============================
+
+After booting to the system kernel, dump-capture kernel needs to be
+loaded.
+
+Based on the architecture and type of image (relocatable or not), one
+can choose to load the uncompressed vmlinux or compressed bzImage/vmlinuz
+of dump-capture kernel. Following is the summary.
+
+For i386 and x86_64:
+
+	- Use vmlinux if kernel is not relocatable.
+	- Use bzImage/vmlinuz if kernel is relocatable.
+
+For ppc64:
+
+	- Use vmlinux
+
+For ia64:
+
+	- Use vmlinux or vmlinuz.gz
+
+For s390x:
+
+	- Use image or bzImage
+
+For arm:
+
+	- Use zImage
+
+For arm64:
+
+	- Use vmlinux or Image
+
+If you are using an uncompressed vmlinux image then use following command
+to load dump-capture kernel::
+
+   kexec -p <dump-capture-kernel-vmlinux-image> \
+   --initrd=<initrd-for-dump-capture-kernel> --args-linux \
+   --append="root=<root-dev> <arch-specific-options>"
+
+If you are using a compressed bzImage/vmlinuz, then use following command
+to load dump-capture kernel::
+
+   kexec -p <dump-capture-kernel-bzImage> \
+   --initrd=<initrd-for-dump-capture-kernel> \
+   --append="root=<root-dev> <arch-specific-options>"
+
+If you are using a compressed zImage, then use following command
+to load dump-capture kernel::
+
+   kexec --type zImage -p <dump-capture-kernel-bzImage> \
+   --initrd=<initrd-for-dump-capture-kernel> \
+   --dtb=<dtb-for-dump-capture-kernel> \
+   --append="root=<root-dev> <arch-specific-options>"
+
+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel::
+
+   kexec -p <dump-capture-kernel-Image> \
+   --initrd=<initrd-for-dump-capture-kernel> \
+   --append="root=<root-dev> <arch-specific-options>"
+
+Please note, that --args-linux does not need to be specified for ia64.
+It is planned to make this a no-op on that architecture, but for now
+it should be omitted
+
+Following are the arch specific command line options to be used while
+loading dump-capture kernel.
+
+For i386, x86_64 and ia64:
+
+	"1 irqpoll maxcpus=1 reset_devices"
+
+For ppc64:
+
+	"1 maxcpus=1 noirqdistrib reset_devices"
+
+For s390x:
+
+	"1 maxcpus=1 cgroup_disable=memory"
+
+For arm:
+
+	"1 maxcpus=1 reset_devices"
+
+For arm64:
+
+	"1 maxcpus=1 reset_devices"
+
+Notes on loading the dump-capture kernel:
+
+* By default, the ELF headers are stored in ELF64 format to support
+  systems with more than 4GB memory. On i386, kexec automatically checks if
+  the physical RAM size exceeds the 4 GB limit and if not, uses ELF32.
+  So, on non-PAE systems, ELF32 is always used.
+
+  The --elf32-core-headers option can be used to force the generation of ELF32
+  headers. This is necessary because GDB currently cannot open vmcore files
+  with ELF64 headers on 32-bit systems.
+
+* The "irqpoll" boot parameter reduces driver initialization failures
+  due to shared interrupts in the dump-capture kernel.
+
+* You must specify <root-dev> in the format corresponding to the root
+  device name in the output of mount command.
+
+* Boot parameter "1" boots the dump-capture kernel into single-user
+  mode without networking. If you want networking, use "3".
+
+* We generally don't have to bring up a SMP kernel just to capture the
+  dump. Hence generally it is useful either to build a UP dump-capture
+  kernel or specify maxcpus=1 option while loading dump-capture kernel.
+  Note, though maxcpus always works, you had better replace it with
+  nr_cpus to save memory if supported by the current ARCH, such as x86.
+
+* You should enable multi-cpu support in dump-capture kernel if you intend
+  to use multi-thread programs with it, such as parallel dump feature of
+  makedumpfile. Otherwise, the multi-thread program may have a great
+  performance degradation. To enable multi-cpu support, you should bring up an
+  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
+  options while loading it.
+
+* For s390x there are two kdump modes: If a ELF header is specified with
+  the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
+  is done on all other architectures. If no elfcorehdr= kernel parameter is
+  specified, the s390x kdump kernel dynamically creates the header. The
+  second mode has the advantage that for CPU and memory hotplug, kdump has
+  not to be reloaded with kexec_load().
+
+* For s390x systems with many attached devices the "cio_ignore" kernel
+  parameter should be used for the kdump kernel in order to prevent allocation
+  of kernel memory for devices that are not relevant for kdump. The same
+  applies to systems that use SCSI/FCP devices. In that case the
+  "allow_lun_scan" zfcp module parameter should be set to zero before
+  setting FCP devices online.
+
+Kernel Panic
+============
+
+After successfully loading the dump-capture kernel as previously
+described, the system will reboot into the dump-capture kernel if a
+system crash is triggered.  Trigger points are located in panic(),
+die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
+
+The following conditions will execute a crash trigger point:
+
+If a hard lockup is detected and "NMI watchdog" is configured, the system
+will boot into the dump-capture kernel ( die_nmi() ).
+
+If die() is called, and it happens to be a thread with pid 0 or 1, or die()
+is called inside interrupt context or die() is called and panic_on_oops is set,
+the system will boot into the dump-capture kernel.
+
+On powerpc systems when a soft-reset is generated, die() is called by all cpus
+and the system will boot into the dump-capture kernel.
+
+For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
+"echo c > /proc/sysrq-trigger" or write a module to force the panic.
+
+Write Out the Dump File
+=======================
+
+After the dump-capture kernel is booted, write out the dump file with
+the following command::
+
+   cp /proc/vmcore <dump-file>
+
+
+Analysis
+========
+
+Before analyzing the dump image, you should reboot into a stable kernel.
+
+You can do limited analysis using GDB on the dump file copied out of
+/proc/vmcore. Use the debug vmlinux built with -g and run the following
+command::
+
+   gdb vmlinux <dump-file>
+
+Stack trace for the task on processor 0, register display, and memory
+display work fine.
+
+Note: GDB cannot analyze core files generated in ELF64 format for x86.
+On systems with a maximum of 4GB of memory, you can generate
+ELF32-format headers using the --elf32-core-headers kernel option on the
+dump kernel.
+
+You can also use the Crash utility to analyze dump files in Kdump
+format. Crash is available on Dave Anderson's site at the following URL:
+
+   http://people.redhat.com/~anderson/
+
+Trigger Kdump on WARN()
+=======================
+
+The kernel parameter, panic_on_warn, calls panic() in all WARN() paths.  This
+will cause a kdump to occur at the panic() call.  In cases where a user wants
+to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
+to achieve the same behaviour.
+
+Contact
+=======
+
+- Vivek Goyal (vgoyal@redhat.com)
+- Maneesh Soni (maneesh@in.ibm.com)
+
+GDB macros
+==========
+
+.. include:: gdbmacros.txt
+   :literal:
diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
new file mode 100644
index 000000000000..007a6b86e0ee
--- /dev/null
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -0,0 +1,488 @@
+==========
+VMCOREINFO
+==========
+
+What is it?
+===========
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+Common variables
+================
+
+init_uts_ns.name.release
+------------------------
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built. For example, crash uses it to
+find the corresponding vmlinux in order to process vmcore.
+
+PAGE_SIZE
+---------
+
+The size of a page. It is the smallest unit of data used by the memory
+management facilities. It is usually 4096 bytes of size and a page is
+aligned on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+-----------
+
+The UTS namespace which is used to isolate two specific elements of the
+system that relate to the uname(2) system call. It is named after the
+data structure used to store information returned by the uname(2) system
+call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---------------
+
+An array node_states[N_ONLINE] which represents the set of online nodes
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+--------------
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+------
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--------------
+
+Stores the virtual area list. makedumpfile gets the vmalloc start value
+from this variable and its value is necessary for vmalloc translation.
+
+mem_map
+-------
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+----------------
+
+Makedumpfile gets the pglist_data structure from this symbol, which is
+used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--------------------------------------------------------------------------
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+----
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute contiguous memory.
+
+pglist_data
+-----------
+
+The size of a pglist_data structure. This value is used to check if the
+pglist_data structure is valid. It is also used for checking the memory
+type.
+
+zone
+----
+
+The size of a zone structure. This value is used to check if the zone
+structure has been found. It is also used for excluding free pages.
+
+free_area
+---------
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful when excluding free pages.
+
+list_head
+---------
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+----------
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
+-------------------------------------------------------------------------------------------------
+
+User-space tools compute their values based on the offset of these
+variables. The variables are used when excluding unnecessary pages.
+
+(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_spanned_pages|node_id)
+-----------------------------------------------------------------------------------------
+
+On NUMA machines, each NUMA node has a pg_data_t to describe its memory
+layout. On UMA machines there is a single pglist_data which describes the
+whole memory.
+
+These values are used to check the memory type and to compute the
+virtual address for memory map.
+
+(zone, free_area|vm_stat|spanned_pages)
+---------------------------------------
+
+Each node is divided into a number of blocks called zones which
+represent ranges within memory. A zone is described by a structure zone.
+
+User-space tools compute required values based on the offset of these
+variables.
+
+(free_area, free_list)
+----------------------
+
+Offset of the free_list's member. This value is used to compute the number
+of free pages.
+
+Each zone has a free_area structure array called free_area[MAX_ORDER].
+The free_list represents a linked list of free page blocks.
+
+(list_head, next|prev)
+----------------------
+
+Offsets of the list_head's members. list_head is used to define a
+circular linked list. User-space tools need these in order to traverse
+lists.
+
+(vmap_area, va_start|list)
+--------------------------
+
+Offsets of the vmap_area's members. They carry vmalloc-specific
+information. Makedumpfile gets the start address of the vmalloc region
+from this.
+
+(zone.free_area, MAX_ORDER)
+---------------------------
+
+Free areas descriptor. User-space tools use this value to iterate the
+free_area ranges. MAX_ORDER is used by the zone buddy allocator.
+
+log_first_idx
+-------------
+
+Index of the first record stored in the buffer log_buf. Used by
+user-space tools to read the strings in the log_buf.
+
+log_buf
+-------
+
+Console output is written to the ring buffer log_buf at index
+log_first_idx. Used to get the kernel log.
+
+log_buf_len
+-----------
+
+log_buf's length.
+
+clear_idx
+---------
+
+The index that the next printk() record to read after the last clear
+command. It indicates the first record after the last SYSLOG_ACTION
+_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
+the dmesg log.
+
+log_next_idx
+------------
+
+The index of the next record to store in the buffer log_buf. Used to
+compute the index of the current buffer position.
+
+printk_log
+----------
+
+The size of a structure printk_log. Used to compute the size of
+messages, and extract dmesg log. It encapsulates header information for
+log_buf, such as timestamp, syslog level, etc.
+
+(printk_log, ts_nsec|len|text_len|dict_len)
+-------------------------------------------
+
+It represents field offsets in struct printk_log. User space tools
+parse it and check whether the values of printk_log's members have been
+changed.
+
+(free_area.free_list, MIGRATE_TYPES)
+------------------------------------
+
+The number of migrate types for pages. The free_list is described by the
+array. Used by tools to compute the number of free pages.
+
+NR_FREE_PAGES
+-------------
+
+On linux-2.6.21 or later, the number of free pages is in
+vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
+
+PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
+------------------------------------------------------------------------------
+
+Page attributes. These flags are used to filter various unnecessary for
+dumping pages.
+
+PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
+-----------------------------------------------------------------------------
+
+More page attributes. These flags are used to filter various unnecessary for
+dumping pages.
+
+
+HUGETLB_PAGE_DTOR
+-----------------
+
+The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
+excludes these pages.
+
+x86_64
+======
+
+phys_base
+---------
+
+Used to convert the virtual address of an exported kernel symbol to its
+corresponding physical address.
+
+init_top_pgt
+------------
+
+Used to walk through the whole page table and convert virtual addresses
+to physical addresses. The init_top_pgt is somewhat similar to
+swapper_pg_dir, but it is only used in x86_64.
+
+pgtable_l5_enabled
+------------------
+
+User-space tools need to know whether the crash kernel was in 5-level
+paging mode.
+
+node_data
+---------
+
+This is a struct pglist_data array and stores all NUMA nodes
+information. Makedumpfile gets the pglist_data structure from it.
+
+(node_data, MAX_NUMNODES)
+-------------------------
+
+The maximum number of nodes in system.
+
+KERNELOFFSET
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
+
+KERNEL_IMAGE_SIZE
+-----------------
+
+Currently unused by Makedumpfile. Used to compute the module virtual
+address by Crash.
+
+sme_mask
+--------
+
+AMD-specific with SME support: it indicates the secure memory encryption
+mask. Makedumpfile tools need to know whether the crash kernel was
+encrypted. If SME is enabled in the first kernel, the crash kernel's
+page table entries (pgd/pud/pmd/pte) contain the memory encryption
+mask. This is used to remove the SME mask and obtain the true physical
+address.
+
+Currently, sme_mask stores the value of the C-bit position. If needed,
+additional SME-relevant info can be placed in that variable.
+
+For example::
+
+  [ misc	        ][ enc bit  ][ other misc SME info       ]
+  0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
+  63   59   55   51   47   43   39   35   31   27   ... 3
+
+x86_32
+======
+
+X86_PAE
+-------
+
+Denotes whether physical address extensions are enabled. It has the cost
+of a higher page table lookup overhead, and also consumes more page
+table space per process. Used to check whether PAE was enabled in the
+crash kernel when converting virtual addresses to physical addresses.
+
+ia64
+====
+
+pgdat_list|(pgdat_list, MAX_NUMNODES)
+-------------------------------------
+
+pg_data_t array storing all NUMA nodes information. MAX_NUMNODES
+indicates the number of the nodes.
+
+node_memblk|(node_memblk, NR_NODE_MEMBLKS)
+------------------------------------------
+
+List of node memory chunks. Filled when parsing the SRAT table to obtain
+information about memory nodes. NR_NODE_MEMBLKS indicates the number of
+node memory chunks.
+
+These values are used to compute the number of nodes the crashed kernel used.
+
+node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
+----------------------------------------------------------------
+
+The size of a struct node_memblk_s and the offsets of the
+node_memblk_s's members. Used to compute the number of nodes.
+
+PGTABLE_3|PGTABLE_4
+-------------------
+
+User-space tools need to know whether the crash kernel was in 3-level or
+4-level paging mode. Used to distinguish the page table.
+
+ARM64
+=====
+
+VA_BITS
+-------
+
+The maximum number of bits for virtual addresses. Used to compute the
+virtual memory ranges.
+
+kimage_voffset
+--------------
+
+The offset between the kernel virtual and physical mappings. Used to
+translate virtual to physical addresses.
+
+PHYS_OFFSET
+-----------
+
+Indicates the physical address of the start of memory. Similar to
+kimage_voffset, which is used to translate virtual to physical
+addresses.
+
+KERNELOFFSET
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
+
+arm
+===
+
+ARM_LPAE
+--------
+
+It indicates whether the crash kernel supports large physical address
+extensions. Used to translate virtual to physical addresses.
+
+s390
+====
+
+lowcore_ptr
+-----------
+
+An array with a pointer to the lowcore of every CPU. Used to print the
+psw and all registers information.
+
+high_memory
+-----------
+
+Used to get the vmalloc_start address from the high_memory symbol.
+
+(lowcore_ptr, NR_CPUS)
+----------------------
+
+The maximum number of CPUs.
+
+powerpc
+=======
+
+
+node_data|(node_data, MAX_NUMNODES)
+-----------------------------------
+
+See above.
+
+contig_page_data
+----------------
+
+See above.
+
+vmemmap_list
+------------
+
+The vmemmap_list maintains the entire vmemmap physical mapping. Used
+to get vmemmap list count and populated vmemmap regions info. If the
+vmemmap address translation information is stored in the crash kernel,
+it is used to translate vmemmap kernel virtual addresses.
+
+mmu_vmemmap_psize
+-----------------
+
+The size of a page. Used to translate virtual to physical addresses.
+
+mmu_psize_defs
+--------------
+
+Page size definitions, i.e. 4k, 64k, or 16M.
+
+Used to make vtop translations.
+
+vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|(vmemmap_backing, virt_addr)
+--------------------------------------------------------------------------------------------
+
+The vmemmap virtual address space management does not have a traditional
+page table to track which virtual struct pages are backed by a physical
+mapping. The virtual to physical mappings are tracked in a simple linked
+list format.
+
+User-space tools need to know the offset of list, phys and virt_addr
+when computing the count of vmemmap regions.
+
+mmu_psize_def|(mmu_psize_def, shift)
+------------------------------------
+
+The size of a struct mmu_psize_def and the offset of mmu_psize_def's
+member.
+
+Used in vtop translations.
+
+sh
+==
+
+node_data|(node_data, MAX_NUMNODES)
+-----------------------------------
+
+See above.
+
+X2TLB
+-----
+
+Indicates whether the crashed kernel enabled SH extended mode.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4821175a3769..e645b3ab4b6f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -708,14 +708,14 @@
 			[KNL, x86_64] select a region under 4G first, and
 			fall back to reserve region above 4G when '@offset'
 			hasn't been specified.
-			See Documentation/kdump/kdump.rst for further details.
+			See Documentation/admin-guide/kdump/kdump.rst for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
 			[KNL] Same as above, but depends on the memory
 			in the running system. The syntax of range is
 			start-[end] where start and end are both
 			a memory unit (amount[KMG]). See also
-			Documentation/kdump/kdump.rst for an example.
+			Documentation/admin-guide/kdump/kdump.rst for an example.
 
 	crashkernel=size[KMG],high
 			[KNL, x86_64] range could be above 4G. Allow kernel
@@ -1207,7 +1207,7 @@
 			Specifies physical address of start of kernel core
 			image elf header and optionally the size. Generally
 			kexec loader will pass this option to capture kernel.
-			See Documentation/kdump/kdump.rst for details.
+			See Documentation/admin-guide/kdump/kdump.rst for details.
 
 	enable_mtrr_cleanup [X86]
 			The kernel tries to adjust MTRR layout from continuous
diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt
deleted file mode 100644
index 220d0a80ca2c..000000000000
--- a/Documentation/kdump/gdbmacros.txt
+++ /dev/null
@@ -1,264 +0,0 @@
-#
-# This file contains a few gdb macros (user defined commands) to extract
-# useful information from kernel crashdump (kdump) like stack traces of
-# all the processes or a particular process and trapinfo.
-#
-# These macros can be used by copying this file in .gdbinit (put in home
-# directory or current directory) or by invoking gdb command with
-# --command=<command-file-name> option
-#
-# Credits:
-# Alexander Nyberg <alexn@telia.com>
-# V Srivatsa <vatsa@in.ibm.com>
-# Maneesh Soni <maneesh@in.ibm.com>
-#
-
-define bttnobp
-	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
-	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
-	set $init_t=&init_task
-	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
-	set var $stacksize = sizeof(union thread_union)
-	while ($next_t != $init_t)
-		set $next_t=(struct task_struct *)$next_t
-		printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm
-		printf "===================\n"
-		set var $stackp = $next_t.thread.sp
-		set var $stack_top = ($stackp & ~($stacksize - 1)) + $stacksize
-
-		while ($stackp < $stack_top)
-			if (*($stackp) > _stext && *($stackp) < _sinittext)
-				info symbol *($stackp)
-			end
-			set $stackp += 4
-		end
-		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
-		while ($next_th != $next_t)
-			set $next_th=(struct task_struct *)$next_th
-			printf "\npid %d; comm %s:\n", $next_t.pid, $next_t.comm
-			printf "===================\n"
-			set var $stackp = $next_t.thread.sp
-			set var $stack_top = ($stackp & ~($stacksize - 1)) + stacksize
-
-			while ($stackp < $stack_top)
-				if (*($stackp) > _stext && *($stackp) < _sinittext)
-					info symbol *($stackp)
-				end
-				set $stackp += 4
-			end
-			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
-		end
-		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
-	end
-end
-document bttnobp
-	dump all thread stack traces on a kernel compiled with !CONFIG_FRAME_POINTER
-end
-
-define btthreadstack
-	set var $pid_task = $arg0
-
-	printf "\npid %d; comm %s:\n", $pid_task.pid, $pid_task.comm
-	printf "task struct: "
-	print $pid_task
-	printf "===================\n"
-	set var $stackp = $pid_task.thread.sp
-	set var $stacksize = sizeof(union thread_union)
-	set var $stack_top = ($stackp & ~($stacksize - 1)) + $stacksize
-	set var $stack_bot = ($stackp & ~($stacksize - 1))
-
-	set $stackp = *((unsigned long *) $stackp)
-	while (($stackp < $stack_top) && ($stackp > $stack_bot))
-		set var $addr = *(((unsigned long *) $stackp) + 1)
-		info symbol $addr
-		set $stackp = *((unsigned long *) $stackp)
-	end
-end
-document btthreadstack
-	 dump a thread stack using the given task structure pointer
-end
-
-
-define btt
-	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
-	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
-	set $init_t=&init_task
-	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
-	while ($next_t != $init_t)
-		set $next_t=(struct task_struct *)$next_t
-		btthreadstack $next_t
-
-		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
-		while ($next_th != $next_t)
-			set $next_th=(struct task_struct *)$next_th
-			btthreadstack $next_th
-			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
-		end
-		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
-	end
-end
-document btt
-	dump all thread stack traces on a kernel compiled with CONFIG_FRAME_POINTER
-end
-
-define btpid
-	set var $pid = $arg0
-	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
-	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
-	set $init_t=&init_task
-	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
-	set var $pid_task = 0
-
-	while ($next_t != $init_t)
-		set $next_t=(struct task_struct *)$next_t
-
-		if ($next_t.pid == $pid)
-			set $pid_task = $next_t
-		end
-
-		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
-		while ($next_th != $next_t)
-			set $next_th=(struct task_struct *)$next_th
-			if ($next_th.pid == $pid)
-				set $pid_task = $next_th
-			end
-			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
-		end
-		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
-	end
-
-	btthreadstack $pid_task
-end
-document btpid
-	backtrace of pid
-end
-
-
-define trapinfo
-	set var $pid = $arg0
-	set $tasks_off=((size_t)&((struct task_struct *)0)->tasks)
-	set $pid_off=((size_t)&((struct task_struct *)0)->thread_group.next)
-	set $init_t=&init_task
-	set $next_t=(((char *)($init_t->tasks).next) - $tasks_off)
-	set var $pid_task = 0
-
-	while ($next_t != $init_t)
-		set $next_t=(struct task_struct *)$next_t
-
-		if ($next_t.pid == $pid)
-			set $pid_task = $next_t
-		end
-
-		set $next_th=(((char *)$next_t->thread_group.next) - $pid_off)
-		while ($next_th != $next_t)
-			set $next_th=(struct task_struct *)$next_th
-			if ($next_th.pid == $pid)
-				set $pid_task = $next_th
-			end
-			set $next_th=(((char *)$next_th->thread_group.next) - $pid_off)
-		end
-		set $next_t=(char *)($next_t->tasks.next) - $tasks_off
-	end
-
-	printf "Trapno %ld, cr2 0x%lx, error_code %ld\n", $pid_task.thread.trap_no, \
-				$pid_task.thread.cr2, $pid_task.thread.error_code
-
-end
-document trapinfo
-	Run info threads and lookup pid of thread #1
-	'trapinfo <pid>' will tell you by which trap & possibly
-	address the kernel panicked.
-end
-
-define dump_log_idx
-	set $idx = $arg0
-	if ($argc > 1)
-		set $prev_flags = $arg1
-	else
-		set $prev_flags = 0
-	end
-	set $msg = ((struct printk_log *) (log_buf + $idx))
-	set $prefix = 1
-	set $newline = 1
-	set $log = log_buf + $idx + sizeof(*$msg)
-
-	# prev & LOG_CONT && !(msg->flags & LOG_PREIX)
-	if (($prev_flags & 8) && !($msg->flags & 4))
-		set $prefix = 0
-	end
-
-	# msg->flags & LOG_CONT
-	if ($msg->flags & 8)
-		# (prev & LOG_CONT && !(prev & LOG_NEWLINE))
-		if (($prev_flags & 8) && !($prev_flags & 2))
-			set $prefix = 0
-		end
-		# (!(msg->flags & LOG_NEWLINE))
-		if (!($msg->flags & 2))
-			set $newline = 0
-		end
-	end
-
-	if ($prefix)
-		printf "[%5lu.%06lu] ", $msg->ts_nsec / 1000000000, $msg->ts_nsec % 1000000000
-	end
-	if ($msg->text_len != 0)
-		eval "printf \"%%%d.%ds\", $log", $msg->text_len, $msg->text_len
-	end
-	if ($newline)
-		printf "\n"
-	end
-	if ($msg->dict_len > 0)
-		set $dict = $log + $msg->text_len
-		set $idx = 0
-		set $line = 1
-		while ($idx < $msg->dict_len)
-			if ($line)
-				printf " "
-				set $line = 0
-			end
-			set $c = $dict[$idx]
-			if ($c == '\0')
-				printf "\n"
-				set $line = 1
-			else
-				if ($c < ' ' || $c >= 127 || $c == '\\')
-					printf "\\x%02x", $c
-				else
-					printf "%c", $c
-				end
-			end
-			set $idx = $idx + 1
-		end
-		printf "\n"
-	end
-end
-document dump_log_idx
-	Dump a single log given its index in the log buffer.  The first
-	parameter is the index into log_buf, the second is optional and
-	specified the previous log buffer's flags, used for properly
-	formatting continued lines.
-end
-
-define dmesg
-	set $i = log_first_idx
-	set $end_idx = log_first_idx
-	set $prev_flags = 0
-
-	while (1)
-		set $msg = ((struct printk_log *) (log_buf + $i))
-		if ($msg->len == 0)
-			set $i = 0
-		else
-			dump_log_idx $i $prev_flags
-			set $i = $i + $msg->len
-			set $prev_flags = $msg->flags
-		end
-		if ($i == $end_idx)
-			loop_break
-		end
-	end
-end
-document dmesg
-	print the kernel ring buffer
-end
diff --git a/Documentation/kdump/index.rst b/Documentation/kdump/index.rst
deleted file mode 100644
index 2b17fcf6867a..000000000000
--- a/Documentation/kdump/index.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-:orphan:
-
-================================================================
-Documentation for Kdump - The kexec-based Crash Dumping Solution
-================================================================
-
-This document includes overview, setup and installation, and analysis
-information.
-
-.. toctree::
-    :maxdepth: 1
-
-    kdump
-    vmcoreinfo
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/kdump/kdump.rst b/Documentation/kdump/kdump.rst
deleted file mode 100644
index ac7e131d2935..000000000000
--- a/Documentation/kdump/kdump.rst
+++ /dev/null
@@ -1,534 +0,0 @@
-================================================================
-Documentation for Kdump - The kexec-based Crash Dumping Solution
-================================================================
-
-This document includes overview, setup and installation, and analysis
-information.
-
-Overview
-========
-
-Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
-dump of the system kernel's memory needs to be taken (for example, when
-the system panics). The system kernel's memory image is preserved across
-the reboot and is accessible to the dump-capture kernel.
-
-You can use common commands, such as cp and scp, to copy the
-memory image to a dump file on the local disk, or across the network to
-a remote system.
-
-Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x, arm and arm64 architectures.
-
-When the system kernel boots, it reserves a small section of memory for
-the dump-capture kernel. This ensures that ongoing Direct Memory Access
-(DMA) from the system kernel does not corrupt the dump-capture kernel.
-The kexec -p command loads the dump-capture kernel into this reserved
-memory.
-
-On x86 machines, the first 640 KB of physical memory is needed to boot,
-regardless of where the kernel loads. Therefore, kexec backs up this
-region just before rebooting into the dump-capture kernel.
-
-Similarly on PPC64 machines first 32KB of physical memory is needed for
-booting regardless of where the kernel is loaded and to support 64K page
-size kexec backs up the first 64KB memory.
-
-For s390x, when kdump is triggered, the crashkernel region is exchanged
-with the region [0, crashkernel region size] and then the kdump kernel
-runs in [0, crashkernel region size]. Therefore no relocatable kernel is
-needed for s390x.
-
-All of the necessary information about the system kernel's core image is
-encoded in the ELF format, and stored in a reserved area of memory
-before a crash. The physical address of the start of the ELF header is
-passed to the dump-capture kernel through the elfcorehdr= boot
-parameter. Optionally the size of the ELF header can also be passed
-when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax.
-
-
-With the dump-capture kernel, you can access the memory image through
-/proc/vmcore. This exports the dump as an ELF-format file that you can
-write out using file copy commands such as cp or scp. Further, you can
-use analysis tools such as the GNU Debugger (GDB) and the Crash tool to
-debug the dump file. This method ensures that the dump pages are correctly
-ordered.
-
-
-Setup and Installation
-======================
-
-Install kexec-tools
--------------------
-
-1) Login as the root user.
-
-2) Download the kexec-tools user-space package from the following URL:
-
-http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
-
-This is a symlink to the latest version.
-
-The latest kexec-tools git tree is available at:
-
-- git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
-- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
-
-There is also a gitweb interface available at
-http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
-
-More information about kexec-tools can be found at
-http://horms.net/projects/kexec/
-
-3) Unpack the tarball with the tar command, as follows::
-
-	tar xvpzf kexec-tools.tar.gz
-
-4) Change to the kexec-tools directory, as follows::
-
-	cd kexec-tools-VERSION
-
-5) Configure the package, as follows::
-
-	./configure
-
-6) Compile the package, as follows::
-
-	make
-
-7) Install the package, as follows::
-
-	make install
-
-
-Build the system and dump-capture kernels
------------------------------------------
-There are two possible methods of using Kdump.
-
-1) Build a separate custom dump-capture kernel for capturing the
-   kernel core dump.
-
-2) Or use the system kernel binary itself as dump-capture kernel and there is
-   no need to build a separate dump-capture kernel. This is possible
-   only with the architectures which support a relocatable kernel. As
-   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
-   relocatable kernel.
-
-Building a relocatable kernel is advantageous from the point of view that
-one does not have to build a second kernel for capturing the dump. But
-at the same time one might want to build a custom dump capture kernel
-suitable to his needs.
-
-Following are the configuration setting required for system and
-dump-capture kernels for enabling kdump support.
-
-System kernel config options
-----------------------------
-
-1) Enable "kexec system call" in "Processor type and features."::
-
-	CONFIG_KEXEC=y
-
-2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
-   filesystems." This is usually enabled by default::
-
-	CONFIG_SYSFS=y
-
-   Note that "sysfs file system support" might not appear in the "Pseudo
-   filesystems" menu if "Configure standard kernel features (for small
-   systems)" is not enabled in "General Setup." In this case, check the
-   .config file itself to ensure that sysfs is turned on, as follows::
-
-	grep 'CONFIG_SYSFS' .config
-
-3) Enable "Compile the kernel with debug info" in "Kernel hacking."::
-
-	CONFIG_DEBUG_INFO=Y
-
-   This causes the kernel to be built with debug symbols. The dump
-   analysis tools require a vmlinux with debug symbols in order to read
-   and analyze a dump file.
-
-Dump-capture kernel config options (Arch Independent)
------------------------------------------------------
-
-1) Enable "kernel crash dumps" support under "Processor type and
-   features"::
-
-	CONFIG_CRASH_DUMP=y
-
-2) Enable "/proc/vmcore support" under "Filesystems" -> "Pseudo filesystems"::
-
-	CONFIG_PROC_VMCORE=y
-
-   (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
-
-Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
---------------------------------------------------------------------
-
-1) On i386, enable high memory support under "Processor type and
-   features"::
-
-	CONFIG_HIGHMEM64G=y
-
-   or::
-
-	CONFIG_HIGHMEM4G
-
-2) On i386 and x86_64, disable symmetric multi-processing support
-   under "Processor type and features"::
-
-	CONFIG_SMP=n
-
-   (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
-   when loading the dump-capture kernel, see section "Load the Dump-capture
-   Kernel".)
-
-3) If one wants to build and use a relocatable kernel,
-   Enable "Build a relocatable kernel" support under "Processor type and
-   features"::
-
-	CONFIG_RELOCATABLE=y
-
-4) Use a suitable value for "Physical address where the kernel is
-   loaded" (under "Processor type and features"). This only appears when
-   "kernel crash dumps" is enabled. A suitable value depends upon
-   whether kernel is relocatable or not.
-
-   If you are using a relocatable kernel use CONFIG_PHYSICAL_START=0x100000
-   This will compile the kernel for physical address 1MB, but given the fact
-   kernel is relocatable, it can be run from any physical address hence
-   kexec boot loader will load it in memory region reserved for dump-capture
-   kernel.
-
-   Otherwise it should be the start of memory region reserved for
-   second kernel using boot parameter "crashkernel=Y@X". Here X is
-   start of memory region reserved for dump-capture kernel.
-   Generally X is 16MB (0x1000000). So you can set
-   CONFIG_PHYSICAL_START=0x1000000
-
-5) Make and install the kernel and its modules. DO NOT add this kernel
-   to the boot loader configuration files.
-
-Dump-capture kernel config options (Arch Dependent, ppc64)
-----------------------------------------------------------
-
-1) Enable "Build a kdump crash kernel" support under "Kernel" options::
-
-	CONFIG_CRASH_DUMP=y
-
-2)   Enable "Build a relocatable kernel" support::
-
-	CONFIG_RELOCATABLE=y
-
-   Make and install the kernel and its modules.
-
-Dump-capture kernel config options (Arch Dependent, ia64)
-----------------------------------------------------------
-
-- No specific options are required to create a dump-capture kernel
-  for ia64, other than those specified in the arch independent section
-  above. This means that it is possible to use the system kernel
-  as a dump-capture kernel if desired.
-
-  The crashkernel region can be automatically placed by the system
-  kernel at run time. This is done by specifying the base address as 0,
-  or omitting it all together::
-
-	crashkernel=256M@0
-
-  or::
-
-	crashkernel=256M
-
-  If the start address is specified, note that the start address of the
-  kernel will be aligned to 64Mb, so if the start address is not then
-  any space below the alignment point will be wasted.
-
-Dump-capture kernel config options (Arch Dependent, arm)
-----------------------------------------------------------
-
--   To use a relocatable kernel,
-    Enable "AUTO_ZRELADDR" support under "Boot" options::
-
-	AUTO_ZRELADDR=y
-
-Dump-capture kernel config options (Arch Dependent, arm64)
-----------------------------------------------------------
-
-- Please note that kvm of the dump-capture kernel will not be enabled
-  on non-VHE systems even if it is configured. This is because the CPU
-  will not be reset to EL2 on panic.
-
-Extended crashkernel syntax
-===========================
-
-While the "crashkernel=size[@offset]" syntax is sufficient for most
-configurations, sometimes it's handy to have the reserved memory dependent
-on the value of System RAM -- that's mostly for distributors that pre-setup
-the kernel command line to avoid a unbootable system after some memory has
-been removed from the machine.
-
-The syntax is::
-
-    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
-    range=start-[end]
-
-For example::
-
-    crashkernel=512M-2G:64M,2G-:128M
-
-This would mean:
-
-    1) if the RAM is smaller than 512M, then don't reserve anything
-       (this is the "rescue" case)
-    2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
-    3) if the RAM size is larger than 2G, then reserve 128M
-
-
-
-Boot into System Kernel
-=======================
-
-1) Update the boot loader (such as grub, yaboot, or lilo) configuration
-   files as necessary.
-
-2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
-   where Y specifies how much memory to reserve for the dump-capture kernel
-   and X specifies the beginning of this reserved memory. For example,
-   "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
-   starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
-
-   On x86 and x86_64, use "crashkernel=64M@16M".
-
-   On ppc64, use "crashkernel=128M@32M".
-
-   On ia64, 256M@256M is a generous value that typically works.
-   The region may be automatically placed on ia64, see the
-   dump-capture kernel config option notes above.
-   If use sparse memory, the size should be rounded to GRANULE boundaries.
-
-   On s390x, typically use "crashkernel=xxM". The value of xx is dependent
-   on the memory consumption of the kdump system. In general this is not
-   dependent on the memory size of the production system.
-
-   On arm, the use of "crashkernel=Y@X" is no longer necessary; the
-   kernel will automatically locate the crash kernel image within the
-   first 512MB of RAM if X is not given.
-
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
-   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
-
-Load the Dump-capture Kernel
-============================
-
-After booting to the system kernel, dump-capture kernel needs to be
-loaded.
-
-Based on the architecture and type of image (relocatable or not), one
-can choose to load the uncompressed vmlinux or compressed bzImage/vmlinuz
-of dump-capture kernel. Following is the summary.
-
-For i386 and x86_64:
-
-	- Use vmlinux if kernel is not relocatable.
-	- Use bzImage/vmlinuz if kernel is relocatable.
-
-For ppc64:
-
-	- Use vmlinux
-
-For ia64:
-
-	- Use vmlinux or vmlinuz.gz
-
-For s390x:
-
-	- Use image or bzImage
-
-For arm:
-
-	- Use zImage
-
-For arm64:
-
-	- Use vmlinux or Image
-
-If you are using an uncompressed vmlinux image then use following command
-to load dump-capture kernel::
-
-   kexec -p <dump-capture-kernel-vmlinux-image> \
-   --initrd=<initrd-for-dump-capture-kernel> --args-linux \
-   --append="root=<root-dev> <arch-specific-options>"
-
-If you are using a compressed bzImage/vmlinuz, then use following command
-to load dump-capture kernel::
-
-   kexec -p <dump-capture-kernel-bzImage> \
-   --initrd=<initrd-for-dump-capture-kernel> \
-   --append="root=<root-dev> <arch-specific-options>"
-
-If you are using a compressed zImage, then use following command
-to load dump-capture kernel::
-
-   kexec --type zImage -p <dump-capture-kernel-bzImage> \
-   --initrd=<initrd-for-dump-capture-kernel> \
-   --dtb=<dtb-for-dump-capture-kernel> \
-   --append="root=<root-dev> <arch-specific-options>"
-
-If you are using an uncompressed Image, then use following command
-to load dump-capture kernel::
-
-   kexec -p <dump-capture-kernel-Image> \
-   --initrd=<initrd-for-dump-capture-kernel> \
-   --append="root=<root-dev> <arch-specific-options>"
-
-Please note, that --args-linux does not need to be specified for ia64.
-It is planned to make this a no-op on that architecture, but for now
-it should be omitted
-
-Following are the arch specific command line options to be used while
-loading dump-capture kernel.
-
-For i386, x86_64 and ia64:
-
-	"1 irqpoll maxcpus=1 reset_devices"
-
-For ppc64:
-
-	"1 maxcpus=1 noirqdistrib reset_devices"
-
-For s390x:
-
-	"1 maxcpus=1 cgroup_disable=memory"
-
-For arm:
-
-	"1 maxcpus=1 reset_devices"
-
-For arm64:
-
-	"1 maxcpus=1 reset_devices"
-
-Notes on loading the dump-capture kernel:
-
-* By default, the ELF headers are stored in ELF64 format to support
-  systems with more than 4GB memory. On i386, kexec automatically checks if
-  the physical RAM size exceeds the 4 GB limit and if not, uses ELF32.
-  So, on non-PAE systems, ELF32 is always used.
-
-  The --elf32-core-headers option can be used to force the generation of ELF32
-  headers. This is necessary because GDB currently cannot open vmcore files
-  with ELF64 headers on 32-bit systems.
-
-* The "irqpoll" boot parameter reduces driver initialization failures
-  due to shared interrupts in the dump-capture kernel.
-
-* You must specify <root-dev> in the format corresponding to the root
-  device name in the output of mount command.
-
-* Boot parameter "1" boots the dump-capture kernel into single-user
-  mode without networking. If you want networking, use "3".
-
-* We generally don't have to bring up a SMP kernel just to capture the
-  dump. Hence generally it is useful either to build a UP dump-capture
-  kernel or specify maxcpus=1 option while loading dump-capture kernel.
-  Note, though maxcpus always works, you had better replace it with
-  nr_cpus to save memory if supported by the current ARCH, such as x86.
-
-* You should enable multi-cpu support in dump-capture kernel if you intend
-  to use multi-thread programs with it, such as parallel dump feature of
-  makedumpfile. Otherwise, the multi-thread program may have a great
-  performance degradation. To enable multi-cpu support, you should bring up an
-  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
-  options while loading it.
-
-* For s390x there are two kdump modes: If a ELF header is specified with
-  the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
-  is done on all other architectures. If no elfcorehdr= kernel parameter is
-  specified, the s390x kdump kernel dynamically creates the header. The
-  second mode has the advantage that for CPU and memory hotplug, kdump has
-  not to be reloaded with kexec_load().
-
-* For s390x systems with many attached devices the "cio_ignore" kernel
-  parameter should be used for the kdump kernel in order to prevent allocation
-  of kernel memory for devices that are not relevant for kdump. The same
-  applies to systems that use SCSI/FCP devices. In that case the
-  "allow_lun_scan" zfcp module parameter should be set to zero before
-  setting FCP devices online.
-
-Kernel Panic
-============
-
-After successfully loading the dump-capture kernel as previously
-described, the system will reboot into the dump-capture kernel if a
-system crash is triggered.  Trigger points are located in panic(),
-die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
-
-The following conditions will execute a crash trigger point:
-
-If a hard lockup is detected and "NMI watchdog" is configured, the system
-will boot into the dump-capture kernel ( die_nmi() ).
-
-If die() is called, and it happens to be a thread with pid 0 or 1, or die()
-is called inside interrupt context or die() is called and panic_on_oops is set,
-the system will boot into the dump-capture kernel.
-
-On powerpc systems when a soft-reset is generated, die() is called by all cpus
-and the system will boot into the dump-capture kernel.
-
-For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
-"echo c > /proc/sysrq-trigger" or write a module to force the panic.
-
-Write Out the Dump File
-=======================
-
-After the dump-capture kernel is booted, write out the dump file with
-the following command::
-
-   cp /proc/vmcore <dump-file>
-
-
-Analysis
-========
-
-Before analyzing the dump image, you should reboot into a stable kernel.
-
-You can do limited analysis using GDB on the dump file copied out of
-/proc/vmcore. Use the debug vmlinux built with -g and run the following
-command::
-
-   gdb vmlinux <dump-file>
-
-Stack trace for the task on processor 0, register display, and memory
-display work fine.
-
-Note: GDB cannot analyze core files generated in ELF64 format for x86.
-On systems with a maximum of 4GB of memory, you can generate
-ELF32-format headers using the --elf32-core-headers kernel option on the
-dump kernel.
-
-You can also use the Crash utility to analyze dump files in Kdump
-format. Crash is available on Dave Anderson's site at the following URL:
-
-   http://people.redhat.com/~anderson/
-
-Trigger Kdump on WARN()
-=======================
-
-The kernel parameter, panic_on_warn, calls panic() in all WARN() paths.  This
-will cause a kdump to occur at the panic() call.  In cases where a user wants
-to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
-to achieve the same behaviour.
-
-Contact
-=======
-
-- Vivek Goyal (vgoyal@redhat.com)
-- Maneesh Soni (maneesh@in.ibm.com)
-
-GDB macros
-==========
-
-.. include:: gdbmacros.txt
-   :literal:
diff --git a/Documentation/kdump/vmcoreinfo.rst b/Documentation/kdump/vmcoreinfo.rst
deleted file mode 100644
index 007a6b86e0ee..000000000000
--- a/Documentation/kdump/vmcoreinfo.rst
+++ /dev/null
@@ -1,488 +0,0 @@
-==========
-VMCOREINFO
-==========
-
-What is it?
-===========
-
-VMCOREINFO is a special ELF note section. It contains various
-information from the kernel like structure size, page size, symbol
-values, field offsets, etc. These data are packed into an ELF note
-section and used by user-space tools like crash and makedumpfile to
-analyze a kernel's memory layout.
-
-Common variables
-================
-
-init_uts_ns.name.release
-------------------------
-
-The version of the Linux kernel. Used to find the corresponding source
-code from which the kernel has been built. For example, crash uses it to
-find the corresponding vmlinux in order to process vmcore.
-
-PAGE_SIZE
----------
-
-The size of a page. It is the smallest unit of data used by the memory
-management facilities. It is usually 4096 bytes of size and a page is
-aligned on 4096 bytes. Used for computing page addresses.
-
-init_uts_ns
------------
-
-The UTS namespace which is used to isolate two specific elements of the
-system that relate to the uname(2) system call. It is named after the
-data structure used to store information returned by the uname(2) system
-call.
-
-User-space tools can get the kernel name, host name, kernel release
-number, kernel version, architecture name and OS type from it.
-
-node_online_map
----------------
-
-An array node_states[N_ONLINE] which represents the set of online nodes
-in a system, one bit position per node number. Used to keep track of
-which nodes are in the system and online.
-
-swapper_pg_dir
---------------
-
-The global page directory pointer of the kernel. Used to translate
-virtual to physical addresses.
-
-_stext
-------
-
-Defines the beginning of the text section. In general, _stext indicates
-the kernel start address. Used to convert a virtual address from the
-direct kernel map to a physical address.
-
-vmap_area_list
---------------
-
-Stores the virtual area list. makedumpfile gets the vmalloc start value
-from this variable and its value is necessary for vmalloc translation.
-
-mem_map
--------
-
-Physical addresses are translated to struct pages by treating them as
-an index into the mem_map array. Right-shifting a physical address
-PAGE_SHIFT bits converts it into a page frame number which is an index
-into that mem_map array.
-
-Used to map an address to the corresponding struct page.
-
-contig_page_data
-----------------
-
-Makedumpfile gets the pglist_data structure from this symbol, which is
-used to describe the memory layout.
-
-User-space tools use this to exclude free pages when dumping memory.
-
-mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
---------------------------------------------------------------------------
-
-The address of the mem_section array, its length, structure size, and
-the section_mem_map offset.
-
-It exists in the sparse memory mapping model, and it is also somewhat
-similar to the mem_map variable, both of them are used to translate an
-address.
-
-page
-----
-
-The size of a page structure. struct page is an important data structure
-and it is widely used to compute contiguous memory.
-
-pglist_data
------------
-
-The size of a pglist_data structure. This value is used to check if the
-pglist_data structure is valid. It is also used for checking the memory
-type.
-
-zone
-----
-
-The size of a zone structure. This value is used to check if the zone
-structure has been found. It is also used for excluding free pages.
-
-free_area
----------
-
-The size of a free_area structure. It indicates whether the free_area
-structure is valid or not. Useful when excluding free pages.
-
-list_head
----------
-
-The size of a list_head structure. Used when iterating lists in a
-post-mortem analysis session.
-
-nodemask_t
-----------
-
-The size of a nodemask_t type. Used to compute the number of online
-nodes.
-
-(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
--------------------------------------------------------------------------------------------------
-
-User-space tools compute their values based on the offset of these
-variables. The variables are used when excluding unnecessary pages.
-
-(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_spanned_pages|node_id)
------------------------------------------------------------------------------------------
-
-On NUMA machines, each NUMA node has a pg_data_t to describe its memory
-layout. On UMA machines there is a single pglist_data which describes the
-whole memory.
-
-These values are used to check the memory type and to compute the
-virtual address for memory map.
-
-(zone, free_area|vm_stat|spanned_pages)
----------------------------------------
-
-Each node is divided into a number of blocks called zones which
-represent ranges within memory. A zone is described by a structure zone.
-
-User-space tools compute required values based on the offset of these
-variables.
-
-(free_area, free_list)
-----------------------
-
-Offset of the free_list's member. This value is used to compute the number
-of free pages.
-
-Each zone has a free_area structure array called free_area[MAX_ORDER].
-The free_list represents a linked list of free page blocks.
-
-(list_head, next|prev)
-----------------------
-
-Offsets of the list_head's members. list_head is used to define a
-circular linked list. User-space tools need these in order to traverse
-lists.
-
-(vmap_area, va_start|list)
---------------------------
-
-Offsets of the vmap_area's members. They carry vmalloc-specific
-information. Makedumpfile gets the start address of the vmalloc region
-from this.
-
-(zone.free_area, MAX_ORDER)
----------------------------
-
-Free areas descriptor. User-space tools use this value to iterate the
-free_area ranges. MAX_ORDER is used by the zone buddy allocator.
-
-log_first_idx
--------------
-
-Index of the first record stored in the buffer log_buf. Used by
-user-space tools to read the strings in the log_buf.
-
-log_buf
--------
-
-Console output is written to the ring buffer log_buf at index
-log_first_idx. Used to get the kernel log.
-
-log_buf_len
------------
-
-log_buf's length.
-
-clear_idx
----------
-
-The index that the next printk() record to read after the last clear
-command. It indicates the first record after the last SYSLOG_ACTION
-_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
-the dmesg log.
-
-log_next_idx
-------------
-
-The index of the next record to store in the buffer log_buf. Used to
-compute the index of the current buffer position.
-
-printk_log
-----------
-
-The size of a structure printk_log. Used to compute the size of
-messages, and extract dmesg log. It encapsulates header information for
-log_buf, such as timestamp, syslog level, etc.
-
-(printk_log, ts_nsec|len|text_len|dict_len)
--------------------------------------------
-
-It represents field offsets in struct printk_log. User space tools
-parse it and check whether the values of printk_log's members have been
-changed.
-
-(free_area.free_list, MIGRATE_TYPES)
-------------------------------------
-
-The number of migrate types for pages. The free_list is described by the
-array. Used by tools to compute the number of free pages.
-
-NR_FREE_PAGES
--------------
-
-On linux-2.6.21 or later, the number of free pages is in
-vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
-
-PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
-------------------------------------------------------------------------------
-
-Page attributes. These flags are used to filter various unnecessary for
-dumping pages.
-
-PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
------------------------------------------------------------------------------
-
-More page attributes. These flags are used to filter various unnecessary for
-dumping pages.
-
-
-HUGETLB_PAGE_DTOR
------------------
-
-The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
-excludes these pages.
-
-x86_64
-======
-
-phys_base
----------
-
-Used to convert the virtual address of an exported kernel symbol to its
-corresponding physical address.
-
-init_top_pgt
-------------
-
-Used to walk through the whole page table and convert virtual addresses
-to physical addresses. The init_top_pgt is somewhat similar to
-swapper_pg_dir, but it is only used in x86_64.
-
-pgtable_l5_enabled
-------------------
-
-User-space tools need to know whether the crash kernel was in 5-level
-paging mode.
-
-node_data
----------
-
-This is a struct pglist_data array and stores all NUMA nodes
-information. Makedumpfile gets the pglist_data structure from it.
-
-(node_data, MAX_NUMNODES)
--------------------------
-
-The maximum number of nodes in system.
-
-KERNELOFFSET
-------------
-
-The kernel randomization offset. Used to compute the page offset. If
-KASLR is disabled, this value is zero.
-
-KERNEL_IMAGE_SIZE
------------------
-
-Currently unused by Makedumpfile. Used to compute the module virtual
-address by Crash.
-
-sme_mask
---------
-
-AMD-specific with SME support: it indicates the secure memory encryption
-mask. Makedumpfile tools need to know whether the crash kernel was
-encrypted. If SME is enabled in the first kernel, the crash kernel's
-page table entries (pgd/pud/pmd/pte) contain the memory encryption
-mask. This is used to remove the SME mask and obtain the true physical
-address.
-
-Currently, sme_mask stores the value of the C-bit position. If needed,
-additional SME-relevant info can be placed in that variable.
-
-For example::
-
-  [ misc	        ][ enc bit  ][ other misc SME info       ]
-  0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
-  63   59   55   51   47   43   39   35   31   27   ... 3
-
-x86_32
-======
-
-X86_PAE
--------
-
-Denotes whether physical address extensions are enabled. It has the cost
-of a higher page table lookup overhead, and also consumes more page
-table space per process. Used to check whether PAE was enabled in the
-crash kernel when converting virtual addresses to physical addresses.
-
-ia64
-====
-
-pgdat_list|(pgdat_list, MAX_NUMNODES)
--------------------------------------
-
-pg_data_t array storing all NUMA nodes information. MAX_NUMNODES
-indicates the number of the nodes.
-
-node_memblk|(node_memblk, NR_NODE_MEMBLKS)
-------------------------------------------
-
-List of node memory chunks. Filled when parsing the SRAT table to obtain
-information about memory nodes. NR_NODE_MEMBLKS indicates the number of
-node memory chunks.
-
-These values are used to compute the number of nodes the crashed kernel used.
-
-node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
-----------------------------------------------------------------
-
-The size of a struct node_memblk_s and the offsets of the
-node_memblk_s's members. Used to compute the number of nodes.
-
-PGTABLE_3|PGTABLE_4
--------------------
-
-User-space tools need to know whether the crash kernel was in 3-level or
-4-level paging mode. Used to distinguish the page table.
-
-ARM64
-=====
-
-VA_BITS
--------
-
-The maximum number of bits for virtual addresses. Used to compute the
-virtual memory ranges.
-
-kimage_voffset
---------------
-
-The offset between the kernel virtual and physical mappings. Used to
-translate virtual to physical addresses.
-
-PHYS_OFFSET
------------
-
-Indicates the physical address of the start of memory. Similar to
-kimage_voffset, which is used to translate virtual to physical
-addresses.
-
-KERNELOFFSET
-------------
-
-The kernel randomization offset. Used to compute the page offset. If
-KASLR is disabled, this value is zero.
-
-arm
-===
-
-ARM_LPAE
---------
-
-It indicates whether the crash kernel supports large physical address
-extensions. Used to translate virtual to physical addresses.
-
-s390
-====
-
-lowcore_ptr
------------
-
-An array with a pointer to the lowcore of every CPU. Used to print the
-psw and all registers information.
-
-high_memory
------------
-
-Used to get the vmalloc_start address from the high_memory symbol.
-
-(lowcore_ptr, NR_CPUS)
-----------------------
-
-The maximum number of CPUs.
-
-powerpc
-=======
-
-
-node_data|(node_data, MAX_NUMNODES)
------------------------------------
-
-See above.
-
-contig_page_data
-----------------
-
-See above.
-
-vmemmap_list
-------------
-
-The vmemmap_list maintains the entire vmemmap physical mapping. Used
-to get vmemmap list count and populated vmemmap regions info. If the
-vmemmap address translation information is stored in the crash kernel,
-it is used to translate vmemmap kernel virtual addresses.
-
-mmu_vmemmap_psize
------------------
-
-The size of a page. Used to translate virtual to physical addresses.
-
-mmu_psize_defs
---------------
-
-Page size definitions, i.e. 4k, 64k, or 16M.
-
-Used to make vtop translations.
-
-vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|(vmemmap_backing, virt_addr)
---------------------------------------------------------------------------------------------
-
-The vmemmap virtual address space management does not have a traditional
-page table to track which virtual struct pages are backed by a physical
-mapping. The virtual to physical mappings are tracked in a simple linked
-list format.
-
-User-space tools need to know the offset of list, phys and virt_addr
-when computing the count of vmemmap regions.
-
-mmu_psize_def|(mmu_psize_def, shift)
-------------------------------------
-
-The size of a struct mmu_psize_def and the offset of mmu_psize_def's
-member.
-
-Used in vtop translations.
-
-sh
-==
-
-node_data|(node_data, MAX_NUMNODES)
------------------------------------
-
-See above.
-
-X2TLB
------
-
-Indicates whether the crashed kernel enabled SH extended mode.
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
index 0c41d6d463f3..10e7f4d16c14 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -59,7 +59,7 @@ as follows:
          the default calculated size. Use this option if default
          boot memory size is not sufficient for second kernel to
          boot successfully. For syntax of crashkernel= parameter,
-         refer to Documentation/kdump/kdump.rst. If any offset is
+         refer to Documentation/admin-guide/kdump/kdump.rst. If any offset is
          provided in crashkernel= parameter, it will be ignored
          as fadump uses a predefined offset to reserve memory
          for boot memory dump preservation in case of a crash.
diff --git a/Documentation/translations/zh_CN/oops-tracing.txt b/Documentation/translations/zh_CN/oops-tracing.txt
index 368ddd05b304..c5f3bda7abcb 100644
--- a/Documentation/translations/zh_CN/oops-tracing.txt
+++ b/Documentation/translations/zh_CN/oops-tracing.txt
@@ -53,8 +53,8 @@ cat /proc/kmsg > file， 然而你必须介入中止传输， kmsg是一个“
 （2）用串口终端启动（请参看Documentation/admin-guide/serial-console.rst），运行一个null
 modem到另一台机器并用你喜欢的通讯工具获取输出。Minicom工作地很好。
 
-（3）使用Kdump（请参看Documentation/kdump/kdump.rst），
-使用在Documentation/kdump/gdbmacros.txt中定义的dmesg gdb宏，从旧的内存中提取内核
+（3）使用Kdump（请参看Documentation/admin-guide/kdump/kdump.rst），
+使用在Documentation/admin-guide/kdump/gdbmacros.txt中定义的dmesg gdb宏，从旧的内存中提取内核
 环形缓冲区。
 
 完整信息
diff --git a/MAINTAINERS b/MAINTAINERS
index 288f84dbd480..b36028f43192 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8675,7 +8675,7 @@ R:	Vivek Goyal <vgoyal@redhat.com>
 L:	kexec@lists.infradead.org
 W:	http://lse.sourceforge.net/kdump/
 S:	Maintained
-F:	Documentation/kdump/
+F:	Documentation/admin-guide/kdump/
 
 KEENE FM RADIO TRANSMITTER DRIVER
 M:	Hans Verkuil <hverkuil@xs4all.nl>
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 6425871e9903..20afd6077465 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2036,7 +2036,7 @@ config CRASH_DUMP
 	  kdump/kexec. The crash dump kernel must be compiled to a
 	  memory address not used by the main kernel
 
-	  For more details see Documentation/kdump/kdump.rst
+	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
 config AUTO_ZRELADDR
 	bool "Auto calculation of the decompressed kernel image address"
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4b22bbf0590..86f81b5afd95 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -996,7 +996,7 @@ config CRASH_DUMP
 	  reserved region and then later executed after a crash by
 	  kdump/kexec.
 
-	  For more details see Documentation/kdump/kdump.rst
+	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
 config XEN_DOM0
 	def_bool y
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 31a7d12db705..c2858ac6a46a 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -626,7 +626,7 @@ config CRASH_DUMP
 	  to a memory address not used by the main kernel using
 	  PHYSICAL_START.
 
-	  For more details see Documentation/kdump/kdump.rst
+	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
 config KEXEC_JUMP
 	bool "kexec jump (EXPERIMENTAL)"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d0bbca65e4a4..9505066b7ba3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2057,7 +2057,7 @@ config CRASH_DUMP
 	  to a memory address not used by the main kernel or BIOS using
 	  PHYSICAL_START, or it must be built as a relocatable image
 	  (CONFIG_RELOCATABLE=y).
-	  For more details see Documentation/kdump/kdump.rst
+	  For more details see Documentation/admin-guide/kdump/kdump.rst
 
 config KEXEC_JUMP
 	bool "kexec jump"
@@ -2094,7 +2094,7 @@ config PHYSICAL_START
 	  the reserved region.  In other words, it can be set based on
 	  the "X" value as specified in the "crashkernel=YM@XM"
 	  command line boot parameter passed to the panic-ed
-	  kernel. Please take a look at Documentation/kdump/kdump.rst
+	  kernel. Please take a look at Documentation/admin-guide/kdump/kdump.rst
 	  for more details about crash dumps.
 
 	  Usage of bzImage for capturing the crash dump is recommended as
-- 
cgit v1.2.3-55-g7522


From e7751617dd0599ceadf4221cb08e04307b00aa1f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 11:47:10 -0300
Subject: docs: blockdev: add it to the admin-guide

The blockdev book basically contains user-faced documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 .../blockdev/drbd/DRBD-8.3-data-packets.svg        | 588 +++++++++++++++++++++
 .../blockdev/drbd/DRBD-data-packets.svg            | 459 ++++++++++++++++
 .../admin-guide/blockdev/drbd/conn-states-8.dot    |  18 +
 .../blockdev/drbd/data-structure-v9.rst            |  42 ++
 .../admin-guide/blockdev/drbd/disk-states-8.dot    |  16 +
 .../drbd/drbd-connection-state-overview.dot        |  85 +++
 .../admin-guide/blockdev/drbd/figures.rst          |  28 +
 Documentation/admin-guide/blockdev/drbd/index.rst  |  19 +
 .../admin-guide/blockdev/drbd/node-states-8.dot    |  13 +
 Documentation/admin-guide/blockdev/floppy.rst      | 255 +++++++++
 Documentation/admin-guide/blockdev/index.rst       |  14 +
 Documentation/admin-guide/blockdev/nbd.rst         |  31 ++
 Documentation/admin-guide/blockdev/paride.rst      | 439 +++++++++++++++
 Documentation/admin-guide/blockdev/ramdisk.rst     | 177 +++++++
 Documentation/admin-guide/blockdev/zram.rst        | 422 +++++++++++++++
 Documentation/admin-guide/index.rst                |   1 +
 Documentation/admin-guide/kernel-parameters.txt    |  18 +-
 .../blockdev/drbd/DRBD-8.3-data-packets.svg        | 588 ---------------------
 Documentation/blockdev/drbd/DRBD-data-packets.svg  | 459 ----------------
 Documentation/blockdev/drbd/conn-states-8.dot      |  18 -
 Documentation/blockdev/drbd/data-structure-v9.rst  |  42 --
 Documentation/blockdev/drbd/disk-states-8.dot      |  16 -
 .../drbd/drbd-connection-state-overview.dot        |  85 ---
 Documentation/blockdev/drbd/figures.rst            |  28 -
 Documentation/blockdev/drbd/index.rst              |  19 -
 Documentation/blockdev/drbd/node-states-8.dot      |  14 -
 Documentation/blockdev/floppy.rst                  | 255 ---------
 Documentation/blockdev/index.rst                   |  16 -
 Documentation/blockdev/nbd.rst                     |  31 --
 Documentation/blockdev/paride.rst                  | 439 ---------------
 Documentation/blockdev/ramdisk.rst                 | 177 -------
 Documentation/blockdev/zram.rst                    | 422 ---------------
 MAINTAINERS                                        |  10 +-
 drivers/block/Kconfig                              |   8 +-
 drivers/block/floppy.c                             |   2 +-
 drivers/block/zram/Kconfig                         |   6 +-
 tools/testing/selftests/zram/README                |   2 +-
 37 files changed, 2630 insertions(+), 2632 deletions(-)
 create mode 100644 Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
 create mode 100644 Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
 create mode 100644 Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
 create mode 100644 Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
 create mode 100644 Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
 create mode 100644 Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
 create mode 100644 Documentation/admin-guide/blockdev/drbd/figures.rst
 create mode 100644 Documentation/admin-guide/blockdev/drbd/index.rst
 create mode 100644 Documentation/admin-guide/blockdev/drbd/node-states-8.dot
 create mode 100644 Documentation/admin-guide/blockdev/floppy.rst
 create mode 100644 Documentation/admin-guide/blockdev/index.rst
 create mode 100644 Documentation/admin-guide/blockdev/nbd.rst
 create mode 100644 Documentation/admin-guide/blockdev/paride.rst
 create mode 100644 Documentation/admin-guide/blockdev/ramdisk.rst
 create mode 100644 Documentation/admin-guide/blockdev/zram.rst
 delete mode 100644 Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg
 delete mode 100644 Documentation/blockdev/drbd/DRBD-data-packets.svg
 delete mode 100644 Documentation/blockdev/drbd/conn-states-8.dot
 delete mode 100644 Documentation/blockdev/drbd/data-structure-v9.rst
 delete mode 100644 Documentation/blockdev/drbd/disk-states-8.dot
 delete mode 100644 Documentation/blockdev/drbd/drbd-connection-state-overview.dot
 delete mode 100644 Documentation/blockdev/drbd/figures.rst
 delete mode 100644 Documentation/blockdev/drbd/index.rst
 delete mode 100644 Documentation/blockdev/drbd/node-states-8.dot
 delete mode 100644 Documentation/blockdev/floppy.rst
 delete mode 100644 Documentation/blockdev/index.rst
 delete mode 100644 Documentation/blockdev/nbd.rst
 delete mode 100644 Documentation/blockdev/paride.rst
 delete mode 100644 Documentation/blockdev/ramdisk.rst
 delete mode 100644 Documentation/blockdev/zram.rst

diff --git a/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg b/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
new file mode 100644
index 000000000000..f87cfa0dc2fb
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
@@ -0,0 +1,588 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+<svg
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   version="1.0"
+   width="210mm"
+   height="297mm"
+   viewBox="0 0 21000 29700"
+   id="svg2"
+   style="fill-rule:evenodd">
+  <defs
+     id="defs4" />
+  <g
+     id="Default"
+     style="visibility:visible">
+    <desc
+       id="desc180">Master slide</desc>
+  </g>
+  <path
+     d="M 11999,8601 L 11899,8301 L 12099,8301 L 11999,8601 z"
+     id="path193"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 11999,7801 L 11999,8361"
+     id="path197"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 7999,10401 L 7899,10101 L 8099,10101 L 7999,10401 z"
+     id="path209"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 7999,9601 L 7999,10161"
+     id="path213"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 11999,7801 L 11685,7840 L 11724,7644 L 11999,7801 z"
+     id="path225"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 7999,7001 L 11764,7754"
+     id="path229"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <g
+     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,-1244.4792,1416.5139)"
+     id="g245"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text247">
+      <tspan
+         x="9139 9368 9579 9808 9986 10075 10252 10481 10659 10837 10909"
+         y="9284"
+         id="tspan249">RSDataReply</tspan>
+    </text>
+  </g>
+  <path
+     d="M 7999,9601 L 8281,9458 L 8311,9655 L 7999,9601 z"
+     id="path259"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 11999,9001 L 8236,9565"
+     id="path263"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <g
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,1620.9382,-1639.4947)"
+     id="g279"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text281">
+      <tspan
+         x="8743 8972 9132 9310 9573 9801 10013 10242 10419 10597 10775 10953 11114"
+         y="7023"
+         id="tspan283">CsumRSRequest</tspan>
+    </text>
+  </g>
+  <text
+     id="text297"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4034 4263 4440 4703 4881 5042 5219 5397 5503 5681 5842 6003 6180 6341 6519 6625 6803 6980 7158 7336 7497 7586 7692"
+       y="5707"
+       id="tspan299">w_make_resync_request()</tspan>
+  </text>
+  <text
+     id="text313"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12305 12483 12644 12821 12893 13054 13232 13410 13638 13816 13905 14083 14311 14489 14667 14845 15023 15184 15272 15378"
+       y="7806"
+       id="tspan315">receive_DataRequest()</tspan>
+  </text>
+  <text
+     id="text329"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12377 12483 12660 12838 13016 13194 13372 13549 13621 13799 13977 14083 14261 14438 14616 14794 14955 15133 15294 15399"
+       y="8606"
+       id="tspan331">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text345"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12191 12420 12597 12775 12953 13131 13309 13486 13664 13825 13986 14164 14426 14604 14710 14871 15049 15154 15332 15510 15616"
+       y="9007"
+       id="tspan347">w_e_end_csum_rs_req()</tspan>
+  </text>
+  <text
+     id="text361"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4444 4550 4728 4889 5066 5138 5299 5477 5655 5883 6095 6324 6501 6590 6768 6997 7175 7352 7424 7585 7691"
+       y="9507"
+       id="tspan363">receive_RSDataReply()</tspan>
+  </text>
+  <text
+     id="text377"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4457 4635 4741 4918 5096 5274 5452 5630 5807 5879 6057 6235 6464 6569 6641 6730 6908 7086 7247 7425 7585 7691"
+       y="10407"
+       id="tspan379">drbd_endio_write_sec()</tspan>
+  </text>
+  <text
+     id="text393"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4647 4825 5003 5180 5358 5536 5714 5820 5997 6158 6319 6497 6658 6836 7013 7085 7263 7424 7585 7691"
+       y="10907"
+       id="tspan395">e_end_resync_block()</tspan>
+  </text>
+  <path
+     d="M 11999,11601 L 11685,11640 L 11724,11444 L 11999,11601 z"
+     id="path405"
+     style="fill:#000080;visibility:visible" />
+  <path
+     d="M 7999,10801 L 11764,11554"
+     id="path409"
+     style="fill:none;stroke:#000080;visibility:visible" />
+  <g
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,2434.7562,-1674.649)"
+     id="g425"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text427">
+      <tspan
+         x="9320 9621 9726 9798 9887 10065 10277 10438"
+         y="10943"
+         id="tspan429">WriteAck</tspan>
+    </text>
+  </g>
+  <text
+     id="text443"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12377 12555 12644 12821 13033 13105 13283 13444 13604 13816 13977 14138 14244"
+       y="11559"
+       id="tspan445">got_BlockAck()</tspan>
+  </text>
+  <text
+     id="text459"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="7999 8304 8541 8778 8990 9201 9413 9650 10001 10120 10357 10594 10806 11043 11280 11398 11703 11940 12152 12364 12601 12812 12931 13049 13261 13498 13710 13947 14065 14302 14540 14658 14777 14870 15107 15225 15437 15649 15886"
+       y="4877"
+       id="tspan461">Checksum based Resync, case not in sync</tspan>
+  </text>
+  <text
+     id="text475"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="6961 7266 7571 7854 8159 8299 8536 8654 8891 9010 9247 9484 9603 9840 9958 10077 10170 10407"
+       y="2806"
+       id="tspan477">DRBD-8.3 data flow</tspan>
+  </text>
+  <text
+     id="text491"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5190 5419 5596 5774 5952 6113 6291 6468 6646 6824 6985 7146 7324 7586 7692"
+       y="7005"
+       id="tspan493">w_e_send_csum()</tspan>
+  </text>
+  <path
+     d="M 11999,17601 L 11899,17301 L 12099,17301 L 11999,17601 z"
+     id="path503"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 11999,16801 L 11999,17361"
+     id="path507"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 11999,16801 L 11685,16840 L 11724,16644 L 11999,16801 z"
+     id="path519"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 7999,16001 L 11764,16754"
+     id="path523"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <g
+     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,-2539.5806,1529.3491)"
+     id="g539"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text541">
+      <tspan
+         x="9269 9498 9709 9798 9959 10048 10226 10437 10598 10776"
+         y="18265"
+         id="tspan543">RSIsInSync</tspan>
+    </text>
+  </g>
+  <path
+     d="M 7999,18601 L 8281,18458 L 8311,18655 L 7999,18601 z"
+     id="path553"
+     style="fill:#000080;visibility:visible" />
+  <path
+     d="M 11999,18001 L 8236,18565"
+     id="path557"
+     style="fill:none;stroke:#000080;visibility:visible" />
+  <g
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,3461.4027,-1449.3012)"
+     id="g573"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text575">
+      <tspan
+         x="8743 8972 9132 9310 9573 9801 10013 10242 10419 10597 10775 10953 11114"
+         y="16023"
+         id="tspan577">CsumRSRequest</tspan>
+    </text>
+  </g>
+  <text
+     id="text591"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12305 12483 12644 12821 12893 13054 13232 13410 13638 13816 13905 14083 14311 14489 14667 14845 15023 15184 15272 15378"
+       y="16806"
+       id="tspan593">receive_DataRequest()</tspan>
+  </text>
+  <text
+     id="text607"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12377 12483 12660 12838 13016 13194 13372 13549 13621 13799 13977 14083 14261 14438 14616 14794 14955 15133 15294 15399"
+       y="17606"
+       id="tspan609">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text623"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12191 12420 12597 12775 12953 13131 13309 13486 13664 13825 13986 14164 14426 14604 14710 14871 15049 15154 15332 15510 15616"
+       y="18007"
+       id="tspan625">w_e_end_csum_rs_req()</tspan>
+  </text>
+  <text
+     id="text639"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5735 5913 6091 6180 6357 6446 6607 6696 6874 7085 7246 7424 7585 7691"
+       y="18507"
+       id="tspan641">got_IsInSync()</tspan>
+  </text>
+  <text
+     id="text655"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="7999 8304 8541 8778 8990 9201 9413 9650 10001 10120 10357 10594 10806 11043 11280 11398 11703 11940 12152 12364 12601 12812 12931 13049 13261 13498 13710 13947 14065 14159 14396 14514 14726 14937 15175"
+       y="13877"
+       id="tspan657">Checksum based Resync, case in sync</tspan>
+  </text>
+  <path
+     d="M 12000,24601 L 11900,24301 L 12100,24301 L 12000,24601 z"
+     id="path667"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 12000,23801 L 12000,24361"
+     id="path671"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 8000,26401 L 7900,26101 L 8100,26101 L 8000,26401 z"
+     id="path683"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,25601 L 8000,26161"
+     id="path687"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 12000,23801 L 11686,23840 L 11725,23644 L 12000,23801 z"
+     id="path699"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,23001 L 11765,23754"
+     id="path703"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <g
+     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,-3543.8452,1630.5143)"
+     id="g719"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text721">
+      <tspan
+         x="9464 9710 9921 10150 10328 10505 10577"
+         y="25236"
+         id="tspan723">OVReply</tspan>
+    </text>
+  </g>
+  <path
+     d="M 8000,25601 L 8282,25458 L 8312,25655 L 8000,25601 z"
+     id="path733"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 12000,25001 L 8237,25565"
+     id="path737"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <g
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,4918.2801,-1381.2128)"
+     id="g753"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text755">
+      <tspan
+         x="9142 9388 9599 9828 10006 10183 10361 10539 10700"
+         y="23106"
+         id="tspan757">OVRequest</tspan>
+    </text>
+  </g>
+  <text
+     id="text771"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12306 12484 12645 12822 12894 13055 13233 13411 13656 13868 14097 14274 14452 14630 14808 14969 15058 15163"
+       y="23806"
+       id="tspan773">receive_OVRequest()</tspan>
+  </text>
+  <text
+     id="text787"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12378 12484 12661 12839 13017 13195 13373 13550 13622 13800 13978 14084 14262 14439 14617 14795 14956 15134 15295 15400"
+       y="24606"
+       id="tspan789">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text803"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12192 12421 12598 12776 12954 13132 13310 13487 13665 13843 14004 14182 14288 14465 14643 14749"
+       y="25007"
+       id="tspan805">w_e_end_ov_req()</tspan>
+  </text>
+  <text
+     id="text819"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5101 5207 5385 5546 5723 5795 5956 6134 6312 6557 6769 6998 7175 7353 7425 7586 7692"
+       y="25507"
+       id="tspan821">receive_OVReply()</tspan>
+  </text>
+  <text
+     id="text835"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4492 4670 4776 4953 5131 5309 5487 5665 5842 5914 6092 6270 6376 6554 6731 6909 7087 7248 7426 7587 7692"
+       y="26407"
+       id="tspan837">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text851"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4902 5131 5308 5486 5664 5842 6020 6197 6375 6553 6714 6892 6998 7175 7353 7425 7586 7692"
+       y="26907"
+       id="tspan853">w_e_end_ov_reply()</tspan>
+  </text>
+  <path
+     d="M 12000,27601 L 11686,27640 L 11725,27444 L 12000,27601 z"
+     id="path863"
+     style="fill:#000080;visibility:visible" />
+  <path
+     d="M 8000,26801 L 11765,27554"
+     id="path867"
+     style="fill:none;stroke:#000080;visibility:visible" />
+  <g
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,5704.1907,-1328.312)"
+     id="g883"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <text
+       id="text885">
+      <tspan
+         x="9279 9525 9736 9965 10143 10303 10481 10553"
+         y="26935"
+         id="tspan887">OVResult</tspan>
+    </text>
+  </g>
+  <text
+     id="text901"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12378 12556 12645 12822 13068 13280 13508 13686 13847 14025 14097 14185 14291"
+       y="27559"
+       id="tspan903">got_OVResult()</tspan>
+  </text>
+  <text
+     id="text917"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="8000 8330 8567 8660 8754 8991 9228 9346 9558 9795 9935 10028 10146"
+       y="21877"
+       id="tspan919">Online verify</tspan>
+  </text>
+  <text
+     id="text933"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4641 4870 5047 5310 5488 5649 5826 6004 6182 6343 6521 6626 6804 6982 7160 7338 7499 7587 7693"
+       y="23005"
+       id="tspan935">w_make_ov_request()</tspan>
+  </text>
+  <path
+     d="M 8000,6500 L 7900,6200 L 8100,6200 L 8000,6500 z"
+     id="path945"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,5700 L 8000,6260"
+     id="path949"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 3900,5500 L 3700,5500 L 3700,11000 L 3900,11000"
+     id="path961"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <path
+     d="M 3900,14500 L 3700,14500 L 3700,18600 L 3900,18600"
+     id="path973"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <path
+     d="M 3900,22800 L 3700,22800 L 3700,26900 L 3900,26900"
+     id="path985"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <text
+     id="text1001"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4492 4670 4776 4953 5131 5309 5487 5665 5842 5914 6092 6270 6376 6554 6731 6909 7087 7248 7426 7587 7692"
+       y="6506"
+       id="tspan1003">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text1017"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4034 4263 4440 4703 4881 5042 5219 5397 5503 5681 5842 6003 6180 6341 6519 6625 6803 6980 7158 7336 7497 7586 7692"
+       y="14708"
+       id="tspan1019">w_make_resync_request()</tspan>
+  </text>
+  <text
+     id="text1033"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5190 5419 5596 5774 5952 6113 6291 6468 6646 6824 6985 7146 7324 7586 7692"
+       y="16006"
+       id="tspan1035">w_e_send_csum()</tspan>
+  </text>
+  <path
+     d="M 8000,15501 L 7900,15201 L 8100,15201 L 8000,15501 z"
+     id="path1045"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,14701 L 8000,15261"
+     id="path1049"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     id="text1065"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4492 4670 4776 4953 5131 5309 5487 5665 5842 5914 6092 6270 6376 6554 6731 6909 7087 7248 7426 7587 7692"
+       y="15507"
+       id="tspan1067">drbd_endio_read_sec()</tspan>
+  </text>
+  <path
+     d="M 16100,9000 L 16300,9000 L 16300,7500 L 16100,7500"
+     id="path1077"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <path
+     d="M 16100,18000 L 16300,18000 L 16300,16500 L 16100,16500"
+     id="path1089"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <path
+     d="M 16100,25000 L 16300,25000 L 16300,23500 L 16100,23500"
+     id="path1101"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <text
+     id="text1117"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="2026 2132 2293 2471 2648 2826 3004 3076 3254 3431 3503 3681 3787"
+       y="5402"
+       id="tspan1119">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1133"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="2027 2133 2294 2472 2649 2827 3005 3077 3255 3432 3504 3682 3788"
+       y="14402"
+       id="tspan1135">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1149"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="2026 2132 2293 2471 2648 2826 3004 3076 3254 3431 3503 3681 3787"
+       y="22602"
+       id="tspan1151">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1165"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="1426 1532 1693 1871 2031 2209 2472 2649 2721 2899 2988 3166 3344 3416 3593 3699"
+       y="11302"
+       id="tspan1167">rs_complete_io()</tspan>
+  </text>
+  <text
+     id="text1181"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="1526 1632 1793 1971 2131 2309 2572 2749 2821 2999 3088 3266 3444 3516 3693 3799"
+       y="18931"
+       id="tspan1183">rs_complete_io()</tspan>
+  </text>
+  <text
+     id="text1197"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="1526 1632 1793 1971 2131 2309 2572 2749 2821 2999 3088 3266 3444 3516 3693 3799"
+       y="27231"
+       id="tspan1199">rs_complete_io()</tspan>
+  </text>
+  <text
+     id="text1213"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16126 16232 16393 16571 16748 16926 17104 17176 17354 17531 17603 17781 17887"
+       y="7402"
+       id="tspan1215">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1229"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16127 16233 16394 16572 16749 16927 17105 17177 17355 17532 17604 17782 17888"
+       y="16331"
+       id="tspan1231">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1245"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16127 16233 16394 16572 16749 16927 17105 17177 17355 17532 17604 17782 17888"
+       y="23302"
+       id="tspan1247">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1261"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16115 16221 16382 16560 16720 16898 17161 17338 17410 17588 17677 17855 18033 18105 18282 18388"
+       y="9302"
+       id="tspan1263">rs_complete_io()</tspan>
+  </text>
+  <text
+     id="text1277"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16115 16221 16382 16560 16720 16898 17161 17338 17410 17588 17677 17855 18033 18105 18282 18388"
+       y="18331"
+       id="tspan1279">rs_complete_io()</tspan>
+  </text>
+  <text
+     id="text1293"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16126 16232 16393 16571 16731 16909 17172 17349 17421 17599 17688 17866 18044 18116 18293 18399"
+       y="25302"
+       id="tspan1295">rs_complete_io()</tspan>
+  </text>
+</svg>
diff --git a/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg b/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
new file mode 100644
index 000000000000..48a1e2165fec
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
@@ -0,0 +1,459 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+<svg
+   xmlns:svg="http://www.w3.org/2000/svg"
+   xmlns="http://www.w3.org/2000/svg"
+   version="1.0"
+   width="210mm"
+   height="297mm"
+   viewBox="0 0 21000 29700"
+   id="svg2"
+   style="fill-rule:evenodd">
+  <defs
+     id="defs4" />
+  <g
+     id="Default"
+     style="visibility:visible">
+    <desc
+       id="desc176">Master slide</desc>
+  </g>
+  <path
+     d="M 11999,19601 L 11899,19301 L 12099,19301 L 11999,19601 z"
+     id="path189"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 11999,18801 L 11999,19361"
+     id="path193"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 7999,21401 L 7899,21101 L 8099,21101 L 7999,21401 z"
+     id="path205"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 7999,20601 L 7999,21161"
+     id="path209"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 11999,18801 L 11685,18840 L 11724,18644 L 11999,18801 z"
+     id="path221"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 7999,18001 L 11764,18754"
+     id="path225"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     x="-3023.845"
+     y="1106.8124"
+     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,0,0)"
+     id="text243"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="6115.1553 6344.1553 6555.1553 6784.1553 6962.1553 7051.1553 7228.1553 7457.1553 7635.1553 7813.1553 7885.1553"
+       y="21390.812"
+       id="tspan245">RSDataReply</tspan>
+  </text>
+  <path
+     d="M 7999,20601 L 8281,20458 L 8311,20655 L 7999,20601 z"
+     id="path255"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 11999,20001 L 8236,20565"
+     id="path259"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     x="3502.5356"
+     y="-2184.6621"
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
+     id="text277"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12321.536 12550.536 12761.536 12990.536 13168.536 13257.536 13434.536 13663.536 13841.536 14019.536 14196.536 14374.536 14535.536"
+       y="15854.338"
+       id="tspan279">RSDataRequest</tspan>
+  </text>
+  <text
+     id="text293"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4034 4263 4440 4703 4881 5042 5219 5397 5503 5681 5842 6003 6180 6341 6519 6625 6803 6980 7158 7336 7497 7586 7692"
+       y="17807"
+       id="tspan295">w_make_resync_request()</tspan>
+  </text>
+  <text
+     id="text309"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12305 12483 12644 12821 12893 13054 13232 13410 13638 13816 13905 14083 14311 14489 14667 14845 15023 15184 15272 15378"
+       y="18806"
+       id="tspan311">receive_DataRequest()</tspan>
+  </text>
+  <text
+     id="text325"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12377 12483 12660 12838 13016 13194 13372 13549 13621 13799 13977 14083 14261 14438 14616 14794 14955 15133 15294 15399"
+       y="19606"
+       id="tspan327">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text341"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12191 12420 12597 12775 12953 13131 13309 13486 13664 13770 13931 14109 14287 14375 14553 14731 14837 15015 15192 15298"
+       y="20007"
+       id="tspan343">w_e_end_rsdata_req()</tspan>
+  </text>
+  <text
+     id="text357"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4444 4550 4728 4889 5066 5138 5299 5477 5655 5883 6095 6324 6501 6590 6768 6997 7175 7352 7424 7585 7691"
+       y="20507"
+       id="tspan359">receive_RSDataReply()</tspan>
+  </text>
+  <text
+     id="text373"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4457 4635 4741 4918 5096 5274 5452 5630 5807 5879 6057 6235 6464 6569 6641 6730 6908 7086 7247 7425 7585 7691"
+       y="21407"
+       id="tspan375">drbd_endio_write_sec()</tspan>
+  </text>
+  <text
+     id="text389"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4647 4825 5003 5180 5358 5536 5714 5820 5997 6158 6319 6497 6658 6836 7013 7085 7263 7424 7585 7691"
+       y="21907"
+       id="tspan391">e_end_resync_block()</tspan>
+  </text>
+  <path
+     d="M 11999,22601 L 11685,22640 L 11724,22444 L 11999,22601 z"
+     id="path401"
+     style="fill:#000080;visibility:visible" />
+  <path
+     d="M 7999,21801 L 11764,22554"
+     id="path405"
+     style="fill:none;stroke:#000080;visibility:visible" />
+  <text
+     x="4290.3008"
+     y="-2369.6162"
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
+     id="text423"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="13610.301 13911.301 14016.301 14088.301 14177.301 14355.301 14567.301 14728.301"
+       y="19573.385"
+       id="tspan425">WriteAck</tspan>
+  </text>
+  <text
+     id="text439"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12199 12377 12555 12644 12821 13033 13105 13283 13444 13604 13816 13977 14138 14244"
+       y="22559"
+       id="tspan441">got_BlockAck()</tspan>
+  </text>
+  <text
+     id="text455"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="7999 8304 8541 8753 8964 9201 9413 9531 9769 9862 10099 10310 10522 10734 10852 10971 11208 11348 11585 11822"
+       y="16877"
+       id="tspan457">Resync blocks, 4-32K</tspan>
+  </text>
+  <path
+     d="M 12000,7601 L 11900,7301 L 12100,7301 L 12000,7601 z"
+     id="path467"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 12000,6801 L 12000,7361"
+     id="path471"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 12000,6801 L 11686,6840 L 11725,6644 L 12000,6801 z"
+     id="path483"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,6001 L 11765,6754"
+     id="path487"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     x="-1288.1796"
+     y="1279.7666"
+     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,0,0)"
+     id="text505"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="8174.8208 8475.8203 8580.8203 8652.8203 8741.8203 8919.8203 9131.8203 9292.8203"
+       y="9516.7666"
+       id="tspan507">WriteAck</tspan>
+  </text>
+  <path
+     d="M 8000,8601 L 8282,8458 L 8312,8655 L 8000,8601 z"
+     id="path517"
+     style="fill:#000080;visibility:visible" />
+  <path
+     d="M 12000,8001 L 8237,8565"
+     id="path521"
+     style="fill:none;stroke:#000080;visibility:visible" />
+  <text
+     x="1065.6655"
+     y="-2097.7664"
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
+     id="text539"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="10682.666 10911.666 11088.666 11177.666"
+       y="4107.2339"
+       id="tspan541">Data</tspan>
+  </text>
+  <text
+     id="text555"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4746 4924 5030 5207 5385 5563 5826 6003 6164 6342 6520 6626 6803 6981 7159 7337 7498 7587 7692"
+       y="5505"
+       id="tspan557">drbd_make_request()</tspan>
+  </text>
+  <text
+     id="text571"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12306 12484 12645 12822 12894 13055 13233 13411 13639 13817 13906 14084 14190"
+       y="6806"
+       id="tspan573">receive_Data()</tspan>
+  </text>
+  <text
+     id="text587"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12378 12484 12661 12839 13017 13195 13373 13550 13622 13800 13978 14207 14312 14384 14473 14651 14829 14990 15168 15328 15434"
+       y="7606"
+       id="tspan589">drbd_endio_write_sec()</tspan>
+  </text>
+  <text
+     id="text603"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12192 12370 12548 12725 12903 13081 13259 13437 13509 13686 13847 14008 14114"
+       y="8007"
+       id="tspan605">e_end_block()</tspan>
+  </text>
+  <text
+     id="text619"
+     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5647 5825 6003 6092 6269 6481 6553 6731 6892 7052 7264 7425 7586 7692"
+       y="8606"
+       id="tspan621">got_BlockAck()</tspan>
+  </text>
+  <text
+     id="text635"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="8000 8305 8542 8779 9016 9109 9346 9486 9604 9956 10049 10189 10328 10565 10705 10942 11179 11298 11603 11742 11835 11954 12191 12310 12428 12665 12902 13139 13279 13516 13753"
+       y="4877"
+       id="tspan637">Regular mirrored write, 512-32K</tspan>
+  </text>
+  <text
+     id="text651"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5381 5610 5787 5948 6126 6304 6482 6659 6837 7015 7087 7265 7426 7587 7692"
+       y="6003"
+       id="tspan653">w_send_dblock()</tspan>
+  </text>
+  <path
+     d="M 8000,6800 L 7900,6500 L 8100,6500 L 8000,6800 z"
+     id="path663"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,6000 L 8000,6560"
+     id="path667"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     id="text683"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4602 4780 4886 5063 5241 5419 5597 5775 5952 6024 6202 6380 6609 6714 6786 6875 7053 7231 7409 7515 7587 7692"
+       y="6905"
+       id="tspan685">drbd_endio_write_pri()</tspan>
+  </text>
+  <path
+     d="M 12000,13602 L 11900,13302 L 12100,13302 L 12000,13602 z"
+     id="path695"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 12000,12802 L 12000,13362"
+     id="path699"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <path
+     d="M 12000,12802 L 11686,12841 L 11725,12645 L 12000,12802 z"
+     id="path711"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 8000,12002 L 11765,12755"
+     id="path715"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     x="-2155.5266"
+     y="1201.5964"
+     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,0,0)"
+     id="text733"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="7202.4736 7431.4736 7608.4736 7697.4736 7875.4736 8104.4736 8282.4736 8459.4736 8531.4736"
+       y="15454.597"
+       id="tspan735">DataReply</tspan>
+  </text>
+  <path
+     d="M 8000,14602 L 8282,14459 L 8312,14656 L 8000,14602 z"
+     id="path745"
+     style="fill:#008000;visibility:visible" />
+  <path
+     d="M 12000,14002 L 8237,14566"
+     id="path749"
+     style="fill:none;stroke:#008000;visibility:visible" />
+  <text
+     x="2280.3804"
+     y="-2103.2141"
+     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
+     id="text767"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="11316.381 11545.381 11722.381 11811.381 11989.381 12218.381 12396.381 12573.381 12751.381 12929.381 13090.381"
+       y="9981.7861"
+       id="tspan769">DataRequest</tspan>
+  </text>
+  <text
+     id="text783"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="4746 4924 5030 5207 5385 5563 5826 6003 6164 6342 6520 6626 6803 6981 7159 7337 7498 7587 7692"
+       y="11506"
+       id="tspan785">drbd_make_request()</tspan>
+  </text>
+  <text
+     id="text799"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12306 12484 12645 12822 12894 13055 13233 13411 13639 13817 13906 14084 14312 14490 14668 14846 15024 15185 15273 15379"
+       y="12807"
+       id="tspan801">receive_DataRequest()</tspan>
+  </text>
+  <text
+     id="text815"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12200 12378 12484 12661 12839 13017 13195 13373 13550 13622 13800 13978 14084 14262 14439 14617 14795 14956 15134 15295 15400"
+       y="13607"
+       id="tspan817">drbd_endio_read_sec()</tspan>
+  </text>
+  <text
+     id="text831"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="12192 12421 12598 12776 12954 13132 13310 13487 13665 13843 14021 14110 14288 14465 14571 14749 14927 15033"
+       y="14008"
+       id="tspan833">w_e_end_data_req()</tspan>
+  </text>
+  <g
+     id="g835"
+     style="visibility:visible">
+    <desc
+       id="desc837">Drawing</desc>
+    <text
+       id="text847"
+       style="font-size:318px;font-weight:400;fill:#008000;font-family:Helvetica embedded">
+      <tspan
+         x="4885 4991 5169 5330 5507 5579 5740 5918 6096 6324 6502 6591 6769 6997 7175 7353 7425 7586 7692"
+         y="14607"
+         id="tspan849">receive_DataReply()</tspan>
+    </text>
+  </g>
+  <text
+     id="text863"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="8000 8305 8398 8610 8821 8914 9151 9363 9575 9693 9833 10070 10307 10544 10663 10781 11018 11255 11493 11632 11869 12106"
+       y="10878"
+       id="tspan865">Diskless read, 512-32K</tspan>
+  </text>
+  <text
+     id="text879"
+     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="5029 5258 5435 5596 5774 5952 6130 6307 6413 6591 6769 6947 7125 7230 7408 7586 7692"
+       y="12004"
+       id="tspan881">w_send_read_req()</tspan>
+  </text>
+  <text
+     id="text895"
+     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="6961 7266 7571 7854 8159 8278 8515 8633 8870 9107 9226 9463 9581 9700 9793 10030"
+       y="2806"
+       id="tspan897">DRBD 8 data flow</tspan>
+  </text>
+  <path
+     d="M 3900,5300 L 3700,5300 L 3700,7000 L 3900,7000"
+     id="path907"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <path
+     d="M 3900,17600 L 3700,17600 L 3700,22000 L 3900,22000"
+     id="path919"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <path
+     d="M 16100,20000 L 16300,20000 L 16300,18500 L 16100,18500"
+     id="path931"
+     style="fill:none;stroke:#000000;visibility:visible" />
+  <text
+     id="text947"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="2126 2304 2376 2554 2731 2909 3087 3159 3337 3515 3587 3764 3870"
+       y="5202"
+       id="tspan949">al_begin_io()</tspan>
+  </text>
+  <text
+     id="text963"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="1632 1810 1882 2060 2220 2398 2661 2839 2910 3088 3177 3355 3533 3605 3783 3888"
+       y="7331"
+       id="tspan965">al_complete_io()</tspan>
+  </text>
+  <text
+     id="text979"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="2126 2232 2393 2571 2748 2926 3104 3176 3354 3531 3603 3781 3887"
+       y="17431"
+       id="tspan981">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text995"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="1626 1732 1893 2071 2231 2409 2672 2849 2921 3099 3188 3366 3544 3616 3793 3899"
+       y="22331"
+       id="tspan997">rs_complete_io()</tspan>
+  </text>
+  <text
+     id="text1011"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16027 16133 16294 16472 16649 16827 17005 17077 17255 17432 17504 17682 17788"
+       y="18402"
+       id="tspan1013">rs_begin_io()</tspan>
+  </text>
+  <text
+     id="text1027"
+     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
+    <tspan
+       x="16115 16221 16382 16560 16720 16898 17161 17338 17410 17588 17677 17855 18033 18105 18282 18388"
+       y="20331"
+       id="tspan1029">rs_complete_io()</tspan>
+  </text>
+</svg>
diff --git a/Documentation/admin-guide/blockdev/drbd/conn-states-8.dot b/Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
new file mode 100644
index 000000000000..025e8cf5e64a
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
@@ -0,0 +1,18 @@
+digraph conn_states {
+	StandAllone  -> WFConnection   [ label = "ioctl_set_net()" ]
+	WFConnection -> Unconnected    [ label = "unable to bind()" ]
+	WFConnection -> WFReportParams [ label = "in connect() after accept" ]
+	WFReportParams -> StandAllone  [ label = "checks in receive_param()" ]
+	WFReportParams -> Connected    [ label = "in receive_param()" ]
+	WFReportParams -> WFBitMapS    [ label = "sync_handshake()" ]
+	WFReportParams -> WFBitMapT    [ label = "sync_handshake()" ]
+	WFBitMapS -> SyncSource        [ label = "receive_bitmap()" ]
+	WFBitMapT -> SyncTarget        [ label = "receive_bitmap()" ]
+	SyncSource -> Connected
+	SyncTarget -> Connected
+	SyncSource -> PausedSyncS
+	SyncTarget -> PausedSyncT
+	PausedSyncS -> SyncSource
+	PausedSyncT -> SyncTarget
+	Connected   -> WFConnection    [ label = "* on network error" ]
+}
diff --git a/Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst b/Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
new file mode 100644
index 000000000000..66036b901644
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
@@ -0,0 +1,42 @@
+================================
+kernel data structure for DRBD-9
+================================
+
+This describes the in kernel data structure for DRBD-9. Starting with
+Linux v3.14 we are reorganizing DRBD to use this data structure.
+
+Basic Data Structure
+====================
+
+A node has a number of DRBD resources.  Each such resource has a number of
+devices (aka volumes) and connections to other nodes ("peer nodes"). Each DRBD
+device is represented by a block device locally.
+
+The DRBD objects are interconnected to form a matrix as depicted below; a
+drbd_peer_device object sits at each intersection between a drbd_device and a
+drbd_connection::
+
+  /--------------+---------------+.....+---------------\
+  |   resource   |    device     |     |    device     |
+  +--------------+---------------+.....+---------------+
+  |  connection  |  peer_device  |     |  peer_device  |
+  +--------------+---------------+.....+---------------+
+  :              :               :     :               :
+  :              :               :     :               :
+  +--------------+---------------+.....+---------------+
+  |  connection  |  peer_device  |     |  peer_device  |
+  \--------------+---------------+.....+---------------/
+
+In this table, horizontally, devices can be accessed from resources by their
+volume number.  Likewise, peer_devices can be accessed from connections by
+their volume number.  Objects in the vertical direction are connected by double
+linked lists.  There are back pointers from peer_devices to their connections a
+devices, and from connections and devices to their resource.
+
+All resources are in the drbd_resources double-linked list.  In addition, all
+devices can be accessed by their minor device number via the drbd_devices idr.
+
+The drbd_resource, drbd_connection, and drbd_device objects are reference
+counted.  The peer_device objects only serve to establish the links between
+devices and connections; their lifetime is determined by the lifetime of the
+device and connection which they reference.
diff --git a/Documentation/admin-guide/blockdev/drbd/disk-states-8.dot b/Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
new file mode 100644
index 000000000000..d06cfb46fb98
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
@@ -0,0 +1,16 @@
+digraph disk_states {
+	Diskless -> Inconsistent       [ label = "ioctl_set_disk()" ]
+	Diskless -> Consistent         [ label = "ioctl_set_disk()" ]
+	Diskless -> Outdated           [ label = "ioctl_set_disk()" ]
+	Consistent -> Outdated         [ label = "receive_param()" ]
+	Consistent -> UpToDate         [ label = "receive_param()" ]
+	Consistent -> Inconsistent     [ label = "start resync" ]
+	Outdated   -> Inconsistent     [ label = "start resync" ]
+	UpToDate   -> Inconsistent     [ label = "ioctl_replicate" ]
+	Inconsistent -> UpToDate       [ label = "resync completed" ]
+	Consistent -> Failed           [ label = "io completion error" ]
+	Outdated   -> Failed           [ label = "io completion error" ]
+	UpToDate   -> Failed           [ label = "io completion error" ]
+	Inconsistent -> Failed         [ label = "io completion error" ]
+	Failed -> Diskless             [ label = "sending notify to peer" ]
+}
diff --git a/Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot b/Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
new file mode 100644
index 000000000000..6d9cf0a7b11d
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
@@ -0,0 +1,85 @@
+// vim: set sw=2 sts=2 :
+digraph {
+  rankdir=BT
+  bgcolor=white
+
+  node [shape=plaintext]
+  node [fontcolor=black]
+
+  StandAlone     [ style=filled,fillcolor=gray,label=StandAlone ]
+
+  node [fontcolor=lightgray]
+
+  Unconnected    [ label=Unconnected ]
+
+  CommTrouble [ shape=record,
+    label="{communication loss|{Timeout|BrokenPipe|NetworkFailure}}" ]
+
+  node [fontcolor=gray]
+
+  subgraph cluster_try_connect {
+    label="try to connect, handshake"
+    rank=max
+    WFConnection   [ label=WFConnection ]
+    WFReportParams [ label=WFReportParams ]
+  }
+
+  TearDown       [ label=TearDown ]
+
+  Connected      [ label=Connected,style=filled,fillcolor=green,fontcolor=black ]
+
+  node [fontcolor=lightblue]
+
+  StartingSyncS  [ label=StartingSyncS ]
+  StartingSyncT  [ label=StartingSyncT ]
+
+  subgraph cluster_bitmap_exchange {
+    node [fontcolor=red]
+    fontcolor=red
+    label="new application (WRITE?) requests blocked\lwhile bitmap is exchanged"
+
+    WFBitMapT      [ label=WFBitMapT ]
+    WFSyncUUID     [ label=WFSyncUUID ]
+    WFBitMapS      [ label=WFBitMapS ]
+  }
+
+  node [fontcolor=blue]
+
+  cluster_resync [ shape=record,label="{<any>resynchronisation process running\l'concurrent' application requests allowed|{{<T>PausedSyncT\nSyncTarget}|{<S>PausedSyncS\nSyncSource}}}" ]
+
+  node [shape=box,fontcolor=black]
+
+  // drbdadm [label="drbdadm connect"]
+  // handshake [label="drbd_connect()\ndrbd_do_handshake\ndrbd_sync_handshake() etc."]
+  // comm_error [label="communication trouble"]
+
+  //
+  // edges
+  // --------------------------------------
+
+  StandAlone -> Unconnected [ label="drbdadm connect" ]
+  Unconnected -> StandAlone  [ label="drbdadm disconnect\lor serious communication trouble" ]
+  Unconnected -> WFConnection [ label="receiver thread is started" ]
+  WFConnection -> WFReportParams [ headlabel="accept()\land/or                        \lconnect()\l" ]
+
+  WFReportParams -> StandAlone [ label="during handshake\lpeers do not agree\labout something essential" ]
+  WFReportParams -> Connected [ label="data identical\lno sync needed",color=green,fontcolor=green ]
+
+    WFReportParams -> WFBitMapS
+    WFReportParams -> WFBitMapT
+    WFBitMapT -> WFSyncUUID [minlen=0.1,constraint=false]
+
+      WFBitMapS -> cluster_resync:S
+      WFSyncUUID -> cluster_resync:T
+
+  edge [color=green]
+  cluster_resync:any -> Connected [ label="resnyc done",fontcolor=green ]
+
+  edge [color=red]
+  WFReportParams -> CommTrouble
+  Connected -> CommTrouble
+  cluster_resync:any -> CommTrouble
+  edge [color=black]
+  CommTrouble -> Unconnected [label="receiver thread is stopped" ]
+
+}
diff --git a/Documentation/admin-guide/blockdev/drbd/figures.rst b/Documentation/admin-guide/blockdev/drbd/figures.rst
new file mode 100644
index 000000000000..3e3fd4b8a478
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/figures.rst
@@ -0,0 +1,28 @@
+.. The here included files are intended to help understand the implementation
+
+Data flows that Relate some functions, and write packets
+========================================================
+
+.. kernel-figure:: DRBD-8.3-data-packets.svg
+    :alt:   DRBD-8.3-data-packets.svg
+    :align: center
+
+.. kernel-figure:: DRBD-data-packets.svg
+    :alt:   DRBD-data-packets.svg
+    :align: center
+
+
+Sub graphs of DRBD's state transitions
+======================================
+
+.. kernel-figure:: conn-states-8.dot
+    :alt:   conn-states-8.dot
+    :align: center
+
+.. kernel-figure:: disk-states-8.dot
+    :alt:   disk-states-8.dot
+    :align: center
+
+.. kernel-figure:: node-states-8.dot
+    :alt:   node-states-8.dot
+    :align: center
diff --git a/Documentation/admin-guide/blockdev/drbd/index.rst b/Documentation/admin-guide/blockdev/drbd/index.rst
new file mode 100644
index 000000000000..68ecd5c113e9
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/index.rst
@@ -0,0 +1,19 @@
+==========================================
+Distributed Replicated Block Device - DRBD
+==========================================
+
+Description
+===========
+
+  DRBD is a shared-nothing, synchronously replicated block device. It
+  is designed to serve as a building block for high availability
+  clusters and in this context, is a "drop-in" replacement for shared
+  storage. Simplistically, you could see it as a network RAID 1.
+
+  Please visit http://www.drbd.org to find out more.
+
+.. toctree::
+   :maxdepth: 1
+
+   data-structure-v9
+   figures
diff --git a/Documentation/admin-guide/blockdev/drbd/node-states-8.dot b/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
new file mode 100644
index 000000000000..bfa54e1f8016
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
@@ -0,0 +1,13 @@
+digraph node_states {
+	Secondary -> Primary           [ label = "ioctl_set_state()" ]
+	Primary   -> Secondary 	       [ label = "ioctl_set_state()" ]
+}
+
+digraph peer_states {
+	Secondary -> Primary           [ label = "recv state packet" ]
+	Primary   -> Secondary 	       [ label = "recv state packet" ]
+	Primary   -> Unknown 	       [ label = "connection lost" ]
+	Secondary  -> Unknown  	       [ label = "connection lost" ]
+	Unknown   -> Primary           [ label = "connected" ]
+	Unknown   -> Secondary         [ label = "connected" ]
+}
diff --git a/Documentation/admin-guide/blockdev/floppy.rst b/Documentation/admin-guide/blockdev/floppy.rst
new file mode 100644
index 000000000000..4a8f31cf4139
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/floppy.rst
@@ -0,0 +1,255 @@
+=============
+Floppy Driver
+=============
+
+FAQ list:
+=========
+
+A FAQ list may be found in the fdutils package (see below), and also
+at <http://fdutils.linux.lu/faq.html>.
+
+
+LILO configuration options (Thinkpad users, read this)
+======================================================
+
+The floppy driver is configured using the 'floppy=' option in
+lilo. This option can be typed at the boot prompt, or entered in the
+lilo configuration file.
+
+Example: If your kernel is called linux-2.6.9, type the following line
+at the lilo boot prompt (if you have a thinkpad)::
+
+ linux-2.6.9 floppy=thinkpad
+
+You may also enter the following line in /etc/lilo.conf, in the description
+of linux-2.6.9::
+
+ append = "floppy=thinkpad"
+
+Several floppy related options may be given, example::
+
+ linux-2.6.9 floppy=daring floppy=two_fdc
+ append = "floppy=daring floppy=two_fdc"
+
+If you give options both in the lilo config file and on the boot
+prompt, the option strings of both places are concatenated, the boot
+prompt options coming last. That's why there are also options to
+restore the default behavior.
+
+
+Module configuration options
+============================
+
+If you use the floppy driver as a module, use the following syntax::
+
+	modprobe floppy floppy="<options>"
+
+Example::
+
+	modprobe floppy floppy="omnibook messages"
+
+If you need certain options enabled every time you load the floppy driver,
+you can put::
+
+	options floppy floppy="omnibook messages"
+
+in a configuration file in /etc/modprobe.d/.
+
+
+The floppy driver related options are:
+
+ floppy=asus_pci
+	Sets the bit mask to allow only units 0 and 1. (default)
+
+ floppy=daring
+	Tells the floppy driver that you have a well behaved floppy controller.
+	This allows more efficient and smoother operation, but may fail on
+	certain controllers. This may speed up certain operations.
+
+ floppy=0,daring
+	Tells the floppy driver that your floppy controller should be used
+	with caution.
+
+ floppy=one_fdc
+	Tells the floppy driver that you have only one floppy controller.
+	(default)
+
+ floppy=two_fdc / floppy=<address>,two_fdc
+	Tells the floppy driver that you have two floppy controllers.
+	The second floppy controller is assumed to be at <address>.
+	This option is not needed if the second controller is at address
+	0x370, and if you use the 'cmos' option.
+
+ floppy=thinkpad
+	Tells the floppy driver that you have a Thinkpad. Thinkpads use an
+	inverted convention for the disk change line.
+
+ floppy=0,thinkpad
+	Tells the floppy driver that you don't have a Thinkpad.
+
+ floppy=omnibook / floppy=nodma
+	Tells the floppy driver not to use Dma for data transfers.
+	This is needed on HP Omnibooks, which don't have a workable
+	DMA channel for the floppy driver. This option is also useful
+	if you frequently get "Unable to allocate DMA memory" messages.
+	Indeed, dma memory needs to be continuous in physical memory,
+	and is thus harder to find, whereas non-dma buffers may be
+	allocated in virtual memory. However, I advise against this if
+	you have an FDC without a FIFO (8272A or 82072). 82072A and
+	later are OK. You also need at least a 486 to use nodma.
+	If you use nodma mode, I suggest you also set the FIFO
+	threshold to 10 or lower, in order to limit the number of data
+	transfer interrupts.
+
+	If you have a FIFO-able FDC, the floppy driver automatically
+	falls back on non DMA mode if no DMA-able memory can be found.
+	If you want to avoid this, explicitly ask for 'yesdma'.
+
+ floppy=yesdma
+	Tells the floppy driver that a workable DMA channel is available.
+	(default)
+
+ floppy=nofifo
+	Disables the FIFO entirely. This is needed if you get "Bus
+	master arbitration error" messages from your Ethernet card (or
+	from other devices) while accessing the floppy.
+
+ floppy=usefifo
+	Enables the FIFO. (default)
+
+ floppy=<threshold>,fifo_depth
+	Sets the FIFO threshold. This is mostly relevant in DMA
+	mode. If this is higher, the floppy driver tolerates more
+	interrupt latency, but it triggers more interrupts (i.e. it
+	imposes more load on the rest of the system). If this is
+	lower, the interrupt latency should be lower too (faster
+	processor). The benefit of a lower threshold is less
+	interrupts.
+
+	To tune the fifo threshold, switch on over/underrun messages
+	using 'floppycontrol --messages'. Then access a floppy
+	disk. If you get a huge amount of "Over/Underrun - retrying"
+	messages, then the fifo threshold is too low. Try with a
+	higher value, until you only get an occasional Over/Underrun.
+	It is a good idea to compile the floppy driver as a module
+	when doing this tuning. Indeed, it allows to try different
+	fifo values without rebooting the machine for each test. Note
+	that you need to do 'floppycontrol --messages' every time you
+	re-insert the module.
+
+	Usually, tuning the fifo threshold should not be needed, as
+	the default (0xa) is reasonable.
+
+ floppy=<drive>,<type>,cmos
+	Sets the CMOS type of <drive> to <type>. This is mandatory if
+	you have more than two floppy drives (only two can be
+	described in the physical CMOS), or if your BIOS uses
+	non-standard CMOS types. The CMOS types are:
+
+	       ==  ==================================
+		0  Use the value of the physical CMOS
+		1  5 1/4 DD
+		2  5 1/4 HD
+		3  3 1/2 DD
+		4  3 1/2 HD
+		5  3 1/2 ED
+		6  3 1/2 ED
+	       16  unknown or not installed
+	       ==  ==================================
+
+	(Note: there are two valid types for ED drives. This is because 5 was
+	initially chosen to represent floppy *tapes*, and 6 for ED drives.
+	AMI ignored this, and used 5 for ED drives. That's why the floppy
+	driver handles both.)
+
+ floppy=unexpected_interrupts
+	Print a warning message when an unexpected interrupt is received.
+	(default)
+
+ floppy=no_unexpected_interrupts / floppy=L40SX
+	Don't print a message when an unexpected interrupt is received. This
+	is needed on IBM L40SX laptops in certain video modes. (There seems
+	to be an interaction between video and floppy. The unexpected
+	interrupts affect only performance, and can be safely ignored.)
+
+ floppy=broken_dcl
+	Don't use the disk change line, but assume that the disk was
+	changed whenever the device node is reopened. Needed on some
+	boxes where the disk change line is broken or unsupported.
+	This should be regarded as a stopgap measure, indeed it makes
+	floppy operation less efficient due to unneeded cache
+	flushings, and slightly more unreliable. Please verify your
+	cable, connection and jumper settings if you have any DCL
+	problems. However, some older drives, and also some laptops
+	are known not to have a DCL.
+
+ floppy=debug
+	Print debugging messages.
+
+ floppy=messages
+	Print informational messages for some operations (disk change
+	notifications, warnings about over and underruns, and about
+	autodetection).
+
+ floppy=silent_dcl_clear
+	Uses a less noisy way to clear the disk change line (which
+	doesn't involve seeks). Implied by 'daring' option.
+
+ floppy=<nr>,irq
+	Sets the floppy IRQ to <nr> instead of 6.
+
+ floppy=<nr>,dma
+	Sets the floppy DMA channel to <nr> instead of 2.
+
+ floppy=slow
+	Use PS/2 stepping rate::
+
+	   PS/2 floppies have much slower step rates than regular floppies.
+	   It's been recommended that take about 1/4 of the default speed
+	   in some more extreme cases.
+
+
+Supporting utilities and additional documentation:
+==================================================
+
+Additional parameters of the floppy driver can be configured at
+runtime. Utilities which do this can be found in the fdutils package.
+This package also contains a new version of mtools which allows to
+access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
+It also contains additional documentation about the floppy driver.
+
+The latest version can be found at fdutils homepage:
+
+ http://fdutils.linux.lu
+
+The fdutils releases can be found at:
+
+ http://fdutils.linux.lu/download.html
+
+ http://www.tux.org/pub/knaff/fdutils/
+
+ ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
+
+Reporting problems about the floppy driver
+==========================================
+
+If you have a question or a bug report about the floppy driver, mail
+me at Alain.Knaff@poboxes.com . If you post to Usenet, preferably use
+comp.os.linux.hardware. As the volume in these groups is rather high,
+be sure to include the word "floppy" (or "FLOPPY") in the subject
+line.  If the reported problem happens when mounting floppy disks, be
+sure to mention also the type of the filesystem in the subject line.
+
+Be sure to read the FAQ before mailing/posting any bug reports!
+
+Alain
+
+Changelog
+=========
+
+10-30-2004 :
+		Cleanup, updating, add reference to module configuration.
+		James Nelson <james4765@gmail.com>
+
+6-3-2000 :
+		Original Document
diff --git a/Documentation/admin-guide/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
new file mode 100644
index 000000000000..20a738d9d047
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -0,0 +1,14 @@
+===========================
+The Linux RapidIO Subsystem
+===========================
+
+.. toctree::
+   :maxdepth: 1
+
+   floppy
+   nbd
+   paride
+   ramdisk
+   zram
+
+   drbd/index
diff --git a/Documentation/admin-guide/blockdev/nbd.rst b/Documentation/admin-guide/blockdev/nbd.rst
new file mode 100644
index 000000000000..d78dfe559dcf
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/nbd.rst
@@ -0,0 +1,31 @@
+==================================
+Network Block Device (TCP version)
+==================================
+
+1) Overview
+-----------
+
+What is it: With this compiled in the kernel (or as a module), Linux
+can use a remote server as one of its block devices. So every time
+the client computer wants to read, e.g., /dev/nb0, it sends a
+request over TCP to the server, which will reply with the data read.
+This can be used for stations with low disk space (or even diskless)
+to borrow disk space from another computer.
+Unlike NFS, it is possible to put any filesystem on it, etc.
+
+For more information, or to download the nbd-client and nbd-server
+tools, go to http://nbd.sf.net/.
+
+The nbd kernel module need only be installed on the client
+system, as the nbd-server is completely in userspace. In fact,
+the nbd-server has been successfully ported to other operating
+systems, including Windows.
+
+A) NBD parameters
+-----------------
+
+max_part
+	Number of partitions per device (default: 0).
+
+nbds_max
+	Number of block devices that should be initialized (default: 16).
diff --git a/Documentation/admin-guide/blockdev/paride.rst b/Documentation/admin-guide/blockdev/paride.rst
new file mode 100644
index 000000000000..87b4278bf314
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/paride.rst
@@ -0,0 +1,439 @@
+===================================
+Linux and parallel port IDE devices
+===================================
+
+PARIDE v1.03   (c) 1997-8  Grant Guenther <grant@torque.net>
+
+1. Introduction
+===============
+
+Owing to the simplicity and near universality of the parallel port interface
+to personal computers, many external devices such as portable hard-disk,
+CD-ROM, LS-120 and tape drives use the parallel port to connect to their
+host computer.  While some devices (notably scanners) use ad-hoc methods
+to pass commands and data through the parallel port interface, most
+external devices are actually identical to an internal model, but with
+a parallel-port adapter chip added in.  Some of the original parallel port
+adapters were little more than mechanisms for multiplexing a SCSI bus.
+(The Iomega PPA-3 adapter used in the ZIP drives is an example of this
+approach).  Most current designs, however, take a different approach.
+The adapter chip reproduces a small ISA or IDE bus in the external device
+and the communication protocol provides operations for reading and writing
+device registers, as well as data block transfer functions.  Sometimes,
+the device being addressed via the parallel cable is a standard SCSI
+controller like an NCR 5380.  The "ditto" family of external tape
+drives use the ISA replicator to interface a floppy disk controller,
+which is then connected to a floppy-tape mechanism.  The vast majority
+of external parallel port devices, however, are now based on standard
+IDE type devices, which require no intermediate controller.  If one
+were to open up a parallel port CD-ROM drive, for instance, one would
+find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
+that interconnected a standard PC parallel port cable and a standard
+IDE cable.  It is usually possible to exchange the CD-ROM device with
+any other device using the IDE interface.
+
+The document describes the support in Linux for parallel port IDE
+devices.  It does not cover parallel port SCSI devices, "ditto" tape
+drives or scanners.  Many different devices are supported by the
+parallel port IDE subsystem, including:
+
+	- MicroSolutions backpack CD-ROM
+	- MicroSolutions backpack PD/CD
+	- MicroSolutions backpack hard-drives
+	- MicroSolutions backpack 8000t tape drive
+	- SyQuest EZ-135, EZ-230 & SparQ drives
+	- Avatar Shark
+	- Imation Superdisk LS-120
+	- Maxell Superdisk LS-120
+	- FreeCom Power CD
+	- Hewlett-Packard 5GB and 8GB tape drives
+	- Hewlett-Packard 7100 and 7200 CD-RW drives
+
+as well as most of the clone and no-name products on the market.
+
+To support such a wide range of devices, PARIDE, the parallel port IDE
+subsystem, is actually structured in three parts.   There is a base
+paride module which provides a registry and some common methods for
+accessing the parallel ports.  The second component is a set of
+high-level drivers for each of the different types of supported devices:
+
+	===	=============
+	pd	IDE disk
+	pcd	ATAPI CD-ROM
+	pf	ATAPI disk
+	pt	ATAPI tape
+	pg	ATAPI generic
+	===	=============
+
+(Currently, the pg driver is only used with CD-R drives).
+
+The high-level drivers function according to the relevant standards.
+The third component of PARIDE is a set of low-level protocol drivers
+for each of the parallel port IDE adapter chips.  Thanks to the interest
+and encouragement of Linux users from many parts of the world,
+support is available for almost all known adapter protocols:
+
+	====    ====================================== ====
+        aten    ATEN EH-100                            (HK)
+        bpck    Microsolutions backpack                (US)
+        comm    DataStor (old-type) "commuter" adapter (TW)
+        dstr    DataStor EP-2000                       (TW)
+        epat    Shuttle EPAT                           (UK)
+        epia    Shuttle EPIA                           (UK)
+	fit2    FIT TD-2000			       (US)
+	fit3    FIT TD-3000			       (US)
+	friq    Freecom IQ cable                       (DE)
+        frpw    Freecom Power                          (DE)
+        kbic    KingByte KBIC-951A and KBIC-971A       (TW)
+	ktti    KT Technology PHd adapter              (SG)
+        on20    OnSpec 90c20                           (US)
+        on26    OnSpec 90c26                           (US)
+	====    ====================================== ====
+
+
+2. Using the PARIDE subsystem
+=============================
+
+While configuring the Linux kernel, you may choose either to build
+the PARIDE drivers into your kernel, or to build them as modules.
+
+In either case, you will need to select "Parallel port IDE device support"
+as well as at least one of the high-level drivers and at least one
+of the parallel port communication protocols.  If you do not know
+what kind of parallel port adapter is used in your drive, you could
+begin by checking the file names and any text files on your DOS
+installation floppy.  Alternatively, you can look at the markings on
+the adapter chip itself.  That's usually sufficient to identify the
+correct device.
+
+You can actually select all the protocol modules, and allow the PARIDE
+subsystem to try them all for you.
+
+For the "brand-name" products listed above, here are the protocol
+and high-level drivers that you would use:
+
+	================	============	======	========
+	Manufacturer		Model		Driver	Protocol
+	================	============	======	========
+	MicroSolutions		CD-ROM		pcd	bpck
+	MicroSolutions		PD drive	pf	bpck
+	MicroSolutions		hard-drive	pd	bpck
+	MicroSolutions          8000t tape      pt      bpck
+	SyQuest			EZ, SparQ	pd	epat
+	Imation			Superdisk	pf	epat
+	Maxell                  Superdisk       pf      friq
+	Avatar			Shark		pd	epat
+	FreeCom			CD-ROM		pcd	frpw
+	Hewlett-Packard		5GB Tape	pt	epat
+	Hewlett-Packard		7200e (CD)	pcd	epat
+	Hewlett-Packard		7200e (CD-R)	pg	epat
+	================	============	======	========
+
+2.1  Configuring built-in drivers
+---------------------------------
+
+We recommend that you get to know how the drivers work and how to
+configure them as loadable modules, before attempting to compile a
+kernel with the drivers built-in.
+
+If you built all of your PARIDE support directly into your kernel,
+and you have just a single parallel port IDE device, your kernel should
+locate it automatically for you.  If you have more than one device,
+you may need to give some command line options to your bootloader
+(eg: LILO), how to do that is beyond the scope of this document.
+
+The high-level drivers accept a number of command line parameters, all
+of which are documented in the source files in linux/drivers/block/paride.
+By default, each driver will automatically try all parallel ports it
+can find, and all protocol types that have been installed, until it finds
+a parallel port IDE adapter.  Once it finds one, the probe stops.  So,
+if you have more than one device, you will need to tell the drivers
+how to identify them.  This requires specifying the port address, the
+protocol identification number and, for some devices, the drive's
+chain ID.  While your system is booting, a number of messages are
+displayed on the console.  Like all such messages, they can be
+reviewed with the 'dmesg' command.  Among those messages will be
+some lines like::
+
+	paride: bpck registered as protocol 0
+	paride: epat registered as protocol 1
+
+The numbers will always be the same until you build a new kernel with
+different protocol selections.  You should note these numbers as you
+will need them to identify the devices.
+
+If you happen to be using a MicroSolutions backpack device, you will
+also need to know the unit ID number for each drive.  This is usually
+the last two digits of the drive's serial number (but read MicroSolutions'
+documentation about this).
+
+As an example, let's assume that you have a MicroSolutions PD/CD drive
+with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
+EZ-135 connected to the chained port on the PD/CD drive and also an
+Imation Superdisk connected to port 0x278.  You could give the following
+options on your boot command::
+
+	pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
+
+In the last option, pf.drive1 configures device /dev/pf1, the 0x378
+is the parallel port base address, the 0 is the protocol registration
+number and 36 is the chain ID.
+
+Please note:  while PARIDE will work both with and without the
+PARPORT parallel port sharing system that is included by the
+"Parallel port support" option, PARPORT must be included and enabled
+if you want to use chains of devices on the same parallel port.
+
+2.2  Loading and configuring PARIDE as modules
+----------------------------------------------
+
+It is much faster and simpler to get to understand the PARIDE drivers
+if you use them as loadable kernel modules.
+
+Note 1:
+	using these drivers with the "kerneld" automatic module loading
+	system is not recommended for beginners, and is not documented here.
+
+Note 2:
+	if you build PARPORT support as a loadable module, PARIDE must
+	also be built as loadable modules, and PARPORT must be loaded before
+	the PARIDE modules.
+
+To use PARIDE, you must begin by::
+
+	insmod paride
+
+this loads a base module which provides a registry for the protocols,
+among other tasks.
+
+Then, load as many of the protocol modules as you think you might need.
+As you load each module, it will register the protocols that it supports,
+and print a log message to your kernel log file and your console. For
+example::
+
+	# insmod epat
+	paride: epat registered as protocol 0
+	# insmod kbic
+	paride: k951 registered as protocol 1
+        paride: k971 registered as protocol 2
+
+Finally, you can load high-level drivers for each kind of device that
+you have connected.  By default, each driver will autoprobe for a single
+device, but you can support up to four similar devices by giving their
+individual co-ordinates when you load the driver.
+
+For example, if you had two no-name CD-ROM drives both using the
+KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
+you could give the following command::
+
+	# insmod pcd drive0=0x378,1 drive1=0x3bc,1
+
+For most adapters, giving a port address and protocol number is sufficient,
+but check the source files in linux/drivers/block/paride for more
+information.  (Hopefully someone will write some man pages one day !).
+
+As another example, here's what happens when PARPORT is installed, and
+a SyQuest EZ-135 is attached to port 0x378::
+
+	# insmod paride
+	paride: version 1.0 installed
+	# insmod epat
+	paride: epat registered as protocol 0
+	# insmod pd
+	pd: pd version 1.0, major 45, cluster 64, nice 0
+	pda: Sharing parport1 at 0x378
+	pda: epat 1.0, Shuttle EPAT chip c3 at 0x378, mode 5 (EPP-32), delay 1
+	pda: SyQuest EZ135A, 262144 blocks [128M], (512/16/32), removable media
+	 pda: pda1
+
+Note that the last line is the output from the generic partition table
+scanner - in this case it reports that it has found a disk with one partition.
+
+2.3  Using a PARIDE device
+--------------------------
+
+Once the drivers have been loaded, you can access PARIDE devices in the
+same way as their traditional counterparts.  You will probably need to
+create the device "special files".  Here is a simple script that you can
+cut to a file and execute::
+
+  #!/bin/bash
+  #
+  # mkd -- a script to create the device special files for the PARIDE subsystem
+  #
+  function mkdev {
+    mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
+  }
+  #
+  function pd {
+    D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
+    mkdev pd$D b 45 $[ $1 * 16 ]
+    for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+    do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
+    done
+  }
+  #
+  cd /dev
+  #
+  for u in 0 1 2 3 ; do pd $u ; done
+  for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
+  for u in 0 1 2 3 ; do mkdev pf$u  b 47 $u ; done
+  for u in 0 1 2 3 ; do mkdev pt$u  c 96 $u ; done
+  for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
+  for u in 0 1 2 3 ; do mkdev pg$u  c 97 $u ; done
+  #
+  # end of mkd
+
+With the device files and drivers in place, you can access PARIDE devices
+like any other Linux device.   For example, to mount a CD-ROM in pcd0, use::
+
+	mount /dev/pcd0 /cdrom
+
+If you have a fresh Avatar Shark cartridge, and the drive is pda, you
+might do something like::
+
+	fdisk /dev/pda		-- make a new partition table with
+				   partition 1 of type 83
+
+	mke2fs /dev/pda1	-- to build the file system
+
+	mkdir /shark		-- make a place to mount the disk
+
+	mount /dev/pda1 /shark
+
+Devices like the Imation superdisk work in the same way, except that
+they do not have a partition table.  For example to make a 120MB
+floppy that you could share with a DOS system::
+
+	mkdosfs /dev/pf0
+	mount /dev/pf0 /mnt
+
+
+2.4  The pf driver
+------------------
+
+The pf driver is intended for use with parallel port ATAPI disk
+devices.  The most common devices in this category are PD drives
+and LS-120 drives.  Traditionally, media for these devices are not
+partitioned.  Consequently, the pf driver does not support partitioned
+media.  This may be changed in a future version of the driver.
+
+2.5  Using the pt driver
+------------------------
+
+The pt driver for parallel port ATAPI tape drives is a minimal driver.
+It does not yet support many of the standard tape ioctl operations.
+For best performance, a block size of 32KB should be used.  You will
+probably want to set the parallel port delay to 0, if you can.
+
+2.6  Using the pg driver
+------------------------
+
+The pg driver can be used in conjunction with the cdrecord program
+to create CD-ROMs.  Please get cdrecord version 1.6.1 or later
+from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ .  To record CD-R media
+your parallel port should ideally be set to EPP mode, and the "port delay"
+should be set to 0.  With those settings it is possible to record at 2x
+speed without any buffer underruns.  If you cannot get the driver to work
+in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
+
+
+3. Troubleshooting
+==================
+
+3.1  Use EPP mode if you can
+----------------------------
+
+The most common problems that people report with the PARIDE drivers
+concern the parallel port CMOS settings.  At this time, none of the
+PARIDE protocol modules support ECP mode, or any ECP combination modes.
+If you are able to do so, please set your parallel port into EPP mode
+using your CMOS setup procedure.
+
+3.2  Check the port delay
+-------------------------
+
+Some parallel ports cannot reliably transfer data at full speed.  To
+offset the errors, the PARIDE protocol modules introduce a "port
+delay" between each access to the i/o ports.  Each protocol sets
+a default value for this delay.  In most cases, the user can override
+the default and set it to 0 - resulting in somewhat higher transfer
+rates.  In some rare cases (especially with older 486 systems) the
+default delays are not long enough.  if you experience corrupt data
+transfers, or unexpected failures, you may wish to increase the
+port delay.   The delay can be programmed using the "driveN" parameters
+to each of the high-level drivers.  Please see the notes above, or
+read the comments at the beginning of the driver source files in
+linux/drivers/block/paride.
+
+3.3  Some drives need a printer reset
+-------------------------------------
+
+There appear to be a number of "noname" external drives on the market
+that do not always power up correctly.  We have noticed this with some
+drives based on OnSpec and older Freecom adapters.  In these rare cases,
+the adapter can often be reinitialised by issuing a "printer reset" on
+the parallel port.  As the reset operation is potentially disruptive in
+multiple device environments, the PARIDE drivers will not do it
+automatically.  You can however, force a printer reset by doing::
+
+	insmod lp reset=1
+	rmmod lp
+
+If you have one of these marginal cases, you should probably build
+your paride drivers as modules, and arrange to do the printer reset
+before loading the PARIDE drivers.
+
+3.4  Use the verbose option and dmesg if you need help
+------------------------------------------------------
+
+While a lot of testing has gone into these drivers to make them work
+as smoothly as possible, problems will arise.  If you do have problems,
+please check all the obvious things first:  does the drive work in
+DOS with the manufacturer's drivers ?  If that doesn't yield any useful
+clues, then please make sure that only one drive is hooked to your system,
+and that either (a) PARPORT is enabled or (b) no other device driver
+is using your parallel port (check in /proc/ioports).  Then, load the
+appropriate drivers (you can load several protocol modules if you want)
+as in::
+
+	# insmod paride
+	# insmod epat
+	# insmod bpck
+	# insmod kbic
+	...
+	# insmod pd verbose=1
+
+(using the correct driver for the type of device you have, of course).
+The verbose=1 parameter will cause the drivers to log a trace of their
+activity as they attempt to locate your drive.
+
+Use 'dmesg' to capture a log of all the PARIDE messages (any messages
+beginning with paride:, a protocol module's name or a driver's name) and
+include that with your bug report.  You can submit a bug report in one
+of two ways.  Either send it directly to the author of the PARIDE suite,
+by e-mail to grant@torque.net, or join the linux-parport mailing list
+and post your report there.
+
+3.5  For more information or help
+---------------------------------
+
+You can join the linux-parport mailing list by sending a mail message
+to:
+
+		linux-parport-request@torque.net
+
+with the single word::
+
+		subscribe
+
+in the body of the mail message (not in the subject line).   Please be
+sure that your mail program is correctly set up when you do this,  as
+the list manager is a robot that will subscribe you using the reply
+address in your mail headers.  REMOVE any anti-spam gimmicks you may
+have in your mail headers, when sending mail to the list server.
+
+You might also find some useful information on the linux-parport
+web pages (although they are not always up to date) at
+
+	http://web.archive.org/web/%2E/http://www.torque.net/parport/
diff --git a/Documentation/admin-guide/blockdev/ramdisk.rst b/Documentation/admin-guide/blockdev/ramdisk.rst
new file mode 100644
index 000000000000..b7c2268f8dec
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/ramdisk.rst
@@ -0,0 +1,177 @@
+==========================================
+Using the RAM disk block device with Linux
+==========================================
+
+.. Contents:
+
+	1) Overview
+	2) Kernel Command Line Parameters
+	3) Using "rdev -r"
+	4) An Example of Creating a Compressed RAM Disk
+
+
+1) Overview
+-----------
+
+The RAM disk driver is a way to use main system memory as a block device.  It
+is required for initrd, an initial filesystem used if you need to load modules
+in order to access the root filesystem (see Documentation/admin-guide/initrd.rst).  It can
+also be used for a temporary filesystem for crypto work, since the contents
+are erased on reboot.
+
+The RAM disk dynamically grows as more space is required. It does this by using
+RAM from the buffer cache. The driver marks the buffers it is using as dirty
+so that the VM subsystem does not try to reclaim them later.
+
+The RAM disk supports up to 16 RAM disks by default, and can be reconfigured
+to support an unlimited number of RAM disks (at your own risk).  Just change
+the configuration symbol BLK_DEV_RAM_COUNT in the Block drivers config menu
+and (re)build the kernel.
+
+To use RAM disk support with your system, run './MAKEDEV ram' from the /dev
+directory.  RAM disks are all major number 1, and start with minor number 0
+for /dev/ram0, etc.  If used, modern kernels use /dev/ram0 for an initrd.
+
+The new RAM disk also has the ability to load compressed RAM disk images,
+allowing one to squeeze more programs onto an average installation or
+rescue floppy disk.
+
+
+2) Parameters
+---------------------------------
+
+2a) Kernel Command Line Parameters
+
+	ramdisk_size=N
+		Size of the ramdisk.
+
+This parameter tells the RAM disk driver to set up RAM disks of N k size.  The
+default is 4096 (4 MB).
+
+2b) Module parameters
+
+	rd_nr
+		/dev/ramX devices created.
+
+	max_part
+		Maximum partition number.
+
+	rd_size
+		See ramdisk_size.
+
+3) Using "rdev -r"
+------------------
+
+The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
+as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
+to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
+14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
+prompt/wait sequence is to be given before trying to read the RAM disk. Since
+the RAM disk dynamically grows as data is being written into it, a size field
+is not required. Bits 11 to 13 are not currently used and may as well be zero.
+These numbers are no magical secrets, as seen below::
+
+  ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK     0x07FF
+  ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG          0x8000
+  ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG            0x4000
+
+Consider a typical two floppy disk setup, where you will have the
+kernel on disk one, and have already put a RAM disk image onto disk #2.
+
+Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
+starts at an offset of 0 kB from the beginning of the floppy.
+The command line equivalent is: "ramdisk_start=0"
+
+You want bit 14 as one, indicating that a RAM disk is to be loaded.
+The command line equivalent is: "load_ramdisk=1"
+
+You want bit 15 as one, indicating that you want a prompt/keypress
+sequence so that you have a chance to switch floppy disks.
+The command line equivalent is: "prompt_ramdisk=1"
+
+Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
+So to create disk one of the set, you would do::
+
+	/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
+	/usr/src/linux# rdev /dev/fd0 /dev/fd0
+	/usr/src/linux# rdev -r /dev/fd0 49152
+
+If you make a boot disk that has LILO, then for the above, you would use::
+
+	append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
+
+Since the default start = 0 and the default prompt = 1, you could use::
+
+	append = "load_ramdisk=1"
+
+
+4) An Example of Creating a Compressed RAM Disk
+-----------------------------------------------
+
+To create a RAM disk image, you will need a spare block device to
+construct it on. This can be the RAM disk device itself, or an
+unused disk partition (such as an unmounted swap partition). For this
+example, we will use the RAM disk device, "/dev/ram0".
+
+Note: This technique should not be done on a machine with less than 8 MB
+of RAM. If using a spare disk partition instead of /dev/ram0, then this
+restriction does not apply.
+
+a) Decide on the RAM disk size that you want. Say 2 MB for this example.
+   Create it by writing to the RAM disk device. (This step is not currently
+   required, but may be in the future.) It is wise to zero out the
+   area (esp. for disks) so that maximal compression is achieved for
+   the unused blocks of the image that you are about to create::
+
+	dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
+
+b) Make a filesystem on it. Say ext2fs for this example::
+
+	mke2fs -vm0 /dev/ram0 2048
+
+c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
+   and unmount it again.
+
+d) Compress the contents of the RAM disk. The level of compression
+   will be approximately 50% of the space used by the files. Unused
+   space on the RAM disk will compress to almost nothing::
+
+	dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
+
+e) Put the kernel onto the floppy::
+
+	dd if=zImage of=/dev/fd0 bs=1k
+
+f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
+   that is slightly larger than the kernel, so that you can put another
+   (possibly larger) kernel onto the same floppy later without overlapping
+   the RAM disk image. An offset of 400 kB for kernels about 350 kB in
+   size would be reasonable. Make sure offset+size of ram_image.gz is
+   not larger than the total space on your floppy (usually 1440 kB)::
+
+	dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
+
+g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
+   For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
+   have 2^15 + 2^14 + 400 = 49552::
+
+	rdev /dev/fd0 /dev/fd0
+	rdev -r /dev/fd0 49552
+
+That is it. You now have your boot/root compressed RAM disk floppy. Some
+users may wish to combine steps (d) and (f) by using a pipe.
+
+
+						Paul Gortmaker 12/95
+
+Changelog:
+----------
+
+10-22-04 :
+		Updated to reflect changes in command line options, remove
+		obsolete references, general cleanup.
+		James Nelson (james4765@gmail.com)
+
+
+12-95 :
+		Original Document
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
new file mode 100644
index 000000000000..6eccf13219ff
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -0,0 +1,422 @@
+========================================
+zram: Compressed RAM based block devices
+========================================
+
+Introduction
+============
+
+The zram module creates RAM based block devices named /dev/zram<id>
+(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
+in memory itself. These disks allow very fast I/O and compression provides
+good amounts of memory savings. Some of the usecases include /tmp storage,
+use as swap disks, various caches under /var and maybe many more :)
+
+Statistics for individual zram devices are exported through sysfs nodes at
+/sys/block/zram<id>/
+
+Usage
+=====
+
+There are several ways to configure and manage zram device(-s):
+
+a) using zram and zram_control sysfs attributes
+b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org).
+
+In this document we will describe only 'manual' zram configuration steps,
+IOW, zram and zram_control sysfs attributes.
+
+In order to get a better idea about zramctl please consult util-linux
+documentation, zramctl man-page or `zramctl --help`. Please be informed
+that zram maintainers do not develop/maintain util-linux or zramctl, should
+you have any questions please contact util-linux@vger.kernel.org
+
+Following shows a typical sequence of steps for using zram.
+
+WARNING
+=======
+
+For the sake of simplicity we skip error checking parts in most of the
+examples below. However, it is your sole responsibility to handle errors.
+
+zram sysfs attributes always return negative values in case of errors.
+The list of possible return codes:
+
+========  =============================================================
+-EBUSY	  an attempt to modify an attribute that cannot be changed once
+	  the device has been initialised. Please reset device first;
+-ENOMEM	  zram was not able to allocate enough memory to fulfil your
+	  needs;
+-EINVAL	  invalid input has been provided.
+========  =============================================================
+
+If you use 'echo', the returned value that is changed by 'echo' utility,
+and, in general case, something like::
+
+	echo 3 > /sys/block/zram0/max_comp_streams
+	if [ $? -ne 0 ];
+		handle_error
+	fi
+
+should suffice.
+
+1) Load Module
+==============
+
+::
+
+	modprobe zram num_devices=4
+	This creates 4 devices: /dev/zram{0,1,2,3}
+
+num_devices parameter is optional and tells zram how many devices should be
+pre-created. Default: 1.
+
+2) Set max number of compression streams
+========================================
+
+Regardless the value passed to this attribute, ZRAM will always
+allocate multiple compression streams - one per online CPUs - thus
+allowing several concurrent compression operations. The number of
+allocated compression streams goes down when some of the CPUs
+become offline. There is no single-compression-stream mode anymore,
+unless you are running a UP system or has only 1 CPU online.
+
+To find out how many streams are currently available::
+
+	cat /sys/block/zram0/max_comp_streams
+
+3) Select compression algorithm
+===============================
+
+Using comp_algorithm device attribute one can see available and
+currently selected (shown in square brackets) compression algorithms,
+change selected compression algorithm (once the device is initialised
+there is no way to change compression algorithm).
+
+Examples::
+
+	#show supported compression algorithms
+	cat /sys/block/zram0/comp_algorithm
+	lzo [lz4]
+
+	#select lzo compression algorithm
+	echo lzo > /sys/block/zram0/comp_algorithm
+
+For the time being, the `comp_algorithm` content does not necessarily
+show every compression algorithm supported by the kernel. We keep this
+list primarily to simplify device configuration and one can configure
+a new device with a compression algorithm that is not listed in
+`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
+and, if some of the algorithms were built as modules, it's impossible
+to list all of them using, for instance, /proc/crypto or any other
+method. This, however, has an advantage of permitting the usage of
+custom crypto compression modules (implementing S/W or H/W compression).
+
+4) Set Disksize
+===============
+
+Set disk size by writing the value to sysfs node 'disksize'.
+The value can be either in bytes or you can use mem suffixes.
+Examples::
+
+	# Initialize /dev/zram0 with 50MB disksize
+	echo $((50*1024*1024)) > /sys/block/zram0/disksize
+
+	# Using mem suffixes
+	echo 256K > /sys/block/zram0/disksize
+	echo 512M > /sys/block/zram0/disksize
+	echo 1G > /sys/block/zram0/disksize
+
+Note:
+There is little point creating a zram of greater than twice the size of memory
+since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
+size of the disk when not in use so a huge zram is wasteful.
+
+5) Set memory limit: Optional
+=============================
+
+Set memory limit by writing the value to sysfs node 'mem_limit'.
+The value can be either in bytes or you can use mem suffixes.
+In addition, you could change the value in runtime.
+Examples::
+
+	# limit /dev/zram0 with 50MB memory
+	echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
+
+	# Using mem suffixes
+	echo 256K > /sys/block/zram0/mem_limit
+	echo 512M > /sys/block/zram0/mem_limit
+	echo 1G > /sys/block/zram0/mem_limit
+
+	# To disable memory limit
+	echo 0 > /sys/block/zram0/mem_limit
+
+6) Activate
+===========
+
+::
+
+	mkswap /dev/zram0
+	swapon /dev/zram0
+
+	mkfs.ext4 /dev/zram1
+	mount /dev/zram1 /tmp
+
+7) Add/remove zram devices
+==========================
+
+zram provides a control interface, which enables dynamic (on-demand) device
+addition and removal.
+
+In order to add a new /dev/zramX device, perform read operation on hot_add
+attribute. This will return either new device's device id (meaning that you
+can use /dev/zram<id>) or error code.
+
+Example::
+
+	cat /sys/class/zram-control/hot_add
+	1
+
+To remove the existing /dev/zramX device (where X is a device id)
+execute::
+
+	echo X > /sys/class/zram-control/hot_remove
+
+8) Stats
+========
+
+Per-device statistics are exported as various nodes under /sys/block/zram<id>/
+
+A brief description of exported device attributes. For more details please
+read Documentation/ABI/testing/sysfs-block-zram.
+
+======================  ======  ===============================================
+Name            	access            description
+======================  ======  ===============================================
+disksize          	RW	show and set the device's disk size
+initstate         	RO	shows the initialization state of the device
+reset             	WO	trigger device reset
+mem_used_max      	WO	reset the `mem_used_max` counter (see later)
+mem_limit         	WO	specifies the maximum amount of memory ZRAM can
+				use to store the compressed data
+writeback_limit   	WO	specifies the maximum amount of write IO zram
+				can write out to backing device as 4KB unit
+writeback_limit_enable  RW	show and set writeback_limit feature
+max_comp_streams  	RW	the number of possible concurrent compress
+				operations
+comp_algorithm    	RW	show and change the compression algorithm
+compact           	WO	trigger memory compaction
+debug_stat        	RO	this file is used for zram debugging purposes
+backing_dev	  	RW	set up backend storage for zram to write out
+idle		  	WO	mark allocated slot as idle
+======================  ======  ===============================================
+
+
+User space is advised to use the following files to read the device statistics.
+
+File /sys/block/zram<id>/stat
+
+Represents block layer statistics. Read Documentation/block/stat.rst for
+details.
+
+File /sys/block/zram<id>/io_stat
+
+The stat file represents device's I/O statistics not accounted by block
+layer and, thus, not available in zram<id>/stat file. It consists of a
+single line of text and contains the following stats separated by
+whitespace:
+
+ =============    =============================================================
+ failed_reads     The number of failed reads
+ failed_writes    The number of failed writes
+ invalid_io       The number of non-page-size-aligned I/O requests
+ notify_free      Depending on device usage scenario it may account
+
+                  a) the number of pages freed because of swap slot free
+                     notifications
+                  b) the number of pages freed because of
+                     REQ_OP_DISCARD requests sent by bio. The former ones are
+                     sent to a swap block device when a swap slot is freed,
+                     which implies that this disk is being used as a swap disk.
+
+                  The latter ones are sent by filesystem mounted with
+                  discard option, whenever some data blocks are getting
+                  discarded.
+ =============    =============================================================
+
+File /sys/block/zram<id>/mm_stat
+
+The stat file represents device's mm statistics. It consists of a single
+line of text and contains the following stats separated by whitespace:
+
+ ================ =============================================================
+ orig_data_size   uncompressed size of data stored in this disk.
+		  This excludes same-element-filled pages (same_pages) since
+		  no memory is allocated for them.
+                  Unit: bytes
+ compr_data_size  compressed size of data stored in this disk
+ mem_used_total   the amount of memory allocated for this disk. This
+                  includes allocator fragmentation and metadata overhead,
+                  allocated for this disk. So, allocator space efficiency
+                  can be calculated using compr_data_size and this statistic.
+                  Unit: bytes
+ mem_limit        the maximum amount of memory ZRAM can use to store
+                  the compressed data
+ mem_used_max     the maximum amount of memory zram have consumed to
+                  store the data
+ same_pages       the number of same element filled pages written to this disk.
+                  No memory is allocated for such pages.
+ pages_compacted  the number of pages freed during compaction
+ huge_pages	  the number of incompressible pages
+ ================ =============================================================
+
+File /sys/block/zram<id>/bd_stat
+
+The stat file represents device's backing device statistics. It consists of
+a single line of text and contains the following stats separated by whitespace:
+
+ ============== =============================================================
+ bd_count	size of data written in backing device.
+		Unit: 4K bytes
+ bd_reads	the number of reads from backing device
+		Unit: 4K bytes
+ bd_writes	the number of writes to backing device
+		Unit: 4K bytes
+ ============== =============================================================
+
+9) Deactivate
+=============
+
+::
+
+	swapoff /dev/zram0
+	umount /dev/zram1
+
+10) Reset
+=========
+
+	Write any positive value to 'reset' sysfs node::
+
+		echo 1 > /sys/block/zram0/reset
+		echo 1 > /sys/block/zram1/reset
+
+	This frees all the memory allocated for the given device and
+	resets the disksize to zero. You must set the disksize again
+	before reusing the device.
+
+Optional Feature
+================
+
+writeback
+---------
+
+With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
+to backing storage rather than keeping it in memory.
+To use the feature, admin should set up backing device via::
+
+	echo /dev/sda5 > /sys/block/zramX/backing_dev
+
+before disksize setting. It supports only partition at this moment.
+If admin want to use incompressible page writeback, they could do via::
+
+	echo huge > /sys/block/zramX/write
+
+To use idle page writeback, first, user need to declare zram pages
+as idle::
+
+	echo all > /sys/block/zramX/idle
+
+From now on, any pages on zram are idle pages. The idle mark
+will be removed until someone request access of the block.
+IOW, unless there is access request, those pages are still idle pages.
+
+Admin can request writeback of those idle pages at right timing via::
+
+	echo idle > /sys/block/zramX/writeback
+
+With the command, zram writeback idle pages from memory to the storage.
+
+If there are lots of write IO with flash device, potentially, it has
+flash wearout problem so that admin needs to design write limitation
+to guarantee storage health for entire product life.
+
+To overcome the concern, zram supports "writeback_limit" feature.
+The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
+any writeback. IOW, if admin want to apply writeback budget, he should
+enable writeback_limit_enable via::
+
+	$ echo 1 > /sys/block/zramX/writeback_limit_enable
+
+Once writeback_limit_enable is set, zram doesn't allow any writeback
+until admin set the budget via /sys/block/zramX/writeback_limit.
+
+(If admin doesn't enable writeback_limit_enable, writeback_limit's value
+assigned via /sys/block/zramX/writeback_limit is meaninless.)
+
+If admin want to limit writeback as per-day 400M, he could do it
+like below::
+
+	$ MB_SHIFT=20
+	$ 4K_SHIFT=12
+	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
+		/sys/block/zram0/writeback_limit.
+	$ echo 1 > /sys/block/zram0/writeback_limit_enable
+
+If admin want to allow further write again once the bugdet is exausted,
+he could do it like below::
+
+	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
+		/sys/block/zram0/writeback_limit
+
+If admin want to see remaining writeback budget since he set::
+
+	$ cat /sys/block/zramX/writeback_limit
+
+If admin want to disable writeback limit, he could do::
+
+	$ echo 0 > /sys/block/zramX/writeback_limit_enable
+
+The writeback_limit count will reset whenever you reset zram(e.g.,
+system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
+writeback happened until you reset the zram to allocate extra writeback
+budget in next setting is user's job.
+
+If admin want to measure writeback count in a certain period, he could
+know it via /sys/block/zram0/bd_stat's 3rd column.
+
+memory tracking
+===============
+
+With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
+zram block. It could be useful to catch cold or incompressible
+pages of the process with*pagemap.
+
+If you enable the feature, you could see block state via
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
+
+	  300    75.033841 .wh.
+	  301    63.806904 s...
+	  302    63.806919 ..hi
+
+First column
+	zram's block index.
+Second column
+	access time since the system was booted
+Third column
+	state of the block:
+
+	s:
+		same page
+	w:
+		written page to backing store
+	h:
+		huge page
+	i:
+		idle page
+
+First line of above example says 300th block is accessed at 75.033841sec
+and the block's state is huge so it is written back to the backing
+storage. It's a debugging feature so anyone shouldn't rely on it to work
+properly.
+
+Nitin Gupta
+ngupta@vflare.org
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 5b63182ceb5f..9228fbf5ce4e 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -73,6 +73,7 @@ configure specific aspects of kernel behavior to your liking.
    java
    ras
    bcache
+   blockdev/index
    ext4
    binderfs
    pm/index
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e645b3ab4b6f..78576aa45cce 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,7 +1249,7 @@
 			See also Documentation/fault-injection/.
 
 	floppy=		[HW]
-			See Documentation/blockdev/floppy.rst.
+			See Documentation/admin-guide/blockdev/floppy.rst.
 
 	force_pal_cache_flush
 			[IA-64] Avoid check_sal_cache_flush which may hang on
@@ -2234,7 +2234,7 @@
 	memblock=debug	[KNL] Enable memblock debug messages.
 
 	load_ramdisk=	[RAM] List of ramdisks to load from floppy
-			See Documentation/blockdev/ramdisk.rst.
+			See Documentation/admin-guide/blockdev/ramdisk.rst.
 
 	lockd.nlm_grace_period=P  [NFS] Assign grace period.
 			Format: <integer>
@@ -3268,7 +3268,7 @@
 
 	pcd.		[PARIDE]
 			See header of drivers/block/paride/pcd.c.
-			See also Documentation/blockdev/paride.rst.
+			See also Documentation/admin-guide/blockdev/paride.rst.
 
 	pci=option[,option...]	[PCI] various PCI subsystem options.
 
@@ -3512,7 +3512,7 @@
 			needed on a platform with proper driver support.
 
 	pd.		[PARIDE]
-			See Documentation/blockdev/paride.rst.
+			See Documentation/admin-guide/blockdev/paride.rst.
 
 	pdcchassis=	[PARISC,HW] Disable/Enable PDC Chassis Status codes at
 			boot time.
@@ -3527,10 +3527,10 @@
 			and performance comparison.
 
 	pf.		[PARIDE]
-			See Documentation/blockdev/paride.rst.
+			See Documentation/admin-guide/blockdev/paride.rst.
 
 	pg.		[PARIDE]
-			See Documentation/blockdev/paride.rst.
+			See Documentation/admin-guide/blockdev/paride.rst.
 
 	pirq=		[SMP,APIC] Manual mp-table setup
 			See Documentation/x86/i386/IO-APIC.rst.
@@ -3642,7 +3642,7 @@
 
 	prompt_ramdisk=	[RAM] List of RAM disks to prompt for floppy disk
 			before loading.
-			See Documentation/blockdev/ramdisk.rst.
+			See Documentation/admin-guide/blockdev/ramdisk.rst.
 
 	psi=		[KNL] Enable or disable pressure stall information
 			tracking.
@@ -3664,7 +3664,7 @@
 	pstore.backend=	Specify the name of the pstore backend to use
 
 	pt.		[PARIDE]
-			See Documentation/blockdev/paride.rst.
+			See Documentation/admin-guide/blockdev/paride.rst.
 
 	pti=		[X86_64] Control Page Table Isolation of user and
 			kernel address spaces.  Disabling this feature
@@ -3693,7 +3693,7 @@
 			See Documentation/admin-guide/md.rst.
 
 	ramdisk_size=	[RAM] Sizes of RAM disks in kilobytes
-			See Documentation/blockdev/ramdisk.rst.
+			See Documentation/admin-guide/blockdev/ramdisk.rst.
 
 	random.trust_cpu={on,off}
 			[KNL] Enable or disable trusting the use of the
diff --git a/Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg b/Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg
deleted file mode 100644
index f87cfa0dc2fb..000000000000
--- a/Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg
+++ /dev/null
@@ -1,588 +0,0 @@
-<?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-<svg
-   xmlns:svg="http://www.w3.org/2000/svg"
-   xmlns="http://www.w3.org/2000/svg"
-   version="1.0"
-   width="210mm"
-   height="297mm"
-   viewBox="0 0 21000 29700"
-   id="svg2"
-   style="fill-rule:evenodd">
-  <defs
-     id="defs4" />
-  <g
-     id="Default"
-     style="visibility:visible">
-    <desc
-       id="desc180">Master slide</desc>
-  </g>
-  <path
-     d="M 11999,8601 L 11899,8301 L 12099,8301 L 11999,8601 z"
-     id="path193"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 11999,7801 L 11999,8361"
-     id="path197"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 7999,10401 L 7899,10101 L 8099,10101 L 7999,10401 z"
-     id="path209"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 7999,9601 L 7999,10161"
-     id="path213"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 11999,7801 L 11685,7840 L 11724,7644 L 11999,7801 z"
-     id="path225"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 7999,7001 L 11764,7754"
-     id="path229"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <g
-     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,-1244.4792,1416.5139)"
-     id="g245"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text247">
-      <tspan
-         x="9139 9368 9579 9808 9986 10075 10252 10481 10659 10837 10909"
-         y="9284"
-         id="tspan249">RSDataReply</tspan>
-    </text>
-  </g>
-  <path
-     d="M 7999,9601 L 8281,9458 L 8311,9655 L 7999,9601 z"
-     id="path259"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 11999,9001 L 8236,9565"
-     id="path263"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <g
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,1620.9382,-1639.4947)"
-     id="g279"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text281">
-      <tspan
-         x="8743 8972 9132 9310 9573 9801 10013 10242 10419 10597 10775 10953 11114"
-         y="7023"
-         id="tspan283">CsumRSRequest</tspan>
-    </text>
-  </g>
-  <text
-     id="text297"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4034 4263 4440 4703 4881 5042 5219 5397 5503 5681 5842 6003 6180 6341 6519 6625 6803 6980 7158 7336 7497 7586 7692"
-       y="5707"
-       id="tspan299">w_make_resync_request()</tspan>
-  </text>
-  <text
-     id="text313"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12305 12483 12644 12821 12893 13054 13232 13410 13638 13816 13905 14083 14311 14489 14667 14845 15023 15184 15272 15378"
-       y="7806"
-       id="tspan315">receive_DataRequest()</tspan>
-  </text>
-  <text
-     id="text329"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12377 12483 12660 12838 13016 13194 13372 13549 13621 13799 13977 14083 14261 14438 14616 14794 14955 15133 15294 15399"
-       y="8606"
-       id="tspan331">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text345"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12191 12420 12597 12775 12953 13131 13309 13486 13664 13825 13986 14164 14426 14604 14710 14871 15049 15154 15332 15510 15616"
-       y="9007"
-       id="tspan347">w_e_end_csum_rs_req()</tspan>
-  </text>
-  <text
-     id="text361"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4444 4550 4728 4889 5066 5138 5299 5477 5655 5883 6095 6324 6501 6590 6768 6997 7175 7352 7424 7585 7691"
-       y="9507"
-       id="tspan363">receive_RSDataReply()</tspan>
-  </text>
-  <text
-     id="text377"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4457 4635 4741 4918 5096 5274 5452 5630 5807 5879 6057 6235 6464 6569 6641 6730 6908 7086 7247 7425 7585 7691"
-       y="10407"
-       id="tspan379">drbd_endio_write_sec()</tspan>
-  </text>
-  <text
-     id="text393"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4647 4825 5003 5180 5358 5536 5714 5820 5997 6158 6319 6497 6658 6836 7013 7085 7263 7424 7585 7691"
-       y="10907"
-       id="tspan395">e_end_resync_block()</tspan>
-  </text>
-  <path
-     d="M 11999,11601 L 11685,11640 L 11724,11444 L 11999,11601 z"
-     id="path405"
-     style="fill:#000080;visibility:visible" />
-  <path
-     d="M 7999,10801 L 11764,11554"
-     id="path409"
-     style="fill:none;stroke:#000080;visibility:visible" />
-  <g
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,2434.7562,-1674.649)"
-     id="g425"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text427">
-      <tspan
-         x="9320 9621 9726 9798 9887 10065 10277 10438"
-         y="10943"
-         id="tspan429">WriteAck</tspan>
-    </text>
-  </g>
-  <text
-     id="text443"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12377 12555 12644 12821 13033 13105 13283 13444 13604 13816 13977 14138 14244"
-       y="11559"
-       id="tspan445">got_BlockAck()</tspan>
-  </text>
-  <text
-     id="text459"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="7999 8304 8541 8778 8990 9201 9413 9650 10001 10120 10357 10594 10806 11043 11280 11398 11703 11940 12152 12364 12601 12812 12931 13049 13261 13498 13710 13947 14065 14302 14540 14658 14777 14870 15107 15225 15437 15649 15886"
-       y="4877"
-       id="tspan461">Checksum based Resync, case not in sync</tspan>
-  </text>
-  <text
-     id="text475"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="6961 7266 7571 7854 8159 8299 8536 8654 8891 9010 9247 9484 9603 9840 9958 10077 10170 10407"
-       y="2806"
-       id="tspan477">DRBD-8.3 data flow</tspan>
-  </text>
-  <text
-     id="text491"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5190 5419 5596 5774 5952 6113 6291 6468 6646 6824 6985 7146 7324 7586 7692"
-       y="7005"
-       id="tspan493">w_e_send_csum()</tspan>
-  </text>
-  <path
-     d="M 11999,17601 L 11899,17301 L 12099,17301 L 11999,17601 z"
-     id="path503"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 11999,16801 L 11999,17361"
-     id="path507"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 11999,16801 L 11685,16840 L 11724,16644 L 11999,16801 z"
-     id="path519"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 7999,16001 L 11764,16754"
-     id="path523"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <g
-     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,-2539.5806,1529.3491)"
-     id="g539"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text541">
-      <tspan
-         x="9269 9498 9709 9798 9959 10048 10226 10437 10598 10776"
-         y="18265"
-         id="tspan543">RSIsInSync</tspan>
-    </text>
-  </g>
-  <path
-     d="M 7999,18601 L 8281,18458 L 8311,18655 L 7999,18601 z"
-     id="path553"
-     style="fill:#000080;visibility:visible" />
-  <path
-     d="M 11999,18001 L 8236,18565"
-     id="path557"
-     style="fill:none;stroke:#000080;visibility:visible" />
-  <g
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,3461.4027,-1449.3012)"
-     id="g573"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text575">
-      <tspan
-         x="8743 8972 9132 9310 9573 9801 10013 10242 10419 10597 10775 10953 11114"
-         y="16023"
-         id="tspan577">CsumRSRequest</tspan>
-    </text>
-  </g>
-  <text
-     id="text591"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12305 12483 12644 12821 12893 13054 13232 13410 13638 13816 13905 14083 14311 14489 14667 14845 15023 15184 15272 15378"
-       y="16806"
-       id="tspan593">receive_DataRequest()</tspan>
-  </text>
-  <text
-     id="text607"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12377 12483 12660 12838 13016 13194 13372 13549 13621 13799 13977 14083 14261 14438 14616 14794 14955 15133 15294 15399"
-       y="17606"
-       id="tspan609">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text623"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12191 12420 12597 12775 12953 13131 13309 13486 13664 13825 13986 14164 14426 14604 14710 14871 15049 15154 15332 15510 15616"
-       y="18007"
-       id="tspan625">w_e_end_csum_rs_req()</tspan>
-  </text>
-  <text
-     id="text639"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5735 5913 6091 6180 6357 6446 6607 6696 6874 7085 7246 7424 7585 7691"
-       y="18507"
-       id="tspan641">got_IsInSync()</tspan>
-  </text>
-  <text
-     id="text655"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="7999 8304 8541 8778 8990 9201 9413 9650 10001 10120 10357 10594 10806 11043 11280 11398 11703 11940 12152 12364 12601 12812 12931 13049 13261 13498 13710 13947 14065 14159 14396 14514 14726 14937 15175"
-       y="13877"
-       id="tspan657">Checksum based Resync, case in sync</tspan>
-  </text>
-  <path
-     d="M 12000,24601 L 11900,24301 L 12100,24301 L 12000,24601 z"
-     id="path667"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 12000,23801 L 12000,24361"
-     id="path671"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 8000,26401 L 7900,26101 L 8100,26101 L 8000,26401 z"
-     id="path683"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,25601 L 8000,26161"
-     id="path687"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 12000,23801 L 11686,23840 L 11725,23644 L 12000,23801 z"
-     id="path699"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,23001 L 11765,23754"
-     id="path703"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <g
-     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,-3543.8452,1630.5143)"
-     id="g719"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text721">
-      <tspan
-         x="9464 9710 9921 10150 10328 10505 10577"
-         y="25236"
-         id="tspan723">OVReply</tspan>
-    </text>
-  </g>
-  <path
-     d="M 8000,25601 L 8282,25458 L 8312,25655 L 8000,25601 z"
-     id="path733"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 12000,25001 L 8237,25565"
-     id="path737"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <g
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,4918.2801,-1381.2128)"
-     id="g753"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text755">
-      <tspan
-         x="9142 9388 9599 9828 10006 10183 10361 10539 10700"
-         y="23106"
-         id="tspan757">OVRequest</tspan>
-    </text>
-  </g>
-  <text
-     id="text771"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12306 12484 12645 12822 12894 13055 13233 13411 13656 13868 14097 14274 14452 14630 14808 14969 15058 15163"
-       y="23806"
-       id="tspan773">receive_OVRequest()</tspan>
-  </text>
-  <text
-     id="text787"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12378 12484 12661 12839 13017 13195 13373 13550 13622 13800 13978 14084 14262 14439 14617 14795 14956 15134 15295 15400"
-       y="24606"
-       id="tspan789">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text803"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12192 12421 12598 12776 12954 13132 13310 13487 13665 13843 14004 14182 14288 14465 14643 14749"
-       y="25007"
-       id="tspan805">w_e_end_ov_req()</tspan>
-  </text>
-  <text
-     id="text819"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5101 5207 5385 5546 5723 5795 5956 6134 6312 6557 6769 6998 7175 7353 7425 7586 7692"
-       y="25507"
-       id="tspan821">receive_OVReply()</tspan>
-  </text>
-  <text
-     id="text835"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4492 4670 4776 4953 5131 5309 5487 5665 5842 5914 6092 6270 6376 6554 6731 6909 7087 7248 7426 7587 7692"
-       y="26407"
-       id="tspan837">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text851"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4902 5131 5308 5486 5664 5842 6020 6197 6375 6553 6714 6892 6998 7175 7353 7425 7586 7692"
-       y="26907"
-       id="tspan853">w_e_end_ov_reply()</tspan>
-  </text>
-  <path
-     d="M 12000,27601 L 11686,27640 L 11725,27444 L 12000,27601 z"
-     id="path863"
-     style="fill:#000080;visibility:visible" />
-  <path
-     d="M 8000,26801 L 11765,27554"
-     id="path867"
-     style="fill:none;stroke:#000080;visibility:visible" />
-  <g
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,5704.1907,-1328.312)"
-     id="g883"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <text
-       id="text885">
-      <tspan
-         x="9279 9525 9736 9965 10143 10303 10481 10553"
-         y="26935"
-         id="tspan887">OVResult</tspan>
-    </text>
-  </g>
-  <text
-     id="text901"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12378 12556 12645 12822 13068 13280 13508 13686 13847 14025 14097 14185 14291"
-       y="27559"
-       id="tspan903">got_OVResult()</tspan>
-  </text>
-  <text
-     id="text917"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="8000 8330 8567 8660 8754 8991 9228 9346 9558 9795 9935 10028 10146"
-       y="21877"
-       id="tspan919">Online verify</tspan>
-  </text>
-  <text
-     id="text933"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4641 4870 5047 5310 5488 5649 5826 6004 6182 6343 6521 6626 6804 6982 7160 7338 7499 7587 7693"
-       y="23005"
-       id="tspan935">w_make_ov_request()</tspan>
-  </text>
-  <path
-     d="M 8000,6500 L 7900,6200 L 8100,6200 L 8000,6500 z"
-     id="path945"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,5700 L 8000,6260"
-     id="path949"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 3900,5500 L 3700,5500 L 3700,11000 L 3900,11000"
-     id="path961"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <path
-     d="M 3900,14500 L 3700,14500 L 3700,18600 L 3900,18600"
-     id="path973"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <path
-     d="M 3900,22800 L 3700,22800 L 3700,26900 L 3900,26900"
-     id="path985"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <text
-     id="text1001"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4492 4670 4776 4953 5131 5309 5487 5665 5842 5914 6092 6270 6376 6554 6731 6909 7087 7248 7426 7587 7692"
-       y="6506"
-       id="tspan1003">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text1017"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4034 4263 4440 4703 4881 5042 5219 5397 5503 5681 5842 6003 6180 6341 6519 6625 6803 6980 7158 7336 7497 7586 7692"
-       y="14708"
-       id="tspan1019">w_make_resync_request()</tspan>
-  </text>
-  <text
-     id="text1033"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5190 5419 5596 5774 5952 6113 6291 6468 6646 6824 6985 7146 7324 7586 7692"
-       y="16006"
-       id="tspan1035">w_e_send_csum()</tspan>
-  </text>
-  <path
-     d="M 8000,15501 L 7900,15201 L 8100,15201 L 8000,15501 z"
-     id="path1045"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,14701 L 8000,15261"
-     id="path1049"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     id="text1065"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4492 4670 4776 4953 5131 5309 5487 5665 5842 5914 6092 6270 6376 6554 6731 6909 7087 7248 7426 7587 7692"
-       y="15507"
-       id="tspan1067">drbd_endio_read_sec()</tspan>
-  </text>
-  <path
-     d="M 16100,9000 L 16300,9000 L 16300,7500 L 16100,7500"
-     id="path1077"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <path
-     d="M 16100,18000 L 16300,18000 L 16300,16500 L 16100,16500"
-     id="path1089"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <path
-     d="M 16100,25000 L 16300,25000 L 16300,23500 L 16100,23500"
-     id="path1101"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <text
-     id="text1117"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="2026 2132 2293 2471 2648 2826 3004 3076 3254 3431 3503 3681 3787"
-       y="5402"
-       id="tspan1119">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1133"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="2027 2133 2294 2472 2649 2827 3005 3077 3255 3432 3504 3682 3788"
-       y="14402"
-       id="tspan1135">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1149"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="2026 2132 2293 2471 2648 2826 3004 3076 3254 3431 3503 3681 3787"
-       y="22602"
-       id="tspan1151">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1165"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="1426 1532 1693 1871 2031 2209 2472 2649 2721 2899 2988 3166 3344 3416 3593 3699"
-       y="11302"
-       id="tspan1167">rs_complete_io()</tspan>
-  </text>
-  <text
-     id="text1181"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="1526 1632 1793 1971 2131 2309 2572 2749 2821 2999 3088 3266 3444 3516 3693 3799"
-       y="18931"
-       id="tspan1183">rs_complete_io()</tspan>
-  </text>
-  <text
-     id="text1197"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="1526 1632 1793 1971 2131 2309 2572 2749 2821 2999 3088 3266 3444 3516 3693 3799"
-       y="27231"
-       id="tspan1199">rs_complete_io()</tspan>
-  </text>
-  <text
-     id="text1213"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16126 16232 16393 16571 16748 16926 17104 17176 17354 17531 17603 17781 17887"
-       y="7402"
-       id="tspan1215">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1229"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16127 16233 16394 16572 16749 16927 17105 17177 17355 17532 17604 17782 17888"
-       y="16331"
-       id="tspan1231">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1245"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16127 16233 16394 16572 16749 16927 17105 17177 17355 17532 17604 17782 17888"
-       y="23302"
-       id="tspan1247">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1261"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16115 16221 16382 16560 16720 16898 17161 17338 17410 17588 17677 17855 18033 18105 18282 18388"
-       y="9302"
-       id="tspan1263">rs_complete_io()</tspan>
-  </text>
-  <text
-     id="text1277"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16115 16221 16382 16560 16720 16898 17161 17338 17410 17588 17677 17855 18033 18105 18282 18388"
-       y="18331"
-       id="tspan1279">rs_complete_io()</tspan>
-  </text>
-  <text
-     id="text1293"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16126 16232 16393 16571 16731 16909 17172 17349 17421 17599 17688 17866 18044 18116 18293 18399"
-       y="25302"
-       id="tspan1295">rs_complete_io()</tspan>
-  </text>
-</svg>
diff --git a/Documentation/blockdev/drbd/DRBD-data-packets.svg b/Documentation/blockdev/drbd/DRBD-data-packets.svg
deleted file mode 100644
index 48a1e2165fec..000000000000
--- a/Documentation/blockdev/drbd/DRBD-data-packets.svg
+++ /dev/null
@@ -1,459 +0,0 @@
-<?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-<svg
-   xmlns:svg="http://www.w3.org/2000/svg"
-   xmlns="http://www.w3.org/2000/svg"
-   version="1.0"
-   width="210mm"
-   height="297mm"
-   viewBox="0 0 21000 29700"
-   id="svg2"
-   style="fill-rule:evenodd">
-  <defs
-     id="defs4" />
-  <g
-     id="Default"
-     style="visibility:visible">
-    <desc
-       id="desc176">Master slide</desc>
-  </g>
-  <path
-     d="M 11999,19601 L 11899,19301 L 12099,19301 L 11999,19601 z"
-     id="path189"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 11999,18801 L 11999,19361"
-     id="path193"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 7999,21401 L 7899,21101 L 8099,21101 L 7999,21401 z"
-     id="path205"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 7999,20601 L 7999,21161"
-     id="path209"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 11999,18801 L 11685,18840 L 11724,18644 L 11999,18801 z"
-     id="path221"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 7999,18001 L 11764,18754"
-     id="path225"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     x="-3023.845"
-     y="1106.8124"
-     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,0,0)"
-     id="text243"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="6115.1553 6344.1553 6555.1553 6784.1553 6962.1553 7051.1553 7228.1553 7457.1553 7635.1553 7813.1553 7885.1553"
-       y="21390.812"
-       id="tspan245">RSDataReply</tspan>
-  </text>
-  <path
-     d="M 7999,20601 L 8281,20458 L 8311,20655 L 7999,20601 z"
-     id="path255"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 11999,20001 L 8236,20565"
-     id="path259"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     x="3502.5356"
-     y="-2184.6621"
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
-     id="text277"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12321.536 12550.536 12761.536 12990.536 13168.536 13257.536 13434.536 13663.536 13841.536 14019.536 14196.536 14374.536 14535.536"
-       y="15854.338"
-       id="tspan279">RSDataRequest</tspan>
-  </text>
-  <text
-     id="text293"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4034 4263 4440 4703 4881 5042 5219 5397 5503 5681 5842 6003 6180 6341 6519 6625 6803 6980 7158 7336 7497 7586 7692"
-       y="17807"
-       id="tspan295">w_make_resync_request()</tspan>
-  </text>
-  <text
-     id="text309"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12305 12483 12644 12821 12893 13054 13232 13410 13638 13816 13905 14083 14311 14489 14667 14845 15023 15184 15272 15378"
-       y="18806"
-       id="tspan311">receive_DataRequest()</tspan>
-  </text>
-  <text
-     id="text325"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12377 12483 12660 12838 13016 13194 13372 13549 13621 13799 13977 14083 14261 14438 14616 14794 14955 15133 15294 15399"
-       y="19606"
-       id="tspan327">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text341"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12191 12420 12597 12775 12953 13131 13309 13486 13664 13770 13931 14109 14287 14375 14553 14731 14837 15015 15192 15298"
-       y="20007"
-       id="tspan343">w_e_end_rsdata_req()</tspan>
-  </text>
-  <text
-     id="text357"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4444 4550 4728 4889 5066 5138 5299 5477 5655 5883 6095 6324 6501 6590 6768 6997 7175 7352 7424 7585 7691"
-       y="20507"
-       id="tspan359">receive_RSDataReply()</tspan>
-  </text>
-  <text
-     id="text373"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4457 4635 4741 4918 5096 5274 5452 5630 5807 5879 6057 6235 6464 6569 6641 6730 6908 7086 7247 7425 7585 7691"
-       y="21407"
-       id="tspan375">drbd_endio_write_sec()</tspan>
-  </text>
-  <text
-     id="text389"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4647 4825 5003 5180 5358 5536 5714 5820 5997 6158 6319 6497 6658 6836 7013 7085 7263 7424 7585 7691"
-       y="21907"
-       id="tspan391">e_end_resync_block()</tspan>
-  </text>
-  <path
-     d="M 11999,22601 L 11685,22640 L 11724,22444 L 11999,22601 z"
-     id="path401"
-     style="fill:#000080;visibility:visible" />
-  <path
-     d="M 7999,21801 L 11764,22554"
-     id="path405"
-     style="fill:none;stroke:#000080;visibility:visible" />
-  <text
-     x="4290.3008"
-     y="-2369.6162"
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
-     id="text423"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="13610.301 13911.301 14016.301 14088.301 14177.301 14355.301 14567.301 14728.301"
-       y="19573.385"
-       id="tspan425">WriteAck</tspan>
-  </text>
-  <text
-     id="text439"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12199 12377 12555 12644 12821 13033 13105 13283 13444 13604 13816 13977 14138 14244"
-       y="22559"
-       id="tspan441">got_BlockAck()</tspan>
-  </text>
-  <text
-     id="text455"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="7999 8304 8541 8753 8964 9201 9413 9531 9769 9862 10099 10310 10522 10734 10852 10971 11208 11348 11585 11822"
-       y="16877"
-       id="tspan457">Resync blocks, 4-32K</tspan>
-  </text>
-  <path
-     d="M 12000,7601 L 11900,7301 L 12100,7301 L 12000,7601 z"
-     id="path467"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 12000,6801 L 12000,7361"
-     id="path471"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 12000,6801 L 11686,6840 L 11725,6644 L 12000,6801 z"
-     id="path483"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,6001 L 11765,6754"
-     id="path487"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     x="-1288.1796"
-     y="1279.7666"
-     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,0,0)"
-     id="text505"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="8174.8208 8475.8203 8580.8203 8652.8203 8741.8203 8919.8203 9131.8203 9292.8203"
-       y="9516.7666"
-       id="tspan507">WriteAck</tspan>
-  </text>
-  <path
-     d="M 8000,8601 L 8282,8458 L 8312,8655 L 8000,8601 z"
-     id="path517"
-     style="fill:#000080;visibility:visible" />
-  <path
-     d="M 12000,8001 L 8237,8565"
-     id="path521"
-     style="fill:none;stroke:#000080;visibility:visible" />
-  <text
-     x="1065.6655"
-     y="-2097.7664"
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
-     id="text539"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="10682.666 10911.666 11088.666 11177.666"
-       y="4107.2339"
-       id="tspan541">Data</tspan>
-  </text>
-  <text
-     id="text555"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4746 4924 5030 5207 5385 5563 5826 6003 6164 6342 6520 6626 6803 6981 7159 7337 7498 7587 7692"
-       y="5505"
-       id="tspan557">drbd_make_request()</tspan>
-  </text>
-  <text
-     id="text571"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12306 12484 12645 12822 12894 13055 13233 13411 13639 13817 13906 14084 14190"
-       y="6806"
-       id="tspan573">receive_Data()</tspan>
-  </text>
-  <text
-     id="text587"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12378 12484 12661 12839 13017 13195 13373 13550 13622 13800 13978 14207 14312 14384 14473 14651 14829 14990 15168 15328 15434"
-       y="7606"
-       id="tspan589">drbd_endio_write_sec()</tspan>
-  </text>
-  <text
-     id="text603"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12192 12370 12548 12725 12903 13081 13259 13437 13509 13686 13847 14008 14114"
-       y="8007"
-       id="tspan605">e_end_block()</tspan>
-  </text>
-  <text
-     id="text619"
-     style="font-size:318px;font-weight:400;fill:#000080;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5647 5825 6003 6092 6269 6481 6553 6731 6892 7052 7264 7425 7586 7692"
-       y="8606"
-       id="tspan621">got_BlockAck()</tspan>
-  </text>
-  <text
-     id="text635"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="8000 8305 8542 8779 9016 9109 9346 9486 9604 9956 10049 10189 10328 10565 10705 10942 11179 11298 11603 11742 11835 11954 12191 12310 12428 12665 12902 13139 13279 13516 13753"
-       y="4877"
-       id="tspan637">Regular mirrored write, 512-32K</tspan>
-  </text>
-  <text
-     id="text651"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5381 5610 5787 5948 6126 6304 6482 6659 6837 7015 7087 7265 7426 7587 7692"
-       y="6003"
-       id="tspan653">w_send_dblock()</tspan>
-  </text>
-  <path
-     d="M 8000,6800 L 7900,6500 L 8100,6500 L 8000,6800 z"
-     id="path663"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,6000 L 8000,6560"
-     id="path667"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     id="text683"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4602 4780 4886 5063 5241 5419 5597 5775 5952 6024 6202 6380 6609 6714 6786 6875 7053 7231 7409 7515 7587 7692"
-       y="6905"
-       id="tspan685">drbd_endio_write_pri()</tspan>
-  </text>
-  <path
-     d="M 12000,13602 L 11900,13302 L 12100,13302 L 12000,13602 z"
-     id="path695"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 12000,12802 L 12000,13362"
-     id="path699"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <path
-     d="M 12000,12802 L 11686,12841 L 11725,12645 L 12000,12802 z"
-     id="path711"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 8000,12002 L 11765,12755"
-     id="path715"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     x="-2155.5266"
-     y="1201.5964"
-     transform="matrix(0.9895258,-0.1443562,0.1443562,0.9895258,0,0)"
-     id="text733"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="7202.4736 7431.4736 7608.4736 7697.4736 7875.4736 8104.4736 8282.4736 8459.4736 8531.4736"
-       y="15454.597"
-       id="tspan735">DataReply</tspan>
-  </text>
-  <path
-     d="M 8000,14602 L 8282,14459 L 8312,14656 L 8000,14602 z"
-     id="path745"
-     style="fill:#008000;visibility:visible" />
-  <path
-     d="M 12000,14002 L 8237,14566"
-     id="path749"
-     style="fill:none;stroke:#008000;visibility:visible" />
-  <text
-     x="2280.3804"
-     y="-2103.2141"
-     transform="matrix(0.9788674,0.2044961,-0.2044961,0.9788674,0,0)"
-     id="text767"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="11316.381 11545.381 11722.381 11811.381 11989.381 12218.381 12396.381 12573.381 12751.381 12929.381 13090.381"
-       y="9981.7861"
-       id="tspan769">DataRequest</tspan>
-  </text>
-  <text
-     id="text783"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="4746 4924 5030 5207 5385 5563 5826 6003 6164 6342 6520 6626 6803 6981 7159 7337 7498 7587 7692"
-       y="11506"
-       id="tspan785">drbd_make_request()</tspan>
-  </text>
-  <text
-     id="text799"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12306 12484 12645 12822 12894 13055 13233 13411 13639 13817 13906 14084 14312 14490 14668 14846 15024 15185 15273 15379"
-       y="12807"
-       id="tspan801">receive_DataRequest()</tspan>
-  </text>
-  <text
-     id="text815"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12200 12378 12484 12661 12839 13017 13195 13373 13550 13622 13800 13978 14084 14262 14439 14617 14795 14956 15134 15295 15400"
-       y="13607"
-       id="tspan817">drbd_endio_read_sec()</tspan>
-  </text>
-  <text
-     id="text831"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="12192 12421 12598 12776 12954 13132 13310 13487 13665 13843 14021 14110 14288 14465 14571 14749 14927 15033"
-       y="14008"
-       id="tspan833">w_e_end_data_req()</tspan>
-  </text>
-  <g
-     id="g835"
-     style="visibility:visible">
-    <desc
-       id="desc837">Drawing</desc>
-    <text
-       id="text847"
-       style="font-size:318px;font-weight:400;fill:#008000;font-family:Helvetica embedded">
-      <tspan
-         x="4885 4991 5169 5330 5507 5579 5740 5918 6096 6324 6502 6591 6769 6997 7175 7353 7425 7586 7692"
-         y="14607"
-         id="tspan849">receive_DataReply()</tspan>
-    </text>
-  </g>
-  <text
-     id="text863"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="8000 8305 8398 8610 8821 8914 9151 9363 9575 9693 9833 10070 10307 10544 10663 10781 11018 11255 11493 11632 11869 12106"
-       y="10878"
-       id="tspan865">Diskless read, 512-32K</tspan>
-  </text>
-  <text
-     id="text879"
-     style="font-size:318px;font-weight:400;fill:#008000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="5029 5258 5435 5596 5774 5952 6130 6307 6413 6591 6769 6947 7125 7230 7408 7586 7692"
-       y="12004"
-       id="tspan881">w_send_read_req()</tspan>
-  </text>
-  <text
-     id="text895"
-     style="font-size:423px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="6961 7266 7571 7854 8159 8278 8515 8633 8870 9107 9226 9463 9581 9700 9793 10030"
-       y="2806"
-       id="tspan897">DRBD 8 data flow</tspan>
-  </text>
-  <path
-     d="M 3900,5300 L 3700,5300 L 3700,7000 L 3900,7000"
-     id="path907"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <path
-     d="M 3900,17600 L 3700,17600 L 3700,22000 L 3900,22000"
-     id="path919"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <path
-     d="M 16100,20000 L 16300,20000 L 16300,18500 L 16100,18500"
-     id="path931"
-     style="fill:none;stroke:#000000;visibility:visible" />
-  <text
-     id="text947"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="2126 2304 2376 2554 2731 2909 3087 3159 3337 3515 3587 3764 3870"
-       y="5202"
-       id="tspan949">al_begin_io()</tspan>
-  </text>
-  <text
-     id="text963"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="1632 1810 1882 2060 2220 2398 2661 2839 2910 3088 3177 3355 3533 3605 3783 3888"
-       y="7331"
-       id="tspan965">al_complete_io()</tspan>
-  </text>
-  <text
-     id="text979"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="2126 2232 2393 2571 2748 2926 3104 3176 3354 3531 3603 3781 3887"
-       y="17431"
-       id="tspan981">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text995"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="1626 1732 1893 2071 2231 2409 2672 2849 2921 3099 3188 3366 3544 3616 3793 3899"
-       y="22331"
-       id="tspan997">rs_complete_io()</tspan>
-  </text>
-  <text
-     id="text1011"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16027 16133 16294 16472 16649 16827 17005 17077 17255 17432 17504 17682 17788"
-       y="18402"
-       id="tspan1013">rs_begin_io()</tspan>
-  </text>
-  <text
-     id="text1027"
-     style="font-size:318px;font-weight:400;fill:#000000;visibility:visible;font-family:Helvetica embedded">
-    <tspan
-       x="16115 16221 16382 16560 16720 16898 17161 17338 17410 17588 17677 17855 18033 18105 18282 18388"
-       y="20331"
-       id="tspan1029">rs_complete_io()</tspan>
-  </text>
-</svg>
diff --git a/Documentation/blockdev/drbd/conn-states-8.dot b/Documentation/blockdev/drbd/conn-states-8.dot
deleted file mode 100644
index 025e8cf5e64a..000000000000
--- a/Documentation/blockdev/drbd/conn-states-8.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph conn_states {
-	StandAllone  -> WFConnection   [ label = "ioctl_set_net()" ]
-	WFConnection -> Unconnected    [ label = "unable to bind()" ]
-	WFConnection -> WFReportParams [ label = "in connect() after accept" ]
-	WFReportParams -> StandAllone  [ label = "checks in receive_param()" ]
-	WFReportParams -> Connected    [ label = "in receive_param()" ]
-	WFReportParams -> WFBitMapS    [ label = "sync_handshake()" ]
-	WFReportParams -> WFBitMapT    [ label = "sync_handshake()" ]
-	WFBitMapS -> SyncSource        [ label = "receive_bitmap()" ]
-	WFBitMapT -> SyncTarget        [ label = "receive_bitmap()" ]
-	SyncSource -> Connected
-	SyncTarget -> Connected
-	SyncSource -> PausedSyncS
-	SyncTarget -> PausedSyncT
-	PausedSyncS -> SyncSource
-	PausedSyncT -> SyncTarget
-	Connected   -> WFConnection    [ label = "* on network error" ]
-}
diff --git a/Documentation/blockdev/drbd/data-structure-v9.rst b/Documentation/blockdev/drbd/data-structure-v9.rst
deleted file mode 100644
index 66036b901644..000000000000
--- a/Documentation/blockdev/drbd/data-structure-v9.rst
+++ /dev/null
@@ -1,42 +0,0 @@
-================================
-kernel data structure for DRBD-9
-================================
-
-This describes the in kernel data structure for DRBD-9. Starting with
-Linux v3.14 we are reorganizing DRBD to use this data structure.
-
-Basic Data Structure
-====================
-
-A node has a number of DRBD resources.  Each such resource has a number of
-devices (aka volumes) and connections to other nodes ("peer nodes"). Each DRBD
-device is represented by a block device locally.
-
-The DRBD objects are interconnected to form a matrix as depicted below; a
-drbd_peer_device object sits at each intersection between a drbd_device and a
-drbd_connection::
-
-  /--------------+---------------+.....+---------------\
-  |   resource   |    device     |     |    device     |
-  +--------------+---------------+.....+---------------+
-  |  connection  |  peer_device  |     |  peer_device  |
-  +--------------+---------------+.....+---------------+
-  :              :               :     :               :
-  :              :               :     :               :
-  +--------------+---------------+.....+---------------+
-  |  connection  |  peer_device  |     |  peer_device  |
-  \--------------+---------------+.....+---------------/
-
-In this table, horizontally, devices can be accessed from resources by their
-volume number.  Likewise, peer_devices can be accessed from connections by
-their volume number.  Objects in the vertical direction are connected by double
-linked lists.  There are back pointers from peer_devices to their connections a
-devices, and from connections and devices to their resource.
-
-All resources are in the drbd_resources double-linked list.  In addition, all
-devices can be accessed by their minor device number via the drbd_devices idr.
-
-The drbd_resource, drbd_connection, and drbd_device objects are reference
-counted.  The peer_device objects only serve to establish the links between
-devices and connections; their lifetime is determined by the lifetime of the
-device and connection which they reference.
diff --git a/Documentation/blockdev/drbd/disk-states-8.dot b/Documentation/blockdev/drbd/disk-states-8.dot
deleted file mode 100644
index d06cfb46fb98..000000000000
--- a/Documentation/blockdev/drbd/disk-states-8.dot
+++ /dev/null
@@ -1,16 +0,0 @@
-digraph disk_states {
-	Diskless -> Inconsistent       [ label = "ioctl_set_disk()" ]
-	Diskless -> Consistent         [ label = "ioctl_set_disk()" ]
-	Diskless -> Outdated           [ label = "ioctl_set_disk()" ]
-	Consistent -> Outdated         [ label = "receive_param()" ]
-	Consistent -> UpToDate         [ label = "receive_param()" ]
-	Consistent -> Inconsistent     [ label = "start resync" ]
-	Outdated   -> Inconsistent     [ label = "start resync" ]
-	UpToDate   -> Inconsistent     [ label = "ioctl_replicate" ]
-	Inconsistent -> UpToDate       [ label = "resync completed" ]
-	Consistent -> Failed           [ label = "io completion error" ]
-	Outdated   -> Failed           [ label = "io completion error" ]
-	UpToDate   -> Failed           [ label = "io completion error" ]
-	Inconsistent -> Failed         [ label = "io completion error" ]
-	Failed -> Diskless             [ label = "sending notify to peer" ]
-}
diff --git a/Documentation/blockdev/drbd/drbd-connection-state-overview.dot b/Documentation/blockdev/drbd/drbd-connection-state-overview.dot
deleted file mode 100644
index 6d9cf0a7b11d..000000000000
--- a/Documentation/blockdev/drbd/drbd-connection-state-overview.dot
+++ /dev/null
@@ -1,85 +0,0 @@
-// vim: set sw=2 sts=2 :
-digraph {
-  rankdir=BT
-  bgcolor=white
-
-  node [shape=plaintext]
-  node [fontcolor=black]
-
-  StandAlone     [ style=filled,fillcolor=gray,label=StandAlone ]
-
-  node [fontcolor=lightgray]
-
-  Unconnected    [ label=Unconnected ]
-
-  CommTrouble [ shape=record,
-    label="{communication loss|{Timeout|BrokenPipe|NetworkFailure}}" ]
-
-  node [fontcolor=gray]
-
-  subgraph cluster_try_connect {
-    label="try to connect, handshake"
-    rank=max
-    WFConnection   [ label=WFConnection ]
-    WFReportParams [ label=WFReportParams ]
-  }
-
-  TearDown       [ label=TearDown ]
-
-  Connected      [ label=Connected,style=filled,fillcolor=green,fontcolor=black ]
-
-  node [fontcolor=lightblue]
-
-  StartingSyncS  [ label=StartingSyncS ]
-  StartingSyncT  [ label=StartingSyncT ]
-
-  subgraph cluster_bitmap_exchange {
-    node [fontcolor=red]
-    fontcolor=red
-    label="new application (WRITE?) requests blocked\lwhile bitmap is exchanged"
-
-    WFBitMapT      [ label=WFBitMapT ]
-    WFSyncUUID     [ label=WFSyncUUID ]
-    WFBitMapS      [ label=WFBitMapS ]
-  }
-
-  node [fontcolor=blue]
-
-  cluster_resync [ shape=record,label="{<any>resynchronisation process running\l'concurrent' application requests allowed|{{<T>PausedSyncT\nSyncTarget}|{<S>PausedSyncS\nSyncSource}}}" ]
-
-  node [shape=box,fontcolor=black]
-
-  // drbdadm [label="drbdadm connect"]
-  // handshake [label="drbd_connect()\ndrbd_do_handshake\ndrbd_sync_handshake() etc."]
-  // comm_error [label="communication trouble"]
-
-  //
-  // edges
-  // --------------------------------------
-
-  StandAlone -> Unconnected [ label="drbdadm connect" ]
-  Unconnected -> StandAlone  [ label="drbdadm disconnect\lor serious communication trouble" ]
-  Unconnected -> WFConnection [ label="receiver thread is started" ]
-  WFConnection -> WFReportParams [ headlabel="accept()\land/or                        \lconnect()\l" ]
-
-  WFReportParams -> StandAlone [ label="during handshake\lpeers do not agree\labout something essential" ]
-  WFReportParams -> Connected [ label="data identical\lno sync needed",color=green,fontcolor=green ]
-
-    WFReportParams -> WFBitMapS
-    WFReportParams -> WFBitMapT
-    WFBitMapT -> WFSyncUUID [minlen=0.1,constraint=false]
-
-      WFBitMapS -> cluster_resync:S
-      WFSyncUUID -> cluster_resync:T
-
-  edge [color=green]
-  cluster_resync:any -> Connected [ label="resnyc done",fontcolor=green ]
-
-  edge [color=red]
-  WFReportParams -> CommTrouble
-  Connected -> CommTrouble
-  cluster_resync:any -> CommTrouble
-  edge [color=black]
-  CommTrouble -> Unconnected [label="receiver thread is stopped" ]
-
-}
diff --git a/Documentation/blockdev/drbd/figures.rst b/Documentation/blockdev/drbd/figures.rst
deleted file mode 100644
index 3e3fd4b8a478..000000000000
--- a/Documentation/blockdev/drbd/figures.rst
+++ /dev/null
@@ -1,28 +0,0 @@
-.. The here included files are intended to help understand the implementation
-
-Data flows that Relate some functions, and write packets
-========================================================
-
-.. kernel-figure:: DRBD-8.3-data-packets.svg
-    :alt:   DRBD-8.3-data-packets.svg
-    :align: center
-
-.. kernel-figure:: DRBD-data-packets.svg
-    :alt:   DRBD-data-packets.svg
-    :align: center
-
-
-Sub graphs of DRBD's state transitions
-======================================
-
-.. kernel-figure:: conn-states-8.dot
-    :alt:   conn-states-8.dot
-    :align: center
-
-.. kernel-figure:: disk-states-8.dot
-    :alt:   disk-states-8.dot
-    :align: center
-
-.. kernel-figure:: node-states-8.dot
-    :alt:   node-states-8.dot
-    :align: center
diff --git a/Documentation/blockdev/drbd/index.rst b/Documentation/blockdev/drbd/index.rst
deleted file mode 100644
index 68ecd5c113e9..000000000000
--- a/Documentation/blockdev/drbd/index.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-==========================================
-Distributed Replicated Block Device - DRBD
-==========================================
-
-Description
-===========
-
-  DRBD is a shared-nothing, synchronously replicated block device. It
-  is designed to serve as a building block for high availability
-  clusters and in this context, is a "drop-in" replacement for shared
-  storage. Simplistically, you could see it as a network RAID 1.
-
-  Please visit http://www.drbd.org to find out more.
-
-.. toctree::
-   :maxdepth: 1
-
-   data-structure-v9
-   figures
diff --git a/Documentation/blockdev/drbd/node-states-8.dot b/Documentation/blockdev/drbd/node-states-8.dot
deleted file mode 100644
index 4a2b00c23547..000000000000
--- a/Documentation/blockdev/drbd/node-states-8.dot
+++ /dev/null
@@ -1,14 +0,0 @@
-digraph node_states {
-	Secondary -> Primary           [ label = "ioctl_set_state()" ]
-	Primary   -> Secondary 	       [ label = "ioctl_set_state()" ]
-}
-
-digraph peer_states {
-	Secondary -> Primary           [ label = "recv state packet" ]
-	Primary   -> Secondary 	       [ label = "recv state packet" ]
-	Primary   -> Unknown 	       [ label = "connection lost" ]
-	Secondary  -> Unknown  	       [ label = "connection lost" ]
-	Unknown   -> Primary           [ label = "connected" ]
-	Unknown   -> Secondary         [ label = "connected" ]
-}
-
diff --git a/Documentation/blockdev/floppy.rst b/Documentation/blockdev/floppy.rst
deleted file mode 100644
index 4a8f31cf4139..000000000000
--- a/Documentation/blockdev/floppy.rst
+++ /dev/null
@@ -1,255 +0,0 @@
-=============
-Floppy Driver
-=============
-
-FAQ list:
-=========
-
-A FAQ list may be found in the fdutils package (see below), and also
-at <http://fdutils.linux.lu/faq.html>.
-
-
-LILO configuration options (Thinkpad users, read this)
-======================================================
-
-The floppy driver is configured using the 'floppy=' option in
-lilo. This option can be typed at the boot prompt, or entered in the
-lilo configuration file.
-
-Example: If your kernel is called linux-2.6.9, type the following line
-at the lilo boot prompt (if you have a thinkpad)::
-
- linux-2.6.9 floppy=thinkpad
-
-You may also enter the following line in /etc/lilo.conf, in the description
-of linux-2.6.9::
-
- append = "floppy=thinkpad"
-
-Several floppy related options may be given, example::
-
- linux-2.6.9 floppy=daring floppy=two_fdc
- append = "floppy=daring floppy=two_fdc"
-
-If you give options both in the lilo config file and on the boot
-prompt, the option strings of both places are concatenated, the boot
-prompt options coming last. That's why there are also options to
-restore the default behavior.
-
-
-Module configuration options
-============================
-
-If you use the floppy driver as a module, use the following syntax::
-
-	modprobe floppy floppy="<options>"
-
-Example::
-
-	modprobe floppy floppy="omnibook messages"
-
-If you need certain options enabled every time you load the floppy driver,
-you can put::
-
-	options floppy floppy="omnibook messages"
-
-in a configuration file in /etc/modprobe.d/.
-
-
-The floppy driver related options are:
-
- floppy=asus_pci
-	Sets the bit mask to allow only units 0 and 1. (default)
-
- floppy=daring
-	Tells the floppy driver that you have a well behaved floppy controller.
-	This allows more efficient and smoother operation, but may fail on
-	certain controllers. This may speed up certain operations.
-
- floppy=0,daring
-	Tells the floppy driver that your floppy controller should be used
-	with caution.
-
- floppy=one_fdc
-	Tells the floppy driver that you have only one floppy controller.
-	(default)
-
- floppy=two_fdc / floppy=<address>,two_fdc
-	Tells the floppy driver that you have two floppy controllers.
-	The second floppy controller is assumed to be at <address>.
-	This option is not needed if the second controller is at address
-	0x370, and if you use the 'cmos' option.
-
- floppy=thinkpad
-	Tells the floppy driver that you have a Thinkpad. Thinkpads use an
-	inverted convention for the disk change line.
-
- floppy=0,thinkpad
-	Tells the floppy driver that you don't have a Thinkpad.
-
- floppy=omnibook / floppy=nodma
-	Tells the floppy driver not to use Dma for data transfers.
-	This is needed on HP Omnibooks, which don't have a workable
-	DMA channel for the floppy driver. This option is also useful
-	if you frequently get "Unable to allocate DMA memory" messages.
-	Indeed, dma memory needs to be continuous in physical memory,
-	and is thus harder to find, whereas non-dma buffers may be
-	allocated in virtual memory. However, I advise against this if
-	you have an FDC without a FIFO (8272A or 82072). 82072A and
-	later are OK. You also need at least a 486 to use nodma.
-	If you use nodma mode, I suggest you also set the FIFO
-	threshold to 10 or lower, in order to limit the number of data
-	transfer interrupts.
-
-	If you have a FIFO-able FDC, the floppy driver automatically
-	falls back on non DMA mode if no DMA-able memory can be found.
-	If you want to avoid this, explicitly ask for 'yesdma'.
-
- floppy=yesdma
-	Tells the floppy driver that a workable DMA channel is available.
-	(default)
-
- floppy=nofifo
-	Disables the FIFO entirely. This is needed if you get "Bus
-	master arbitration error" messages from your Ethernet card (or
-	from other devices) while accessing the floppy.
-
- floppy=usefifo
-	Enables the FIFO. (default)
-
- floppy=<threshold>,fifo_depth
-	Sets the FIFO threshold. This is mostly relevant in DMA
-	mode. If this is higher, the floppy driver tolerates more
-	interrupt latency, but it triggers more interrupts (i.e. it
-	imposes more load on the rest of the system). If this is
-	lower, the interrupt latency should be lower too (faster
-	processor). The benefit of a lower threshold is less
-	interrupts.
-
-	To tune the fifo threshold, switch on over/underrun messages
-	using 'floppycontrol --messages'. Then access a floppy
-	disk. If you get a huge amount of "Over/Underrun - retrying"
-	messages, then the fifo threshold is too low. Try with a
-	higher value, until you only get an occasional Over/Underrun.
-	It is a good idea to compile the floppy driver as a module
-	when doing this tuning. Indeed, it allows to try different
-	fifo values without rebooting the machine for each test. Note
-	that you need to do 'floppycontrol --messages' every time you
-	re-insert the module.
-
-	Usually, tuning the fifo threshold should not be needed, as
-	the default (0xa) is reasonable.
-
- floppy=<drive>,<type>,cmos
-	Sets the CMOS type of <drive> to <type>. This is mandatory if
-	you have more than two floppy drives (only two can be
-	described in the physical CMOS), or if your BIOS uses
-	non-standard CMOS types. The CMOS types are:
-
-	       ==  ==================================
-		0  Use the value of the physical CMOS
-		1  5 1/4 DD
-		2  5 1/4 HD
-		3  3 1/2 DD
-		4  3 1/2 HD
-		5  3 1/2 ED
-		6  3 1/2 ED
-	       16  unknown or not installed
-	       ==  ==================================
-
-	(Note: there are two valid types for ED drives. This is because 5 was
-	initially chosen to represent floppy *tapes*, and 6 for ED drives.
-	AMI ignored this, and used 5 for ED drives. That's why the floppy
-	driver handles both.)
-
- floppy=unexpected_interrupts
-	Print a warning message when an unexpected interrupt is received.
-	(default)
-
- floppy=no_unexpected_interrupts / floppy=L40SX
-	Don't print a message when an unexpected interrupt is received. This
-	is needed on IBM L40SX laptops in certain video modes. (There seems
-	to be an interaction between video and floppy. The unexpected
-	interrupts affect only performance, and can be safely ignored.)
-
- floppy=broken_dcl
-	Don't use the disk change line, but assume that the disk was
-	changed whenever the device node is reopened. Needed on some
-	boxes where the disk change line is broken or unsupported.
-	This should be regarded as a stopgap measure, indeed it makes
-	floppy operation less efficient due to unneeded cache
-	flushings, and slightly more unreliable. Please verify your
-	cable, connection and jumper settings if you have any DCL
-	problems. However, some older drives, and also some laptops
-	are known not to have a DCL.
-
- floppy=debug
-	Print debugging messages.
-
- floppy=messages
-	Print informational messages for some operations (disk change
-	notifications, warnings about over and underruns, and about
-	autodetection).
-
- floppy=silent_dcl_clear
-	Uses a less noisy way to clear the disk change line (which
-	doesn't involve seeks). Implied by 'daring' option.
-
- floppy=<nr>,irq
-	Sets the floppy IRQ to <nr> instead of 6.
-
- floppy=<nr>,dma
-	Sets the floppy DMA channel to <nr> instead of 2.
-
- floppy=slow
-	Use PS/2 stepping rate::
-
-	   PS/2 floppies have much slower step rates than regular floppies.
-	   It's been recommended that take about 1/4 of the default speed
-	   in some more extreme cases.
-
-
-Supporting utilities and additional documentation:
-==================================================
-
-Additional parameters of the floppy driver can be configured at
-runtime. Utilities which do this can be found in the fdutils package.
-This package also contains a new version of mtools which allows to
-access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
-It also contains additional documentation about the floppy driver.
-
-The latest version can be found at fdutils homepage:
-
- http://fdutils.linux.lu
-
-The fdutils releases can be found at:
-
- http://fdutils.linux.lu/download.html
-
- http://www.tux.org/pub/knaff/fdutils/
-
- ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
-
-Reporting problems about the floppy driver
-==========================================
-
-If you have a question or a bug report about the floppy driver, mail
-me at Alain.Knaff@poboxes.com . If you post to Usenet, preferably use
-comp.os.linux.hardware. As the volume in these groups is rather high,
-be sure to include the word "floppy" (or "FLOPPY") in the subject
-line.  If the reported problem happens when mounting floppy disks, be
-sure to mention also the type of the filesystem in the subject line.
-
-Be sure to read the FAQ before mailing/posting any bug reports!
-
-Alain
-
-Changelog
-=========
-
-10-30-2004 :
-		Cleanup, updating, add reference to module configuration.
-		James Nelson <james4765@gmail.com>
-
-6-3-2000 :
-		Original Document
diff --git a/Documentation/blockdev/index.rst b/Documentation/blockdev/index.rst
deleted file mode 100644
index a9af6ed8b4aa..000000000000
--- a/Documentation/blockdev/index.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-:orphan:
-
-===========================
-The Linux RapidIO Subsystem
-===========================
-
-.. toctree::
-   :maxdepth: 1
-
-   floppy
-   nbd
-   paride
-   ramdisk
-   zram
-
-   drbd/index
diff --git a/Documentation/blockdev/nbd.rst b/Documentation/blockdev/nbd.rst
deleted file mode 100644
index d78dfe559dcf..000000000000
--- a/Documentation/blockdev/nbd.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-==================================
-Network Block Device (TCP version)
-==================================
-
-1) Overview
------------
-
-What is it: With this compiled in the kernel (or as a module), Linux
-can use a remote server as one of its block devices. So every time
-the client computer wants to read, e.g., /dev/nb0, it sends a
-request over TCP to the server, which will reply with the data read.
-This can be used for stations with low disk space (or even diskless)
-to borrow disk space from another computer.
-Unlike NFS, it is possible to put any filesystem on it, etc.
-
-For more information, or to download the nbd-client and nbd-server
-tools, go to http://nbd.sf.net/.
-
-The nbd kernel module need only be installed on the client
-system, as the nbd-server is completely in userspace. In fact,
-the nbd-server has been successfully ported to other operating
-systems, including Windows.
-
-A) NBD parameters
------------------
-
-max_part
-	Number of partitions per device (default: 0).
-
-nbds_max
-	Number of block devices that should be initialized (default: 16).
diff --git a/Documentation/blockdev/paride.rst b/Documentation/blockdev/paride.rst
deleted file mode 100644
index 87b4278bf314..000000000000
--- a/Documentation/blockdev/paride.rst
+++ /dev/null
@@ -1,439 +0,0 @@
-===================================
-Linux and parallel port IDE devices
-===================================
-
-PARIDE v1.03   (c) 1997-8  Grant Guenther <grant@torque.net>
-
-1. Introduction
-===============
-
-Owing to the simplicity and near universality of the parallel port interface
-to personal computers, many external devices such as portable hard-disk,
-CD-ROM, LS-120 and tape drives use the parallel port to connect to their
-host computer.  While some devices (notably scanners) use ad-hoc methods
-to pass commands and data through the parallel port interface, most
-external devices are actually identical to an internal model, but with
-a parallel-port adapter chip added in.  Some of the original parallel port
-adapters were little more than mechanisms for multiplexing a SCSI bus.
-(The Iomega PPA-3 adapter used in the ZIP drives is an example of this
-approach).  Most current designs, however, take a different approach.
-The adapter chip reproduces a small ISA or IDE bus in the external device
-and the communication protocol provides operations for reading and writing
-device registers, as well as data block transfer functions.  Sometimes,
-the device being addressed via the parallel cable is a standard SCSI
-controller like an NCR 5380.  The "ditto" family of external tape
-drives use the ISA replicator to interface a floppy disk controller,
-which is then connected to a floppy-tape mechanism.  The vast majority
-of external parallel port devices, however, are now based on standard
-IDE type devices, which require no intermediate controller.  If one
-were to open up a parallel port CD-ROM drive, for instance, one would
-find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
-that interconnected a standard PC parallel port cable and a standard
-IDE cable.  It is usually possible to exchange the CD-ROM device with
-any other device using the IDE interface.
-
-The document describes the support in Linux for parallel port IDE
-devices.  It does not cover parallel port SCSI devices, "ditto" tape
-drives or scanners.  Many different devices are supported by the
-parallel port IDE subsystem, including:
-
-	- MicroSolutions backpack CD-ROM
-	- MicroSolutions backpack PD/CD
-	- MicroSolutions backpack hard-drives
-	- MicroSolutions backpack 8000t tape drive
-	- SyQuest EZ-135, EZ-230 & SparQ drives
-	- Avatar Shark
-	- Imation Superdisk LS-120
-	- Maxell Superdisk LS-120
-	- FreeCom Power CD
-	- Hewlett-Packard 5GB and 8GB tape drives
-	- Hewlett-Packard 7100 and 7200 CD-RW drives
-
-as well as most of the clone and no-name products on the market.
-
-To support such a wide range of devices, PARIDE, the parallel port IDE
-subsystem, is actually structured in three parts.   There is a base
-paride module which provides a registry and some common methods for
-accessing the parallel ports.  The second component is a set of
-high-level drivers for each of the different types of supported devices:
-
-	===	=============
-	pd	IDE disk
-	pcd	ATAPI CD-ROM
-	pf	ATAPI disk
-	pt	ATAPI tape
-	pg	ATAPI generic
-	===	=============
-
-(Currently, the pg driver is only used with CD-R drives).
-
-The high-level drivers function according to the relevant standards.
-The third component of PARIDE is a set of low-level protocol drivers
-for each of the parallel port IDE adapter chips.  Thanks to the interest
-and encouragement of Linux users from many parts of the world,
-support is available for almost all known adapter protocols:
-
-	====    ====================================== ====
-        aten    ATEN EH-100                            (HK)
-        bpck    Microsolutions backpack                (US)
-        comm    DataStor (old-type) "commuter" adapter (TW)
-        dstr    DataStor EP-2000                       (TW)
-        epat    Shuttle EPAT                           (UK)
-        epia    Shuttle EPIA                           (UK)
-	fit2    FIT TD-2000			       (US)
-	fit3    FIT TD-3000			       (US)
-	friq    Freecom IQ cable                       (DE)
-        frpw    Freecom Power                          (DE)
-        kbic    KingByte KBIC-951A and KBIC-971A       (TW)
-	ktti    KT Technology PHd adapter              (SG)
-        on20    OnSpec 90c20                           (US)
-        on26    OnSpec 90c26                           (US)
-	====    ====================================== ====
-
-
-2. Using the PARIDE subsystem
-=============================
-
-While configuring the Linux kernel, you may choose either to build
-the PARIDE drivers into your kernel, or to build them as modules.
-
-In either case, you will need to select "Parallel port IDE device support"
-as well as at least one of the high-level drivers and at least one
-of the parallel port communication protocols.  If you do not know
-what kind of parallel port adapter is used in your drive, you could
-begin by checking the file names and any text files on your DOS
-installation floppy.  Alternatively, you can look at the markings on
-the adapter chip itself.  That's usually sufficient to identify the
-correct device.
-
-You can actually select all the protocol modules, and allow the PARIDE
-subsystem to try them all for you.
-
-For the "brand-name" products listed above, here are the protocol
-and high-level drivers that you would use:
-
-	================	============	======	========
-	Manufacturer		Model		Driver	Protocol
-	================	============	======	========
-	MicroSolutions		CD-ROM		pcd	bpck
-	MicroSolutions		PD drive	pf	bpck
-	MicroSolutions		hard-drive	pd	bpck
-	MicroSolutions          8000t tape      pt      bpck
-	SyQuest			EZ, SparQ	pd	epat
-	Imation			Superdisk	pf	epat
-	Maxell                  Superdisk       pf      friq
-	Avatar			Shark		pd	epat
-	FreeCom			CD-ROM		pcd	frpw
-	Hewlett-Packard		5GB Tape	pt	epat
-	Hewlett-Packard		7200e (CD)	pcd	epat
-	Hewlett-Packard		7200e (CD-R)	pg	epat
-	================	============	======	========
-
-2.1  Configuring built-in drivers
----------------------------------
-
-We recommend that you get to know how the drivers work and how to
-configure them as loadable modules, before attempting to compile a
-kernel with the drivers built-in.
-
-If you built all of your PARIDE support directly into your kernel,
-and you have just a single parallel port IDE device, your kernel should
-locate it automatically for you.  If you have more than one device,
-you may need to give some command line options to your bootloader
-(eg: LILO), how to do that is beyond the scope of this document.
-
-The high-level drivers accept a number of command line parameters, all
-of which are documented in the source files in linux/drivers/block/paride.
-By default, each driver will automatically try all parallel ports it
-can find, and all protocol types that have been installed, until it finds
-a parallel port IDE adapter.  Once it finds one, the probe stops.  So,
-if you have more than one device, you will need to tell the drivers
-how to identify them.  This requires specifying the port address, the
-protocol identification number and, for some devices, the drive's
-chain ID.  While your system is booting, a number of messages are
-displayed on the console.  Like all such messages, they can be
-reviewed with the 'dmesg' command.  Among those messages will be
-some lines like::
-
-	paride: bpck registered as protocol 0
-	paride: epat registered as protocol 1
-
-The numbers will always be the same until you build a new kernel with
-different protocol selections.  You should note these numbers as you
-will need them to identify the devices.
-
-If you happen to be using a MicroSolutions backpack device, you will
-also need to know the unit ID number for each drive.  This is usually
-the last two digits of the drive's serial number (but read MicroSolutions'
-documentation about this).
-
-As an example, let's assume that you have a MicroSolutions PD/CD drive
-with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
-EZ-135 connected to the chained port on the PD/CD drive and also an
-Imation Superdisk connected to port 0x278.  You could give the following
-options on your boot command::
-
-	pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
-
-In the last option, pf.drive1 configures device /dev/pf1, the 0x378
-is the parallel port base address, the 0 is the protocol registration
-number and 36 is the chain ID.
-
-Please note:  while PARIDE will work both with and without the
-PARPORT parallel port sharing system that is included by the
-"Parallel port support" option, PARPORT must be included and enabled
-if you want to use chains of devices on the same parallel port.
-
-2.2  Loading and configuring PARIDE as modules
-----------------------------------------------
-
-It is much faster and simpler to get to understand the PARIDE drivers
-if you use them as loadable kernel modules.
-
-Note 1:
-	using these drivers with the "kerneld" automatic module loading
-	system is not recommended for beginners, and is not documented here.
-
-Note 2:
-	if you build PARPORT support as a loadable module, PARIDE must
-	also be built as loadable modules, and PARPORT must be loaded before
-	the PARIDE modules.
-
-To use PARIDE, you must begin by::
-
-	insmod paride
-
-this loads a base module which provides a registry for the protocols,
-among other tasks.
-
-Then, load as many of the protocol modules as you think you might need.
-As you load each module, it will register the protocols that it supports,
-and print a log message to your kernel log file and your console. For
-example::
-
-	# insmod epat
-	paride: epat registered as protocol 0
-	# insmod kbic
-	paride: k951 registered as protocol 1
-        paride: k971 registered as protocol 2
-
-Finally, you can load high-level drivers for each kind of device that
-you have connected.  By default, each driver will autoprobe for a single
-device, but you can support up to four similar devices by giving their
-individual co-ordinates when you load the driver.
-
-For example, if you had two no-name CD-ROM drives both using the
-KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
-you could give the following command::
-
-	# insmod pcd drive0=0x378,1 drive1=0x3bc,1
-
-For most adapters, giving a port address and protocol number is sufficient,
-but check the source files in linux/drivers/block/paride for more
-information.  (Hopefully someone will write some man pages one day !).
-
-As another example, here's what happens when PARPORT is installed, and
-a SyQuest EZ-135 is attached to port 0x378::
-
-	# insmod paride
-	paride: version 1.0 installed
-	# insmod epat
-	paride: epat registered as protocol 0
-	# insmod pd
-	pd: pd version 1.0, major 45, cluster 64, nice 0
-	pda: Sharing parport1 at 0x378
-	pda: epat 1.0, Shuttle EPAT chip c3 at 0x378, mode 5 (EPP-32), delay 1
-	pda: SyQuest EZ135A, 262144 blocks [128M], (512/16/32), removable media
-	 pda: pda1
-
-Note that the last line is the output from the generic partition table
-scanner - in this case it reports that it has found a disk with one partition.
-
-2.3  Using a PARIDE device
---------------------------
-
-Once the drivers have been loaded, you can access PARIDE devices in the
-same way as their traditional counterparts.  You will probably need to
-create the device "special files".  Here is a simple script that you can
-cut to a file and execute::
-
-  #!/bin/bash
-  #
-  # mkd -- a script to create the device special files for the PARIDE subsystem
-  #
-  function mkdev {
-    mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
-  }
-  #
-  function pd {
-    D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
-    mkdev pd$D b 45 $[ $1 * 16 ]
-    for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-    do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
-    done
-  }
-  #
-  cd /dev
-  #
-  for u in 0 1 2 3 ; do pd $u ; done
-  for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
-  for u in 0 1 2 3 ; do mkdev pf$u  b 47 $u ; done
-  for u in 0 1 2 3 ; do mkdev pt$u  c 96 $u ; done
-  for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
-  for u in 0 1 2 3 ; do mkdev pg$u  c 97 $u ; done
-  #
-  # end of mkd
-
-With the device files and drivers in place, you can access PARIDE devices
-like any other Linux device.   For example, to mount a CD-ROM in pcd0, use::
-
-	mount /dev/pcd0 /cdrom
-
-If you have a fresh Avatar Shark cartridge, and the drive is pda, you
-might do something like::
-
-	fdisk /dev/pda		-- make a new partition table with
-				   partition 1 of type 83
-
-	mke2fs /dev/pda1	-- to build the file system
-
-	mkdir /shark		-- make a place to mount the disk
-
-	mount /dev/pda1 /shark
-
-Devices like the Imation superdisk work in the same way, except that
-they do not have a partition table.  For example to make a 120MB
-floppy that you could share with a DOS system::
-
-	mkdosfs /dev/pf0
-	mount /dev/pf0 /mnt
-
-
-2.4  The pf driver
-------------------
-
-The pf driver is intended for use with parallel port ATAPI disk
-devices.  The most common devices in this category are PD drives
-and LS-120 drives.  Traditionally, media for these devices are not
-partitioned.  Consequently, the pf driver does not support partitioned
-media.  This may be changed in a future version of the driver.
-
-2.5  Using the pt driver
-------------------------
-
-The pt driver for parallel port ATAPI tape drives is a minimal driver.
-It does not yet support many of the standard tape ioctl operations.
-For best performance, a block size of 32KB should be used.  You will
-probably want to set the parallel port delay to 0, if you can.
-
-2.6  Using the pg driver
-------------------------
-
-The pg driver can be used in conjunction with the cdrecord program
-to create CD-ROMs.  Please get cdrecord version 1.6.1 or later
-from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ .  To record CD-R media
-your parallel port should ideally be set to EPP mode, and the "port delay"
-should be set to 0.  With those settings it is possible to record at 2x
-speed without any buffer underruns.  If you cannot get the driver to work
-in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
-
-
-3. Troubleshooting
-==================
-
-3.1  Use EPP mode if you can
-----------------------------
-
-The most common problems that people report with the PARIDE drivers
-concern the parallel port CMOS settings.  At this time, none of the
-PARIDE protocol modules support ECP mode, or any ECP combination modes.
-If you are able to do so, please set your parallel port into EPP mode
-using your CMOS setup procedure.
-
-3.2  Check the port delay
--------------------------
-
-Some parallel ports cannot reliably transfer data at full speed.  To
-offset the errors, the PARIDE protocol modules introduce a "port
-delay" between each access to the i/o ports.  Each protocol sets
-a default value for this delay.  In most cases, the user can override
-the default and set it to 0 - resulting in somewhat higher transfer
-rates.  In some rare cases (especially with older 486 systems) the
-default delays are not long enough.  if you experience corrupt data
-transfers, or unexpected failures, you may wish to increase the
-port delay.   The delay can be programmed using the "driveN" parameters
-to each of the high-level drivers.  Please see the notes above, or
-read the comments at the beginning of the driver source files in
-linux/drivers/block/paride.
-
-3.3  Some drives need a printer reset
--------------------------------------
-
-There appear to be a number of "noname" external drives on the market
-that do not always power up correctly.  We have noticed this with some
-drives based on OnSpec and older Freecom adapters.  In these rare cases,
-the adapter can often be reinitialised by issuing a "printer reset" on
-the parallel port.  As the reset operation is potentially disruptive in
-multiple device environments, the PARIDE drivers will not do it
-automatically.  You can however, force a printer reset by doing::
-
-	insmod lp reset=1
-	rmmod lp
-
-If you have one of these marginal cases, you should probably build
-your paride drivers as modules, and arrange to do the printer reset
-before loading the PARIDE drivers.
-
-3.4  Use the verbose option and dmesg if you need help
-------------------------------------------------------
-
-While a lot of testing has gone into these drivers to make them work
-as smoothly as possible, problems will arise.  If you do have problems,
-please check all the obvious things first:  does the drive work in
-DOS with the manufacturer's drivers ?  If that doesn't yield any useful
-clues, then please make sure that only one drive is hooked to your system,
-and that either (a) PARPORT is enabled or (b) no other device driver
-is using your parallel port (check in /proc/ioports).  Then, load the
-appropriate drivers (you can load several protocol modules if you want)
-as in::
-
-	# insmod paride
-	# insmod epat
-	# insmod bpck
-	# insmod kbic
-	...
-	# insmod pd verbose=1
-
-(using the correct driver for the type of device you have, of course).
-The verbose=1 parameter will cause the drivers to log a trace of their
-activity as they attempt to locate your drive.
-
-Use 'dmesg' to capture a log of all the PARIDE messages (any messages
-beginning with paride:, a protocol module's name or a driver's name) and
-include that with your bug report.  You can submit a bug report in one
-of two ways.  Either send it directly to the author of the PARIDE suite,
-by e-mail to grant@torque.net, or join the linux-parport mailing list
-and post your report there.
-
-3.5  For more information or help
----------------------------------
-
-You can join the linux-parport mailing list by sending a mail message
-to:
-
-		linux-parport-request@torque.net
-
-with the single word::
-
-		subscribe
-
-in the body of the mail message (not in the subject line).   Please be
-sure that your mail program is correctly set up when you do this,  as
-the list manager is a robot that will subscribe you using the reply
-address in your mail headers.  REMOVE any anti-spam gimmicks you may
-have in your mail headers, when sending mail to the list server.
-
-You might also find some useful information on the linux-parport
-web pages (although they are not always up to date) at
-
-	http://web.archive.org/web/%2E/http://www.torque.net/parport/
diff --git a/Documentation/blockdev/ramdisk.rst b/Documentation/blockdev/ramdisk.rst
deleted file mode 100644
index b7c2268f8dec..000000000000
--- a/Documentation/blockdev/ramdisk.rst
+++ /dev/null
@@ -1,177 +0,0 @@
-==========================================
-Using the RAM disk block device with Linux
-==========================================
-
-.. Contents:
-
-	1) Overview
-	2) Kernel Command Line Parameters
-	3) Using "rdev -r"
-	4) An Example of Creating a Compressed RAM Disk
-
-
-1) Overview
------------
-
-The RAM disk driver is a way to use main system memory as a block device.  It
-is required for initrd, an initial filesystem used if you need to load modules
-in order to access the root filesystem (see Documentation/admin-guide/initrd.rst).  It can
-also be used for a temporary filesystem for crypto work, since the contents
-are erased on reboot.
-
-The RAM disk dynamically grows as more space is required. It does this by using
-RAM from the buffer cache. The driver marks the buffers it is using as dirty
-so that the VM subsystem does not try to reclaim them later.
-
-The RAM disk supports up to 16 RAM disks by default, and can be reconfigured
-to support an unlimited number of RAM disks (at your own risk).  Just change
-the configuration symbol BLK_DEV_RAM_COUNT in the Block drivers config menu
-and (re)build the kernel.
-
-To use RAM disk support with your system, run './MAKEDEV ram' from the /dev
-directory.  RAM disks are all major number 1, and start with minor number 0
-for /dev/ram0, etc.  If used, modern kernels use /dev/ram0 for an initrd.
-
-The new RAM disk also has the ability to load compressed RAM disk images,
-allowing one to squeeze more programs onto an average installation or
-rescue floppy disk.
-
-
-2) Parameters
----------------------------------
-
-2a) Kernel Command Line Parameters
-
-	ramdisk_size=N
-		Size of the ramdisk.
-
-This parameter tells the RAM disk driver to set up RAM disks of N k size.  The
-default is 4096 (4 MB).
-
-2b) Module parameters
-
-	rd_nr
-		/dev/ramX devices created.
-
-	max_part
-		Maximum partition number.
-
-	rd_size
-		See ramdisk_size.
-
-3) Using "rdev -r"
-------------------
-
-The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
-as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
-to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
-14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
-prompt/wait sequence is to be given before trying to read the RAM disk. Since
-the RAM disk dynamically grows as data is being written into it, a size field
-is not required. Bits 11 to 13 are not currently used and may as well be zero.
-These numbers are no magical secrets, as seen below::
-
-  ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK     0x07FF
-  ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG          0x8000
-  ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG            0x4000
-
-Consider a typical two floppy disk setup, where you will have the
-kernel on disk one, and have already put a RAM disk image onto disk #2.
-
-Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
-starts at an offset of 0 kB from the beginning of the floppy.
-The command line equivalent is: "ramdisk_start=0"
-
-You want bit 14 as one, indicating that a RAM disk is to be loaded.
-The command line equivalent is: "load_ramdisk=1"
-
-You want bit 15 as one, indicating that you want a prompt/keypress
-sequence so that you have a chance to switch floppy disks.
-The command line equivalent is: "prompt_ramdisk=1"
-
-Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
-So to create disk one of the set, you would do::
-
-	/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
-	/usr/src/linux# rdev /dev/fd0 /dev/fd0
-	/usr/src/linux# rdev -r /dev/fd0 49152
-
-If you make a boot disk that has LILO, then for the above, you would use::
-
-	append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
-
-Since the default start = 0 and the default prompt = 1, you could use::
-
-	append = "load_ramdisk=1"
-
-
-4) An Example of Creating a Compressed RAM Disk
------------------------------------------------
-
-To create a RAM disk image, you will need a spare block device to
-construct it on. This can be the RAM disk device itself, or an
-unused disk partition (such as an unmounted swap partition). For this
-example, we will use the RAM disk device, "/dev/ram0".
-
-Note: This technique should not be done on a machine with less than 8 MB
-of RAM. If using a spare disk partition instead of /dev/ram0, then this
-restriction does not apply.
-
-a) Decide on the RAM disk size that you want. Say 2 MB for this example.
-   Create it by writing to the RAM disk device. (This step is not currently
-   required, but may be in the future.) It is wise to zero out the
-   area (esp. for disks) so that maximal compression is achieved for
-   the unused blocks of the image that you are about to create::
-
-	dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
-
-b) Make a filesystem on it. Say ext2fs for this example::
-
-	mke2fs -vm0 /dev/ram0 2048
-
-c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
-   and unmount it again.
-
-d) Compress the contents of the RAM disk. The level of compression
-   will be approximately 50% of the space used by the files. Unused
-   space on the RAM disk will compress to almost nothing::
-
-	dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
-
-e) Put the kernel onto the floppy::
-
-	dd if=zImage of=/dev/fd0 bs=1k
-
-f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
-   that is slightly larger than the kernel, so that you can put another
-   (possibly larger) kernel onto the same floppy later without overlapping
-   the RAM disk image. An offset of 400 kB for kernels about 350 kB in
-   size would be reasonable. Make sure offset+size of ram_image.gz is
-   not larger than the total space on your floppy (usually 1440 kB)::
-
-	dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
-
-g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
-   For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
-   have 2^15 + 2^14 + 400 = 49552::
-
-	rdev /dev/fd0 /dev/fd0
-	rdev -r /dev/fd0 49552
-
-That is it. You now have your boot/root compressed RAM disk floppy. Some
-users may wish to combine steps (d) and (f) by using a pipe.
-
-
-						Paul Gortmaker 12/95
-
-Changelog:
-----------
-
-10-22-04 :
-		Updated to reflect changes in command line options, remove
-		obsolete references, general cleanup.
-		James Nelson (james4765@gmail.com)
-
-
-12-95 :
-		Original Document
diff --git a/Documentation/blockdev/zram.rst b/Documentation/blockdev/zram.rst
deleted file mode 100644
index 6eccf13219ff..000000000000
--- a/Documentation/blockdev/zram.rst
+++ /dev/null
@@ -1,422 +0,0 @@
-========================================
-zram: Compressed RAM based block devices
-========================================
-
-Introduction
-============
-
-The zram module creates RAM based block devices named /dev/zram<id>
-(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
-in memory itself. These disks allow very fast I/O and compression provides
-good amounts of memory savings. Some of the usecases include /tmp storage,
-use as swap disks, various caches under /var and maybe many more :)
-
-Statistics for individual zram devices are exported through sysfs nodes at
-/sys/block/zram<id>/
-
-Usage
-=====
-
-There are several ways to configure and manage zram device(-s):
-
-a) using zram and zram_control sysfs attributes
-b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org).
-
-In this document we will describe only 'manual' zram configuration steps,
-IOW, zram and zram_control sysfs attributes.
-
-In order to get a better idea about zramctl please consult util-linux
-documentation, zramctl man-page or `zramctl --help`. Please be informed
-that zram maintainers do not develop/maintain util-linux or zramctl, should
-you have any questions please contact util-linux@vger.kernel.org
-
-Following shows a typical sequence of steps for using zram.
-
-WARNING
-=======
-
-For the sake of simplicity we skip error checking parts in most of the
-examples below. However, it is your sole responsibility to handle errors.
-
-zram sysfs attributes always return negative values in case of errors.
-The list of possible return codes:
-
-========  =============================================================
--EBUSY	  an attempt to modify an attribute that cannot be changed once
-	  the device has been initialised. Please reset device first;
--ENOMEM	  zram was not able to allocate enough memory to fulfil your
-	  needs;
--EINVAL	  invalid input has been provided.
-========  =============================================================
-
-If you use 'echo', the returned value that is changed by 'echo' utility,
-and, in general case, something like::
-
-	echo 3 > /sys/block/zram0/max_comp_streams
-	if [ $? -ne 0 ];
-		handle_error
-	fi
-
-should suffice.
-
-1) Load Module
-==============
-
-::
-
-	modprobe zram num_devices=4
-	This creates 4 devices: /dev/zram{0,1,2,3}
-
-num_devices parameter is optional and tells zram how many devices should be
-pre-created. Default: 1.
-
-2) Set max number of compression streams
-========================================
-
-Regardless the value passed to this attribute, ZRAM will always
-allocate multiple compression streams - one per online CPUs - thus
-allowing several concurrent compression operations. The number of
-allocated compression streams goes down when some of the CPUs
-become offline. There is no single-compression-stream mode anymore,
-unless you are running a UP system or has only 1 CPU online.
-
-To find out how many streams are currently available::
-
-	cat /sys/block/zram0/max_comp_streams
-
-3) Select compression algorithm
-===============================
-
-Using comp_algorithm device attribute one can see available and
-currently selected (shown in square brackets) compression algorithms,
-change selected compression algorithm (once the device is initialised
-there is no way to change compression algorithm).
-
-Examples::
-
-	#show supported compression algorithms
-	cat /sys/block/zram0/comp_algorithm
-	lzo [lz4]
-
-	#select lzo compression algorithm
-	echo lzo > /sys/block/zram0/comp_algorithm
-
-For the time being, the `comp_algorithm` content does not necessarily
-show every compression algorithm supported by the kernel. We keep this
-list primarily to simplify device configuration and one can configure
-a new device with a compression algorithm that is not listed in
-`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
-and, if some of the algorithms were built as modules, it's impossible
-to list all of them using, for instance, /proc/crypto or any other
-method. This, however, has an advantage of permitting the usage of
-custom crypto compression modules (implementing S/W or H/W compression).
-
-4) Set Disksize
-===============
-
-Set disk size by writing the value to sysfs node 'disksize'.
-The value can be either in bytes or you can use mem suffixes.
-Examples::
-
-	# Initialize /dev/zram0 with 50MB disksize
-	echo $((50*1024*1024)) > /sys/block/zram0/disksize
-
-	# Using mem suffixes
-	echo 256K > /sys/block/zram0/disksize
-	echo 512M > /sys/block/zram0/disksize
-	echo 1G > /sys/block/zram0/disksize
-
-Note:
-There is little point creating a zram of greater than twice the size of memory
-since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
-size of the disk when not in use so a huge zram is wasteful.
-
-5) Set memory limit: Optional
-=============================
-
-Set memory limit by writing the value to sysfs node 'mem_limit'.
-The value can be either in bytes or you can use mem suffixes.
-In addition, you could change the value in runtime.
-Examples::
-
-	# limit /dev/zram0 with 50MB memory
-	echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
-
-	# Using mem suffixes
-	echo 256K > /sys/block/zram0/mem_limit
-	echo 512M > /sys/block/zram0/mem_limit
-	echo 1G > /sys/block/zram0/mem_limit
-
-	# To disable memory limit
-	echo 0 > /sys/block/zram0/mem_limit
-
-6) Activate
-===========
-
-::
-
-	mkswap /dev/zram0
-	swapon /dev/zram0
-
-	mkfs.ext4 /dev/zram1
-	mount /dev/zram1 /tmp
-
-7) Add/remove zram devices
-==========================
-
-zram provides a control interface, which enables dynamic (on-demand) device
-addition and removal.
-
-In order to add a new /dev/zramX device, perform read operation on hot_add
-attribute. This will return either new device's device id (meaning that you
-can use /dev/zram<id>) or error code.
-
-Example::
-
-	cat /sys/class/zram-control/hot_add
-	1
-
-To remove the existing /dev/zramX device (where X is a device id)
-execute::
-
-	echo X > /sys/class/zram-control/hot_remove
-
-8) Stats
-========
-
-Per-device statistics are exported as various nodes under /sys/block/zram<id>/
-
-A brief description of exported device attributes. For more details please
-read Documentation/ABI/testing/sysfs-block-zram.
-
-======================  ======  ===============================================
-Name            	access            description
-======================  ======  ===============================================
-disksize          	RW	show and set the device's disk size
-initstate         	RO	shows the initialization state of the device
-reset             	WO	trigger device reset
-mem_used_max      	WO	reset the `mem_used_max` counter (see later)
-mem_limit         	WO	specifies the maximum amount of memory ZRAM can
-				use to store the compressed data
-writeback_limit   	WO	specifies the maximum amount of write IO zram
-				can write out to backing device as 4KB unit
-writeback_limit_enable  RW	show and set writeback_limit feature
-max_comp_streams  	RW	the number of possible concurrent compress
-				operations
-comp_algorithm    	RW	show and change the compression algorithm
-compact           	WO	trigger memory compaction
-debug_stat        	RO	this file is used for zram debugging purposes
-backing_dev	  	RW	set up backend storage for zram to write out
-idle		  	WO	mark allocated slot as idle
-======================  ======  ===============================================
-
-
-User space is advised to use the following files to read the device statistics.
-
-File /sys/block/zram<id>/stat
-
-Represents block layer statistics. Read Documentation/block/stat.rst for
-details.
-
-File /sys/block/zram<id>/io_stat
-
-The stat file represents device's I/O statistics not accounted by block
-layer and, thus, not available in zram<id>/stat file. It consists of a
-single line of text and contains the following stats separated by
-whitespace:
-
- =============    =============================================================
- failed_reads     The number of failed reads
- failed_writes    The number of failed writes
- invalid_io       The number of non-page-size-aligned I/O requests
- notify_free      Depending on device usage scenario it may account
-
-                  a) the number of pages freed because of swap slot free
-                     notifications
-                  b) the number of pages freed because of
-                     REQ_OP_DISCARD requests sent by bio. The former ones are
-                     sent to a swap block device when a swap slot is freed,
-                     which implies that this disk is being used as a swap disk.
-
-                  The latter ones are sent by filesystem mounted with
-                  discard option, whenever some data blocks are getting
-                  discarded.
- =============    =============================================================
-
-File /sys/block/zram<id>/mm_stat
-
-The stat file represents device's mm statistics. It consists of a single
-line of text and contains the following stats separated by whitespace:
-
- ================ =============================================================
- orig_data_size   uncompressed size of data stored in this disk.
-		  This excludes same-element-filled pages (same_pages) since
-		  no memory is allocated for them.
-                  Unit: bytes
- compr_data_size  compressed size of data stored in this disk
- mem_used_total   the amount of memory allocated for this disk. This
-                  includes allocator fragmentation and metadata overhead,
-                  allocated for this disk. So, allocator space efficiency
-                  can be calculated using compr_data_size and this statistic.
-                  Unit: bytes
- mem_limit        the maximum amount of memory ZRAM can use to store
-                  the compressed data
- mem_used_max     the maximum amount of memory zram have consumed to
-                  store the data
- same_pages       the number of same element filled pages written to this disk.
-                  No memory is allocated for such pages.
- pages_compacted  the number of pages freed during compaction
- huge_pages	  the number of incompressible pages
- ================ =============================================================
-
-File /sys/block/zram<id>/bd_stat
-
-The stat file represents device's backing device statistics. It consists of
-a single line of text and contains the following stats separated by whitespace:
-
- ============== =============================================================
- bd_count	size of data written in backing device.
-		Unit: 4K bytes
- bd_reads	the number of reads from backing device
-		Unit: 4K bytes
- bd_writes	the number of writes to backing device
-		Unit: 4K bytes
- ============== =============================================================
-
-9) Deactivate
-=============
-
-::
-
-	swapoff /dev/zram0
-	umount /dev/zram1
-
-10) Reset
-=========
-
-	Write any positive value to 'reset' sysfs node::
-
-		echo 1 > /sys/block/zram0/reset
-		echo 1 > /sys/block/zram1/reset
-
-	This frees all the memory allocated for the given device and
-	resets the disksize to zero. You must set the disksize again
-	before reusing the device.
-
-Optional Feature
-================
-
-writeback
----------
-
-With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
-to backing storage rather than keeping it in memory.
-To use the feature, admin should set up backing device via::
-
-	echo /dev/sda5 > /sys/block/zramX/backing_dev
-
-before disksize setting. It supports only partition at this moment.
-If admin want to use incompressible page writeback, they could do via::
-
-	echo huge > /sys/block/zramX/write
-
-To use idle page writeback, first, user need to declare zram pages
-as idle::
-
-	echo all > /sys/block/zramX/idle
-
-From now on, any pages on zram are idle pages. The idle mark
-will be removed until someone request access of the block.
-IOW, unless there is access request, those pages are still idle pages.
-
-Admin can request writeback of those idle pages at right timing via::
-
-	echo idle > /sys/block/zramX/writeback
-
-With the command, zram writeback idle pages from memory to the storage.
-
-If there are lots of write IO with flash device, potentially, it has
-flash wearout problem so that admin needs to design write limitation
-to guarantee storage health for entire product life.
-
-To overcome the concern, zram supports "writeback_limit" feature.
-The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
-any writeback. IOW, if admin want to apply writeback budget, he should
-enable writeback_limit_enable via::
-
-	$ echo 1 > /sys/block/zramX/writeback_limit_enable
-
-Once writeback_limit_enable is set, zram doesn't allow any writeback
-until admin set the budget via /sys/block/zramX/writeback_limit.
-
-(If admin doesn't enable writeback_limit_enable, writeback_limit's value
-assigned via /sys/block/zramX/writeback_limit is meaninless.)
-
-If admin want to limit writeback as per-day 400M, he could do it
-like below::
-
-	$ MB_SHIFT=20
-	$ 4K_SHIFT=12
-	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
-		/sys/block/zram0/writeback_limit.
-	$ echo 1 > /sys/block/zram0/writeback_limit_enable
-
-If admin want to allow further write again once the bugdet is exausted,
-he could do it like below::
-
-	$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
-		/sys/block/zram0/writeback_limit
-
-If admin want to see remaining writeback budget since he set::
-
-	$ cat /sys/block/zramX/writeback_limit
-
-If admin want to disable writeback limit, he could do::
-
-	$ echo 0 > /sys/block/zramX/writeback_limit_enable
-
-The writeback_limit count will reset whenever you reset zram(e.g.,
-system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
-writeback happened until you reset the zram to allocate extra writeback
-budget in next setting is user's job.
-
-If admin want to measure writeback count in a certain period, he could
-know it via /sys/block/zram0/bd_stat's 3rd column.
-
-memory tracking
-===============
-
-With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
-zram block. It could be useful to catch cold or incompressible
-pages of the process with*pagemap.
-
-If you enable the feature, you could see block state via
-/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
-
-	  300    75.033841 .wh.
-	  301    63.806904 s...
-	  302    63.806919 ..hi
-
-First column
-	zram's block index.
-Second column
-	access time since the system was booted
-Third column
-	state of the block:
-
-	s:
-		same page
-	w:
-		written page to backing store
-	h:
-		huge page
-	i:
-		idle page
-
-First line of above example says 300th block is accessed at 75.033841sec
-and the block's state is huge so it is written back to the backing
-storage. It's a debugging feature so anyone shouldn't rely on it to work
-properly.
-
-Nitin Gupta
-ngupta@vflare.org
diff --git a/MAINTAINERS b/MAINTAINERS
index b36028f43192..699596d931c1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5006,7 +5006,7 @@ T:	git git://git.linbit.com/drbd-8.4.git
 S:	Supported
 F:	drivers/block/drbd/
 F:	lib/lru_cache.c
-F:	Documentation/blockdev/drbd/
+F:	Documentation/admin-guide/blockdev/
 
 DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS
 M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@@ -11076,7 +11076,7 @@ M:	Josef Bacik <josef@toxicpanda.com>
 S:	Maintained
 L:	linux-block@vger.kernel.org
 L:	nbd@other.debian.org
-F:	Documentation/blockdev/nbd.rst
+F:	Documentation/admin-guide/blockdev/nbd.rst
 F:	drivers/block/nbd.c
 F:	include/trace/events/nbd.h
 F:	include/uapi/linux/nbd.h
@@ -12086,7 +12086,7 @@ PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
 M:	Tim Waugh <tim@cyberelk.net>
 L:	linux-parport@lists.infradead.org (subscribers-only)
 S:	Maintained
-F:	Documentation/blockdev/paride.rst
+F:	Documentation/admin-guide/blockdev/paride.rst
 F:	drivers/block/paride/
 
 PARISC ARCHITECTURE
@@ -13367,7 +13367,7 @@ F:	drivers/net/wireless/ralink/rt2x00/
 RAMDISK RAM BLOCK DEVICE DRIVER
 M:	Jens Axboe <axboe@kernel.dk>
 S:	Maintained
-F:	Documentation/blockdev/ramdisk.rst
+F:	Documentation/admin-guide/blockdev/ramdisk.rst
 F:	drivers/block/brd.c
 
 RANCHU VIRTUAL BOARD FOR MIPS
@@ -17723,7 +17723,7 @@ R:	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
 L:	linux-kernel@vger.kernel.org
 S:	Maintained
 F:	drivers/block/zram/
-F:	Documentation/blockdev/zram.rst
+F:	Documentation/admin-guide/blockdev/zram.rst
 
 ZS DECSTATION Z85C30 SERIAL DRIVER
 M:	"Maciej W. Rozycki" <macro@linux-mips.org>
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index c43690b973d8..1bb8ec575352 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -31,7 +31,7 @@ config BLK_DEV_FD
 	  If you want to use the floppy disk drive(s) of your PC under Linux,
 	  say Y. Information about this driver, especially important for IBM
 	  Thinkpad users, is contained in
-	  <file:Documentation/blockdev/floppy.rst>.
+	  <file:Documentation/admin-guide/blockdev/floppy.rst>.
 	  That file also contains the location of the Floppy driver FAQ as
 	  well as location of the fdutils package used to configure additional
 	  parameters of the driver at run time.
@@ -96,7 +96,7 @@ config PARIDE
 	  your computer's parallel port. Most of them are actually IDE devices
 	  using a parallel port IDE adapter. This option enables the PARIDE
 	  subsystem which contains drivers for many of these external drives.
-	  Read <file:Documentation/blockdev/paride.rst> for more information.
+	  Read <file:Documentation/admin-guide/blockdev/paride.rst> for more information.
 
 	  If you have said Y to the "Parallel-port support" configuration
 	  option, you may share a single port between your printer and other
@@ -261,7 +261,7 @@ config BLK_DEV_NBD
 	  userland (making server and client physically the same computer,
 	  communicating using the loopback network device).
 
-	  Read <file:Documentation/blockdev/nbd.rst> for more information,
+	  Read <file:Documentation/admin-guide/blockdev/nbd.rst> for more information,
 	  especially about where to find the server code, which runs in user
 	  space and does not need special kernel support.
 
@@ -303,7 +303,7 @@ config BLK_DEV_RAM
 	  during the initial install of Linux.
 
 	  Note that the kernel command line option "ramdisk=XX" is now obsolete.
-	  For details, read <file:Documentation/blockdev/ramdisk.rst>.
+	  For details, read <file:Documentation/admin-guide/blockdev/ramdisk.rst>.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called brd. An alias "rd" has been defined
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 5c99e52f9dc1..f652c1ac3ae9 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4424,7 +4424,7 @@ static int __init floppy_setup(char *str)
 		pr_cont("\n");
 	} else
 		DPRINT("botched floppy option\n");
-	DPRINT("Read Documentation/blockdev/floppy.rst\n");
+	DPRINT("Read Documentation/admin-guide/blockdev/floppy.rst\n");
 	return 0;
 }
 
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index e06b99d54816..fe7a4b7d30cf 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -12,7 +12,7 @@ config ZRAM
 	  It has several use cases, for example: /tmp storage, use as swap
 	  disks and maybe many more.
 
-	  See Documentation/blockdev/zram.rst for more information.
+	  See Documentation/admin-guide/blockdev/zram.rst for more information.
 
 config ZRAM_WRITEBACK
        bool "Write back incompressible or idle page to backing device"
@@ -26,7 +26,7 @@ config ZRAM_WRITEBACK
 	 With /sys/block/zramX/{idle,writeback}, application could ask
 	 idle page's writeback to the backing device to save in memory.
 
-	 See Documentation/blockdev/zram.rst for more information.
+	 See Documentation/admin-guide/blockdev/zram.rst for more information.
 
 config ZRAM_MEMORY_TRACKING
 	bool "Track zRam block status"
@@ -36,4 +36,4 @@ config ZRAM_MEMORY_TRACKING
 	  of zRAM. Admin could see the information via
 	  /sys/kernel/debug/zram/zramX/block_state.
 
-	  See Documentation/blockdev/zram.rst for more information.
+	  See Documentation/admin-guide/blockdev/zram.rst for more information.
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 5fa378391d3b..110b34834a6f 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -37,4 +37,4 @@ Commands required for testing:
  - mkfs/ mkfs.ext4
 
 For more information please refer:
-kernel-source-tree/Documentation/blockdev/zram.rst
+kernel-source-tree/Documentation/admin-guide/blockdev/zram.rst
-- 
cgit v1.2.3-55-g7522


From 4d3beaa06d3536aa8968d1828a66bd5ccb5036ac Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 19 Apr 2019 21:39:29 -0300
Subject: docs: security: move some books to it and update

The following files belong to security:

  Documentation/security/LSM.rst -> Documentation/security/lsm-development.rst
  Documentation/lsm.txt -> Documentation/security/lsm.rst
  Documentation/SAK.txt -> Documentation/security/sak.rst
  Documentation/siphash.txt -> Documentation/security/siphash.rst

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/SAK.txt                       |  91 -------------
 Documentation/lsm.txt                       | 201 ----------------------------
 Documentation/security/LSM.rst              |  17 ---
 Documentation/security/index.rst            |   5 +-
 Documentation/security/lsm-development.rst  |  17 +++
 Documentation/security/lsm.rst              | 201 ++++++++++++++++++++++++++++
 Documentation/security/sak.rst              |  91 +++++++++++++
 Documentation/security/siphash.rst          | 189 ++++++++++++++++++++++++++
 Documentation/security/tpm/index.rst        |   1 +
 Documentation/security/tpm/xen-tpmfront.rst |   2 -
 Documentation/siphash.txt                   | 189 --------------------------
 11 files changed, 503 insertions(+), 501 deletions(-)
 delete mode 100644 Documentation/SAK.txt
 delete mode 100644 Documentation/lsm.txt
 delete mode 100644 Documentation/security/LSM.rst
 create mode 100644 Documentation/security/lsm-development.rst
 create mode 100644 Documentation/security/lsm.rst
 create mode 100644 Documentation/security/sak.rst
 create mode 100644 Documentation/security/siphash.rst
 delete mode 100644 Documentation/siphash.txt

diff --git a/Documentation/SAK.txt b/Documentation/SAK.txt
deleted file mode 100644
index 260e1d3687bd..000000000000
--- a/Documentation/SAK.txt
+++ /dev/null
@@ -1,91 +0,0 @@
-=========================================
-Linux Secure Attention Key (SAK) handling
-=========================================
-
-:Date: 18 March 2001
-:Author: Andrew Morton
-
-An operating system's Secure Attention Key is a security tool which is
-provided as protection against trojan password capturing programs.  It
-is an undefeatable way of killing all programs which could be
-masquerading as login applications.  Users need to be taught to enter
-this key sequence before they log in to the system.
-
-From the PC keyboard, Linux has two similar but different ways of
-providing SAK.  One is the ALT-SYSRQ-K sequence.  You shouldn't use
-this sequence.  It is only available if the kernel was compiled with
-sysrq support.
-
-The proper way of generating a SAK is to define the key sequence using
-``loadkeys``.  This will work whether or not sysrq support is compiled
-into the kernel.
-
-SAK works correctly when the keyboard is in raw mode.  This means that
-once defined, SAK will kill a running X server.  If the system is in
-run level 5, the X server will restart.  This is what you want to
-happen.
-
-What key sequence should you use? Well, CTRL-ALT-DEL is used to reboot
-the machine.  CTRL-ALT-BACKSPACE is magical to the X server.  We'll
-choose CTRL-ALT-PAUSE.
-
-In your rc.sysinit (or rc.local) file, add the command::
-
-	echo "control alt keycode 101 = SAK" | /bin/loadkeys
-
-And that's it!  Only the superuser may reprogram the SAK key.
-
-
-.. note::
-
-  1. Linux SAK is said to be not a "true SAK" as is required by
-     systems which implement C2 level security.  This author does not
-     know why.
-
-
-  2. On the PC keyboard, SAK kills all applications which have
-     /dev/console opened.
-
-     Unfortunately this includes a number of things which you don't
-     actually want killed.  This is because these applications are
-     incorrectly holding /dev/console open.  Be sure to complain to your
-     Linux distributor about this!
-
-     You can identify processes which will be killed by SAK with the
-     command::
-
-	# ls -l /proc/[0-9]*/fd/* | grep console
-	l-wx------    1 root     root           64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console
-
-     Then::
-
-	# ps aux|grep 579
-	root       579  0.0  0.1  1088  436 ?        S    00:43   0:00 gpm -t ps/2
-
-     So ``gpm`` will be killed by SAK.  This is a bug in gpm.  It should
-     be closing standard input.  You can work around this by finding the
-     initscript which launches gpm and changing it thusly:
-
-     Old::
-
-	daemon gpm
-
-     New::
-
-	daemon gpm < /dev/null
-
-     Vixie cron also seems to have this problem, and needs the same treatment.
-
-     Also, one prominent Linux distribution has the following three
-     lines in its rc.sysinit and rc scripts::
-
-	exec 3<&0
-	exec 4>&1
-	exec 5>&2
-
-     These commands cause **all** daemons which are launched by the
-     initscripts to have file descriptors 3, 4 and 5 attached to
-     /dev/console.  So SAK kills them all.  A workaround is to simply
-     delete these lines, but this may cause system management
-     applications to malfunction - test everything well.
-
diff --git a/Documentation/lsm.txt b/Documentation/lsm.txt
deleted file mode 100644
index ad4dfd020e0d..000000000000
--- a/Documentation/lsm.txt
+++ /dev/null
@@ -1,201 +0,0 @@
-========================================================
-Linux Security Modules: General Security Hooks for Linux
-========================================================
-
-:Author: Stephen Smalley
-:Author: Timothy Fraser
-:Author: Chris Vance
-
-.. note::
-
-   The APIs described in this book are outdated.
-
-Introduction
-============
-
-In March 2001, the National Security Agency (NSA) gave a presentation
-about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit.
-SELinux is an implementation of flexible and fine-grained
-nondiscretionary access controls in the Linux kernel, originally
-implemented as its own particular kernel patch. Several other security
-projects (e.g. RSBAC, Medusa) have also developed flexible access
-control architectures for the Linux kernel, and various projects have
-developed particular access control models for Linux (e.g. LIDS, DTE,
-SubDomain). Each project has developed and maintained its own kernel
-patch to support its security needs.
-
-In response to the NSA presentation, Linus Torvalds made a set of
-remarks that described a security framework he would be willing to
-consider for inclusion in the mainstream Linux kernel. He described a
-general framework that would provide a set of security hooks to control
-operations on kernel objects and a set of opaque security fields in
-kernel data structures for maintaining security attributes. This
-framework could then be used by loadable kernel modules to implement any
-desired model of security. Linus also suggested the possibility of
-migrating the Linux capabilities code into such a module.
-
-The Linux Security Modules (LSM) project was started by WireX to develop
-such a framework. LSM is a joint development effort by several security
-projects, including Immunix, SELinux, SGI and Janus, and several
-individuals, including Greg Kroah-Hartman and James Morris, to develop a
-Linux kernel patch that implements this framework. The patch is
-currently tracking the 2.4 series and is targeted for integration into
-the 2.5 development series. This technical report provides an overview
-of the framework and the example capabilities security module provided
-by the LSM kernel patch.
-
-LSM Framework
-=============
-
-The LSM kernel patch provides a general kernel framework to support
-security modules. In particular, the LSM framework is primarily focused
-on supporting access control modules, although future development is
-likely to address other security needs such as auditing. By itself, the
-framework does not provide any additional security; it merely provides
-the infrastructure to support security modules. The LSM kernel patch
-also moves most of the capabilities logic into an optional security
-module, with the system defaulting to the traditional superuser logic.
-This capabilities module is discussed further in
-`LSM Capabilities Module <#cap>`__.
-
-The LSM kernel patch adds security fields to kernel data structures and
-inserts calls to hook functions at critical points in the kernel code to
-manage the security fields and to perform access control. It also adds
-functions for registering and unregistering security modules, and adds a
-general :c:func:`security()` system call to support new system calls
-for security-aware applications.
-
-The LSM security fields are simply ``void*`` pointers. For process and
-program execution security information, security fields were added to
-:c:type:`struct task_struct <task_struct>` and
-:c:type:`struct linux_binprm <linux_binprm>`. For filesystem
-security information, a security field was added to :c:type:`struct
-super_block <super_block>`. For pipe, file, and socket security
-information, security fields were added to :c:type:`struct inode
-<inode>` and :c:type:`struct file <file>`. For packet and
-network device security information, security fields were added to
-:c:type:`struct sk_buff <sk_buff>` and :c:type:`struct
-net_device <net_device>`. For System V IPC security information,
-security fields were added to :c:type:`struct kern_ipc_perm
-<kern_ipc_perm>` and :c:type:`struct msg_msg
-<msg_msg>`; additionally, the definitions for :c:type:`struct
-msg_msg <msg_msg>`, struct msg_queue, and struct shmid_kernel
-were moved to header files (``include/linux/msg.h`` and
-``include/linux/shm.h`` as appropriate) to allow the security modules to
-use these definitions.
-
-Each LSM hook is a function pointer in a global table, security_ops.
-This table is a :c:type:`struct security_operations
-<security_operations>` structure as defined by
-``include/linux/security.h``. Detailed documentation for each hook is
-included in this header file. At present, this structure consists of a
-collection of substructures that group related hooks based on the kernel
-object (e.g. task, inode, file, sk_buff, etc) as well as some top-level
-hook function pointers for system operations. This structure is likely
-to be flattened in the future for performance. The placement of the hook
-calls in the kernel code is described by the "called:" lines in the
-per-hook documentation in the header file. The hook calls can also be
-easily found in the kernel code by looking for the string
-"security_ops->".
-
-Linus mentioned per-process security hooks in his original remarks as a
-possible alternative to global security hooks. However, if LSM were to
-start from the perspective of per-process hooks, then the base framework
-would have to deal with how to handle operations that involve multiple
-processes (e.g. kill), since each process might have its own hook for
-controlling the operation. This would require a general mechanism for
-composing hooks in the base framework. Additionally, LSM would still
-need global hooks for operations that have no process context (e.g.
-network input operations). Consequently, LSM provides global security
-hooks, but a security module is free to implement per-process hooks
-(where that makes sense) by storing a security_ops table in each
-process' security field and then invoking these per-process hooks from
-the global hooks. The problem of composition is thus deferred to the
-module.
-
-The global security_ops table is initialized to a set of hook functions
-provided by a dummy security module that provides traditional superuser
-logic. A :c:func:`register_security()` function (in
-``security/security.c``) is provided to allow a security module to set
-security_ops to refer to its own hook functions, and an
-:c:func:`unregister_security()` function is provided to revert
-security_ops to the dummy module hooks. This mechanism is used to set
-the primary security module, which is responsible for making the final
-decision for each hook.
-
-LSM also provides a simple mechanism for stacking additional security
-modules with the primary security module. It defines
-:c:func:`register_security()` and
-:c:func:`unregister_security()` hooks in the :c:type:`struct
-security_operations <security_operations>` structure and
-provides :c:func:`mod_reg_security()` and
-:c:func:`mod_unreg_security()` functions that invoke these hooks
-after performing some sanity checking. A security module can call these
-functions in order to stack with other modules. However, the actual
-details of how this stacking is handled are deferred to the module,
-which can implement these hooks in any way it wishes (including always
-returning an error if it does not wish to support stacking). In this
-manner, LSM again defers the problem of composition to the module.
-
-Although the LSM hooks are organized into substructures based on kernel
-object, all of the hooks can be viewed as falling into two major
-categories: hooks that are used to manage the security fields and hooks
-that are used to perform access control. Examples of the first category
-of hooks include the :c:func:`alloc_security()` and
-:c:func:`free_security()` hooks defined for each kernel data
-structure that has a security field. These hooks are used to allocate
-and free security structures for kernel objects. The first category of
-hooks also includes hooks that set information in the security field
-after allocation, such as the :c:func:`post_lookup()` hook in
-:c:type:`struct inode_security_ops <inode_security_ops>`.
-This hook is used to set security information for inodes after
-successful lookup operations. An example of the second category of hooks
-is the :c:func:`permission()` hook in :c:type:`struct
-inode_security_ops <inode_security_ops>`. This hook checks
-permission when accessing an inode.
-
-LSM Capabilities Module
-=======================
-
-The LSM kernel patch moves most of the existing POSIX.1e capabilities
-logic into an optional security module stored in the file
-``security/capability.c``. This change allows users who do not want to
-use capabilities to omit this code entirely from their kernel, instead
-using the dummy module for traditional superuser logic or any other
-module that they desire. This change also allows the developers of the
-capabilities logic to maintain and enhance their code more freely,
-without needing to integrate patches back into the base kernel.
-
-In addition to moving the capabilities logic, the LSM kernel patch could
-move the capability-related fields from the kernel data structures into
-the new security fields managed by the security modules. However, at
-present, the LSM kernel patch leaves the capability fields in the kernel
-data structures. In his original remarks, Linus suggested that this
-might be preferable so that other security modules can be easily stacked
-with the capabilities module without needing to chain multiple security
-structures on the security field. It also avoids imposing extra overhead
-on the capabilities module to manage the security fields. However, the
-LSM framework could certainly support such a move if it is determined to
-be desirable, with only a few additional changes described below.
-
-At present, the capabilities logic for computing process capabilities on
-:c:func:`execve()` and :c:func:`set\*uid()`, checking
-capabilities for a particular process, saving and checking capabilities
-for netlink messages, and handling the :c:func:`capget()` and
-:c:func:`capset()` system calls have been moved into the
-capabilities module. There are still a few locations in the base kernel
-where capability-related fields are directly examined or modified, but
-the current version of the LSM patch does allow a security module to
-completely replace the assignment and testing of capabilities. These few
-locations would need to be changed if the capability-related fields were
-moved into the security field. The following is a list of known
-locations that still perform such direct examination or modification of
-capability-related fields:
-
--  ``fs/open.c``::c:func:`sys_access()`
-
--  ``fs/lockd/host.c``::c:func:`nlm_bind_host()`
-
--  ``fs/nfsd/auth.c``::c:func:`nfsd_setuser()`
-
--  ``fs/proc/array.c``::c:func:`task_cap()`
diff --git a/Documentation/security/LSM.rst b/Documentation/security/LSM.rst
deleted file mode 100644
index 31d92bc5fdd2..000000000000
--- a/Documentation/security/LSM.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-=================================
-Linux Security Module Development
-=================================
-
-Based on https://lkml.org/lkml/2007/10/26/215,
-a new LSM is accepted into the kernel when its intent (a description of
-what it tries to protect against and in what cases one would expect to
-use it) has been appropriately documented in ``Documentation/admin-guide/LSM/``.
-This allows an LSM's code to be easily compared to its goals, and so
-that end users and distros can make a more informed decision about which
-LSMs suit their requirements.
-
-For extensive documentation on the available LSM hook interfaces, please
-see ``include/linux/lsm_hooks.h`` and associated structures:
-
-.. kernel-doc:: include/linux/lsm_hooks.h
-   :internal:
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index aad6d92ffe31..fc503dd689a7 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -8,7 +8,10 @@ Security Documentation
    credentials
    IMA-templates
    keys/index
-   LSM
+   lsm
+   lsm-development
+   sak
    SCTP
    self-protection
+   siphash
    tpm/index
diff --git a/Documentation/security/lsm-development.rst b/Documentation/security/lsm-development.rst
new file mode 100644
index 000000000000..31d92bc5fdd2
--- /dev/null
+++ b/Documentation/security/lsm-development.rst
@@ -0,0 +1,17 @@
+=================================
+Linux Security Module Development
+=================================
+
+Based on https://lkml.org/lkml/2007/10/26/215,
+a new LSM is accepted into the kernel when its intent (a description of
+what it tries to protect against and in what cases one would expect to
+use it) has been appropriately documented in ``Documentation/admin-guide/LSM/``.
+This allows an LSM's code to be easily compared to its goals, and so
+that end users and distros can make a more informed decision about which
+LSMs suit their requirements.
+
+For extensive documentation on the available LSM hook interfaces, please
+see ``include/linux/lsm_hooks.h`` and associated structures:
+
+.. kernel-doc:: include/linux/lsm_hooks.h
+   :internal:
diff --git a/Documentation/security/lsm.rst b/Documentation/security/lsm.rst
new file mode 100644
index 000000000000..ad4dfd020e0d
--- /dev/null
+++ b/Documentation/security/lsm.rst
@@ -0,0 +1,201 @@
+========================================================
+Linux Security Modules: General Security Hooks for Linux
+========================================================
+
+:Author: Stephen Smalley
+:Author: Timothy Fraser
+:Author: Chris Vance
+
+.. note::
+
+   The APIs described in this book are outdated.
+
+Introduction
+============
+
+In March 2001, the National Security Agency (NSA) gave a presentation
+about Security-Enhanced Linux (SELinux) at the 2.5 Linux Kernel Summit.
+SELinux is an implementation of flexible and fine-grained
+nondiscretionary access controls in the Linux kernel, originally
+implemented as its own particular kernel patch. Several other security
+projects (e.g. RSBAC, Medusa) have also developed flexible access
+control architectures for the Linux kernel, and various projects have
+developed particular access control models for Linux (e.g. LIDS, DTE,
+SubDomain). Each project has developed and maintained its own kernel
+patch to support its security needs.
+
+In response to the NSA presentation, Linus Torvalds made a set of
+remarks that described a security framework he would be willing to
+consider for inclusion in the mainstream Linux kernel. He described a
+general framework that would provide a set of security hooks to control
+operations on kernel objects and a set of opaque security fields in
+kernel data structures for maintaining security attributes. This
+framework could then be used by loadable kernel modules to implement any
+desired model of security. Linus also suggested the possibility of
+migrating the Linux capabilities code into such a module.
+
+The Linux Security Modules (LSM) project was started by WireX to develop
+such a framework. LSM is a joint development effort by several security
+projects, including Immunix, SELinux, SGI and Janus, and several
+individuals, including Greg Kroah-Hartman and James Morris, to develop a
+Linux kernel patch that implements this framework. The patch is
+currently tracking the 2.4 series and is targeted for integration into
+the 2.5 development series. This technical report provides an overview
+of the framework and the example capabilities security module provided
+by the LSM kernel patch.
+
+LSM Framework
+=============
+
+The LSM kernel patch provides a general kernel framework to support
+security modules. In particular, the LSM framework is primarily focused
+on supporting access control modules, although future development is
+likely to address other security needs such as auditing. By itself, the
+framework does not provide any additional security; it merely provides
+the infrastructure to support security modules. The LSM kernel patch
+also moves most of the capabilities logic into an optional security
+module, with the system defaulting to the traditional superuser logic.
+This capabilities module is discussed further in
+`LSM Capabilities Module <#cap>`__.
+
+The LSM kernel patch adds security fields to kernel data structures and
+inserts calls to hook functions at critical points in the kernel code to
+manage the security fields and to perform access control. It also adds
+functions for registering and unregistering security modules, and adds a
+general :c:func:`security()` system call to support new system calls
+for security-aware applications.
+
+The LSM security fields are simply ``void*`` pointers. For process and
+program execution security information, security fields were added to
+:c:type:`struct task_struct <task_struct>` and
+:c:type:`struct linux_binprm <linux_binprm>`. For filesystem
+security information, a security field was added to :c:type:`struct
+super_block <super_block>`. For pipe, file, and socket security
+information, security fields were added to :c:type:`struct inode
+<inode>` and :c:type:`struct file <file>`. For packet and
+network device security information, security fields were added to
+:c:type:`struct sk_buff <sk_buff>` and :c:type:`struct
+net_device <net_device>`. For System V IPC security information,
+security fields were added to :c:type:`struct kern_ipc_perm
+<kern_ipc_perm>` and :c:type:`struct msg_msg
+<msg_msg>`; additionally, the definitions for :c:type:`struct
+msg_msg <msg_msg>`, struct msg_queue, and struct shmid_kernel
+were moved to header files (``include/linux/msg.h`` and
+``include/linux/shm.h`` as appropriate) to allow the security modules to
+use these definitions.
+
+Each LSM hook is a function pointer in a global table, security_ops.
+This table is a :c:type:`struct security_operations
+<security_operations>` structure as defined by
+``include/linux/security.h``. Detailed documentation for each hook is
+included in this header file. At present, this structure consists of a
+collection of substructures that group related hooks based on the kernel
+object (e.g. task, inode, file, sk_buff, etc) as well as some top-level
+hook function pointers for system operations. This structure is likely
+to be flattened in the future for performance. The placement of the hook
+calls in the kernel code is described by the "called:" lines in the
+per-hook documentation in the header file. The hook calls can also be
+easily found in the kernel code by looking for the string
+"security_ops->".
+
+Linus mentioned per-process security hooks in his original remarks as a
+possible alternative to global security hooks. However, if LSM were to
+start from the perspective of per-process hooks, then the base framework
+would have to deal with how to handle operations that involve multiple
+processes (e.g. kill), since each process might have its own hook for
+controlling the operation. This would require a general mechanism for
+composing hooks in the base framework. Additionally, LSM would still
+need global hooks for operations that have no process context (e.g.
+network input operations). Consequently, LSM provides global security
+hooks, but a security module is free to implement per-process hooks
+(where that makes sense) by storing a security_ops table in each
+process' security field and then invoking these per-process hooks from
+the global hooks. The problem of composition is thus deferred to the
+module.
+
+The global security_ops table is initialized to a set of hook functions
+provided by a dummy security module that provides traditional superuser
+logic. A :c:func:`register_security()` function (in
+``security/security.c``) is provided to allow a security module to set
+security_ops to refer to its own hook functions, and an
+:c:func:`unregister_security()` function is provided to revert
+security_ops to the dummy module hooks. This mechanism is used to set
+the primary security module, which is responsible for making the final
+decision for each hook.
+
+LSM also provides a simple mechanism for stacking additional security
+modules with the primary security module. It defines
+:c:func:`register_security()` and
+:c:func:`unregister_security()` hooks in the :c:type:`struct
+security_operations <security_operations>` structure and
+provides :c:func:`mod_reg_security()` and
+:c:func:`mod_unreg_security()` functions that invoke these hooks
+after performing some sanity checking. A security module can call these
+functions in order to stack with other modules. However, the actual
+details of how this stacking is handled are deferred to the module,
+which can implement these hooks in any way it wishes (including always
+returning an error if it does not wish to support stacking). In this
+manner, LSM again defers the problem of composition to the module.
+
+Although the LSM hooks are organized into substructures based on kernel
+object, all of the hooks can be viewed as falling into two major
+categories: hooks that are used to manage the security fields and hooks
+that are used to perform access control. Examples of the first category
+of hooks include the :c:func:`alloc_security()` and
+:c:func:`free_security()` hooks defined for each kernel data
+structure that has a security field. These hooks are used to allocate
+and free security structures for kernel objects. The first category of
+hooks also includes hooks that set information in the security field
+after allocation, such as the :c:func:`post_lookup()` hook in
+:c:type:`struct inode_security_ops <inode_security_ops>`.
+This hook is used to set security information for inodes after
+successful lookup operations. An example of the second category of hooks
+is the :c:func:`permission()` hook in :c:type:`struct
+inode_security_ops <inode_security_ops>`. This hook checks
+permission when accessing an inode.
+
+LSM Capabilities Module
+=======================
+
+The LSM kernel patch moves most of the existing POSIX.1e capabilities
+logic into an optional security module stored in the file
+``security/capability.c``. This change allows users who do not want to
+use capabilities to omit this code entirely from their kernel, instead
+using the dummy module for traditional superuser logic or any other
+module that they desire. This change also allows the developers of the
+capabilities logic to maintain and enhance their code more freely,
+without needing to integrate patches back into the base kernel.
+
+In addition to moving the capabilities logic, the LSM kernel patch could
+move the capability-related fields from the kernel data structures into
+the new security fields managed by the security modules. However, at
+present, the LSM kernel patch leaves the capability fields in the kernel
+data structures. In his original remarks, Linus suggested that this
+might be preferable so that other security modules can be easily stacked
+with the capabilities module without needing to chain multiple security
+structures on the security field. It also avoids imposing extra overhead
+on the capabilities module to manage the security fields. However, the
+LSM framework could certainly support such a move if it is determined to
+be desirable, with only a few additional changes described below.
+
+At present, the capabilities logic for computing process capabilities on
+:c:func:`execve()` and :c:func:`set\*uid()`, checking
+capabilities for a particular process, saving and checking capabilities
+for netlink messages, and handling the :c:func:`capget()` and
+:c:func:`capset()` system calls have been moved into the
+capabilities module. There are still a few locations in the base kernel
+where capability-related fields are directly examined or modified, but
+the current version of the LSM patch does allow a security module to
+completely replace the assignment and testing of capabilities. These few
+locations would need to be changed if the capability-related fields were
+moved into the security field. The following is a list of known
+locations that still perform such direct examination or modification of
+capability-related fields:
+
+-  ``fs/open.c``::c:func:`sys_access()`
+
+-  ``fs/lockd/host.c``::c:func:`nlm_bind_host()`
+
+-  ``fs/nfsd/auth.c``::c:func:`nfsd_setuser()`
+
+-  ``fs/proc/array.c``::c:func:`task_cap()`
diff --git a/Documentation/security/sak.rst b/Documentation/security/sak.rst
new file mode 100644
index 000000000000..260e1d3687bd
--- /dev/null
+++ b/Documentation/security/sak.rst
@@ -0,0 +1,91 @@
+=========================================
+Linux Secure Attention Key (SAK) handling
+=========================================
+
+:Date: 18 March 2001
+:Author: Andrew Morton
+
+An operating system's Secure Attention Key is a security tool which is
+provided as protection against trojan password capturing programs.  It
+is an undefeatable way of killing all programs which could be
+masquerading as login applications.  Users need to be taught to enter
+this key sequence before they log in to the system.
+
+From the PC keyboard, Linux has two similar but different ways of
+providing SAK.  One is the ALT-SYSRQ-K sequence.  You shouldn't use
+this sequence.  It is only available if the kernel was compiled with
+sysrq support.
+
+The proper way of generating a SAK is to define the key sequence using
+``loadkeys``.  This will work whether or not sysrq support is compiled
+into the kernel.
+
+SAK works correctly when the keyboard is in raw mode.  This means that
+once defined, SAK will kill a running X server.  If the system is in
+run level 5, the X server will restart.  This is what you want to
+happen.
+
+What key sequence should you use? Well, CTRL-ALT-DEL is used to reboot
+the machine.  CTRL-ALT-BACKSPACE is magical to the X server.  We'll
+choose CTRL-ALT-PAUSE.
+
+In your rc.sysinit (or rc.local) file, add the command::
+
+	echo "control alt keycode 101 = SAK" | /bin/loadkeys
+
+And that's it!  Only the superuser may reprogram the SAK key.
+
+
+.. note::
+
+  1. Linux SAK is said to be not a "true SAK" as is required by
+     systems which implement C2 level security.  This author does not
+     know why.
+
+
+  2. On the PC keyboard, SAK kills all applications which have
+     /dev/console opened.
+
+     Unfortunately this includes a number of things which you don't
+     actually want killed.  This is because these applications are
+     incorrectly holding /dev/console open.  Be sure to complain to your
+     Linux distributor about this!
+
+     You can identify processes which will be killed by SAK with the
+     command::
+
+	# ls -l /proc/[0-9]*/fd/* | grep console
+	l-wx------    1 root     root           64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console
+
+     Then::
+
+	# ps aux|grep 579
+	root       579  0.0  0.1  1088  436 ?        S    00:43   0:00 gpm -t ps/2
+
+     So ``gpm`` will be killed by SAK.  This is a bug in gpm.  It should
+     be closing standard input.  You can work around this by finding the
+     initscript which launches gpm and changing it thusly:
+
+     Old::
+
+	daemon gpm
+
+     New::
+
+	daemon gpm < /dev/null
+
+     Vixie cron also seems to have this problem, and needs the same treatment.
+
+     Also, one prominent Linux distribution has the following three
+     lines in its rc.sysinit and rc scripts::
+
+	exec 3<&0
+	exec 4>&1
+	exec 5>&2
+
+     These commands cause **all** daemons which are launched by the
+     initscripts to have file descriptors 3, 4 and 5 attached to
+     /dev/console.  So SAK kills them all.  A workaround is to simply
+     delete these lines, but this may cause system management
+     applications to malfunction - test everything well.
+
diff --git a/Documentation/security/siphash.rst b/Documentation/security/siphash.rst
new file mode 100644
index 000000000000..9965821ab333
--- /dev/null
+++ b/Documentation/security/siphash.rst
@@ -0,0 +1,189 @@
+===========================
+SipHash - a short input PRF
+===========================
+
+:Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
+
+SipHash is a cryptographically secure PRF -- a keyed hash function -- that
+performs very well for short inputs, hence the name. It was designed by
+cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended
+as a replacement for some uses of: `jhash`, `md5_transform`, `sha_transform`,
+and so forth.
+
+SipHash takes a secret key filled with randomly generated numbers and either
+an input buffer or several input integers. It spits out an integer that is
+indistinguishable from random. You may then use that integer as part of secure
+sequence numbers, secure cookies, or mask it off for use in a hash table.
+
+Generating a key
+================
+
+Keys should always be generated from a cryptographically secure source of
+random numbers, either using get_random_bytes or get_random_once::
+
+	siphash_key_t key;
+	get_random_bytes(&key, sizeof(key));
+
+If you're not deriving your key from here, you're doing it wrong.
+
+Using the functions
+===================
+
+There are two variants of the function, one that takes a list of integers, and
+one that takes a buffer::
+
+	u64 siphash(const void *data, size_t len, const siphash_key_t *key);
+
+And::
+
+	u64 siphash_1u64(u64, const siphash_key_t *key);
+	u64 siphash_2u64(u64, u64, const siphash_key_t *key);
+	u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key);
+	u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key);
+	u64 siphash_1u32(u32, const siphash_key_t *key);
+	u64 siphash_2u32(u32, u32, const siphash_key_t *key);
+	u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key);
+	u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key);
+
+If you pass the generic siphash function something of a constant length, it
+will constant fold at compile-time and automatically choose one of the
+optimized functions.
+
+Hashtable key function usage::
+
+	struct some_hashtable {
+		DECLARE_HASHTABLE(hashtable, 8);
+		siphash_key_t key;
+	};
+
+	void init_hashtable(struct some_hashtable *table)
+	{
+		get_random_bytes(&table->key, sizeof(table->key));
+	}
+
+	static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
+	{
+		return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
+	}
+
+You may then iterate like usual over the returned hash bucket.
+
+Security
+========
+
+SipHash has a very high security margin, with its 128-bit key. So long as the
+key is kept secret, it is impossible for an attacker to guess the outputs of
+the function, even if being able to observe many outputs, since 2^128 outputs
+is significant.
+
+Linux implements the "2-4" variant of SipHash.
+
+Struct-passing Pitfalls
+=======================
+
+Often times the XuY functions will not be large enough, and instead you'll
+want to pass a pre-filled struct to siphash. When doing this, it's important
+to always ensure the struct has no padding holes. The easiest way to do this
+is to simply arrange the members of the struct in descending order of size,
+and to use offsetendof() instead of sizeof() for getting the size. For
+performance reasons, if possible, it's probably a good thing to align the
+struct to the right boundary. Here's an example::
+
+	const struct {
+		struct in6_addr saddr;
+		u32 counter;
+		u16 dport;
+	} __aligned(SIPHASH_ALIGNMENT) combined = {
+		.saddr = *(struct in6_addr *)saddr,
+		.counter = counter,
+		.dport = dport
+	};
+	u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret);
+
+Resources
+=========
+
+Read the SipHash paper if you're interested in learning more:
+https://131002.net/siphash/siphash.pdf
+
+-------------------------------------------------------------------------------
+
+===============================================
+HalfSipHash - SipHash's insecure younger cousin
+===============================================
+
+:Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
+
+On the off-chance that SipHash is not fast enough for your needs, you might be
+able to justify using HalfSipHash, a terrifying but potentially useful
+possibility. HalfSipHash cuts SipHash's rounds down from "2-4" to "1-3" and,
+even scarier, uses an easily brute-forcable 64-bit key (with a 32-bit output)
+instead of SipHash's 128-bit key. However, this may appeal to some
+high-performance `jhash` users.
+
+Danger!
+
+Do not ever use HalfSipHash except for as a hashtable key function, and only
+then when you can be absolutely certain that the outputs will never be
+transmitted out of the kernel. This is only remotely useful over `jhash` as a
+means of mitigating hashtable flooding denial of service attacks.
+
+Generating a key
+================
+
+Keys should always be generated from a cryptographically secure source of
+random numbers, either using get_random_bytes or get_random_once:
+
+hsiphash_key_t key;
+get_random_bytes(&key, sizeof(key));
+
+If you're not deriving your key from here, you're doing it wrong.
+
+Using the functions
+===================
+
+There are two variants of the function, one that takes a list of integers, and
+one that takes a buffer::
+
+	u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key);
+
+And::
+
+	u32 hsiphash_1u32(u32, const hsiphash_key_t *key);
+	u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key);
+	u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key);
+	u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key);
+
+If you pass the generic hsiphash function something of a constant length, it
+will constant fold at compile-time and automatically choose one of the
+optimized functions.
+
+Hashtable key function usage
+============================
+
+::
+
+	struct some_hashtable {
+		DECLARE_HASHTABLE(hashtable, 8);
+		hsiphash_key_t key;
+	};
+
+	void init_hashtable(struct some_hashtable *table)
+	{
+		get_random_bytes(&table->key, sizeof(table->key));
+	}
+
+	static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
+	{
+		return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
+	}
+
+You may then iterate like usual over the returned hash bucket.
+
+Performance
+===========
+
+HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements,
+this will not be a problem, as the hashtable lookup isn't the bottleneck. And
+in general, this is probably a good sacrifice to make for the security and DoS
+resistance of HalfSipHash.
diff --git a/Documentation/security/tpm/index.rst b/Documentation/security/tpm/index.rst
index af77a7bbb070..3296533e54cf 100644
--- a/Documentation/security/tpm/index.rst
+++ b/Documentation/security/tpm/index.rst
@@ -5,3 +5,4 @@ Trusted Platform Module documentation
 .. toctree::
 
    tpm_vtpm_proxy
+   xen-tpmfront
diff --git a/Documentation/security/tpm/xen-tpmfront.rst b/Documentation/security/tpm/xen-tpmfront.rst
index 98a16ab87360..00d5b1db227d 100644
--- a/Documentation/security/tpm/xen-tpmfront.rst
+++ b/Documentation/security/tpm/xen-tpmfront.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ﻿=============================
 Virtual TPM interface for Xen
 =============================
diff --git a/Documentation/siphash.txt b/Documentation/siphash.txt
deleted file mode 100644
index 9965821ab333..000000000000
--- a/Documentation/siphash.txt
+++ /dev/null
@@ -1,189 +0,0 @@
-===========================
-SipHash - a short input PRF
-===========================
-
-:Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
-
-SipHash is a cryptographically secure PRF -- a keyed hash function -- that
-performs very well for short inputs, hence the name. It was designed by
-cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended
-as a replacement for some uses of: `jhash`, `md5_transform`, `sha_transform`,
-and so forth.
-
-SipHash takes a secret key filled with randomly generated numbers and either
-an input buffer or several input integers. It spits out an integer that is
-indistinguishable from random. You may then use that integer as part of secure
-sequence numbers, secure cookies, or mask it off for use in a hash table.
-
-Generating a key
-================
-
-Keys should always be generated from a cryptographically secure source of
-random numbers, either using get_random_bytes or get_random_once::
-
-	siphash_key_t key;
-	get_random_bytes(&key, sizeof(key));
-
-If you're not deriving your key from here, you're doing it wrong.
-
-Using the functions
-===================
-
-There are two variants of the function, one that takes a list of integers, and
-one that takes a buffer::
-
-	u64 siphash(const void *data, size_t len, const siphash_key_t *key);
-
-And::
-
-	u64 siphash_1u64(u64, const siphash_key_t *key);
-	u64 siphash_2u64(u64, u64, const siphash_key_t *key);
-	u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key);
-	u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key);
-	u64 siphash_1u32(u32, const siphash_key_t *key);
-	u64 siphash_2u32(u32, u32, const siphash_key_t *key);
-	u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key);
-	u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key);
-
-If you pass the generic siphash function something of a constant length, it
-will constant fold at compile-time and automatically choose one of the
-optimized functions.
-
-Hashtable key function usage::
-
-	struct some_hashtable {
-		DECLARE_HASHTABLE(hashtable, 8);
-		siphash_key_t key;
-	};
-
-	void init_hashtable(struct some_hashtable *table)
-	{
-		get_random_bytes(&table->key, sizeof(table->key));
-	}
-
-	static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
-	{
-		return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
-	}
-
-You may then iterate like usual over the returned hash bucket.
-
-Security
-========
-
-SipHash has a very high security margin, with its 128-bit key. So long as the
-key is kept secret, it is impossible for an attacker to guess the outputs of
-the function, even if being able to observe many outputs, since 2^128 outputs
-is significant.
-
-Linux implements the "2-4" variant of SipHash.
-
-Struct-passing Pitfalls
-=======================
-
-Often times the XuY functions will not be large enough, and instead you'll
-want to pass a pre-filled struct to siphash. When doing this, it's important
-to always ensure the struct has no padding holes. The easiest way to do this
-is to simply arrange the members of the struct in descending order of size,
-and to use offsetendof() instead of sizeof() for getting the size. For
-performance reasons, if possible, it's probably a good thing to align the
-struct to the right boundary. Here's an example::
-
-	const struct {
-		struct in6_addr saddr;
-		u32 counter;
-		u16 dport;
-	} __aligned(SIPHASH_ALIGNMENT) combined = {
-		.saddr = *(struct in6_addr *)saddr,
-		.counter = counter,
-		.dport = dport
-	};
-	u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret);
-
-Resources
-=========
-
-Read the SipHash paper if you're interested in learning more:
-https://131002.net/siphash/siphash.pdf
-
--------------------------------------------------------------------------------
-
-===============================================
-HalfSipHash - SipHash's insecure younger cousin
-===============================================
-
-:Author: Written by Jason A. Donenfeld <jason@zx2c4.com>
-
-On the off-chance that SipHash is not fast enough for your needs, you might be
-able to justify using HalfSipHash, a terrifying but potentially useful
-possibility. HalfSipHash cuts SipHash's rounds down from "2-4" to "1-3" and,
-even scarier, uses an easily brute-forcable 64-bit key (with a 32-bit output)
-instead of SipHash's 128-bit key. However, this may appeal to some
-high-performance `jhash` users.
-
-Danger!
-
-Do not ever use HalfSipHash except for as a hashtable key function, and only
-then when you can be absolutely certain that the outputs will never be
-transmitted out of the kernel. This is only remotely useful over `jhash` as a
-means of mitigating hashtable flooding denial of service attacks.
-
-Generating a key
-================
-
-Keys should always be generated from a cryptographically secure source of
-random numbers, either using get_random_bytes or get_random_once:
-
-hsiphash_key_t key;
-get_random_bytes(&key, sizeof(key));
-
-If you're not deriving your key from here, you're doing it wrong.
-
-Using the functions
-===================
-
-There are two variants of the function, one that takes a list of integers, and
-one that takes a buffer::
-
-	u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key);
-
-And::
-
-	u32 hsiphash_1u32(u32, const hsiphash_key_t *key);
-	u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key);
-	u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key);
-	u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key);
-
-If you pass the generic hsiphash function something of a constant length, it
-will constant fold at compile-time and automatically choose one of the
-optimized functions.
-
-Hashtable key function usage
-============================
-
-::
-
-	struct some_hashtable {
-		DECLARE_HASHTABLE(hashtable, 8);
-		hsiphash_key_t key;
-	};
-
-	void init_hashtable(struct some_hashtable *table)
-	{
-		get_random_bytes(&table->key, sizeof(table->key));
-	}
-
-	static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input)
-	{
-		return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)];
-	}
-
-You may then iterate like usual over the returned hash bucket.
-
-Performance
-===========
-
-HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements,
-this will not be a problem, as the hashtable lookup isn't the bottleneck. And
-in general, this is probably a good sacrifice to make for the security and DoS
-resistance of HalfSipHash.
-- 
cgit v1.2.3-55-g7522


From e8d776f20f92b9c679bcdcbdf3aee5026d5265f5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sat, 20 Apr 2019 09:20:52 -0300
Subject: docs: x86: move two x86-specific files to x86 arch dir

Those two docs belong to the x86 architecture:

   Documentation/Intel-IOMMU.txt -> Documentation/x86/intel-iommu.rst
   Documentation/intel_txt.txt -> Documentation/x86/intel_txt.rst

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/Intel-IOMMU.txt     | 114 -------------------
 Documentation/intel_txt.txt       | 227 --------------------------------------
 Documentation/x86/index.rst       |   2 +
 Documentation/x86/intel-iommu.rst | 114 +++++++++++++++++++
 Documentation/x86/intel_txt.rst   | 227 ++++++++++++++++++++++++++++++++++++++
 MAINTAINERS                       |   2 +-
 security/Kconfig                  |   2 +-
 7 files changed, 345 insertions(+), 343 deletions(-)
 delete mode 100644 Documentation/Intel-IOMMU.txt
 delete mode 100644 Documentation/intel_txt.txt
 create mode 100644 Documentation/x86/intel-iommu.rst
 create mode 100644 Documentation/x86/intel_txt.rst

diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt
deleted file mode 100644
index 9dae6b47e398..000000000000
--- a/Documentation/Intel-IOMMU.txt
+++ /dev/null
@@ -1,114 +0,0 @@
-===================
-Linux IOMMU Support
-===================
-
-The architecture spec can be obtained from the below location.
-
-http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
-
-This guide gives a quick cheat sheet for some basic understanding.
-
-Some Keywords
-
-- DMAR - DMA remapping
-- DRHD - DMA Remapping Hardware Unit Definition
-- RMRR - Reserved memory Region Reporting Structure
-- ZLR  - Zero length reads from PCI devices
-- IOVA - IO Virtual address.
-
-Basic stuff
------------
-
-ACPI enumerates and lists the different DMA engines in the platform, and
-device scope relationships between PCI devices and which DMA engine  controls
-them.
-
-What is RMRR?
--------------
-
-There are some devices the BIOS controls, for e.g USB devices to perform
-PS2 emulation. The regions of memory used for these devices are marked
-reserved in the e820 map. When we turn on DMA translation, DMA to those
-regions will fail. Hence BIOS uses RMRR to specify these regions along with
-devices that need to access these regions. OS is expected to setup
-unity mappings for these regions for these devices to access these regions.
-
-How is IOVA generated?
-----------------------
-
-Well behaved drivers call pci_map_*() calls before sending command to device
-that needs to perform DMA. Once DMA is completed and mapping is no longer
-required, device performs a pci_unmap_*() calls to unmap the region.
-
-The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
-device has its own domain (hence protection). Devices under p2p bridges
-share the virtual address with all devices under the p2p bridge due to
-transaction id aliasing for p2p bridges.
-
-IOVA generation is pretty generic. We used the same technique as vmalloc()
-but these are not global address spaces, but separate for each domain.
-Different DMA engines may support different number of domains.
-
-We also allocate guard pages with each mapping, so we can attempt to catch
-any overflow that might happen.
-
-
-Graphics Problems?
-------------------
-If you encounter issues with graphics devices, you can try adding
-option intel_iommu=igfx_off to turn off the integrated graphics engine.
-If this fixes anything, please ensure you file a bug reporting the problem.
-
-Some exceptions to IOVA
------------------------
-Interrupt ranges are not address translated, (0xfee00000 - 0xfeefffff).
-The same is true for peer to peer transactions. Hence we reserve the
-address from PCI MMIO ranges so they are not allocated for IOVA addresses.
-
-
-Fault reporting
----------------
-When errors are reported, the DMA engine signals via an interrupt. The fault
-reason and device that caused it with fault reason is printed on console.
-
-See below for sample.
-
-
-Boot Message Sample
--------------------
-
-Something like this gets printed indicating presence of DMAR tables
-in ACPI.
-
-ACPI: DMAR (v001 A M I  OEMDMAR  0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
-
-When DMAR is being processed and initialized by ACPI, prints DMAR locations
-and any RMRR's processed::
-
-	ACPI DMAR:Host address width 36
-	ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
-	ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
-	ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
-	ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
-	ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
-
-When DMAR is enabled for use, you will notice..
-
-PCI-DMA: Using DMAR IOMMU
-
-Fault reporting
----------------
-
-::
-
-	DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
-	DMAR:[fault reason 05] PTE Write access is not set
-	DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
-	DMAR:[fault reason 05] PTE Write access is not set
-
-TBD
-----
-
-- For compatibility testing, could use unity map domain for all devices, just
-  provide a 1-1 for all useful memory under a single domain for all devices.
-- API for paravirt ops for abstracting functionality for VMM folks.
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt
deleted file mode 100644
index d83c1a2122c9..000000000000
--- a/Documentation/intel_txt.txt
+++ /dev/null
@@ -1,227 +0,0 @@
-=====================
-Intel(R) TXT Overview
-=====================
-
-Intel's technology for safer computing, Intel(R) Trusted Execution
-Technology (Intel(R) TXT), defines platform-level enhancements that
-provide the building blocks for creating trusted platforms.
-
-Intel TXT was formerly known by the code name LaGrande Technology (LT).
-
-Intel TXT in Brief:
-
--  Provides dynamic root of trust for measurement (DRTM)
--  Data protection in case of improper shutdown
--  Measurement and verification of launched environment
-
-Intel TXT is part of the vPro(TM) brand and is also available some
-non-vPro systems.  It is currently available on desktop systems
-based on the Q35, X38, Q45, and Q43 Express chipsets (e.g. Dell
-Optiplex 755, HP dc7800, etc.) and mobile systems based on the GM45,
-PM45, and GS45 Express chipsets.
-
-For more information, see http://www.intel.com/technology/security/.
-This site also has a link to the Intel TXT MLE Developers Manual,
-which has been updated for the new released platforms.
-
-Intel TXT has been presented at various events over the past few
-years, some of which are:
-
-      - LinuxTAG 2008:
-          http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag.html
-
-      - TRUST2008:
-          http://www.trust-conference.eu/downloads/Keynote-Speakers/
-          3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf
-
-      - IDF, Shanghai:
-          http://www.prcidf.com.cn/index_en.html
-
-      - IDFs 2006, 2007
-	  (I'm not sure if/where they are online)
-
-Trusted Boot Project Overview
-=============================
-
-Trusted Boot (tboot) is an open source, pre-kernel/VMM module that
-uses Intel TXT to perform a measured and verified launch of an OS
-kernel/VMM.
-
-It is hosted on SourceForge at http://sourceforge.net/projects/tboot.
-The mercurial source repo is available at http://www.bughost.org/
-repos.hg/tboot.hg.
-
-Tboot currently supports launching Xen (open source VMM/hypervisor
-w/ TXT support since v3.2), and now Linux kernels.
-
-
-Value Proposition for Linux or "Why should you care?"
-=====================================================
-
-While there are many products and technologies that attempt to
-measure or protect the integrity of a running kernel, they all
-assume the kernel is "good" to begin with.  The Integrity
-Measurement Architecture (IMA) and Linux Integrity Module interface
-are examples of such solutions.
-
-To get trust in the initial kernel without using Intel TXT, a
-static root of trust must be used.  This bases trust in BIOS
-starting at system reset and requires measurement of all code
-executed between system reset through the completion of the kernel
-boot as well as data objects used by that code.  In the case of a
-Linux kernel, this means all of BIOS, any option ROMs, the
-bootloader and the boot config.  In practice, this is a lot of
-code/data, much of which is subject to change from boot to boot
-(e.g. changing NICs may change option ROMs).  Without reference
-hashes, these measurement changes are difficult to assess or
-confirm as benign.  This process also does not provide DMA
-protection, memory configuration/alias checks and locks, crash
-protection, or policy support.
-
-By using the hardware-based root of trust that Intel TXT provides,
-many of these issues can be mitigated.  Specifically: many
-pre-launch components can be removed from the trust chain, DMA
-protection is provided to all launched components, a large number
-of platform configuration checks are performed and values locked,
-protection is provided for any data in the event of an improper
-shutdown, and there is support for policy-based execution/verification.
-This provides a more stable measurement and a higher assurance of
-system configuration and initial state than would be otherwise
-possible.  Since the tboot project is open source, source code for
-almost all parts of the trust chain is available (excepting SMM and
-Intel-provided firmware).
-
-How Does it Work?
-=================
-
--  Tboot is an executable that is launched by the bootloader as
-   the "kernel" (the binary the bootloader executes).
--  It performs all of the work necessary to determine if the
-   platform supports Intel TXT and, if so, executes the GETSEC[SENTER]
-   processor instruction that initiates the dynamic root of trust.
-
-   -  If tboot determines that the system does not support Intel TXT
-      or is not configured correctly (e.g. the SINIT AC Module was
-      incorrect), it will directly launch the kernel with no changes
-      to any state.
-   -  Tboot will output various information about its progress to the
-      terminal, serial port, and/or an in-memory log; the output
-      locations can be configured with a command line switch.
-
--  The GETSEC[SENTER] instruction will return control to tboot and
-   tboot then verifies certain aspects of the environment (e.g. TPM NV
-   lock, e820 table does not have invalid entries, etc.).
--  It will wake the APs from the special sleep state the GETSEC[SENTER]
-   instruction had put them in and place them into a wait-for-SIPI
-   state.
-
-   -  Because the processors will not respond to an INIT or SIPI when
-      in the TXT environment, it is necessary to create a small VT-x
-      guest for the APs.  When they run in this guest, they will
-      simply wait for the INIT-SIPI-SIPI sequence, which will cause
-      VMEXITs, and then disable VT and jump to the SIPI vector.  This
-      approach seemed like a better choice than having to insert
-      special code into the kernel's MP wakeup sequence.
-
--  Tboot then applies an (optional) user-defined launch policy to
-   verify the kernel and initrd.
-
-   -  This policy is rooted in TPM NV and is described in the tboot
-      project.  The tboot project also contains code for tools to
-      create and provision the policy.
-   -  Policies are completely under user control and if not present
-      then any kernel will be launched.
-   -  Policy action is flexible and can include halting on failures
-      or simply logging them and continuing.
-
--  Tboot adjusts the e820 table provided by the bootloader to reserve
-   its own location in memory as well as to reserve certain other
-   TXT-related regions.
--  As part of its launch, tboot DMA protects all of RAM (using the
-   VT-d PMRs).  Thus, the kernel must be booted with 'intel_iommu=on'
-   in order to remove this blanket protection and use VT-d's
-   page-level protection.
--  Tboot will populate a shared page with some data about itself and
-   pass this to the Linux kernel as it transfers control.
-
-   -  The location of the shared page is passed via the boot_params
-      struct as a physical address.
-
--  The kernel will look for the tboot shared page address and, if it
-   exists, map it.
--  As one of the checks/protections provided by TXT, it makes a copy
-   of the VT-d DMARs in a DMA-protected region of memory and verifies
-   them for correctness.  The VT-d code will detect if the kernel was
-   launched with tboot and use this copy instead of the one in the
-   ACPI table.
--  At this point, tboot and TXT are out of the picture until a
-   shutdown (S<n>)
--  In order to put a system into any of the sleep states after a TXT
-   launch, TXT must first be exited.  This is to prevent attacks that
-   attempt to crash the system to gain control on reboot and steal
-   data left in memory.
-
-   -  The kernel will perform all of its sleep preparation and
-      populate the shared page with the ACPI data needed to put the
-      platform in the desired sleep state.
-   -  Then the kernel jumps into tboot via the vector specified in the
-      shared page.
-   -  Tboot will clean up the environment and disable TXT, then use the
-      kernel-provided ACPI information to actually place the platform
-      into the desired sleep state.
-   -  In the case of S3, tboot will also register itself as the resume
-      vector.  This is necessary because it must re-establish the
-      measured environment upon resume.  Once the TXT environment
-      has been restored, it will restore the TPM PCRs and then
-      transfer control back to the kernel's S3 resume vector.
-      In order to preserve system integrity across S3, the kernel
-      provides tboot with a set of memory ranges (RAM and RESERVED_KERN
-      in the e820 table, but not any memory that BIOS might alter over
-      the S3 transition) that tboot will calculate a MAC (message
-      authentication code) over and then seal with the TPM. On resume
-      and once the measured environment has been re-established, tboot
-      will re-calculate the MAC and verify it against the sealed value.
-      Tboot's policy determines what happens if the verification fails.
-      Note that the c/s 194 of tboot which has the new MAC code supports
-      this.
-
-That's pretty much it for TXT support.
-
-
-Configuring the System
-======================
-
-This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels.
-
-In BIOS, the user must enable:  TPM, TXT, VT-x, VT-d.  Not all BIOSes
-allow these to be individually enabled/disabled and the screens in
-which to find them are BIOS-specific.
-
-grub.conf needs to be modified as follows::
-
-        title Linux 2.6.29-tip w/ tboot
-          root (hd0,0)
-                kernel /tboot.gz logging=serial,vga,memory
-                module /vmlinuz-2.6.29-tip intel_iommu=on ro
-                       root=LABEL=/ rhgb console=ttyS0,115200 3
-                module /initrd-2.6.29-tip.img
-                module /Q35_SINIT_17.BIN
-
-The kernel option for enabling Intel TXT support is found under the
-Security top-level menu and is called "Enable Intel(R) Trusted
-Execution Technology (TXT)".  It is considered EXPERIMENTAL and
-depends on the generic x86 support (to allow maximum flexibility in
-kernel build options), since the tboot code will detect whether the
-platform actually supports Intel TXT and thus whether any of the
-kernel code is executed.
-
-The Q35_SINIT_17.BIN file is what Intel TXT refers to as an
-Authenticated Code Module.  It is specific to the chipset in the
-system and can also be found on the Trusted Boot site.  It is an
-(unencrypted) module signed by Intel that is used as part of the
-DRTM process to verify and configure the system.  It is signed
-because it operates at a higher privilege level in the system than
-any other macrocode and its correct operation is critical to the
-establishment of the DRTM.  The process for determining the correct
-SINIT ACM for a system is documented in the SINIT-guide.txt file
-that is on the tboot SourceForge site under the SINIT ACM downloads.
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f2de1b2d3ac7..af64c4bb4447 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -20,6 +20,8 @@ x86-specific Documentation
    mtrr
    pat
    intel_mpx
+   intel-iommu
+   intel_txt
    amd-memory-encryption
    pti
    mds
diff --git a/Documentation/x86/intel-iommu.rst b/Documentation/x86/intel-iommu.rst
new file mode 100644
index 000000000000..9dae6b47e398
--- /dev/null
+++ b/Documentation/x86/intel-iommu.rst
@@ -0,0 +1,114 @@
+===================
+Linux IOMMU Support
+===================
+
+The architecture spec can be obtained from the below location.
+
+http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
+
+This guide gives a quick cheat sheet for some basic understanding.
+
+Some Keywords
+
+- DMAR - DMA remapping
+- DRHD - DMA Remapping Hardware Unit Definition
+- RMRR - Reserved memory Region Reporting Structure
+- ZLR  - Zero length reads from PCI devices
+- IOVA - IO Virtual address.
+
+Basic stuff
+-----------
+
+ACPI enumerates and lists the different DMA engines in the platform, and
+device scope relationships between PCI devices and which DMA engine  controls
+them.
+
+What is RMRR?
+-------------
+
+There are some devices the BIOS controls, for e.g USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+unity mappings for these regions for these devices to access these regions.
+
+How is IOVA generated?
+----------------------
+
+Well behaved drivers call pci_map_*() calls before sending command to device
+that needs to perform DMA. Once DMA is completed and mapping is no longer
+required, device performs a pci_unmap_*() calls to unmap the region.
+
+The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
+device has its own domain (hence protection). Devices under p2p bridges
+share the virtual address with all devices under the p2p bridge due to
+transaction id aliasing for p2p bridges.
+
+IOVA generation is pretty generic. We used the same technique as vmalloc()
+but these are not global address spaces, but separate for each domain.
+Different DMA engines may support different number of domains.
+
+We also allocate guard pages with each mapping, so we can attempt to catch
+any overflow that might happen.
+
+
+Graphics Problems?
+------------------
+If you encounter issues with graphics devices, you can try adding
+option intel_iommu=igfx_off to turn off the integrated graphics engine.
+If this fixes anything, please ensure you file a bug reporting the problem.
+
+Some exceptions to IOVA
+-----------------------
+Interrupt ranges are not address translated, (0xfee00000 - 0xfeefffff).
+The same is true for peer to peer transactions. Hence we reserve the
+address from PCI MMIO ranges so they are not allocated for IOVA addresses.
+
+
+Fault reporting
+---------------
+When errors are reported, the DMA engine signals via an interrupt. The fault
+reason and device that caused it with fault reason is printed on console.
+
+See below for sample.
+
+
+Boot Message Sample
+-------------------
+
+Something like this gets printed indicating presence of DMAR tables
+in ACPI.
+
+ACPI: DMAR (v001 A M I  OEMDMAR  0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
+
+When DMAR is being processed and initialized by ACPI, prints DMAR locations
+and any RMRR's processed::
+
+	ACPI DMAR:Host address width 36
+	ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
+	ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
+	ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
+	ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
+	ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
+
+When DMAR is enabled for use, you will notice..
+
+PCI-DMA: Using DMAR IOMMU
+
+Fault reporting
+---------------
+
+::
+
+	DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+	DMAR:[fault reason 05] PTE Write access is not set
+	DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
+	DMAR:[fault reason 05] PTE Write access is not set
+
+TBD
+----
+
+- For compatibility testing, could use unity map domain for all devices, just
+  provide a 1-1 for all useful memory under a single domain for all devices.
+- API for paravirt ops for abstracting functionality for VMM folks.
diff --git a/Documentation/x86/intel_txt.rst b/Documentation/x86/intel_txt.rst
new file mode 100644
index 000000000000..d83c1a2122c9
--- /dev/null
+++ b/Documentation/x86/intel_txt.rst
@@ -0,0 +1,227 @@
+=====================
+Intel(R) TXT Overview
+=====================
+
+Intel's technology for safer computing, Intel(R) Trusted Execution
+Technology (Intel(R) TXT), defines platform-level enhancements that
+provide the building blocks for creating trusted platforms.
+
+Intel TXT was formerly known by the code name LaGrande Technology (LT).
+
+Intel TXT in Brief:
+
+-  Provides dynamic root of trust for measurement (DRTM)
+-  Data protection in case of improper shutdown
+-  Measurement and verification of launched environment
+
+Intel TXT is part of the vPro(TM) brand and is also available some
+non-vPro systems.  It is currently available on desktop systems
+based on the Q35, X38, Q45, and Q43 Express chipsets (e.g. Dell
+Optiplex 755, HP dc7800, etc.) and mobile systems based on the GM45,
+PM45, and GS45 Express chipsets.
+
+For more information, see http://www.intel.com/technology/security/.
+This site also has a link to the Intel TXT MLE Developers Manual,
+which has been updated for the new released platforms.
+
+Intel TXT has been presented at various events over the past few
+years, some of which are:
+
+      - LinuxTAG 2008:
+          http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag.html
+
+      - TRUST2008:
+          http://www.trust-conference.eu/downloads/Keynote-Speakers/
+          3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf
+
+      - IDF, Shanghai:
+          http://www.prcidf.com.cn/index_en.html
+
+      - IDFs 2006, 2007
+	  (I'm not sure if/where they are online)
+
+Trusted Boot Project Overview
+=============================
+
+Trusted Boot (tboot) is an open source, pre-kernel/VMM module that
+uses Intel TXT to perform a measured and verified launch of an OS
+kernel/VMM.
+
+It is hosted on SourceForge at http://sourceforge.net/projects/tboot.
+The mercurial source repo is available at http://www.bughost.org/
+repos.hg/tboot.hg.
+
+Tboot currently supports launching Xen (open source VMM/hypervisor
+w/ TXT support since v3.2), and now Linux kernels.
+
+
+Value Proposition for Linux or "Why should you care?"
+=====================================================
+
+While there are many products and technologies that attempt to
+measure or protect the integrity of a running kernel, they all
+assume the kernel is "good" to begin with.  The Integrity
+Measurement Architecture (IMA) and Linux Integrity Module interface
+are examples of such solutions.
+
+To get trust in the initial kernel without using Intel TXT, a
+static root of trust must be used.  This bases trust in BIOS
+starting at system reset and requires measurement of all code
+executed between system reset through the completion of the kernel
+boot as well as data objects used by that code.  In the case of a
+Linux kernel, this means all of BIOS, any option ROMs, the
+bootloader and the boot config.  In practice, this is a lot of
+code/data, much of which is subject to change from boot to boot
+(e.g. changing NICs may change option ROMs).  Without reference
+hashes, these measurement changes are difficult to assess or
+confirm as benign.  This process also does not provide DMA
+protection, memory configuration/alias checks and locks, crash
+protection, or policy support.
+
+By using the hardware-based root of trust that Intel TXT provides,
+many of these issues can be mitigated.  Specifically: many
+pre-launch components can be removed from the trust chain, DMA
+protection is provided to all launched components, a large number
+of platform configuration checks are performed and values locked,
+protection is provided for any data in the event of an improper
+shutdown, and there is support for policy-based execution/verification.
+This provides a more stable measurement and a higher assurance of
+system configuration and initial state than would be otherwise
+possible.  Since the tboot project is open source, source code for
+almost all parts of the trust chain is available (excepting SMM and
+Intel-provided firmware).
+
+How Does it Work?
+=================
+
+-  Tboot is an executable that is launched by the bootloader as
+   the "kernel" (the binary the bootloader executes).
+-  It performs all of the work necessary to determine if the
+   platform supports Intel TXT and, if so, executes the GETSEC[SENTER]
+   processor instruction that initiates the dynamic root of trust.
+
+   -  If tboot determines that the system does not support Intel TXT
+      or is not configured correctly (e.g. the SINIT AC Module was
+      incorrect), it will directly launch the kernel with no changes
+      to any state.
+   -  Tboot will output various information about its progress to the
+      terminal, serial port, and/or an in-memory log; the output
+      locations can be configured with a command line switch.
+
+-  The GETSEC[SENTER] instruction will return control to tboot and
+   tboot then verifies certain aspects of the environment (e.g. TPM NV
+   lock, e820 table does not have invalid entries, etc.).
+-  It will wake the APs from the special sleep state the GETSEC[SENTER]
+   instruction had put them in and place them into a wait-for-SIPI
+   state.
+
+   -  Because the processors will not respond to an INIT or SIPI when
+      in the TXT environment, it is necessary to create a small VT-x
+      guest for the APs.  When they run in this guest, they will
+      simply wait for the INIT-SIPI-SIPI sequence, which will cause
+      VMEXITs, and then disable VT and jump to the SIPI vector.  This
+      approach seemed like a better choice than having to insert
+      special code into the kernel's MP wakeup sequence.
+
+-  Tboot then applies an (optional) user-defined launch policy to
+   verify the kernel and initrd.
+
+   -  This policy is rooted in TPM NV and is described in the tboot
+      project.  The tboot project also contains code for tools to
+      create and provision the policy.
+   -  Policies are completely under user control and if not present
+      then any kernel will be launched.
+   -  Policy action is flexible and can include halting on failures
+      or simply logging them and continuing.
+
+-  Tboot adjusts the e820 table provided by the bootloader to reserve
+   its own location in memory as well as to reserve certain other
+   TXT-related regions.
+-  As part of its launch, tboot DMA protects all of RAM (using the
+   VT-d PMRs).  Thus, the kernel must be booted with 'intel_iommu=on'
+   in order to remove this blanket protection and use VT-d's
+   page-level protection.
+-  Tboot will populate a shared page with some data about itself and
+   pass this to the Linux kernel as it transfers control.
+
+   -  The location of the shared page is passed via the boot_params
+      struct as a physical address.
+
+-  The kernel will look for the tboot shared page address and, if it
+   exists, map it.
+-  As one of the checks/protections provided by TXT, it makes a copy
+   of the VT-d DMARs in a DMA-protected region of memory and verifies
+   them for correctness.  The VT-d code will detect if the kernel was
+   launched with tboot and use this copy instead of the one in the
+   ACPI table.
+-  At this point, tboot and TXT are out of the picture until a
+   shutdown (S<n>)
+-  In order to put a system into any of the sleep states after a TXT
+   launch, TXT must first be exited.  This is to prevent attacks that
+   attempt to crash the system to gain control on reboot and steal
+   data left in memory.
+
+   -  The kernel will perform all of its sleep preparation and
+      populate the shared page with the ACPI data needed to put the
+      platform in the desired sleep state.
+   -  Then the kernel jumps into tboot via the vector specified in the
+      shared page.
+   -  Tboot will clean up the environment and disable TXT, then use the
+      kernel-provided ACPI information to actually place the platform
+      into the desired sleep state.
+   -  In the case of S3, tboot will also register itself as the resume
+      vector.  This is necessary because it must re-establish the
+      measured environment upon resume.  Once the TXT environment
+      has been restored, it will restore the TPM PCRs and then
+      transfer control back to the kernel's S3 resume vector.
+      In order to preserve system integrity across S3, the kernel
+      provides tboot with a set of memory ranges (RAM and RESERVED_KERN
+      in the e820 table, but not any memory that BIOS might alter over
+      the S3 transition) that tboot will calculate a MAC (message
+      authentication code) over and then seal with the TPM. On resume
+      and once the measured environment has been re-established, tboot
+      will re-calculate the MAC and verify it against the sealed value.
+      Tboot's policy determines what happens if the verification fails.
+      Note that the c/s 194 of tboot which has the new MAC code supports
+      this.
+
+That's pretty much it for TXT support.
+
+
+Configuring the System
+======================
+
+This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels.
+
+In BIOS, the user must enable:  TPM, TXT, VT-x, VT-d.  Not all BIOSes
+allow these to be individually enabled/disabled and the screens in
+which to find them are BIOS-specific.
+
+grub.conf needs to be modified as follows::
+
+        title Linux 2.6.29-tip w/ tboot
+          root (hd0,0)
+                kernel /tboot.gz logging=serial,vga,memory
+                module /vmlinuz-2.6.29-tip intel_iommu=on ro
+                       root=LABEL=/ rhgb console=ttyS0,115200 3
+                module /initrd-2.6.29-tip.img
+                module /Q35_SINIT_17.BIN
+
+The kernel option for enabling Intel TXT support is found under the
+Security top-level menu and is called "Enable Intel(R) Trusted
+Execution Technology (TXT)".  It is considered EXPERIMENTAL and
+depends on the generic x86 support (to allow maximum flexibility in
+kernel build options), since the tboot code will detect whether the
+platform actually supports Intel TXT and thus whether any of the
+kernel code is executed.
+
+The Q35_SINIT_17.BIN file is what Intel TXT refers to as an
+Authenticated Code Module.  It is specific to the chipset in the
+system and can also be found on the Trusted Boot site.  It is an
+(unencrypted) module signed by Intel that is used as part of the
+DRTM process to verify and configure the system.  It is signed
+because it operates at a higher privilege level in the system than
+any other macrocode and its correct operation is critical to the
+establishment of the DRTM.  The process for determining the correct
+SINIT ACM for a system is documented in the SINIT-guide.txt file
+that is on the tboot SourceForge site under the SINIT ACM downloads.
diff --git a/MAINTAINERS b/MAINTAINERS
index 699596d931c1..f33487dabafd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8312,7 +8312,7 @@ L:	tboot-devel@lists.sourceforge.net
 W:	http://tboot.sourceforge.net
 T:	hg http://tboot.hg.sourceforge.net:8000/hgroot/tboot/tboot
 S:	Supported
-F:	Documentation/intel_txt.txt
+F:	Documentation/x86/intel_txt.rst
 F:	include/linux/tboot.h
 F:	arch/x86/kernel/tboot.c
 
diff --git a/security/Kconfig b/security/Kconfig
index 06a30851511a..0d65594b5196 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -121,7 +121,7 @@ config INTEL_TXT
 	  See <http://www.intel.com/technology/security/> for more information
 	  about Intel(R) TXT.
 	  See <http://tboot.sourceforge.net> for more information about tboot.
-	  See Documentation/intel_txt.txt for a description of how to enable
+	  See Documentation/x86/intel_txt.rst for a description of how to enable
 	  Intel TXT support in a kernel boot.
 
 	  If you are unsure as to whether this is required, answer N.
-- 
cgit v1.2.3-55-g7522


From 2dbc0838bcf24ca59cabc3130cf3b1d6809cdcd4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 11:39:21 -0300
Subject: docs: ocxl.rst: add it to the uAPI book

The content of this file is user-faced.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 Documentation/accelerators/ocxl.rst               | 178 ----------------------
 Documentation/userspace-api/accelerators/ocxl.rst | 176 +++++++++++++++++++++
 Documentation/userspace-api/index.rst             |   1 +
 MAINTAINERS                                       |   2 +-
 4 files changed, 178 insertions(+), 179 deletions(-)
 delete mode 100644 Documentation/accelerators/ocxl.rst
 create mode 100644 Documentation/userspace-api/accelerators/ocxl.rst

diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
deleted file mode 100644
index b1cea19a90f5..000000000000
--- a/Documentation/accelerators/ocxl.rst
+++ /dev/null
@@ -1,178 +0,0 @@
-:orphan:
-
-========================================================
-OpenCAPI (Open Coherent Accelerator Processor Interface)
-========================================================
-
-OpenCAPI is an interface between processors and accelerators. It aims
-at being low-latency and high-bandwidth. The specification is
-developed by the `OpenCAPI Consortium <http://opencapi.org/>`_.
-
-It allows an accelerator (which could be a FPGA, ASICs, ...) to access
-the host memory coherently, using virtual addresses. An OpenCAPI
-device can also host its own memory, that can be accessed from the
-host.
-
-OpenCAPI is known in linux as 'ocxl', as the open, processor-agnostic
-evolution of 'cxl' (the driver for the IBM CAPI interface for
-powerpc), which was named that way to avoid confusion with the ISDN
-CAPI subsystem.
-
-
-High-level view
-===============
-
-OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL), to
-be implemented on top of a physical link. Any processor or device
-implementing the DL and TL can start sharing memory.
-
-::
-
-  +-----------+                         +-------------+
-  |           |                         |             |
-  |           |                         | Accelerated |
-  | Processor |                         |  Function   |
-  |           |  +--------+             |    Unit     |  +--------+
-  |           |--| Memory |             |    (AFU)    |--| Memory |
-  |           |  +--------+             |             |  +--------+
-  +-----------+                         +-------------+
-       |                                       |
-  +-----------+                         +-------------+
-  |    TL     |                         |    TLX      |
-  +-----------+                         +-------------+
-       |                                       |
-  +-----------+                         +-------------+
-  |    DL     |                         |    DLX      |
-  +-----------+                         +-------------+
-       |                                       |
-       |                   PHY                 |
-       +---------------------------------------+
-
-
-
-Device discovery
-================
-
-OpenCAPI relies on a PCI-like configuration space, implemented on the
-device. So the host can discover AFUs by querying the config space.
-
-OpenCAPI devices in Linux are treated like PCI devices (with a few
-caveats). The firmware is expected to abstract the hardware as if it
-was a PCI link. A lot of the existing PCI infrastructure is reused:
-devices are scanned and BARs are assigned during the standard PCI
-enumeration. Commands like 'lspci' can therefore be used to see what
-devices are available.
-
-The configuration space defines the AFU(s) that can be found on the
-physical adapter, such as its name, how many memory contexts it can
-work with, the size of its MMIO areas, ...
-
-
-
-MMIO
-====
-
-OpenCAPI defines two MMIO areas for each AFU:
-
-* the global MMIO area, with registers pertinent to the whole AFU.
-* a per-process MMIO area, which has a fixed size for each context.
-
-
-
-AFU interrupts
-==============
-
-OpenCAPI includes the possibility for an AFU to send an interrupt to a
-host process. It is done through a 'intrp_req' defined in the
-Transaction Layer, specifying a 64-bit object handle which defines the
-interrupt.
-
-The driver allows a process to allocate an interrupt and obtain its
-64-bit object handle, that can be passed to the AFU.
-
-
-
-char devices
-============
-
-The driver creates one char device per AFU found on the physical
-device. A physical device may have multiple functions and each
-function can have multiple AFUs. At the time of this writing though,
-it has only been tested with devices exporting only one AFU.
-
-Char devices can be found in /dev/ocxl/ and are named as:
-/dev/ocxl/<AFU name>.<location>.<index>
-
-where <AFU name> is a max 20-character long name, as found in the
-config space of the AFU.
-<location> is added by the driver and can help distinguish devices
-when a system has more than one instance of the same OpenCAPI device.
-<index> is also to help distinguish AFUs in the unlikely case where a
-device carries multiple copies of the same AFU.
-
-
-
-Sysfs class
-===========
-
-An ocxl class is added for the devices representing the AFUs. See
-/sys/class/ocxl. The layout is described in
-Documentation/ABI/testing/sysfs-class-ocxl
-
-
-
-User API
-========
-
-open
-----
-
-Based on the AFU definition found in the config space, an AFU may
-support working with more than one memory context, in which case the
-associated char device may be opened multiple times by different
-processes.
-
-
-ioctl
------
-
-OCXL_IOCTL_ATTACH:
-
-  Attach the memory context of the calling process to the AFU so that
-  the AFU can access its memory.
-
-OCXL_IOCTL_IRQ_ALLOC:
-
-  Allocate an AFU interrupt and return an identifier.
-
-OCXL_IOCTL_IRQ_FREE:
-
-  Free a previously allocated AFU interrupt.
-
-OCXL_IOCTL_IRQ_SET_FD:
-
-  Associate an event fd to an AFU interrupt so that the user process
-  can be notified when the AFU sends an interrupt.
-
-OCXL_IOCTL_GET_METADATA:
-
-  Obtains configuration information from the card, such at the size of
-  MMIO areas, the AFU version, and the PASID for the current context.
-
-OCXL_IOCTL_ENABLE_P9_WAIT:
-
-  Allows the AFU to wake a userspace thread executing 'wait'. Returns
-  information to userspace to allow it to configure the AFU. Note that
-  this is only available on POWER9.
-
-OCXL_IOCTL_GET_FEATURES:
-
-  Reports on which CPU features that affect OpenCAPI are usable from
-  userspace.
-
-
-mmap
-----
-
-A process can mmap the per-process MMIO area for interactions with the
-AFU.
diff --git a/Documentation/userspace-api/accelerators/ocxl.rst b/Documentation/userspace-api/accelerators/ocxl.rst
new file mode 100644
index 000000000000..14cefc020e2d
--- /dev/null
+++ b/Documentation/userspace-api/accelerators/ocxl.rst
@@ -0,0 +1,176 @@
+========================================================
+OpenCAPI (Open Coherent Accelerator Processor Interface)
+========================================================
+
+OpenCAPI is an interface between processors and accelerators. It aims
+at being low-latency and high-bandwidth. The specification is
+developed by the `OpenCAPI Consortium <http://opencapi.org/>`_.
+
+It allows an accelerator (which could be a FPGA, ASICs, ...) to access
+the host memory coherently, using virtual addresses. An OpenCAPI
+device can also host its own memory, that can be accessed from the
+host.
+
+OpenCAPI is known in linux as 'ocxl', as the open, processor-agnostic
+evolution of 'cxl' (the driver for the IBM CAPI interface for
+powerpc), which was named that way to avoid confusion with the ISDN
+CAPI subsystem.
+
+
+High-level view
+===============
+
+OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL), to
+be implemented on top of a physical link. Any processor or device
+implementing the DL and TL can start sharing memory.
+
+::
+
+  +-----------+                         +-------------+
+  |           |                         |             |
+  |           |                         | Accelerated |
+  | Processor |                         |  Function   |
+  |           |  +--------+             |    Unit     |  +--------+
+  |           |--| Memory |             |    (AFU)    |--| Memory |
+  |           |  +--------+             |             |  +--------+
+  +-----------+                         +-------------+
+       |                                       |
+  +-----------+                         +-------------+
+  |    TL     |                         |    TLX      |
+  +-----------+                         +-------------+
+       |                                       |
+  +-----------+                         +-------------+
+  |    DL     |                         |    DLX      |
+  +-----------+                         +-------------+
+       |                                       |
+       |                   PHY                 |
+       +---------------------------------------+
+
+
+
+Device discovery
+================
+
+OpenCAPI relies on a PCI-like configuration space, implemented on the
+device. So the host can discover AFUs by querying the config space.
+
+OpenCAPI devices in Linux are treated like PCI devices (with a few
+caveats). The firmware is expected to abstract the hardware as if it
+was a PCI link. A lot of the existing PCI infrastructure is reused:
+devices are scanned and BARs are assigned during the standard PCI
+enumeration. Commands like 'lspci' can therefore be used to see what
+devices are available.
+
+The configuration space defines the AFU(s) that can be found on the
+physical adapter, such as its name, how many memory contexts it can
+work with, the size of its MMIO areas, ...
+
+
+
+MMIO
+====
+
+OpenCAPI defines two MMIO areas for each AFU:
+
+* the global MMIO area, with registers pertinent to the whole AFU.
+* a per-process MMIO area, which has a fixed size for each context.
+
+
+
+AFU interrupts
+==============
+
+OpenCAPI includes the possibility for an AFU to send an interrupt to a
+host process. It is done through a 'intrp_req' defined in the
+Transaction Layer, specifying a 64-bit object handle which defines the
+interrupt.
+
+The driver allows a process to allocate an interrupt and obtain its
+64-bit object handle, that can be passed to the AFU.
+
+
+
+char devices
+============
+
+The driver creates one char device per AFU found on the physical
+device. A physical device may have multiple functions and each
+function can have multiple AFUs. At the time of this writing though,
+it has only been tested with devices exporting only one AFU.
+
+Char devices can be found in /dev/ocxl/ and are named as:
+/dev/ocxl/<AFU name>.<location>.<index>
+
+where <AFU name> is a max 20-character long name, as found in the
+config space of the AFU.
+<location> is added by the driver and can help distinguish devices
+when a system has more than one instance of the same OpenCAPI device.
+<index> is also to help distinguish AFUs in the unlikely case where a
+device carries multiple copies of the same AFU.
+
+
+
+Sysfs class
+===========
+
+An ocxl class is added for the devices representing the AFUs. See
+/sys/class/ocxl. The layout is described in
+Documentation/ABI/testing/sysfs-class-ocxl
+
+
+
+User API
+========
+
+open
+----
+
+Based on the AFU definition found in the config space, an AFU may
+support working with more than one memory context, in which case the
+associated char device may be opened multiple times by different
+processes.
+
+
+ioctl
+-----
+
+OCXL_IOCTL_ATTACH:
+
+  Attach the memory context of the calling process to the AFU so that
+  the AFU can access its memory.
+
+OCXL_IOCTL_IRQ_ALLOC:
+
+  Allocate an AFU interrupt and return an identifier.
+
+OCXL_IOCTL_IRQ_FREE:
+
+  Free a previously allocated AFU interrupt.
+
+OCXL_IOCTL_IRQ_SET_FD:
+
+  Associate an event fd to an AFU interrupt so that the user process
+  can be notified when the AFU sends an interrupt.
+
+OCXL_IOCTL_GET_METADATA:
+
+  Obtains configuration information from the card, such at the size of
+  MMIO areas, the AFU version, and the PASID for the current context.
+
+OCXL_IOCTL_ENABLE_P9_WAIT:
+
+  Allows the AFU to wake a userspace thread executing 'wait'. Returns
+  information to userspace to allow it to configure the AFU. Note that
+  this is only available on POWER9.
+
+OCXL_IOCTL_GET_FEATURES:
+
+  Reports on which CPU features that affect OpenCAPI are usable from
+  userspace.
+
+
+mmap
+----
+
+A process can mmap the per-process MMIO area for interactions with the
+AFU.
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index a3233da7fa88..ad494da40009 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -20,6 +20,7 @@ place where this information is gathered.
    seccomp_filter
    unshare
    spec_ctrl
+   accelerators/ocxl
 
 .. only::  subproject and html
 
diff --git a/MAINTAINERS b/MAINTAINERS
index f33487dabafd..8f496d76bb53 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11535,7 +11535,7 @@ F:	arch/powerpc/include/asm/pnv-ocxl.h
 F:	drivers/misc/ocxl/
 F:	include/misc/ocxl*
 F:	include/uapi/misc/ocxl.h
-F:	Documentation/accelerators/ocxl.rst
+F:	Documentation/userspace-api/accelerators/ocxl.rst
 
 OMAP AUDIO SUPPORT
 M:	Peter Ujfalusi <peter.ujfalusi@ti.com>
-- 
cgit v1.2.3-55-g7522


From 56198359b64125dd0f9fa991972b61e4bc4fc6b5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 11:44:24 -0300
Subject: docs: lp855x-driver.rst: add it to the driver-api book

The content of this file is intended for backlight Kernel
developers.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/backlight/lp855x-driver.rst          | 83 ----------------------
 .../driver-api/backlight/lp855x-driver.rst         | 81 +++++++++++++++++++++
 Documentation/driver-api/index.rst                 |  1 +
 MAINTAINERS                                        |  2 +-
 4 files changed, 83 insertions(+), 84 deletions(-)
 delete mode 100644 Documentation/backlight/lp855x-driver.rst
 create mode 100644 Documentation/driver-api/backlight/lp855x-driver.rst

diff --git a/Documentation/backlight/lp855x-driver.rst b/Documentation/backlight/lp855x-driver.rst
deleted file mode 100644
index 62b7ed847a77..000000000000
--- a/Documentation/backlight/lp855x-driver.rst
+++ /dev/null
@@ -1,83 +0,0 @@
-:orphan:
-
-====================
-Kernel driver lp855x
-====================
-
-Backlight driver for LP855x ICs
-
-Supported chips:
-
-	Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and
-	LP8557
-
-Author: Milo(Woogyom) Kim <milo.kim@ti.com>
-
-Description
------------
-
-* Brightness control
-
-  Brightness can be controlled by the pwm input or the i2c command.
-  The lp855x driver supports both cases.
-
-* Device attributes
-
-  1) bl_ctl_mode
-
-  Backlight control mode.
-
-  Value: pwm based or register based
-
-  2) chip_id
-
-  The lp855x chip id.
-
-  Value: lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557
-
-Platform data for lp855x
-------------------------
-
-For supporting platform specific data, the lp855x platform data can be used.
-
-* name:
-	Backlight driver name. If it is not defined, default name is set.
-* device_control:
-	Value of DEVICE CONTROL register.
-* initial_brightness:
-	Initial value of backlight brightness.
-* period_ns:
-	Platform specific PWM period value. unit is nano.
-	Only valid when brightness is pwm input mode.
-* size_program:
-	Total size of lp855x_rom_data.
-* rom_data:
-	List of new eeprom/eprom registers.
-
-Examples
-========
-
-1) lp8552 platform data: i2c register mode with new eeprom data::
-
-    #define EEPROM_A5_ADDR	0xA5
-    #define EEPROM_A5_VAL	0x4f	/* EN_VSYNC=0 */
-
-    static struct lp855x_rom_data lp8552_eeprom_arr[] = {
-	{EEPROM_A5_ADDR, EEPROM_A5_VAL},
-    };
-
-    static struct lp855x_platform_data lp8552_pdata = {
-	.name = "lcd-bl",
-	.device_control = I2C_CONFIG(LP8552),
-	.initial_brightness = INITIAL_BRT,
-	.size_program = ARRAY_SIZE(lp8552_eeprom_arr),
-	.rom_data = lp8552_eeprom_arr,
-    };
-
-2) lp8556 platform data: pwm input mode with default rom data::
-
-    static struct lp855x_platform_data lp8556_pdata = {
-	.device_control = PWM_CONFIG(LP8556),
-	.initial_brightness = INITIAL_BRT,
-	.period_ns = 1000000,
-    };
diff --git a/Documentation/driver-api/backlight/lp855x-driver.rst b/Documentation/driver-api/backlight/lp855x-driver.rst
new file mode 100644
index 000000000000..1e0b224fc397
--- /dev/null
+++ b/Documentation/driver-api/backlight/lp855x-driver.rst
@@ -0,0 +1,81 @@
+====================
+Kernel driver lp855x
+====================
+
+Backlight driver for LP855x ICs
+
+Supported chips:
+
+	Texas Instruments LP8550, LP8551, LP8552, LP8553, LP8555, LP8556 and
+	LP8557
+
+Author: Milo(Woogyom) Kim <milo.kim@ti.com>
+
+Description
+-----------
+
+* Brightness control
+
+  Brightness can be controlled by the pwm input or the i2c command.
+  The lp855x driver supports both cases.
+
+* Device attributes
+
+  1) bl_ctl_mode
+
+  Backlight control mode.
+
+  Value: pwm based or register based
+
+  2) chip_id
+
+  The lp855x chip id.
+
+  Value: lp8550/lp8551/lp8552/lp8553/lp8555/lp8556/lp8557
+
+Platform data for lp855x
+------------------------
+
+For supporting platform specific data, the lp855x platform data can be used.
+
+* name:
+	Backlight driver name. If it is not defined, default name is set.
+* device_control:
+	Value of DEVICE CONTROL register.
+* initial_brightness:
+	Initial value of backlight brightness.
+* period_ns:
+	Platform specific PWM period value. unit is nano.
+	Only valid when brightness is pwm input mode.
+* size_program:
+	Total size of lp855x_rom_data.
+* rom_data:
+	List of new eeprom/eprom registers.
+
+Examples
+========
+
+1) lp8552 platform data: i2c register mode with new eeprom data::
+
+    #define EEPROM_A5_ADDR	0xA5
+    #define EEPROM_A5_VAL	0x4f	/* EN_VSYNC=0 */
+
+    static struct lp855x_rom_data lp8552_eeprom_arr[] = {
+	{EEPROM_A5_ADDR, EEPROM_A5_VAL},
+    };
+
+    static struct lp855x_platform_data lp8552_pdata = {
+	.name = "lcd-bl",
+	.device_control = I2C_CONFIG(LP8552),
+	.initial_brightness = INITIAL_BRT,
+	.size_program = ARRAY_SIZE(lp8552_eeprom_arr),
+	.rom_data = lp8552_eeprom_arr,
+    };
+
+2) lp8556 platform data: pwm input mode with default rom data::
+
+    static struct lp855x_platform_data lp8556_pdata = {
+	.device_control = PWM_CONFIG(LP8556),
+	.initial_brightness = INITIAL_BRT,
+	.period_ns = 1000000,
+    };
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 0f281f4f648f..b4c993ff7655 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -66,6 +66,7 @@ available subsections can be seen below.
    soundwire/index
    fpga/index
    acpi/index
+   backlight/lp855x-driver.rst
    generic-counter
 
 .. only::  subproject and html
diff --git a/MAINTAINERS b/MAINTAINERS
index 8f496d76bb53..3feb318e1433 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15964,7 +15964,7 @@ F:	sound/soc/codecs/isabelle*
 TI LP855x BACKLIGHT DRIVER
 M:	Milo Kim <milo.kim@ti.com>
 S:	Maintained
-F:	Documentation/backlight/lp855x-driver.rst
+F:	Documentation/driver-api/backlight/lp855x-driver.rst
 F:	drivers/video/backlight/lp855x_bl.c
 F:	include/linux/platform_data/lp855x.h
 
-- 
cgit v1.2.3-55-g7522


From fe34c89d25429e079ba67416529514120dd715f8 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 12:34:59 -0300
Subject: docs: driver-model: move it to the driver-api book

The audience for the Kernel driver-model is clearly Kernel hackers.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> # ice driver changes
---
 Documentation/driver-api/driver-model/binding.rst  |  98 +++++
 Documentation/driver-api/driver-model/bus.rst      | 146 +++++++
 Documentation/driver-api/driver-model/class.rst    | 149 +++++++
 .../driver-api/driver-model/design-patterns.rst    | 116 ++++++
 Documentation/driver-api/driver-model/device.rst   | 109 +++++
 Documentation/driver-api/driver-model/devres.rst   | 414 +++++++++++++++++++
 Documentation/driver-api/driver-model/driver.rst   | 223 ++++++++++
 Documentation/driver-api/driver-model/index.rst    |  24 ++
 Documentation/driver-api/driver-model/overview.rst | 124 ++++++
 Documentation/driver-api/driver-model/platform.rst | 246 +++++++++++
 Documentation/driver-api/driver-model/porting.rst  | 448 +++++++++++++++++++++
 Documentation/driver-api/gpio/driver.rst           |   2 +-
 Documentation/driver-api/index.rst                 |   1 +
 Documentation/driver-model/binding.rst             |  98 -----
 Documentation/driver-model/bus.rst                 | 146 -------
 Documentation/driver-model/class.rst               | 149 -------
 Documentation/driver-model/design-patterns.rst     | 116 ------
 Documentation/driver-model/device.rst              | 109 -----
 Documentation/driver-model/devres.rst              | 414 -------------------
 Documentation/driver-model/driver.rst              | 223 ----------
 Documentation/driver-model/index.rst               |  26 --
 Documentation/driver-model/overview.rst            | 124 ------
 Documentation/driver-model/platform.rst            | 246 -----------
 Documentation/driver-model/porting.rst             | 448 ---------------------
 Documentation/eisa.txt                             |   4 +-
 Documentation/filesystems/sysfs.txt                |   2 +-
 Documentation/hwmon/submitting-patches.rst         |   2 +-
 .../translations/zh_CN/filesystems/sysfs.txt       |   2 +-
 drivers/base/platform.c                            |   2 +-
 drivers/gpio/gpio-cs5535.c                         |   2 +-
 drivers/net/ethernet/intel/ice/ice_main.c          |   2 +-
 drivers/staging/unisys/Documentation/overview.txt  |   4 +-
 include/linux/device.h                             |   2 +-
 include/linux/platform_device.h                    |   2 +-
 scripts/coccinelle/free/devm_free.cocci            |   2 +-
 35 files changed, 2112 insertions(+), 2113 deletions(-)
 create mode 100644 Documentation/driver-api/driver-model/binding.rst
 create mode 100644 Documentation/driver-api/driver-model/bus.rst
 create mode 100644 Documentation/driver-api/driver-model/class.rst
 create mode 100644 Documentation/driver-api/driver-model/design-patterns.rst
 create mode 100644 Documentation/driver-api/driver-model/device.rst
 create mode 100644 Documentation/driver-api/driver-model/devres.rst
 create mode 100644 Documentation/driver-api/driver-model/driver.rst
 create mode 100644 Documentation/driver-api/driver-model/index.rst
 create mode 100644 Documentation/driver-api/driver-model/overview.rst
 create mode 100644 Documentation/driver-api/driver-model/platform.rst
 create mode 100644 Documentation/driver-api/driver-model/porting.rst
 delete mode 100644 Documentation/driver-model/binding.rst
 delete mode 100644 Documentation/driver-model/bus.rst
 delete mode 100644 Documentation/driver-model/class.rst
 delete mode 100644 Documentation/driver-model/design-patterns.rst
 delete mode 100644 Documentation/driver-model/device.rst
 delete mode 100644 Documentation/driver-model/devres.rst
 delete mode 100644 Documentation/driver-model/driver.rst
 delete mode 100644 Documentation/driver-model/index.rst
 delete mode 100644 Documentation/driver-model/overview.rst
 delete mode 100644 Documentation/driver-model/platform.rst
 delete mode 100644 Documentation/driver-model/porting.rst

diff --git a/Documentation/driver-api/driver-model/binding.rst b/Documentation/driver-api/driver-model/binding.rst
new file mode 100644
index 000000000000..7ea1d7a41e1d
--- /dev/null
+++ b/Documentation/driver-api/driver-model/binding.rst
@@ -0,0 +1,98 @@
+==============
+Driver Binding
+==============
+
+Driver binding is the process of associating a device with a device
+driver that can control it. Bus drivers have typically handled this
+because there have been bus-specific structures to represent the
+devices and the drivers. With generic device and device driver
+structures, most of the binding can take place using common code.
+
+
+Bus
+~~~
+
+The bus type structure contains a list of all devices that are on that bus
+type in the system. When device_register is called for a device, it is
+inserted into the end of this list. The bus object also contains a
+list of all drivers of that bus type. When driver_register is called
+for a driver, it is inserted at the end of this list. These are the
+two events which trigger driver binding.
+
+
+device_register
+~~~~~~~~~~~~~~~
+
+When a new device is added, the bus's list of drivers is iterated over
+to find one that supports it. In order to determine that, the device
+ID of the device must match one of the device IDs that the driver
+supports. The format and semantics for comparing IDs is bus-specific.
+Instead of trying to derive a complex state machine and matching
+algorithm, it is up to the bus driver to provide a callback to compare
+a device against the IDs of a driver. The bus returns 1 if a match was
+found; 0 otherwise.
+
+int match(struct device * dev, struct device_driver * drv);
+
+If a match is found, the device's driver field is set to the driver
+and the driver's probe callback is called. This gives the driver a
+chance to verify that it really does support the hardware, and that
+it's in a working state.
+
+Device Class
+~~~~~~~~~~~~
+
+Upon the successful completion of probe, the device is registered with
+the class to which it belongs. Device drivers belong to one and only one
+class, and that is set in the driver's devclass field.
+devclass_add_device is called to enumerate the device within the class
+and actually register it with the class, which happens with the
+class's register_dev callback.
+
+
+Driver
+~~~~~~
+
+When a driver is attached to a device, the device is inserted into the
+driver's list of devices.
+
+
+sysfs
+~~~~~
+
+A symlink is created in the bus's 'devices' directory that points to
+the device's directory in the physical hierarchy.
+
+A symlink is created in the driver's 'devices' directory that points
+to the device's directory in the physical hierarchy.
+
+A directory for the device is created in the class's directory. A
+symlink is created in that directory that points to the device's
+physical location in the sysfs tree.
+
+A symlink can be created (though this isn't done yet) in the device's
+physical directory to either its class directory, or the class's
+top-level directory. One can also be created to point to its driver's
+directory also.
+
+
+driver_register
+~~~~~~~~~~~~~~~
+
+The process is almost identical for when a new driver is added.
+The bus's list of devices is iterated over to find a match. Devices
+that already have a driver are skipped. All the devices are iterated
+over, to bind as many devices as possible to the driver.
+
+
+Removal
+~~~~~~~
+
+When a device is removed, the reference count for it will eventually
+go to 0. When it does, the remove callback of the driver is called. It
+is removed from the driver's list of devices and the reference count
+of the driver is decremented. All symlinks between the two are removed.
+
+When a driver is removed, the list of devices that it supports is
+iterated over, and the driver's remove callback is called for each
+one. The device is removed from that list and the symlinks removed.
diff --git a/Documentation/driver-api/driver-model/bus.rst b/Documentation/driver-api/driver-model/bus.rst
new file mode 100644
index 000000000000..016b15a6e8ea
--- /dev/null
+++ b/Documentation/driver-api/driver-model/bus.rst
@@ -0,0 +1,146 @@
+=========
+Bus Types
+=========
+
+Definition
+~~~~~~~~~~
+See the kerneldoc for the struct bus_type.
+
+int bus_register(struct bus_type * bus);
+
+
+Declaration
+~~~~~~~~~~~
+
+Each bus type in the kernel (PCI, USB, etc) should declare one static
+object of this type. They must initialize the name field, and may
+optionally initialize the match callback::
+
+   struct bus_type pci_bus_type = {
+          .name	= "pci",
+          .match	= pci_bus_match,
+   };
+
+The structure should be exported to drivers in a header file:
+
+extern struct bus_type pci_bus_type;
+
+
+Registration
+~~~~~~~~~~~~
+
+When a bus driver is initialized, it calls bus_register. This
+initializes the rest of the fields in the bus object and inserts it
+into a global list of bus types. Once the bus object is registered,
+the fields in it are usable by the bus driver.
+
+
+Callbacks
+~~~~~~~~~
+
+match(): Attaching Drivers to Devices
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The format of device ID structures and the semantics for comparing
+them are inherently bus-specific. Drivers typically declare an array
+of device IDs of devices they support that reside in a bus-specific
+driver structure.
+
+The purpose of the match callback is to give the bus an opportunity to
+determine if a particular driver supports a particular device by
+comparing the device IDs the driver supports with the device ID of a
+particular device, without sacrificing bus-specific functionality or
+type-safety.
+
+When a driver is registered with the bus, the bus's list of devices is
+iterated over, and the match callback is called for each device that
+does not have a driver associated with it.
+
+
+
+Device and Driver Lists
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The lists of devices and drivers are intended to replace the local
+lists that many buses keep. They are lists of struct devices and
+struct device_drivers, respectively. Bus drivers are free to use the
+lists as they please, but conversion to the bus-specific type may be
+necessary.
+
+The LDM core provides helper functions for iterating over each list::
+
+  int bus_for_each_dev(struct bus_type * bus, struct device * start,
+		       void * data,
+		       int (*fn)(struct device *, void *));
+
+  int bus_for_each_drv(struct bus_type * bus, struct device_driver * start,
+		       void * data, int (*fn)(struct device_driver *, void *));
+
+These helpers iterate over the respective list, and call the callback
+for each device or driver in the list. All list accesses are
+synchronized by taking the bus's lock (read currently). The reference
+count on each object in the list is incremented before the callback is
+called; it is decremented after the next object has been obtained. The
+lock is not held when calling the callback.
+
+
+sysfs
+~~~~~~~~
+There is a top-level directory named 'bus'.
+
+Each bus gets a directory in the bus directory, along with two default
+directories::
+
+	/sys/bus/pci/
+	|-- devices
+	`-- drivers
+
+Drivers registered with the bus get a directory in the bus's drivers
+directory::
+
+	/sys/bus/pci/
+	|-- devices
+	`-- drivers
+	    |-- Intel ICH
+	    |-- Intel ICH Joystick
+	    |-- agpgart
+	    `-- e100
+
+Each device that is discovered on a bus of that type gets a symlink in
+the bus's devices directory to the device's directory in the physical
+hierarchy::
+
+	/sys/bus/pci/
+	|-- devices
+	|   |-- 00:00.0 -> ../../../root/pci0/00:00.0
+	|   |-- 00:01.0 -> ../../../root/pci0/00:01.0
+	|   `-- 00:02.0 -> ../../../root/pci0/00:02.0
+	`-- drivers
+
+
+Exporting Attributes
+~~~~~~~~~~~~~~~~~~~~
+
+::
+
+  struct bus_attribute {
+	struct attribute	attr;
+	ssize_t (*show)(struct bus_type *, char * buf);
+	ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
+  };
+
+Bus drivers can export attributes using the BUS_ATTR_RW macro that works
+similarly to the DEVICE_ATTR_RW macro for devices. For example, a
+definition like this::
+
+	static BUS_ATTR_RW(debug);
+
+is equivalent to declaring::
+
+	static bus_attribute bus_attr_debug;
+
+This can then be used to add and remove the attribute from the bus's
+sysfs directory using::
+
+	int bus_create_file(struct bus_type *, struct bus_attribute *);
+	void bus_remove_file(struct bus_type *, struct bus_attribute *);
diff --git a/Documentation/driver-api/driver-model/class.rst b/Documentation/driver-api/driver-model/class.rst
new file mode 100644
index 000000000000..fff55b80e86a
--- /dev/null
+++ b/Documentation/driver-api/driver-model/class.rst
@@ -0,0 +1,149 @@
+==============
+Device Classes
+==============
+
+Introduction
+~~~~~~~~~~~~
+A device class describes a type of device, like an audio or network
+device. The following device classes have been identified:
+
+<Insert List of Device Classes Here>
+
+
+Each device class defines a set of semantics and a programming interface
+that devices of that class adhere to. Device drivers are the
+implementation of that programming interface for a particular device on
+a particular bus.
+
+Device classes are agnostic with respect to what bus a device resides
+on.
+
+
+Programming Interface
+~~~~~~~~~~~~~~~~~~~~~
+The device class structure looks like::
+
+
+  typedef int (*devclass_add)(struct device *);
+  typedef void (*devclass_remove)(struct device *);
+
+See the kerneldoc for the struct class.
+
+A typical device class definition would look like::
+
+  struct device_class input_devclass = {
+        .name		= "input",
+        .add_device	= input_add_device,
+	.remove_device	= input_remove_device,
+  };
+
+Each device class structure should be exported in a header file so it
+can be used by drivers, extensions and interfaces.
+
+Device classes are registered and unregistered with the core using::
+
+  int devclass_register(struct device_class * cls);
+  void devclass_unregister(struct device_class * cls);
+
+
+Devices
+~~~~~~~
+As devices are bound to drivers, they are added to the device class
+that the driver belongs to. Before the driver model core, this would
+typically happen during the driver's probe() callback, once the device
+has been initialized. It now happens after the probe() callback
+finishes from the core.
+
+The device is enumerated in the class. Each time a device is added to
+the class, the class's devnum field is incremented and assigned to the
+device. The field is never decremented, so if the device is removed
+from the class and re-added, it will receive a different enumerated
+value.
+
+The class is allowed to create a class-specific structure for the
+device and store it in the device's class_data pointer.
+
+There is no list of devices in the device class. Each driver has a
+list of devices that it supports. The device class has a list of
+drivers of that particular class. To access all of the devices in the
+class, iterate over the device lists of each driver in the class.
+
+
+Device Drivers
+~~~~~~~~~~~~~~
+Device drivers are added to device classes when they are registered
+with the core. A driver specifies the class it belongs to by setting
+the struct device_driver::devclass field.
+
+
+sysfs directory structure
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+There is a top-level sysfs directory named 'class'.
+
+Each class gets a directory in the class directory, along with two
+default subdirectories::
+
+        class/
+        `-- input
+            |-- devices
+            `-- drivers
+
+
+Drivers registered with the class get a symlink in the drivers/ directory
+that points to the driver's directory (under its bus directory)::
+
+   class/
+   `-- input
+       |-- devices
+       `-- drivers
+           `-- usb:usb_mouse -> ../../../bus/drivers/usb_mouse/
+
+
+Each device gets a symlink in the devices/ directory that points to the
+device's directory in the physical hierarchy::
+
+   class/
+   `-- input
+       |-- devices
+       |   `-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
+       `-- drivers
+
+
+Exporting Attributes
+~~~~~~~~~~~~~~~~~~~~
+
+::
+
+  struct devclass_attribute {
+        struct attribute        attr;
+        ssize_t (*show)(struct device_class *, char * buf, size_t count, loff_t off);
+        ssize_t (*store)(struct device_class *, const char * buf, size_t count, loff_t off);
+  };
+
+Class drivers can export attributes using the DEVCLASS_ATTR macro that works
+similarly to the DEVICE_ATTR macro for devices. For example, a definition
+like this::
+
+  static DEVCLASS_ATTR(debug,0644,show_debug,store_debug);
+
+is equivalent to declaring::
+
+  static devclass_attribute devclass_attr_debug;
+
+The bus driver can add and remove the attribute from the class's
+sysfs directory using::
+
+  int devclass_create_file(struct device_class *, struct devclass_attribute *);
+  void devclass_remove_file(struct device_class *, struct devclass_attribute *);
+
+In the example above, the file will be named 'debug' in placed in the
+class's directory in sysfs.
+
+
+Interfaces
+~~~~~~~~~~
+There may exist multiple mechanisms for accessing the same device of a
+particular class type. Device interfaces describe these mechanisms.
+
+When a device is added to a device class, the core attempts to add it
+to every interface that is registered with the device class.
diff --git a/Documentation/driver-api/driver-model/design-patterns.rst b/Documentation/driver-api/driver-model/design-patterns.rst
new file mode 100644
index 000000000000..41eb8f41f7dd
--- /dev/null
+++ b/Documentation/driver-api/driver-model/design-patterns.rst
@@ -0,0 +1,116 @@
+=============================
+Device Driver Design Patterns
+=============================
+
+This document describes a few common design patterns found in device drivers.
+It is likely that subsystem maintainers will ask driver developers to
+conform to these design patterns.
+
+1. State Container
+2. container_of()
+
+
+1. State Container
+~~~~~~~~~~~~~~~~~~
+
+While the kernel contains a few device drivers that assume that they will
+only be probed() once on a certain system (singletons), it is custom to assume
+that the device the driver binds to will appear in several instances. This
+means that the probe() function and all callbacks need to be reentrant.
+
+The most common way to achieve this is to use the state container design
+pattern. It usually has this form::
+
+  struct foo {
+      spinlock_t lock; /* Example member */
+      (...)
+  };
+
+  static int foo_probe(...)
+  {
+      struct foo *foo;
+
+      foo = devm_kzalloc(dev, sizeof(*foo), GFP_KERNEL);
+      if (!foo)
+          return -ENOMEM;
+      spin_lock_init(&foo->lock);
+      (...)
+  }
+
+This will create an instance of struct foo in memory every time probe() is
+called. This is our state container for this instance of the device driver.
+Of course it is then necessary to always pass this instance of the
+state around to all functions that need access to the state and its members.
+
+For example, if the driver is registering an interrupt handler, you would
+pass around a pointer to struct foo like this::
+
+  static irqreturn_t foo_handler(int irq, void *arg)
+  {
+      struct foo *foo = arg;
+      (...)
+  }
+
+  static int foo_probe(...)
+  {
+      struct foo *foo;
+
+      (...)
+      ret = request_irq(irq, foo_handler, 0, "foo", foo);
+  }
+
+This way you always get a pointer back to the correct instance of foo in
+your interrupt handler.
+
+
+2. container_of()
+~~~~~~~~~~~~~~~~~
+
+Continuing on the above example we add an offloaded work::
+
+  struct foo {
+      spinlock_t lock;
+      struct workqueue_struct *wq;
+      struct work_struct offload;
+      (...)
+  };
+
+  static void foo_work(struct work_struct *work)
+  {
+      struct foo *foo = container_of(work, struct foo, offload);
+
+      (...)
+  }
+
+  static irqreturn_t foo_handler(int irq, void *arg)
+  {
+      struct foo *foo = arg;
+
+      queue_work(foo->wq, &foo->offload);
+      (...)
+  }
+
+  static int foo_probe(...)
+  {
+      struct foo *foo;
+
+      foo->wq = create_singlethread_workqueue("foo-wq");
+      INIT_WORK(&foo->offload, foo_work);
+      (...)
+  }
+
+The design pattern is the same for an hrtimer or something similar that will
+return a single argument which is a pointer to a struct member in the
+callback.
+
+container_of() is a macro defined in <linux/kernel.h>
+
+What container_of() does is to obtain a pointer to the containing struct from
+a pointer to a member by a simple subtraction using the offsetof() macro from
+standard C, which allows something similar to object oriented behaviours.
+Notice that the contained member must not be a pointer, but an actual member
+for this to work.
+
+We can see here that we avoid having global pointers to our struct foo *
+instance this way, while still keeping the number of parameters passed to the
+work function to a single pointer.
diff --git a/Documentation/driver-api/driver-model/device.rst b/Documentation/driver-api/driver-model/device.rst
new file mode 100644
index 000000000000..2b868d49d349
--- /dev/null
+++ b/Documentation/driver-api/driver-model/device.rst
@@ -0,0 +1,109 @@
+==========================
+The Basic Device Structure
+==========================
+
+See the kerneldoc for the struct device.
+
+
+Programming Interface
+~~~~~~~~~~~~~~~~~~~~~
+The bus driver that discovers the device uses this to register the
+device with the core::
+
+  int device_register(struct device * dev);
+
+The bus should initialize the following fields:
+
+    - parent
+    - name
+    - bus_id
+    - bus
+
+A device is removed from the core when its reference count goes to
+0. The reference count can be adjusted using::
+
+  struct device * get_device(struct device * dev);
+  void put_device(struct device * dev);
+
+get_device() will return a pointer to the struct device passed to it
+if the reference is not already 0 (if it's in the process of being
+removed already).
+
+A driver can access the lock in the device structure using::
+
+  void lock_device(struct device * dev);
+  void unlock_device(struct device * dev);
+
+
+Attributes
+~~~~~~~~~~
+
+::
+
+  struct device_attribute {
+	struct attribute	attr;
+	ssize_t (*show)(struct device *dev, struct device_attribute *attr,
+			char *buf);
+	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+			 const char *buf, size_t count);
+  };
+
+Attributes of devices can be exported by a device driver through sysfs.
+
+Please see Documentation/filesystems/sysfs.txt for more information
+on how sysfs works.
+
+As explained in Documentation/kobject.txt, device attributes must be
+created before the KOBJ_ADD uevent is generated. The only way to realize
+that is by defining an attribute group.
+
+Attributes are declared using a macro called DEVICE_ATTR::
+
+  #define DEVICE_ATTR(name,mode,show,store)
+
+Example:::
+
+  static DEVICE_ATTR(type, 0444, show_type, NULL);
+  static DEVICE_ATTR(power, 0644, show_power, store_power);
+
+This declares two structures of type struct device_attribute with respective
+names 'dev_attr_type' and 'dev_attr_power'. These two attributes can be
+organized as follows into a group::
+
+  static struct attribute *dev_attrs[] = {
+	&dev_attr_type.attr,
+	&dev_attr_power.attr,
+	NULL,
+  };
+
+  static struct attribute_group dev_attr_group = {
+	.attrs = dev_attrs,
+  };
+
+  static const struct attribute_group *dev_attr_groups[] = {
+	&dev_attr_group,
+	NULL,
+  };
+
+This array of groups can then be associated with a device by setting the
+group pointer in struct device before device_register() is invoked::
+
+        dev->groups = dev_attr_groups;
+        device_register(dev);
+
+The device_register() function will use the 'groups' pointer to create the
+device attributes and the device_unregister() function will use this pointer
+to remove the device attributes.
+
+Word of warning:  While the kernel allows device_create_file() and
+device_remove_file() to be called on a device at any time, userspace has
+strict expectations on when attributes get created.  When a new device is
+registered in the kernel, a uevent is generated to notify userspace (like
+udev) that a new device is available.  If attributes are added after the
+device is registered, then userspace won't get notified and userspace will
+not know about the new attributes.
+
+This is important for device driver that need to publish additional
+attributes for a device at driver probe time.  If the device driver simply
+calls device_create_file() on the device structure passed to it, then
+userspace will never be notified of the new attributes.
diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst
new file mode 100644
index 000000000000..4ac99122b5f1
--- /dev/null
+++ b/Documentation/driver-api/driver-model/devres.rst
@@ -0,0 +1,414 @@
+================================
+Devres - Managed Device Resource
+================================
+
+Tejun Heo	<teheo@suse.de>
+
+First draft	10 January 2007
+
+.. contents
+
+   1. Intro			: Huh? Devres?
+   2. Devres			: Devres in a nutshell
+   3. Devres Group		: Group devres'es and release them together
+   4. Details			: Life time rules, calling context, ...
+   5. Overhead			: How much do we have to pay for this?
+   6. List of managed interfaces: Currently implemented managed interfaces
+
+
+1. Intro
+--------
+
+devres came up while trying to convert libata to use iomap.  Each
+iomapped address should be kept and unmapped on driver detach.  For
+example, a plain SFF ATA controller (that is, good old PCI IDE) in
+native mode makes use of 5 PCI BARs and all of them should be
+maintained.
+
+As with many other device drivers, libata low level drivers have
+sufficient bugs in ->remove and ->probe failure path.  Well, yes,
+that's probably because libata low level driver developers are lazy
+bunch, but aren't all low level driver developers?  After spending a
+day fiddling with braindamaged hardware with no document or
+braindamaged document, if it's finally working, well, it's working.
+
+For one reason or another, low level drivers don't receive as much
+attention or testing as core code, and bugs on driver detach or
+initialization failure don't happen often enough to be noticeable.
+Init failure path is worse because it's much less travelled while
+needs to handle multiple entry points.
+
+So, many low level drivers end up leaking resources on driver detach
+and having half broken failure path implementation in ->probe() which
+would leak resources or even cause oops when failure occurs.  iomap
+adds more to this mix.  So do msi and msix.
+
+
+2. Devres
+---------
+
+devres is basically linked list of arbitrarily sized memory areas
+associated with a struct device.  Each devres entry is associated with
+a release function.  A devres can be released in several ways.  No
+matter what, all devres entries are released on driver detach.  On
+release, the associated release function is invoked and then the
+devres entry is freed.
+
+Managed interface is created for resources commonly used by device
+drivers using devres.  For example, coherent DMA memory is acquired
+using dma_alloc_coherent().  The managed version is called
+dmam_alloc_coherent().  It is identical to dma_alloc_coherent() except
+for the DMA memory allocated using it is managed and will be
+automatically released on driver detach.  Implementation looks like
+the following::
+
+  struct dma_devres {
+	size_t		size;
+	void		*vaddr;
+	dma_addr_t	dma_handle;
+  };
+
+  static void dmam_coherent_release(struct device *dev, void *res)
+  {
+	struct dma_devres *this = res;
+
+	dma_free_coherent(dev, this->size, this->vaddr, this->dma_handle);
+  }
+
+  dmam_alloc_coherent(dev, size, dma_handle, gfp)
+  {
+	struct dma_devres *dr;
+	void *vaddr;
+
+	dr = devres_alloc(dmam_coherent_release, sizeof(*dr), gfp);
+	...
+
+	/* alloc DMA memory as usual */
+	vaddr = dma_alloc_coherent(...);
+	...
+
+	/* record size, vaddr, dma_handle in dr */
+	dr->vaddr = vaddr;
+	...
+
+	devres_add(dev, dr);
+
+	return vaddr;
+  }
+
+If a driver uses dmam_alloc_coherent(), the area is guaranteed to be
+freed whether initialization fails half-way or the device gets
+detached.  If most resources are acquired using managed interface, a
+driver can have much simpler init and exit code.  Init path basically
+looks like the following::
+
+  my_init_one()
+  {
+	struct mydev *d;
+
+	d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL);
+	if (!d)
+		return -ENOMEM;
+
+	d->ring = dmam_alloc_coherent(...);
+	if (!d->ring)
+		return -ENOMEM;
+
+	if (check something)
+		return -EINVAL;
+	...
+
+	return register_to_upper_layer(d);
+  }
+
+And exit path::
+
+  my_remove_one()
+  {
+	unregister_from_upper_layer(d);
+	shutdown_my_hardware();
+  }
+
+As shown above, low level drivers can be simplified a lot by using
+devres.  Complexity is shifted from less maintained low level drivers
+to better maintained higher layer.  Also, as init failure path is
+shared with exit path, both can get more testing.
+
+Note though that when converting current calls or assignments to
+managed devm_* versions it is up to you to check if internal operations
+like allocating memory, have failed. Managed resources pertains to the
+freeing of these resources *only* - all other checks needed are still
+on you. In some cases this may mean introducing checks that were not
+necessary before moving to the managed devm_* calls.
+
+
+3. Devres group
+---------------
+
+Devres entries can be grouped using devres group.  When a group is
+released, all contained normal devres entries and properly nested
+groups are released.  One usage is to rollback series of acquired
+resources on failure.  For example::
+
+  if (!devres_open_group(dev, NULL, GFP_KERNEL))
+	return -ENOMEM;
+
+  acquire A;
+  if (failed)
+	goto err;
+
+  acquire B;
+  if (failed)
+	goto err;
+  ...
+
+  devres_remove_group(dev, NULL);
+  return 0;
+
+ err:
+  devres_release_group(dev, NULL);
+  return err_code;
+
+As resource acquisition failure usually means probe failure, constructs
+like above are usually useful in midlayer driver (e.g. libata core
+layer) where interface function shouldn't have side effect on failure.
+For LLDs, just returning error code suffices in most cases.
+
+Each group is identified by `void *id`.  It can either be explicitly
+specified by @id argument to devres_open_group() or automatically
+created by passing NULL as @id as in the above example.  In both
+cases, devres_open_group() returns the group's id.  The returned id
+can be passed to other devres functions to select the target group.
+If NULL is given to those functions, the latest open group is
+selected.
+
+For example, you can do something like the following::
+
+  int my_midlayer_create_something()
+  {
+	if (!devres_open_group(dev, my_midlayer_create_something, GFP_KERNEL))
+		return -ENOMEM;
+
+	...
+
+	devres_close_group(dev, my_midlayer_create_something);
+	return 0;
+  }
+
+  void my_midlayer_destroy_something()
+  {
+	devres_release_group(dev, my_midlayer_create_something);
+  }
+
+
+4. Details
+----------
+
+Lifetime of a devres entry begins on devres allocation and finishes
+when it is released or destroyed (removed and freed) - no reference
+counting.
+
+devres core guarantees atomicity to all basic devres operations and
+has support for single-instance devres types (atomic
+lookup-and-add-if-not-found).  Other than that, synchronizing
+concurrent accesses to allocated devres data is caller's
+responsibility.  This is usually non-issue because bus ops and
+resource allocations already do the job.
+
+For an example of single-instance devres type, read pcim_iomap_table()
+in lib/devres.c.
+
+All devres interface functions can be called without context if the
+right gfp mask is given.
+
+
+5. Overhead
+-----------
+
+Each devres bookkeeping info is allocated together with requested data
+area.  With debug option turned off, bookkeeping info occupies 16
+bytes on 32bit machines and 24 bytes on 64bit (three pointers rounded
+up to ull alignment).  If singly linked list is used, it can be
+reduced to two pointers (8 bytes on 32bit, 16 bytes on 64bit).
+
+Each devres group occupies 8 pointers.  It can be reduced to 6 if
+singly linked list is used.
+
+Memory space overhead on ahci controller with two ports is between 300
+and 400 bytes on 32bit machine after naive conversion (we can
+certainly invest a bit more effort into libata core layer).
+
+
+6. List of managed interfaces
+-----------------------------
+
+CLOCK
+  devm_clk_get()
+  devm_clk_get_optional()
+  devm_clk_put()
+  devm_clk_hw_register()
+  devm_of_clk_add_hw_provider()
+  devm_clk_hw_register_clkdev()
+
+DMA
+  dmaenginem_async_device_register()
+  dmam_alloc_coherent()
+  dmam_alloc_attrs()
+  dmam_free_coherent()
+  dmam_pool_create()
+  dmam_pool_destroy()
+
+DRM
+  devm_drm_dev_init()
+
+GPIO
+  devm_gpiod_get()
+  devm_gpiod_get_index()
+  devm_gpiod_get_index_optional()
+  devm_gpiod_get_optional()
+  devm_gpiod_put()
+  devm_gpiod_unhinge()
+  devm_gpiochip_add_data()
+  devm_gpio_request()
+  devm_gpio_request_one()
+  devm_gpio_free()
+
+I2C
+  devm_i2c_new_dummy_device()
+
+IIO
+  devm_iio_device_alloc()
+  devm_iio_device_free()
+  devm_iio_device_register()
+  devm_iio_device_unregister()
+  devm_iio_kfifo_allocate()
+  devm_iio_kfifo_free()
+  devm_iio_triggered_buffer_setup()
+  devm_iio_triggered_buffer_cleanup()
+  devm_iio_trigger_alloc()
+  devm_iio_trigger_free()
+  devm_iio_trigger_register()
+  devm_iio_trigger_unregister()
+  devm_iio_channel_get()
+  devm_iio_channel_release()
+  devm_iio_channel_get_all()
+  devm_iio_channel_release_all()
+
+INPUT
+  devm_input_allocate_device()
+
+IO region
+  devm_release_mem_region()
+  devm_release_region()
+  devm_release_resource()
+  devm_request_mem_region()
+  devm_request_region()
+  devm_request_resource()
+
+IOMAP
+  devm_ioport_map()
+  devm_ioport_unmap()
+  devm_ioremap()
+  devm_ioremap_nocache()
+  devm_ioremap_wc()
+  devm_ioremap_resource() : checks resource, requests memory region, ioremaps
+  devm_iounmap()
+  pcim_iomap()
+  pcim_iomap_regions()	: do request_region() and iomap() on multiple BARs
+  pcim_iomap_table()	: array of mapped addresses indexed by BAR
+  pcim_iounmap()
+
+IRQ
+  devm_free_irq()
+  devm_request_any_context_irq()
+  devm_request_irq()
+  devm_request_threaded_irq()
+  devm_irq_alloc_descs()
+  devm_irq_alloc_desc()
+  devm_irq_alloc_desc_at()
+  devm_irq_alloc_desc_from()
+  devm_irq_alloc_descs_from()
+  devm_irq_alloc_generic_chip()
+  devm_irq_setup_generic_chip()
+  devm_irq_sim_init()
+
+LED
+  devm_led_classdev_register()
+  devm_led_classdev_unregister()
+
+MDIO
+  devm_mdiobus_alloc()
+  devm_mdiobus_alloc_size()
+  devm_mdiobus_free()
+
+MEM
+  devm_free_pages()
+  devm_get_free_pages()
+  devm_kasprintf()
+  devm_kcalloc()
+  devm_kfree()
+  devm_kmalloc()
+  devm_kmalloc_array()
+  devm_kmemdup()
+  devm_kstrdup()
+  devm_kvasprintf()
+  devm_kzalloc()
+
+MFD
+  devm_mfd_add_devices()
+
+MUX
+  devm_mux_chip_alloc()
+  devm_mux_chip_register()
+  devm_mux_control_get()
+
+PER-CPU MEM
+  devm_alloc_percpu()
+  devm_free_percpu()
+
+PCI
+  devm_pci_alloc_host_bridge()  : managed PCI host bridge allocation
+  devm_pci_remap_cfgspace()	: ioremap PCI configuration space
+  devm_pci_remap_cfg_resource()	: ioremap PCI configuration space resource
+  pcim_enable_device()		: after success, all PCI ops become managed
+  pcim_pin_device()		: keep PCI device enabled after release
+
+PHY
+  devm_usb_get_phy()
+  devm_usb_put_phy()
+
+PINCTRL
+  devm_pinctrl_get()
+  devm_pinctrl_put()
+  devm_pinctrl_register()
+  devm_pinctrl_unregister()
+
+POWER
+  devm_reboot_mode_register()
+  devm_reboot_mode_unregister()
+
+PWM
+  devm_pwm_get()
+  devm_pwm_put()
+
+REGULATOR
+  devm_regulator_bulk_get()
+  devm_regulator_get()
+  devm_regulator_put()
+  devm_regulator_register()
+
+RESET
+  devm_reset_control_get()
+  devm_reset_controller_register()
+
+SERDEV
+  devm_serdev_device_open()
+
+SLAVE DMA ENGINE
+  devm_acpi_dma_controller_register()
+
+SPI
+  devm_spi_register_master()
+
+WATCHDOG
+  devm_watchdog_register_device()
diff --git a/Documentation/driver-api/driver-model/driver.rst b/Documentation/driver-api/driver-model/driver.rst
new file mode 100644
index 000000000000..11d281506a04
--- /dev/null
+++ b/Documentation/driver-api/driver-model/driver.rst
@@ -0,0 +1,223 @@
+==============
+Device Drivers
+==============
+
+See the kerneldoc for the struct device_driver.
+
+
+Allocation
+~~~~~~~~~~
+
+Device drivers are statically allocated structures. Though there may
+be multiple devices in a system that a driver supports, struct
+device_driver represents the driver as a whole (not a particular
+device instance).
+
+Initialization
+~~~~~~~~~~~~~~
+
+The driver must initialize at least the name and bus fields. It should
+also initialize the devclass field (when it arrives), so it may obtain
+the proper linkage internally. It should also initialize as many of
+the callbacks as possible, though each is optional.
+
+Declaration
+~~~~~~~~~~~
+
+As stated above, struct device_driver objects are statically
+allocated. Below is an example declaration of the eepro100
+driver. This declaration is hypothetical only; it relies on the driver
+being converted completely to the new model::
+
+  static struct device_driver eepro100_driver = {
+         .name		= "eepro100",
+         .bus		= &pci_bus_type,
+
+         .probe		= eepro100_probe,
+         .remove		= eepro100_remove,
+         .suspend		= eepro100_suspend,
+         .resume		= eepro100_resume,
+  };
+
+Most drivers will not be able to be converted completely to the new
+model because the bus they belong to has a bus-specific structure with
+bus-specific fields that cannot be generalized.
+
+The most common example of this are device ID structures. A driver
+typically defines an array of device IDs that it supports. The format
+of these structures and the semantics for comparing device IDs are
+completely bus-specific. Defining them as bus-specific entities would
+sacrifice type-safety, so we keep bus-specific structures around.
+
+Bus-specific drivers should include a generic struct device_driver in
+the definition of the bus-specific driver. Like this::
+
+  struct pci_driver {
+         const struct pci_device_id *id_table;
+         struct device_driver	  driver;
+  };
+
+A definition that included bus-specific fields would look like
+(using the eepro100 driver again)::
+
+  static struct pci_driver eepro100_driver = {
+         .id_table       = eepro100_pci_tbl,
+         .driver	       = {
+		.name		= "eepro100",
+		.bus		= &pci_bus_type,
+		.probe		= eepro100_probe,
+		.remove		= eepro100_remove,
+		.suspend	= eepro100_suspend,
+		.resume		= eepro100_resume,
+         },
+  };
+
+Some may find the syntax of embedded struct initialization awkward or
+even a bit ugly. So far, it's the best way we've found to do what we want...
+
+Registration
+~~~~~~~~~~~~
+
+::
+
+  int driver_register(struct device_driver *drv);
+
+The driver registers the structure on startup. For drivers that have
+no bus-specific fields (i.e. don't have a bus-specific driver
+structure), they would use driver_register and pass a pointer to their
+struct device_driver object.
+
+Most drivers, however, will have a bus-specific structure and will
+need to register with the bus using something like pci_driver_register.
+
+It is important that drivers register their driver structure as early as
+possible. Registration with the core initializes several fields in the
+struct device_driver object, including the reference count and the
+lock. These fields are assumed to be valid at all times and may be
+used by the device model core or the bus driver.
+
+
+Transition Bus Drivers
+~~~~~~~~~~~~~~~~~~~~~~
+
+By defining wrapper functions, the transition to the new model can be
+made easier. Drivers can ignore the generic structure altogether and
+let the bus wrapper fill in the fields. For the callbacks, the bus can
+define generic callbacks that forward the call to the bus-specific
+callbacks of the drivers.
+
+This solution is intended to be only temporary. In order to get class
+information in the driver, the drivers must be modified anyway. Since
+converting drivers to the new model should reduce some infrastructural
+complexity and code size, it is recommended that they are converted as
+class information is added.
+
+Access
+~~~~~~
+
+Once the object has been registered, it may access the common fields of
+the object, like the lock and the list of devices::
+
+  int driver_for_each_dev(struct device_driver *drv, void *data,
+			  int (*callback)(struct device *dev, void *data));
+
+The devices field is a list of all the devices that have been bound to
+the driver. The LDM core provides a helper function to operate on all
+the devices a driver controls. This helper locks the driver on each
+node access, and does proper reference counting on each device as it
+accesses it.
+
+
+sysfs
+~~~~~
+
+When a driver is registered, a sysfs directory is created in its
+bus's directory. In this directory, the driver can export an interface
+to userspace to control operation of the driver on a global basis;
+e.g. toggling debugging output in the driver.
+
+A future feature of this directory will be a 'devices' directory. This
+directory will contain symlinks to the directories of devices it
+supports.
+
+
+
+Callbacks
+~~~~~~~~~
+
+::
+
+	int	(*probe)	(struct device *dev);
+
+The probe() entry is called in task context, with the bus's rwsem locked
+and the driver partially bound to the device.  Drivers commonly use
+container_of() to convert "dev" to a bus-specific type, both in probe()
+and other routines.  That type often provides device resource data, such
+as pci_dev.resource[] or platform_device.resources, which is used in
+addition to dev->platform_data to initialize the driver.
+
+This callback holds the driver-specific logic to bind the driver to a
+given device.  That includes verifying that the device is present, that
+it's a version the driver can handle, that driver data structures can
+be allocated and initialized, and that any hardware can be initialized.
+Drivers often store a pointer to their state with dev_set_drvdata().
+When the driver has successfully bound itself to that device, then probe()
+returns zero and the driver model code will finish its part of binding
+the driver to that device.
+
+A driver's probe() may return a negative errno value to indicate that
+the driver did not bind to this device, in which case it should have
+released all resources it allocated::
+
+	int 	(*remove)	(struct device *dev);
+
+remove is called to unbind a driver from a device. This may be
+called if a device is physically removed from the system, if the
+driver module is being unloaded, during a reboot sequence, or
+in other cases.
+
+It is up to the driver to determine if the device is present or
+not. It should free any resources allocated specifically for the
+device; i.e. anything in the device's driver_data field.
+
+If the device is still present, it should quiesce the device and place
+it into a supported low-power state::
+
+	int	(*suspend)	(struct device *dev, pm_message_t state);
+
+suspend is called to put the device in a low power state::
+
+	int	(*resume)	(struct device *dev);
+
+Resume is used to bring a device back from a low power state.
+
+
+Attributes
+~~~~~~~~~~
+
+::
+
+  struct driver_attribute {
+          struct attribute        attr;
+          ssize_t (*show)(struct device_driver *driver, char *buf);
+          ssize_t (*store)(struct device_driver *, const char *buf, size_t count);
+  };
+
+Device drivers can export attributes via their sysfs directories.
+Drivers can declare attributes using a DRIVER_ATTR_RW and DRIVER_ATTR_RO
+macro that works identically to the DEVICE_ATTR_RW and DEVICE_ATTR_RO
+macros.
+
+Example::
+
+	DRIVER_ATTR_RW(debug);
+
+This is equivalent to declaring::
+
+	struct driver_attribute driver_attr_debug;
+
+This can then be used to add and remove the attribute from the
+driver's directory using::
+
+  int driver_create_file(struct device_driver *, const struct driver_attribute *);
+  void driver_remove_file(struct device_driver *, const struct driver_attribute *);
diff --git a/Documentation/driver-api/driver-model/index.rst b/Documentation/driver-api/driver-model/index.rst
new file mode 100644
index 000000000000..755016422269
--- /dev/null
+++ b/Documentation/driver-api/driver-model/index.rst
@@ -0,0 +1,24 @@
+============
+Driver Model
+============
+
+.. toctree::
+   :maxdepth: 1
+
+   binding
+   bus
+   class
+   design-patterns
+   device
+   devres
+   driver
+   overview
+   platform
+   porting
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/driver-model/overview.rst b/Documentation/driver-api/driver-model/overview.rst
new file mode 100644
index 000000000000..d4d1e9b40e0c
--- /dev/null
+++ b/Documentation/driver-api/driver-model/overview.rst
@@ -0,0 +1,124 @@
+=============================
+The Linux Kernel Device Model
+=============================
+
+Patrick Mochel	<mochel@digitalimplant.org>
+
+Drafted 26 August 2002
+Updated 31 January 2006
+
+
+Overview
+~~~~~~~~
+
+The Linux Kernel Driver Model is a unification of all the disparate driver
+models that were previously used in the kernel. It is intended to augment the
+bus-specific drivers for bridges and devices by consolidating a set of data
+and operations into globally accessible data structures.
+
+Traditional driver models implemented some sort of tree-like structure
+(sometimes just a list) for the devices they control. There wasn't any
+uniformity across the different bus types.
+
+The current driver model provides a common, uniform data model for describing
+a bus and the devices that can appear under the bus. The unified bus
+model includes a set of common attributes which all busses carry, and a set
+of common callbacks, such as device discovery during bus probing, bus
+shutdown, bus power management, etc.
+
+The common device and bridge interface reflects the goals of the modern
+computer: namely the ability to do seamless device "plug and play", power
+management, and hot plug. In particular, the model dictated by Intel and
+Microsoft (namely ACPI) ensures that almost every device on almost any bus
+on an x86-compatible system can work within this paradigm.  Of course,
+not every bus is able to support all such operations, although most
+buses support most of those operations.
+
+
+Downstream Access
+~~~~~~~~~~~~~~~~~
+
+Common data fields have been moved out of individual bus layers into a common
+data structure. These fields must still be accessed by the bus layers,
+and sometimes by the device-specific drivers.
+
+Other bus layers are encouraged to do what has been done for the PCI layer.
+struct pci_dev now looks like this::
+
+  struct pci_dev {
+	...
+
+	struct device dev;     /* Generic device interface */
+	...
+  };
+
+Note first that the struct device dev within the struct pci_dev is
+statically allocated. This means only one allocation on device discovery.
+
+Note also that that struct device dev is not necessarily defined at the
+front of the pci_dev structure.  This is to make people think about what
+they're doing when switching between the bus driver and the global driver,
+and to discourage meaningless and incorrect casts between the two.
+
+The PCI bus layer freely accesses the fields of struct device. It knows about
+the structure of struct pci_dev, and it should know the structure of struct
+device. Individual PCI device drivers that have been converted to the current
+driver model generally do not and should not touch the fields of struct device,
+unless there is a compelling reason to do so.
+
+The above abstraction prevents unnecessary pain during transitional phases.
+If it were not done this way, then when a field was renamed or removed, every
+downstream driver would break.  On the other hand, if only the bus layer
+(and not the device layer) accesses the struct device, it is only the bus
+layer that needs to change.
+
+
+User Interface
+~~~~~~~~~~~~~~
+
+By virtue of having a complete hierarchical view of all the devices in the
+system, exporting a complete hierarchical view to userspace becomes relatively
+easy. This has been accomplished by implementing a special purpose virtual
+file system named sysfs.
+
+Almost all mainstream Linux distros mount this filesystem automatically; you
+can see some variation of the following in the output of the "mount" command::
+
+  $ mount
+  ...
+  none on /sys type sysfs (rw,noexec,nosuid,nodev)
+  ...
+  $
+
+The auto-mounting of sysfs is typically accomplished by an entry similar to
+the following in the /etc/fstab file::
+
+  none     	/sys	sysfs    defaults	  	0 0
+
+or something similar in the /lib/init/fstab file on Debian-based systems::
+
+  none            /sys    sysfs    nodev,noexec,nosuid    0 0
+
+If sysfs is not automatically mounted, you can always do it manually with::
+
+	# mount -t sysfs sysfs /sys
+
+Whenever a device is inserted into the tree, a directory is created for it.
+This directory may be populated at each layer of discovery - the global layer,
+the bus layer, or the device layer.
+
+The global layer currently creates two files - 'name' and 'power'. The
+former only reports the name of the device. The latter reports the
+current power state of the device. It will also be used to set the current
+power state.
+
+The bus layer may also create files for the devices it finds while probing the
+bus. For example, the PCI layer currently creates 'irq' and 'resource' files
+for each PCI device.
+
+A device-specific driver may also export files in its directory to expose
+device-specific data or tunable interfaces.
+
+More information about the sysfs directory layout can be found in
+the other documents in this directory and in the file
+Documentation/filesystems/sysfs.txt.
diff --git a/Documentation/driver-api/driver-model/platform.rst b/Documentation/driver-api/driver-model/platform.rst
new file mode 100644
index 000000000000..334dd4071ae4
--- /dev/null
+++ b/Documentation/driver-api/driver-model/platform.rst
@@ -0,0 +1,246 @@
+============================
+Platform Devices and Drivers
+============================
+
+See <linux/platform_device.h> for the driver model interface to the
+platform bus:  platform_device, and platform_driver.  This pseudo-bus
+is used to connect devices on busses with minimal infrastructure,
+like those used to integrate peripherals on many system-on-chip
+processors, or some "legacy" PC interconnects; as opposed to large
+formally specified ones like PCI or USB.
+
+
+Platform devices
+~~~~~~~~~~~~~~~~
+Platform devices are devices that typically appear as autonomous
+entities in the system. This includes legacy port-based devices and
+host bridges to peripheral buses, and most controllers integrated
+into system-on-chip platforms.  What they usually have in common
+is direct addressing from a CPU bus.  Rarely, a platform_device will
+be connected through a segment of some other kind of bus; but its
+registers will still be directly addressable.
+
+Platform devices are given a name, used in driver binding, and a
+list of resources such as addresses and IRQs::
+
+  struct platform_device {
+	const char	*name;
+	u32		id;
+	struct device	dev;
+	u32		num_resources;
+	struct resource	*resource;
+  };
+
+
+Platform drivers
+~~~~~~~~~~~~~~~~
+Platform drivers follow the standard driver model convention, where
+discovery/enumeration is handled outside the drivers, and drivers
+provide probe() and remove() methods.  They support power management
+and shutdown notifications using the standard conventions::
+
+  struct platform_driver {
+	int (*probe)(struct platform_device *);
+	int (*remove)(struct platform_device *);
+	void (*shutdown)(struct platform_device *);
+	int (*suspend)(struct platform_device *, pm_message_t state);
+	int (*suspend_late)(struct platform_device *, pm_message_t state);
+	int (*resume_early)(struct platform_device *);
+	int (*resume)(struct platform_device *);
+	struct device_driver driver;
+  };
+
+Note that probe() should in general verify that the specified device hardware
+actually exists; sometimes platform setup code can't be sure.  The probing
+can use device resources, including clocks, and device platform_data.
+
+Platform drivers register themselves the normal way::
+
+	int platform_driver_register(struct platform_driver *drv);
+
+Or, in common situations where the device is known not to be hot-pluggable,
+the probe() routine can live in an init section to reduce the driver's
+runtime memory footprint::
+
+	int platform_driver_probe(struct platform_driver *drv,
+			  int (*probe)(struct platform_device *))
+
+Kernel modules can be composed of several platform drivers. The platform core
+provides helpers to register and unregister an array of drivers::
+
+	int __platform_register_drivers(struct platform_driver * const *drivers,
+				      unsigned int count, struct module *owner);
+	void platform_unregister_drivers(struct platform_driver * const *drivers,
+					 unsigned int count);
+
+If one of the drivers fails to register, all drivers registered up to that
+point will be unregistered in reverse order. Note that there is a convenience
+macro that passes THIS_MODULE as owner parameter::
+
+	#define platform_register_drivers(drivers, count)
+
+
+Device Enumeration
+~~~~~~~~~~~~~~~~~~
+As a rule, platform specific (and often board-specific) setup code will
+register platform devices::
+
+	int platform_device_register(struct platform_device *pdev);
+
+	int platform_add_devices(struct platform_device **pdevs, int ndev);
+
+The general rule is to register only those devices that actually exist,
+but in some cases extra devices might be registered.  For example, a kernel
+might be configured to work with an external network adapter that might not
+be populated on all boards, or likewise to work with an integrated controller
+that some boards might not hook up to any peripherals.
+
+In some cases, boot firmware will export tables describing the devices
+that are populated on a given board.   Without such tables, often the
+only way for system setup code to set up the correct devices is to build
+a kernel for a specific target board.  Such board-specific kernels are
+common with embedded and custom systems development.
+
+In many cases, the memory and IRQ resources associated with the platform
+device are not enough to let the device's driver work.  Board setup code
+will often provide additional information using the device's platform_data
+field to hold additional information.
+
+Embedded systems frequently need one or more clocks for platform devices,
+which are normally kept off until they're actively needed (to save power).
+System setup also associates those clocks with the device, so that that
+calls to clk_get(&pdev->dev, clock_name) return them as needed.
+
+
+Legacy Drivers:  Device Probing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Some drivers are not fully converted to the driver model, because they take
+on a non-driver role:  the driver registers its platform device, rather than
+leaving that for system infrastructure.  Such drivers can't be hotplugged
+or coldplugged, since those mechanisms require device creation to be in a
+different system component than the driver.
+
+The only "good" reason for this is to handle older system designs which, like
+original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware
+configuration.  Newer systems have largely abandoned that model, in favor of
+bus-level support for dynamic configuration (PCI, USB), or device tables
+provided by the boot firmware (e.g. PNPACPI on x86).  There are too many
+conflicting options about what might be where, and even educated guesses by
+an operating system will be wrong often enough to make trouble.
+
+This style of driver is discouraged.  If you're updating such a driver,
+please try to move the device enumeration to a more appropriate location,
+outside the driver.  This will usually be cleanup, since such drivers
+tend to already have "normal" modes, such as ones using device nodes that
+were created by PNP or by platform device setup.
+
+None the less, there are some APIs to support such legacy drivers.  Avoid
+using these calls except with such hotplug-deficient drivers::
+
+	struct platform_device *platform_device_alloc(
+			const char *name, int id);
+
+You can use platform_device_alloc() to dynamically allocate a device, which
+you will then initialize with resources and platform_device_register().
+A better solution is usually::
+
+	struct platform_device *platform_device_register_simple(
+			const char *name, int id,
+			struct resource *res, unsigned int nres);
+
+You can use platform_device_register_simple() as a one-step call to allocate
+and register a device.
+
+
+Device Naming and Driver Binding
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The platform_device.dev.bus_id is the canonical name for the devices.
+It's built from two components:
+
+    * platform_device.name ... which is also used to for driver matching.
+
+    * platform_device.id ... the device instance number, or else "-1"
+      to indicate there's only one.
+
+These are concatenated, so name/id "serial"/0 indicates bus_id "serial.0", and
+"serial/3" indicates bus_id "serial.3"; both would use the platform_driver
+named "serial".  While "my_rtc"/-1 would be bus_id "my_rtc" (no instance id)
+and use the platform_driver called "my_rtc".
+
+Driver binding is performed automatically by the driver core, invoking
+driver probe() after finding a match between device and driver.  If the
+probe() succeeds, the driver and device are bound as usual.  There are
+three different ways to find such a match:
+
+    - Whenever a device is registered, the drivers for that bus are
+      checked for matches.  Platform devices should be registered very
+      early during system boot.
+
+    - When a driver is registered using platform_driver_register(), all
+      unbound devices on that bus are checked for matches.  Drivers
+      usually register later during booting, or by module loading.
+
+    - Registering a driver using platform_driver_probe() works just like
+      using platform_driver_register(), except that the driver won't
+      be probed later if another device registers.  (Which is OK, since
+      this interface is only for use with non-hotpluggable devices.)
+
+
+Early Platform Devices and Drivers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The early platform interfaces provide platform data to platform device
+drivers early on during the system boot. The code is built on top of the
+early_param() command line parsing and can be executed very early on.
+
+Example: "earlyprintk" class early serial console in 6 steps
+
+1. Registering early platform device data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The architecture code registers platform device data using the function
+early_platform_add_devices(). In the case of early serial console this
+should be hardware configuration for the serial port. Devices registered
+at this point will later on be matched against early platform drivers.
+
+2. Parsing kernel command line
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The architecture code calls parse_early_param() to parse the kernel
+command line. This will execute all matching early_param() callbacks.
+User specified early platform devices will be registered at this point.
+For the early serial console case the user can specify port on the
+kernel command line as "earlyprintk=serial.0" where "earlyprintk" is
+the class string, "serial" is the name of the platform driver and
+0 is the platform device id. If the id is -1 then the dot and the
+id can be omitted.
+
+3. Installing early platform drivers belonging to a certain class
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The architecture code may optionally force registration of all early
+platform drivers belonging to a certain class using the function
+early_platform_driver_register_all(). User specified devices from
+step 2 have priority over these. This step is omitted by the serial
+driver example since the early serial driver code should be disabled
+unless the user has specified port on the kernel command line.
+
+4. Early platform driver registration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Compiled-in platform drivers making use of early_platform_init() are
+automatically registered during step 2 or 3. The serial driver example
+should use early_platform_init("earlyprintk", &platform_driver).
+
+5. Probing of early platform drivers belonging to a certain class
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The architecture code calls early_platform_driver_probe() to match
+registered early platform devices associated with a certain class with
+registered early platform drivers. Matched devices will get probed().
+This step can be executed at any point during the early boot. As soon
+as possible may be good for the serial port case.
+
+6. Inside the early platform driver probe()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The driver code needs to take special care during early boot, especially
+when it comes to memory allocation and interrupt registration. The code
+in the probe() function can use is_early_platform_device() to check if
+it is called at early platform device or at the regular platform device
+time. The early serial driver performs register_console() at this point.
+
+For further information, see <linux/platform_device.h>.
diff --git a/Documentation/driver-api/driver-model/porting.rst b/Documentation/driver-api/driver-model/porting.rst
new file mode 100644
index 000000000000..931ea879af3f
--- /dev/null
+++ b/Documentation/driver-api/driver-model/porting.rst
@@ -0,0 +1,448 @@
+=======================================
+Porting Drivers to the New Driver Model
+=======================================
+
+Patrick Mochel
+
+7 January 2003
+
+
+Overview
+
+Please refer to `Documentation/driver-api/driver-model/*.rst` for definitions of
+various driver types and concepts.
+
+Most of the work of porting devices drivers to the new model happens
+at the bus driver layer. This was intentional, to minimize the
+negative effect on kernel drivers, and to allow a gradual transition
+of bus drivers.
+
+In a nutshell, the driver model consists of a set of objects that can
+be embedded in larger, bus-specific objects. Fields in these generic
+objects can replace fields in the bus-specific objects.
+
+The generic objects must be registered with the driver model core. By
+doing so, they will exported via the sysfs filesystem. sysfs can be
+mounted by doing::
+
+	# mount -t sysfs sysfs /sys
+
+
+
+The Process
+
+Step 0: Read include/linux/device.h for object and function definitions.
+
+Step 1: Registering the bus driver.
+
+
+- Define a struct bus_type for the bus driver::
+
+    struct bus_type pci_bus_type = {
+          .name           = "pci",
+    };
+
+
+- Register the bus type.
+
+  This should be done in the initialization function for the bus type,
+  which is usually the module_init(), or equivalent, function::
+
+    static int __init pci_driver_init(void)
+    {
+            return bus_register(&pci_bus_type);
+    }
+
+    subsys_initcall(pci_driver_init);
+
+
+  The bus type may be unregistered (if the bus driver may be compiled
+  as a module) by doing::
+
+     bus_unregister(&pci_bus_type);
+
+
+- Export the bus type for others to use.
+
+  Other code may wish to reference the bus type, so declare it in a
+  shared header file and export the symbol.
+
+From include/linux/pci.h::
+
+  extern struct bus_type pci_bus_type;
+
+
+From file the above code appears in::
+
+  EXPORT_SYMBOL(pci_bus_type);
+
+
+
+- This will cause the bus to show up in /sys/bus/pci/ with two
+  subdirectories: 'devices' and 'drivers'::
+
+    # tree -d /sys/bus/pci/
+    /sys/bus/pci/
+    |-- devices
+    `-- drivers
+
+
+
+Step 2: Registering Devices.
+
+struct device represents a single device. It mainly contains metadata
+describing the relationship the device has to other entities.
+
+
+- Embed a struct device in the bus-specific device type::
+
+
+    struct pci_dev {
+           ...
+           struct  device  dev;            /* Generic device interface */
+           ...
+    };
+
+  It is recommended that the generic device not be the first item in
+  the struct to discourage programmers from doing mindless casts
+  between the object types. Instead macros, or inline functions,
+  should be created to convert from the generic object type::
+
+
+    #define to_pci_dev(n) container_of(n, struct pci_dev, dev)
+
+    or
+
+    static inline struct pci_dev * to_pci_dev(struct kobject * kobj)
+    {
+	return container_of(n, struct pci_dev, dev);
+    }
+
+  This allows the compiler to verify type-safety of the operations
+  that are performed (which is Good).
+
+
+- Initialize the device on registration.
+
+  When devices are discovered or registered with the bus type, the
+  bus driver should initialize the generic device. The most important
+  things to initialize are the bus_id, parent, and bus fields.
+
+  The bus_id is an ASCII string that contains the device's address on
+  the bus. The format of this string is bus-specific. This is
+  necessary for representing devices in sysfs.
+
+  parent is the physical parent of the device. It is important that
+  the bus driver sets this field correctly.
+
+  The driver model maintains an ordered list of devices that it uses
+  for power management. This list must be in order to guarantee that
+  devices are shutdown before their physical parents, and vice versa.
+  The order of this list is determined by the parent of registered
+  devices.
+
+  Also, the location of the device's sysfs directory depends on a
+  device's parent. sysfs exports a directory structure that mirrors
+  the device hierarchy. Accurately setting the parent guarantees that
+  sysfs will accurately represent the hierarchy.
+
+  The device's bus field is a pointer to the bus type the device
+  belongs to. This should be set to the bus_type that was declared
+  and initialized before.
+
+  Optionally, the bus driver may set the device's name and release
+  fields.
+
+  The name field is an ASCII string describing the device, like
+
+     "ATI Technologies Inc Radeon QD"
+
+  The release field is a callback that the driver model core calls
+  when the device has been removed, and all references to it have
+  been released. More on this in a moment.
+
+
+- Register the device.
+
+  Once the generic device has been initialized, it can be registered
+  with the driver model core by doing::
+
+       device_register(&dev->dev);
+
+  It can later be unregistered by doing::
+
+       device_unregister(&dev->dev);
+
+  This should happen on buses that support hotpluggable devices.
+  If a bus driver unregisters a device, it should not immediately free
+  it. It should instead wait for the driver model core to call the
+  device's release method, then free the bus-specific object.
+  (There may be other code that is currently referencing the device
+  structure, and it would be rude to free the device while that is
+  happening).
+
+
+  When the device is registered, a directory in sysfs is created.
+  The PCI tree in sysfs looks like::
+
+    /sys/devices/pci0/
+    |-- 00:00.0
+    |-- 00:01.0
+    |   `-- 01:00.0
+    |-- 00:02.0
+    |   `-- 02:1f.0
+    |       `-- 03:00.0
+    |-- 00:1e.0
+    |   `-- 04:04.0
+    |-- 00:1f.0
+    |-- 00:1f.1
+    |   |-- ide0
+    |   |   |-- 0.0
+    |   |   `-- 0.1
+    |   `-- ide1
+    |       `-- 1.0
+    |-- 00:1f.2
+    |-- 00:1f.3
+    `-- 00:1f.5
+
+  Also, symlinks are created in the bus's 'devices' directory
+  that point to the device's directory in the physical hierarchy::
+
+    /sys/bus/pci/devices/
+    |-- 00:00.0 -> ../../../devices/pci0/00:00.0
+    |-- 00:01.0 -> ../../../devices/pci0/00:01.0
+    |-- 00:02.0 -> ../../../devices/pci0/00:02.0
+    |-- 00:1e.0 -> ../../../devices/pci0/00:1e.0
+    |-- 00:1f.0 -> ../../../devices/pci0/00:1f.0
+    |-- 00:1f.1 -> ../../../devices/pci0/00:1f.1
+    |-- 00:1f.2 -> ../../../devices/pci0/00:1f.2
+    |-- 00:1f.3 -> ../../../devices/pci0/00:1f.3
+    |-- 00:1f.5 -> ../../../devices/pci0/00:1f.5
+    |-- 01:00.0 -> ../../../devices/pci0/00:01.0/01:00.0
+    |-- 02:1f.0 -> ../../../devices/pci0/00:02.0/02:1f.0
+    |-- 03:00.0 -> ../../../devices/pci0/00:02.0/02:1f.0/03:00.0
+    `-- 04:04.0 -> ../../../devices/pci0/00:1e.0/04:04.0
+
+
+
+Step 3: Registering Drivers.
+
+struct device_driver is a simple driver structure that contains a set
+of operations that the driver model core may call.
+
+
+- Embed a struct device_driver in the bus-specific driver.
+
+  Just like with devices, do something like::
+
+    struct pci_driver {
+           ...
+           struct device_driver    driver;
+    };
+
+
+- Initialize the generic driver structure.
+
+  When the driver registers with the bus (e.g. doing pci_register_driver()),
+  initialize the necessary fields of the driver: the name and bus
+  fields.
+
+
+- Register the driver.
+
+  After the generic driver has been initialized, call::
+
+	driver_register(&drv->driver);
+
+  to register the driver with the core.
+
+  When the driver is unregistered from the bus, unregister it from the
+  core by doing::
+
+        driver_unregister(&drv->driver);
+
+  Note that this will block until all references to the driver have
+  gone away. Normally, there will not be any.
+
+
+- Sysfs representation.
+
+  Drivers are exported via sysfs in their bus's 'driver's directory.
+  For example::
+
+    /sys/bus/pci/drivers/
+    |-- 3c59x
+    |-- Ensoniq AudioPCI
+    |-- agpgart-amdk7
+    |-- e100
+    `-- serial
+
+
+Step 4: Define Generic Methods for Drivers.
+
+struct device_driver defines a set of operations that the driver model
+core calls. Most of these operations are probably similar to
+operations the bus already defines for drivers, but taking different
+parameters.
+
+It would be difficult and tedious to force every driver on a bus to
+simultaneously convert their drivers to generic format. Instead, the
+bus driver should define single instances of the generic methods that
+forward call to the bus-specific drivers. For instance::
+
+
+  static int pci_device_remove(struct device * dev)
+  {
+          struct pci_dev * pci_dev = to_pci_dev(dev);
+          struct pci_driver * drv = pci_dev->driver;
+
+          if (drv) {
+                  if (drv->remove)
+                          drv->remove(pci_dev);
+                  pci_dev->driver = NULL;
+          }
+          return 0;
+  }
+
+
+The generic driver should be initialized with these methods before it
+is registered::
+
+        /* initialize common driver fields */
+        drv->driver.name = drv->name;
+        drv->driver.bus = &pci_bus_type;
+        drv->driver.probe = pci_device_probe;
+        drv->driver.resume = pci_device_resume;
+        drv->driver.suspend = pci_device_suspend;
+        drv->driver.remove = pci_device_remove;
+
+        /* register with core */
+        driver_register(&drv->driver);
+
+
+Ideally, the bus should only initialize the fields if they are not
+already set. This allows the drivers to implement their own generic
+methods.
+
+
+Step 5: Support generic driver binding.
+
+The model assumes that a device or driver can be dynamically
+registered with the bus at any time. When registration happens,
+devices must be bound to a driver, or drivers must be bound to all
+devices that it supports.
+
+A driver typically contains a list of device IDs that it supports. The
+bus driver compares these IDs to the IDs of devices registered with it.
+The format of the device IDs, and the semantics for comparing them are
+bus-specific, so the generic model does attempt to generalize them.
+
+Instead, a bus may supply a method in struct bus_type that does the
+comparison::
+
+  int (*match)(struct device * dev, struct device_driver * drv);
+
+match should return positive value if the driver supports the device,
+and zero otherwise. It may also return error code (for example
+-EPROBE_DEFER) if determining that given driver supports the device is
+not possible.
+
+When a device is registered, the bus's list of drivers is iterated
+over. bus->match() is called for each one until a match is found.
+
+When a driver is registered, the bus's list of devices is iterated
+over. bus->match() is called for each device that is not already
+claimed by a driver.
+
+When a device is successfully bound to a driver, device->driver is
+set, the device is added to a per-driver list of devices, and a
+symlink is created in the driver's sysfs directory that points to the
+device's physical directory::
+
+  /sys/bus/pci/drivers/
+  |-- 3c59x
+  |   `-- 00:0b.0 -> ../../../../devices/pci0/00:0b.0
+  |-- Ensoniq AudioPCI
+  |-- agpgart-amdk7
+  |   `-- 00:00.0 -> ../../../../devices/pci0/00:00.0
+  |-- e100
+  |   `-- 00:0c.0 -> ../../../../devices/pci0/00:0c.0
+  `-- serial
+
+
+This driver binding should replace the existing driver binding
+mechanism the bus currently uses.
+
+
+Step 6: Supply a hotplug callback.
+
+Whenever a device is registered with the driver model core, the
+userspace program /sbin/hotplug is called to notify userspace.
+Users can define actions to perform when a device is inserted or
+removed.
+
+The driver model core passes several arguments to userspace via
+environment variables, including
+
+- ACTION: set to 'add' or 'remove'
+- DEVPATH: set to the device's physical path in sysfs.
+
+A bus driver may also supply additional parameters for userspace to
+consume. To do this, a bus must implement the 'hotplug' method in
+struct bus_type::
+
+     int (*hotplug) (struct device *dev, char **envp,
+                     int num_envp, char *buffer, int buffer_size);
+
+This is called immediately before /sbin/hotplug is executed.
+
+
+Step 7: Cleaning up the bus driver.
+
+The generic bus, device, and driver structures provide several fields
+that can replace those defined privately to the bus driver.
+
+- Device list.
+
+struct bus_type contains a list of all devices registered with the bus
+type. This includes all devices on all instances of that bus type.
+An internal list that the bus uses may be removed, in favor of using
+this one.
+
+The core provides an iterator to access these devices::
+
+  int bus_for_each_dev(struct bus_type * bus, struct device * start,
+                       void * data, int (*fn)(struct device *, void *));
+
+
+- Driver list.
+
+struct bus_type also contains a list of all drivers registered with
+it. An internal list of drivers that the bus driver maintains may
+be removed in favor of using the generic one.
+
+The drivers may be iterated over, like devices::
+
+  int bus_for_each_drv(struct bus_type * bus, struct device_driver * start,
+                       void * data, int (*fn)(struct device_driver *, void *));
+
+
+Please see drivers/base/bus.c for more information.
+
+
+- rwsem
+
+struct bus_type contains an rwsem that protects all core accesses to
+the device and driver lists. This can be used by the bus driver
+internally, and should be used when accessing the device or driver
+lists the bus maintains.
+
+
+- Device and driver fields.
+
+Some of the fields in struct device and struct device_driver duplicate
+fields in the bus-specific representations of these objects. Feel free
+to remove the bus-specific ones and favor the generic ones. Note
+though, that this will likely mean fixing up all the drivers that
+reference the bus-specific fields (though those should all be 1-line
+changes).
diff --git a/Documentation/driver-api/gpio/driver.rst b/Documentation/driver-api/gpio/driver.rst
index 349f2dc33029..921c71a3d683 100644
--- a/Documentation/driver-api/gpio/driver.rst
+++ b/Documentation/driver-api/gpio/driver.rst
@@ -399,7 +399,7 @@ symbol:
   will pass the struct gpio_chip* for the chip to all IRQ callbacks, so the
   callbacks need to embed the gpio_chip in its state container and obtain a
   pointer to the container using container_of().
-  (See Documentation/driver-model/design-patterns.rst)
+  (See Documentation/driver-api/driver-model/design-patterns.rst)
 
 - gpiochip_irqchip_add_nested(): adds a nested cascaded irqchip to a gpiochip,
   as discussed above regarding different types of cascaded irqchips. The
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index b4c993ff7655..9fb03b7bdeb1 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -14,6 +14,7 @@ available subsections can be seen below.
 .. toctree::
    :maxdepth: 2
 
+   driver-model/index
    basics
    infrastructure
    early-userspace/index
diff --git a/Documentation/driver-model/binding.rst b/Documentation/driver-model/binding.rst
deleted file mode 100644
index 7ea1d7a41e1d..000000000000
--- a/Documentation/driver-model/binding.rst
+++ /dev/null
@@ -1,98 +0,0 @@
-==============
-Driver Binding
-==============
-
-Driver binding is the process of associating a device with a device
-driver that can control it. Bus drivers have typically handled this
-because there have been bus-specific structures to represent the
-devices and the drivers. With generic device and device driver
-structures, most of the binding can take place using common code.
-
-
-Bus
-~~~
-
-The bus type structure contains a list of all devices that are on that bus
-type in the system. When device_register is called for a device, it is
-inserted into the end of this list. The bus object also contains a
-list of all drivers of that bus type. When driver_register is called
-for a driver, it is inserted at the end of this list. These are the
-two events which trigger driver binding.
-
-
-device_register
-~~~~~~~~~~~~~~~
-
-When a new device is added, the bus's list of drivers is iterated over
-to find one that supports it. In order to determine that, the device
-ID of the device must match one of the device IDs that the driver
-supports. The format and semantics for comparing IDs is bus-specific.
-Instead of trying to derive a complex state machine and matching
-algorithm, it is up to the bus driver to provide a callback to compare
-a device against the IDs of a driver. The bus returns 1 if a match was
-found; 0 otherwise.
-
-int match(struct device * dev, struct device_driver * drv);
-
-If a match is found, the device's driver field is set to the driver
-and the driver's probe callback is called. This gives the driver a
-chance to verify that it really does support the hardware, and that
-it's in a working state.
-
-Device Class
-~~~~~~~~~~~~
-
-Upon the successful completion of probe, the device is registered with
-the class to which it belongs. Device drivers belong to one and only one
-class, and that is set in the driver's devclass field.
-devclass_add_device is called to enumerate the device within the class
-and actually register it with the class, which happens with the
-class's register_dev callback.
-
-
-Driver
-~~~~~~
-
-When a driver is attached to a device, the device is inserted into the
-driver's list of devices.
-
-
-sysfs
-~~~~~
-
-A symlink is created in the bus's 'devices' directory that points to
-the device's directory in the physical hierarchy.
-
-A symlink is created in the driver's 'devices' directory that points
-to the device's directory in the physical hierarchy.
-
-A directory for the device is created in the class's directory. A
-symlink is created in that directory that points to the device's
-physical location in the sysfs tree.
-
-A symlink can be created (though this isn't done yet) in the device's
-physical directory to either its class directory, or the class's
-top-level directory. One can also be created to point to its driver's
-directory also.
-
-
-driver_register
-~~~~~~~~~~~~~~~
-
-The process is almost identical for when a new driver is added.
-The bus's list of devices is iterated over to find a match. Devices
-that already have a driver are skipped. All the devices are iterated
-over, to bind as many devices as possible to the driver.
-
-
-Removal
-~~~~~~~
-
-When a device is removed, the reference count for it will eventually
-go to 0. When it does, the remove callback of the driver is called. It
-is removed from the driver's list of devices and the reference count
-of the driver is decremented. All symlinks between the two are removed.
-
-When a driver is removed, the list of devices that it supports is
-iterated over, and the driver's remove callback is called for each
-one. The device is removed from that list and the symlinks removed.
diff --git a/Documentation/driver-model/bus.rst b/Documentation/driver-model/bus.rst
deleted file mode 100644
index 016b15a6e8ea..000000000000
--- a/Documentation/driver-model/bus.rst
+++ /dev/null
@@ -1,146 +0,0 @@
-=========
-Bus Types
-=========
-
-Definition
-~~~~~~~~~~
-See the kerneldoc for the struct bus_type.
-
-int bus_register(struct bus_type * bus);
-
-
-Declaration
-~~~~~~~~~~~
-
-Each bus type in the kernel (PCI, USB, etc) should declare one static
-object of this type. They must initialize the name field, and may
-optionally initialize the match callback::
-
-   struct bus_type pci_bus_type = {
-          .name	= "pci",
-          .match	= pci_bus_match,
-   };
-
-The structure should be exported to drivers in a header file:
-
-extern struct bus_type pci_bus_type;
-
-
-Registration
-~~~~~~~~~~~~
-
-When a bus driver is initialized, it calls bus_register. This
-initializes the rest of the fields in the bus object and inserts it
-into a global list of bus types. Once the bus object is registered,
-the fields in it are usable by the bus driver.
-
-
-Callbacks
-~~~~~~~~~
-
-match(): Attaching Drivers to Devices
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The format of device ID structures and the semantics for comparing
-them are inherently bus-specific. Drivers typically declare an array
-of device IDs of devices they support that reside in a bus-specific
-driver structure.
-
-The purpose of the match callback is to give the bus an opportunity to
-determine if a particular driver supports a particular device by
-comparing the device IDs the driver supports with the device ID of a
-particular device, without sacrificing bus-specific functionality or
-type-safety.
-
-When a driver is registered with the bus, the bus's list of devices is
-iterated over, and the match callback is called for each device that
-does not have a driver associated with it.
-
-
-
-Device and Driver Lists
-~~~~~~~~~~~~~~~~~~~~~~~
-
-The lists of devices and drivers are intended to replace the local
-lists that many buses keep. They are lists of struct devices and
-struct device_drivers, respectively. Bus drivers are free to use the
-lists as they please, but conversion to the bus-specific type may be
-necessary.
-
-The LDM core provides helper functions for iterating over each list::
-
-  int bus_for_each_dev(struct bus_type * bus, struct device * start,
-		       void * data,
-		       int (*fn)(struct device *, void *));
-
-  int bus_for_each_drv(struct bus_type * bus, struct device_driver * start,
-		       void * data, int (*fn)(struct device_driver *, void *));
-
-These helpers iterate over the respective list, and call the callback
-for each device or driver in the list. All list accesses are
-synchronized by taking the bus's lock (read currently). The reference
-count on each object in the list is incremented before the callback is
-called; it is decremented after the next object has been obtained. The
-lock is not held when calling the callback.
-
-
-sysfs
-~~~~~~~~
-There is a top-level directory named 'bus'.
-
-Each bus gets a directory in the bus directory, along with two default
-directories::
-
-	/sys/bus/pci/
-	|-- devices
-	`-- drivers
-
-Drivers registered with the bus get a directory in the bus's drivers
-directory::
-
-	/sys/bus/pci/
-	|-- devices
-	`-- drivers
-	    |-- Intel ICH
-	    |-- Intel ICH Joystick
-	    |-- agpgart
-	    `-- e100
-
-Each device that is discovered on a bus of that type gets a symlink in
-the bus's devices directory to the device's directory in the physical
-hierarchy::
-
-	/sys/bus/pci/
-	|-- devices
-	|   |-- 00:00.0 -> ../../../root/pci0/00:00.0
-	|   |-- 00:01.0 -> ../../../root/pci0/00:01.0
-	|   `-- 00:02.0 -> ../../../root/pci0/00:02.0
-	`-- drivers
-
-
-Exporting Attributes
-~~~~~~~~~~~~~~~~~~~~
-
-::
-
-  struct bus_attribute {
-	struct attribute	attr;
-	ssize_t (*show)(struct bus_type *, char * buf);
-	ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
-  };
-
-Bus drivers can export attributes using the BUS_ATTR_RW macro that works
-similarly to the DEVICE_ATTR_RW macro for devices. For example, a
-definition like this::
-
-	static BUS_ATTR_RW(debug);
-
-is equivalent to declaring::
-
-	static bus_attribute bus_attr_debug;
-
-This can then be used to add and remove the attribute from the bus's
-sysfs directory using::
-
-	int bus_create_file(struct bus_type *, struct bus_attribute *);
-	void bus_remove_file(struct bus_type *, struct bus_attribute *);
diff --git a/Documentation/driver-model/class.rst b/Documentation/driver-model/class.rst
deleted file mode 100644
index fff55b80e86a..000000000000
--- a/Documentation/driver-model/class.rst
+++ /dev/null
@@ -1,149 +0,0 @@
-==============
-Device Classes
-==============
-
-Introduction
-~~~~~~~~~~~~
-A device class describes a type of device, like an audio or network
-device. The following device classes have been identified:
-
-<Insert List of Device Classes Here>
-
-
-Each device class defines a set of semantics and a programming interface
-that devices of that class adhere to. Device drivers are the
-implementation of that programming interface for a particular device on
-a particular bus.
-
-Device classes are agnostic with respect to what bus a device resides
-on.
-
-
-Programming Interface
-~~~~~~~~~~~~~~~~~~~~~
-The device class structure looks like::
-
-
-  typedef int (*devclass_add)(struct device *);
-  typedef void (*devclass_remove)(struct device *);
-
-See the kerneldoc for the struct class.
-
-A typical device class definition would look like::
-
-  struct device_class input_devclass = {
-        .name		= "input",
-        .add_device	= input_add_device,
-	.remove_device	= input_remove_device,
-  };
-
-Each device class structure should be exported in a header file so it
-can be used by drivers, extensions and interfaces.
-
-Device classes are registered and unregistered with the core using::
-
-  int devclass_register(struct device_class * cls);
-  void devclass_unregister(struct device_class * cls);
-
-
-Devices
-~~~~~~~
-As devices are bound to drivers, they are added to the device class
-that the driver belongs to. Before the driver model core, this would
-typically happen during the driver's probe() callback, once the device
-has been initialized. It now happens after the probe() callback
-finishes from the core.
-
-The device is enumerated in the class. Each time a device is added to
-the class, the class's devnum field is incremented and assigned to the
-device. The field is never decremented, so if the device is removed
-from the class and re-added, it will receive a different enumerated
-value.
-
-The class is allowed to create a class-specific structure for the
-device and store it in the device's class_data pointer.
-
-There is no list of devices in the device class. Each driver has a
-list of devices that it supports. The device class has a list of
-drivers of that particular class. To access all of the devices in the
-class, iterate over the device lists of each driver in the class.
-
-
-Device Drivers
-~~~~~~~~~~~~~~
-Device drivers are added to device classes when they are registered
-with the core. A driver specifies the class it belongs to by setting
-the struct device_driver::devclass field.
-
-
-sysfs directory structure
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-There is a top-level sysfs directory named 'class'.
-
-Each class gets a directory in the class directory, along with two
-default subdirectories::
-
-        class/
-        `-- input
-            |-- devices
-            `-- drivers
-
-
-Drivers registered with the class get a symlink in the drivers/ directory
-that points to the driver's directory (under its bus directory)::
-
-   class/
-   `-- input
-       |-- devices
-       `-- drivers
-           `-- usb:usb_mouse -> ../../../bus/drivers/usb_mouse/
-
-
-Each device gets a symlink in the devices/ directory that points to the
-device's directory in the physical hierarchy::
-
-   class/
-   `-- input
-       |-- devices
-       |   `-- 1 -> ../../../root/pci0/00:1f.0/usb_bus/00:1f.2-1:0/
-       `-- drivers
-
-
-Exporting Attributes
-~~~~~~~~~~~~~~~~~~~~
-
-::
-
-  struct devclass_attribute {
-        struct attribute        attr;
-        ssize_t (*show)(struct device_class *, char * buf, size_t count, loff_t off);
-        ssize_t (*store)(struct device_class *, const char * buf, size_t count, loff_t off);
-  };
-
-Class drivers can export attributes using the DEVCLASS_ATTR macro that works
-similarly to the DEVICE_ATTR macro for devices. For example, a definition
-like this::
-
-  static DEVCLASS_ATTR(debug,0644,show_debug,store_debug);
-
-is equivalent to declaring::
-
-  static devclass_attribute devclass_attr_debug;
-
-The bus driver can add and remove the attribute from the class's
-sysfs directory using::
-
-  int devclass_create_file(struct device_class *, struct devclass_attribute *);
-  void devclass_remove_file(struct device_class *, struct devclass_attribute *);
-
-In the example above, the file will be named 'debug' in placed in the
-class's directory in sysfs.
-
-
-Interfaces
-~~~~~~~~~~
-There may exist multiple mechanisms for accessing the same device of a
-particular class type. Device interfaces describe these mechanisms.
-
-When a device is added to a device class, the core attempts to add it
-to every interface that is registered with the device class.
diff --git a/Documentation/driver-model/design-patterns.rst b/Documentation/driver-model/design-patterns.rst
deleted file mode 100644
index 41eb8f41f7dd..000000000000
--- a/Documentation/driver-model/design-patterns.rst
+++ /dev/null
@@ -1,116 +0,0 @@
-=============================
-Device Driver Design Patterns
-=============================
-
-This document describes a few common design patterns found in device drivers.
-It is likely that subsystem maintainers will ask driver developers to
-conform to these design patterns.
-
-1. State Container
-2. container_of()
-
-
-1. State Container
-~~~~~~~~~~~~~~~~~~
-
-While the kernel contains a few device drivers that assume that they will
-only be probed() once on a certain system (singletons), it is custom to assume
-that the device the driver binds to will appear in several instances. This
-means that the probe() function and all callbacks need to be reentrant.
-
-The most common way to achieve this is to use the state container design
-pattern. It usually has this form::
-
-  struct foo {
-      spinlock_t lock; /* Example member */
-      (...)
-  };
-
-  static int foo_probe(...)
-  {
-      struct foo *foo;
-
-      foo = devm_kzalloc(dev, sizeof(*foo), GFP_KERNEL);
-      if (!foo)
-          return -ENOMEM;
-      spin_lock_init(&foo->lock);
-      (...)
-  }
-
-This will create an instance of struct foo in memory every time probe() is
-called. This is our state container for this instance of the device driver.
-Of course it is then necessary to always pass this instance of the
-state around to all functions that need access to the state and its members.
-
-For example, if the driver is registering an interrupt handler, you would
-pass around a pointer to struct foo like this::
-
-  static irqreturn_t foo_handler(int irq, void *arg)
-  {
-      struct foo *foo = arg;
-      (...)
-  }
-
-  static int foo_probe(...)
-  {
-      struct foo *foo;
-
-      (...)
-      ret = request_irq(irq, foo_handler, 0, "foo", foo);
-  }
-
-This way you always get a pointer back to the correct instance of foo in
-your interrupt handler.
-
-
-2. container_of()
-~~~~~~~~~~~~~~~~~
-
-Continuing on the above example we add an offloaded work::
-
-  struct foo {
-      spinlock_t lock;
-      struct workqueue_struct *wq;
-      struct work_struct offload;
-      (...)
-  };
-
-  static void foo_work(struct work_struct *work)
-  {
-      struct foo *foo = container_of(work, struct foo, offload);
-
-      (...)
-  }
-
-  static irqreturn_t foo_handler(int irq, void *arg)
-  {
-      struct foo *foo = arg;
-
-      queue_work(foo->wq, &foo->offload);
-      (...)
-  }
-
-  static int foo_probe(...)
-  {
-      struct foo *foo;
-
-      foo->wq = create_singlethread_workqueue("foo-wq");
-      INIT_WORK(&foo->offload, foo_work);
-      (...)
-  }
-
-The design pattern is the same for an hrtimer or something similar that will
-return a single argument which is a pointer to a struct member in the
-callback.
-
-container_of() is a macro defined in <linux/kernel.h>
-
-What container_of() does is to obtain a pointer to the containing struct from
-a pointer to a member by a simple subtraction using the offsetof() macro from
-standard C, which allows something similar to object oriented behaviours.
-Notice that the contained member must not be a pointer, but an actual member
-for this to work.
-
-We can see here that we avoid having global pointers to our struct foo *
-instance this way, while still keeping the number of parameters passed to the
-work function to a single pointer.
diff --git a/Documentation/driver-model/device.rst b/Documentation/driver-model/device.rst
deleted file mode 100644
index 2b868d49d349..000000000000
--- a/Documentation/driver-model/device.rst
+++ /dev/null
@@ -1,109 +0,0 @@
-==========================
-The Basic Device Structure
-==========================
-
-See the kerneldoc for the struct device.
-
-
-Programming Interface
-~~~~~~~~~~~~~~~~~~~~~
-The bus driver that discovers the device uses this to register the
-device with the core::
-
-  int device_register(struct device * dev);
-
-The bus should initialize the following fields:
-
-    - parent
-    - name
-    - bus_id
-    - bus
-
-A device is removed from the core when its reference count goes to
-0. The reference count can be adjusted using::
-
-  struct device * get_device(struct device * dev);
-  void put_device(struct device * dev);
-
-get_device() will return a pointer to the struct device passed to it
-if the reference is not already 0 (if it's in the process of being
-removed already).
-
-A driver can access the lock in the device structure using::
-
-  void lock_device(struct device * dev);
-  void unlock_device(struct device * dev);
-
-
-Attributes
-~~~~~~~~~~
-
-::
-
-  struct device_attribute {
-	struct attribute	attr;
-	ssize_t (*show)(struct device *dev, struct device_attribute *attr,
-			char *buf);
-	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-			 const char *buf, size_t count);
-  };
-
-Attributes of devices can be exported by a device driver through sysfs.
-
-Please see Documentation/filesystems/sysfs.txt for more information
-on how sysfs works.
-
-As explained in Documentation/kobject.txt, device attributes must be
-created before the KOBJ_ADD uevent is generated. The only way to realize
-that is by defining an attribute group.
-
-Attributes are declared using a macro called DEVICE_ATTR::
-
-  #define DEVICE_ATTR(name,mode,show,store)
-
-Example:::
-
-  static DEVICE_ATTR(type, 0444, show_type, NULL);
-  static DEVICE_ATTR(power, 0644, show_power, store_power);
-
-This declares two structures of type struct device_attribute with respective
-names 'dev_attr_type' and 'dev_attr_power'. These two attributes can be
-organized as follows into a group::
-
-  static struct attribute *dev_attrs[] = {
-	&dev_attr_type.attr,
-	&dev_attr_power.attr,
-	NULL,
-  };
-
-  static struct attribute_group dev_attr_group = {
-	.attrs = dev_attrs,
-  };
-
-  static const struct attribute_group *dev_attr_groups[] = {
-	&dev_attr_group,
-	NULL,
-  };
-
-This array of groups can then be associated with a device by setting the
-group pointer in struct device before device_register() is invoked::
-
-        dev->groups = dev_attr_groups;
-        device_register(dev);
-
-The device_register() function will use the 'groups' pointer to create the
-device attributes and the device_unregister() function will use this pointer
-to remove the device attributes.
-
-Word of warning:  While the kernel allows device_create_file() and
-device_remove_file() to be called on a device at any time, userspace has
-strict expectations on when attributes get created.  When a new device is
-registered in the kernel, a uevent is generated to notify userspace (like
-udev) that a new device is available.  If attributes are added after the
-device is registered, then userspace won't get notified and userspace will
-not know about the new attributes.
-
-This is important for device driver that need to publish additional
-attributes for a device at driver probe time.  If the device driver simply
-calls device_create_file() on the device structure passed to it, then
-userspace will never be notified of the new attributes.
diff --git a/Documentation/driver-model/devres.rst b/Documentation/driver-model/devres.rst
deleted file mode 100644
index 4ac99122b5f1..000000000000
--- a/Documentation/driver-model/devres.rst
+++ /dev/null
@@ -1,414 +0,0 @@
-================================
-Devres - Managed Device Resource
-================================
-
-Tejun Heo	<teheo@suse.de>
-
-First draft	10 January 2007
-
-.. contents
-
-   1. Intro			: Huh? Devres?
-   2. Devres			: Devres in a nutshell
-   3. Devres Group		: Group devres'es and release them together
-   4. Details			: Life time rules, calling context, ...
-   5. Overhead			: How much do we have to pay for this?
-   6. List of managed interfaces: Currently implemented managed interfaces
-
-
-1. Intro
---------
-
-devres came up while trying to convert libata to use iomap.  Each
-iomapped address should be kept and unmapped on driver detach.  For
-example, a plain SFF ATA controller (that is, good old PCI IDE) in
-native mode makes use of 5 PCI BARs and all of them should be
-maintained.
-
-As with many other device drivers, libata low level drivers have
-sufficient bugs in ->remove and ->probe failure path.  Well, yes,
-that's probably because libata low level driver developers are lazy
-bunch, but aren't all low level driver developers?  After spending a
-day fiddling with braindamaged hardware with no document or
-braindamaged document, if it's finally working, well, it's working.
-
-For one reason or another, low level drivers don't receive as much
-attention or testing as core code, and bugs on driver detach or
-initialization failure don't happen often enough to be noticeable.
-Init failure path is worse because it's much less travelled while
-needs to handle multiple entry points.
-
-So, many low level drivers end up leaking resources on driver detach
-and having half broken failure path implementation in ->probe() which
-would leak resources or even cause oops when failure occurs.  iomap
-adds more to this mix.  So do msi and msix.
-
-
-2. Devres
----------
-
-devres is basically linked list of arbitrarily sized memory areas
-associated with a struct device.  Each devres entry is associated with
-a release function.  A devres can be released in several ways.  No
-matter what, all devres entries are released on driver detach.  On
-release, the associated release function is invoked and then the
-devres entry is freed.
-
-Managed interface is created for resources commonly used by device
-drivers using devres.  For example, coherent DMA memory is acquired
-using dma_alloc_coherent().  The managed version is called
-dmam_alloc_coherent().  It is identical to dma_alloc_coherent() except
-for the DMA memory allocated using it is managed and will be
-automatically released on driver detach.  Implementation looks like
-the following::
-
-  struct dma_devres {
-	size_t		size;
-	void		*vaddr;
-	dma_addr_t	dma_handle;
-  };
-
-  static void dmam_coherent_release(struct device *dev, void *res)
-  {
-	struct dma_devres *this = res;
-
-	dma_free_coherent(dev, this->size, this->vaddr, this->dma_handle);
-  }
-
-  dmam_alloc_coherent(dev, size, dma_handle, gfp)
-  {
-	struct dma_devres *dr;
-	void *vaddr;
-
-	dr = devres_alloc(dmam_coherent_release, sizeof(*dr), gfp);
-	...
-
-	/* alloc DMA memory as usual */
-	vaddr = dma_alloc_coherent(...);
-	...
-
-	/* record size, vaddr, dma_handle in dr */
-	dr->vaddr = vaddr;
-	...
-
-	devres_add(dev, dr);
-
-	return vaddr;
-  }
-
-If a driver uses dmam_alloc_coherent(), the area is guaranteed to be
-freed whether initialization fails half-way or the device gets
-detached.  If most resources are acquired using managed interface, a
-driver can have much simpler init and exit code.  Init path basically
-looks like the following::
-
-  my_init_one()
-  {
-	struct mydev *d;
-
-	d = devm_kzalloc(dev, sizeof(*d), GFP_KERNEL);
-	if (!d)
-		return -ENOMEM;
-
-	d->ring = dmam_alloc_coherent(...);
-	if (!d->ring)
-		return -ENOMEM;
-
-	if (check something)
-		return -EINVAL;
-	...
-
-	return register_to_upper_layer(d);
-  }
-
-And exit path::
-
-  my_remove_one()
-  {
-	unregister_from_upper_layer(d);
-	shutdown_my_hardware();
-  }
-
-As shown above, low level drivers can be simplified a lot by using
-devres.  Complexity is shifted from less maintained low level drivers
-to better maintained higher layer.  Also, as init failure path is
-shared with exit path, both can get more testing.
-
-Note though that when converting current calls or assignments to
-managed devm_* versions it is up to you to check if internal operations
-like allocating memory, have failed. Managed resources pertains to the
-freeing of these resources *only* - all other checks needed are still
-on you. In some cases this may mean introducing checks that were not
-necessary before moving to the managed devm_* calls.
-
-
-3. Devres group
----------------
-
-Devres entries can be grouped using devres group.  When a group is
-released, all contained normal devres entries and properly nested
-groups are released.  One usage is to rollback series of acquired
-resources on failure.  For example::
-
-  if (!devres_open_group(dev, NULL, GFP_KERNEL))
-	return -ENOMEM;
-
-  acquire A;
-  if (failed)
-	goto err;
-
-  acquire B;
-  if (failed)
-	goto err;
-  ...
-
-  devres_remove_group(dev, NULL);
-  return 0;
-
- err:
-  devres_release_group(dev, NULL);
-  return err_code;
-
-As resource acquisition failure usually means probe failure, constructs
-like above are usually useful in midlayer driver (e.g. libata core
-layer) where interface function shouldn't have side effect on failure.
-For LLDs, just returning error code suffices in most cases.
-
-Each group is identified by `void *id`.  It can either be explicitly
-specified by @id argument to devres_open_group() or automatically
-created by passing NULL as @id as in the above example.  In both
-cases, devres_open_group() returns the group's id.  The returned id
-can be passed to other devres functions to select the target group.
-If NULL is given to those functions, the latest open group is
-selected.
-
-For example, you can do something like the following::
-
-  int my_midlayer_create_something()
-  {
-	if (!devres_open_group(dev, my_midlayer_create_something, GFP_KERNEL))
-		return -ENOMEM;
-
-	...
-
-	devres_close_group(dev, my_midlayer_create_something);
-	return 0;
-  }
-
-  void my_midlayer_destroy_something()
-  {
-	devres_release_group(dev, my_midlayer_create_something);
-  }
-
-
-4. Details
-----------
-
-Lifetime of a devres entry begins on devres allocation and finishes
-when it is released or destroyed (removed and freed) - no reference
-counting.
-
-devres core guarantees atomicity to all basic devres operations and
-has support for single-instance devres types (atomic
-lookup-and-add-if-not-found).  Other than that, synchronizing
-concurrent accesses to allocated devres data is caller's
-responsibility.  This is usually non-issue because bus ops and
-resource allocations already do the job.
-
-For an example of single-instance devres type, read pcim_iomap_table()
-in lib/devres.c.
-
-All devres interface functions can be called without context if the
-right gfp mask is given.
-
-
-5. Overhead
------------
-
-Each devres bookkeeping info is allocated together with requested data
-area.  With debug option turned off, bookkeeping info occupies 16
-bytes on 32bit machines and 24 bytes on 64bit (three pointers rounded
-up to ull alignment).  If singly linked list is used, it can be
-reduced to two pointers (8 bytes on 32bit, 16 bytes on 64bit).
-
-Each devres group occupies 8 pointers.  It can be reduced to 6 if
-singly linked list is used.
-
-Memory space overhead on ahci controller with two ports is between 300
-and 400 bytes on 32bit machine after naive conversion (we can
-certainly invest a bit more effort into libata core layer).
-
-
-6. List of managed interfaces
------------------------------
-
-CLOCK
-  devm_clk_get()
-  devm_clk_get_optional()
-  devm_clk_put()
-  devm_clk_hw_register()
-  devm_of_clk_add_hw_provider()
-  devm_clk_hw_register_clkdev()
-
-DMA
-  dmaenginem_async_device_register()
-  dmam_alloc_coherent()
-  dmam_alloc_attrs()
-  dmam_free_coherent()
-  dmam_pool_create()
-  dmam_pool_destroy()
-
-DRM
-  devm_drm_dev_init()
-
-GPIO
-  devm_gpiod_get()
-  devm_gpiod_get_index()
-  devm_gpiod_get_index_optional()
-  devm_gpiod_get_optional()
-  devm_gpiod_put()
-  devm_gpiod_unhinge()
-  devm_gpiochip_add_data()
-  devm_gpio_request()
-  devm_gpio_request_one()
-  devm_gpio_free()
-
-I2C
-  devm_i2c_new_dummy_device()
-
-IIO
-  devm_iio_device_alloc()
-  devm_iio_device_free()
-  devm_iio_device_register()
-  devm_iio_device_unregister()
-  devm_iio_kfifo_allocate()
-  devm_iio_kfifo_free()
-  devm_iio_triggered_buffer_setup()
-  devm_iio_triggered_buffer_cleanup()
-  devm_iio_trigger_alloc()
-  devm_iio_trigger_free()
-  devm_iio_trigger_register()
-  devm_iio_trigger_unregister()
-  devm_iio_channel_get()
-  devm_iio_channel_release()
-  devm_iio_channel_get_all()
-  devm_iio_channel_release_all()
-
-INPUT
-  devm_input_allocate_device()
-
-IO region
-  devm_release_mem_region()
-  devm_release_region()
-  devm_release_resource()
-  devm_request_mem_region()
-  devm_request_region()
-  devm_request_resource()
-
-IOMAP
-  devm_ioport_map()
-  devm_ioport_unmap()
-  devm_ioremap()
-  devm_ioremap_nocache()
-  devm_ioremap_wc()
-  devm_ioremap_resource() : checks resource, requests memory region, ioremaps
-  devm_iounmap()
-  pcim_iomap()
-  pcim_iomap_regions()	: do request_region() and iomap() on multiple BARs
-  pcim_iomap_table()	: array of mapped addresses indexed by BAR
-  pcim_iounmap()
-
-IRQ
-  devm_free_irq()
-  devm_request_any_context_irq()
-  devm_request_irq()
-  devm_request_threaded_irq()
-  devm_irq_alloc_descs()
-  devm_irq_alloc_desc()
-  devm_irq_alloc_desc_at()
-  devm_irq_alloc_desc_from()
-  devm_irq_alloc_descs_from()
-  devm_irq_alloc_generic_chip()
-  devm_irq_setup_generic_chip()
-  devm_irq_sim_init()
-
-LED
-  devm_led_classdev_register()
-  devm_led_classdev_unregister()
-
-MDIO
-  devm_mdiobus_alloc()
-  devm_mdiobus_alloc_size()
-  devm_mdiobus_free()
-
-MEM
-  devm_free_pages()
-  devm_get_free_pages()
-  devm_kasprintf()
-  devm_kcalloc()
-  devm_kfree()
-  devm_kmalloc()
-  devm_kmalloc_array()
-  devm_kmemdup()
-  devm_kstrdup()
-  devm_kvasprintf()
-  devm_kzalloc()
-
-MFD
-  devm_mfd_add_devices()
-
-MUX
-  devm_mux_chip_alloc()
-  devm_mux_chip_register()
-  devm_mux_control_get()
-
-PER-CPU MEM
-  devm_alloc_percpu()
-  devm_free_percpu()
-
-PCI
-  devm_pci_alloc_host_bridge()  : managed PCI host bridge allocation
-  devm_pci_remap_cfgspace()	: ioremap PCI configuration space
-  devm_pci_remap_cfg_resource()	: ioremap PCI configuration space resource
-  pcim_enable_device()		: after success, all PCI ops become managed
-  pcim_pin_device()		: keep PCI device enabled after release
-
-PHY
-  devm_usb_get_phy()
-  devm_usb_put_phy()
-
-PINCTRL
-  devm_pinctrl_get()
-  devm_pinctrl_put()
-  devm_pinctrl_register()
-  devm_pinctrl_unregister()
-
-POWER
-  devm_reboot_mode_register()
-  devm_reboot_mode_unregister()
-
-PWM
-  devm_pwm_get()
-  devm_pwm_put()
-
-REGULATOR
-  devm_regulator_bulk_get()
-  devm_regulator_get()
-  devm_regulator_put()
-  devm_regulator_register()
-
-RESET
-  devm_reset_control_get()
-  devm_reset_controller_register()
-
-SERDEV
-  devm_serdev_device_open()
-
-SLAVE DMA ENGINE
-  devm_acpi_dma_controller_register()
-
-SPI
-  devm_spi_register_master()
-
-WATCHDOG
-  devm_watchdog_register_device()
diff --git a/Documentation/driver-model/driver.rst b/Documentation/driver-model/driver.rst
deleted file mode 100644
index 11d281506a04..000000000000
--- a/Documentation/driver-model/driver.rst
+++ /dev/null
@@ -1,223 +0,0 @@
-==============
-Device Drivers
-==============
-
-See the kerneldoc for the struct device_driver.
-
-
-Allocation
-~~~~~~~~~~
-
-Device drivers are statically allocated structures. Though there may
-be multiple devices in a system that a driver supports, struct
-device_driver represents the driver as a whole (not a particular
-device instance).
-
-Initialization
-~~~~~~~~~~~~~~
-
-The driver must initialize at least the name and bus fields. It should
-also initialize the devclass field (when it arrives), so it may obtain
-the proper linkage internally. It should also initialize as many of
-the callbacks as possible, though each is optional.
-
-Declaration
-~~~~~~~~~~~
-
-As stated above, struct device_driver objects are statically
-allocated. Below is an example declaration of the eepro100
-driver. This declaration is hypothetical only; it relies on the driver
-being converted completely to the new model::
-
-  static struct device_driver eepro100_driver = {
-         .name		= "eepro100",
-         .bus		= &pci_bus_type,
-
-         .probe		= eepro100_probe,
-         .remove		= eepro100_remove,
-         .suspend		= eepro100_suspend,
-         .resume		= eepro100_resume,
-  };
-
-Most drivers will not be able to be converted completely to the new
-model because the bus they belong to has a bus-specific structure with
-bus-specific fields that cannot be generalized.
-
-The most common example of this are device ID structures. A driver
-typically defines an array of device IDs that it supports. The format
-of these structures and the semantics for comparing device IDs are
-completely bus-specific. Defining them as bus-specific entities would
-sacrifice type-safety, so we keep bus-specific structures around.
-
-Bus-specific drivers should include a generic struct device_driver in
-the definition of the bus-specific driver. Like this::
-
-  struct pci_driver {
-         const struct pci_device_id *id_table;
-         struct device_driver	  driver;
-  };
-
-A definition that included bus-specific fields would look like
-(using the eepro100 driver again)::
-
-  static struct pci_driver eepro100_driver = {
-         .id_table       = eepro100_pci_tbl,
-         .driver	       = {
-		.name		= "eepro100",
-		.bus		= &pci_bus_type,
-		.probe		= eepro100_probe,
-		.remove		= eepro100_remove,
-		.suspend	= eepro100_suspend,
-		.resume		= eepro100_resume,
-         },
-  };
-
-Some may find the syntax of embedded struct initialization awkward or
-even a bit ugly. So far, it's the best way we've found to do what we want...
-
-Registration
-~~~~~~~~~~~~
-
-::
-
-  int driver_register(struct device_driver *drv);
-
-The driver registers the structure on startup. For drivers that have
-no bus-specific fields (i.e. don't have a bus-specific driver
-structure), they would use driver_register and pass a pointer to their
-struct device_driver object.
-
-Most drivers, however, will have a bus-specific structure and will
-need to register with the bus using something like pci_driver_register.
-
-It is important that drivers register their driver structure as early as
-possible. Registration with the core initializes several fields in the
-struct device_driver object, including the reference count and the
-lock. These fields are assumed to be valid at all times and may be
-used by the device model core or the bus driver.
-
-
-Transition Bus Drivers
-~~~~~~~~~~~~~~~~~~~~~~
-
-By defining wrapper functions, the transition to the new model can be
-made easier. Drivers can ignore the generic structure altogether and
-let the bus wrapper fill in the fields. For the callbacks, the bus can
-define generic callbacks that forward the call to the bus-specific
-callbacks of the drivers.
-
-This solution is intended to be only temporary. In order to get class
-information in the driver, the drivers must be modified anyway. Since
-converting drivers to the new model should reduce some infrastructural
-complexity and code size, it is recommended that they are converted as
-class information is added.
-
-Access
-~~~~~~
-
-Once the object has been registered, it may access the common fields of
-the object, like the lock and the list of devices::
-
-  int driver_for_each_dev(struct device_driver *drv, void *data,
-			  int (*callback)(struct device *dev, void *data));
-
-The devices field is a list of all the devices that have been bound to
-the driver. The LDM core provides a helper function to operate on all
-the devices a driver controls. This helper locks the driver on each
-node access, and does proper reference counting on each device as it
-accesses it.
-
-
-sysfs
-~~~~~
-
-When a driver is registered, a sysfs directory is created in its
-bus's directory. In this directory, the driver can export an interface
-to userspace to control operation of the driver on a global basis;
-e.g. toggling debugging output in the driver.
-
-A future feature of this directory will be a 'devices' directory. This
-directory will contain symlinks to the directories of devices it
-supports.
-
-
-
-Callbacks
-~~~~~~~~~
-
-::
-
-	int	(*probe)	(struct device *dev);
-
-The probe() entry is called in task context, with the bus's rwsem locked
-and the driver partially bound to the device.  Drivers commonly use
-container_of() to convert "dev" to a bus-specific type, both in probe()
-and other routines.  That type often provides device resource data, such
-as pci_dev.resource[] or platform_device.resources, which is used in
-addition to dev->platform_data to initialize the driver.
-
-This callback holds the driver-specific logic to bind the driver to a
-given device.  That includes verifying that the device is present, that
-it's a version the driver can handle, that driver data structures can
-be allocated and initialized, and that any hardware can be initialized.
-Drivers often store a pointer to their state with dev_set_drvdata().
-When the driver has successfully bound itself to that device, then probe()
-returns zero and the driver model code will finish its part of binding
-the driver to that device.
-
-A driver's probe() may return a negative errno value to indicate that
-the driver did not bind to this device, in which case it should have
-released all resources it allocated::
-
-	int 	(*remove)	(struct device *dev);
-
-remove is called to unbind a driver from a device. This may be
-called if a device is physically removed from the system, if the
-driver module is being unloaded, during a reboot sequence, or
-in other cases.
-
-It is up to the driver to determine if the device is present or
-not. It should free any resources allocated specifically for the
-device; i.e. anything in the device's driver_data field.
-
-If the device is still present, it should quiesce the device and place
-it into a supported low-power state::
-
-	int	(*suspend)	(struct device *dev, pm_message_t state);
-
-suspend is called to put the device in a low power state::
-
-	int	(*resume)	(struct device *dev);
-
-Resume is used to bring a device back from a low power state.
-
-
-Attributes
-~~~~~~~~~~
-
-::
-
-  struct driver_attribute {
-          struct attribute        attr;
-          ssize_t (*show)(struct device_driver *driver, char *buf);
-          ssize_t (*store)(struct device_driver *, const char *buf, size_t count);
-  };
-
-Device drivers can export attributes via their sysfs directories.
-Drivers can declare attributes using a DRIVER_ATTR_RW and DRIVER_ATTR_RO
-macro that works identically to the DEVICE_ATTR_RW and DEVICE_ATTR_RO
-macros.
-
-Example::
-
-	DRIVER_ATTR_RW(debug);
-
-This is equivalent to declaring::
-
-	struct driver_attribute driver_attr_debug;
-
-This can then be used to add and remove the attribute from the
-driver's directory using::
-
-  int driver_create_file(struct device_driver *, const struct driver_attribute *);
-  void driver_remove_file(struct device_driver *, const struct driver_attribute *);
diff --git a/Documentation/driver-model/index.rst b/Documentation/driver-model/index.rst
deleted file mode 100644
index 9f85d579ce56..000000000000
--- a/Documentation/driver-model/index.rst
+++ /dev/null
@@ -1,26 +0,0 @@
-:orphan:
-
-============
-Driver Model
-============
-
-.. toctree::
-   :maxdepth: 1
-
-   binding
-   bus
-   class
-   design-patterns
-   device
-   devres
-   driver
-   overview
-   platform
-   porting
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/driver-model/overview.rst b/Documentation/driver-model/overview.rst
deleted file mode 100644
index d4d1e9b40e0c..000000000000
--- a/Documentation/driver-model/overview.rst
+++ /dev/null
@@ -1,124 +0,0 @@
-=============================
-The Linux Kernel Device Model
-=============================
-
-Patrick Mochel	<mochel@digitalimplant.org>
-
-Drafted 26 August 2002
-Updated 31 January 2006
-
-
-Overview
-~~~~~~~~
-
-The Linux Kernel Driver Model is a unification of all the disparate driver
-models that were previously used in the kernel. It is intended to augment the
-bus-specific drivers for bridges and devices by consolidating a set of data
-and operations into globally accessible data structures.
-
-Traditional driver models implemented some sort of tree-like structure
-(sometimes just a list) for the devices they control. There wasn't any
-uniformity across the different bus types.
-
-The current driver model provides a common, uniform data model for describing
-a bus and the devices that can appear under the bus. The unified bus
-model includes a set of common attributes which all busses carry, and a set
-of common callbacks, such as device discovery during bus probing, bus
-shutdown, bus power management, etc.
-
-The common device and bridge interface reflects the goals of the modern
-computer: namely the ability to do seamless device "plug and play", power
-management, and hot plug. In particular, the model dictated by Intel and
-Microsoft (namely ACPI) ensures that almost every device on almost any bus
-on an x86-compatible system can work within this paradigm.  Of course,
-not every bus is able to support all such operations, although most
-buses support most of those operations.
-
-
-Downstream Access
-~~~~~~~~~~~~~~~~~
-
-Common data fields have been moved out of individual bus layers into a common
-data structure. These fields must still be accessed by the bus layers,
-and sometimes by the device-specific drivers.
-
-Other bus layers are encouraged to do what has been done for the PCI layer.
-struct pci_dev now looks like this::
-
-  struct pci_dev {
-	...
-
-	struct device dev;     /* Generic device interface */
-	...
-  };
-
-Note first that the struct device dev within the struct pci_dev is
-statically allocated. This means only one allocation on device discovery.
-
-Note also that that struct device dev is not necessarily defined at the
-front of the pci_dev structure.  This is to make people think about what
-they're doing when switching between the bus driver and the global driver,
-and to discourage meaningless and incorrect casts between the two.
-
-The PCI bus layer freely accesses the fields of struct device. It knows about
-the structure of struct pci_dev, and it should know the structure of struct
-device. Individual PCI device drivers that have been converted to the current
-driver model generally do not and should not touch the fields of struct device,
-unless there is a compelling reason to do so.
-
-The above abstraction prevents unnecessary pain during transitional phases.
-If it were not done this way, then when a field was renamed or removed, every
-downstream driver would break.  On the other hand, if only the bus layer
-(and not the device layer) accesses the struct device, it is only the bus
-layer that needs to change.
-
-
-User Interface
-~~~~~~~~~~~~~~
-
-By virtue of having a complete hierarchical view of all the devices in the
-system, exporting a complete hierarchical view to userspace becomes relatively
-easy. This has been accomplished by implementing a special purpose virtual
-file system named sysfs.
-
-Almost all mainstream Linux distros mount this filesystem automatically; you
-can see some variation of the following in the output of the "mount" command::
-
-  $ mount
-  ...
-  none on /sys type sysfs (rw,noexec,nosuid,nodev)
-  ...
-  $
-
-The auto-mounting of sysfs is typically accomplished by an entry similar to
-the following in the /etc/fstab file::
-
-  none     	/sys	sysfs    defaults	  	0 0
-
-or something similar in the /lib/init/fstab file on Debian-based systems::
-
-  none            /sys    sysfs    nodev,noexec,nosuid    0 0
-
-If sysfs is not automatically mounted, you can always do it manually with::
-
-	# mount -t sysfs sysfs /sys
-
-Whenever a device is inserted into the tree, a directory is created for it.
-This directory may be populated at each layer of discovery - the global layer,
-the bus layer, or the device layer.
-
-The global layer currently creates two files - 'name' and 'power'. The
-former only reports the name of the device. The latter reports the
-current power state of the device. It will also be used to set the current
-power state.
-
-The bus layer may also create files for the devices it finds while probing the
-bus. For example, the PCI layer currently creates 'irq' and 'resource' files
-for each PCI device.
-
-A device-specific driver may also export files in its directory to expose
-device-specific data or tunable interfaces.
-
-More information about the sysfs directory layout can be found in
-the other documents in this directory and in the file
-Documentation/filesystems/sysfs.txt.
diff --git a/Documentation/driver-model/platform.rst b/Documentation/driver-model/platform.rst
deleted file mode 100644
index 334dd4071ae4..000000000000
--- a/Documentation/driver-model/platform.rst
+++ /dev/null
@@ -1,246 +0,0 @@
-============================
-Platform Devices and Drivers
-============================
-
-See <linux/platform_device.h> for the driver model interface to the
-platform bus:  platform_device, and platform_driver.  This pseudo-bus
-is used to connect devices on busses with minimal infrastructure,
-like those used to integrate peripherals on many system-on-chip
-processors, or some "legacy" PC interconnects; as opposed to large
-formally specified ones like PCI or USB.
-
-
-Platform devices
-~~~~~~~~~~~~~~~~
-Platform devices are devices that typically appear as autonomous
-entities in the system. This includes legacy port-based devices and
-host bridges to peripheral buses, and most controllers integrated
-into system-on-chip platforms.  What they usually have in common
-is direct addressing from a CPU bus.  Rarely, a platform_device will
-be connected through a segment of some other kind of bus; but its
-registers will still be directly addressable.
-
-Platform devices are given a name, used in driver binding, and a
-list of resources such as addresses and IRQs::
-
-  struct platform_device {
-	const char	*name;
-	u32		id;
-	struct device	dev;
-	u32		num_resources;
-	struct resource	*resource;
-  };
-
-
-Platform drivers
-~~~~~~~~~~~~~~~~
-Platform drivers follow the standard driver model convention, where
-discovery/enumeration is handled outside the drivers, and drivers
-provide probe() and remove() methods.  They support power management
-and shutdown notifications using the standard conventions::
-
-  struct platform_driver {
-	int (*probe)(struct platform_device *);
-	int (*remove)(struct platform_device *);
-	void (*shutdown)(struct platform_device *);
-	int (*suspend)(struct platform_device *, pm_message_t state);
-	int (*suspend_late)(struct platform_device *, pm_message_t state);
-	int (*resume_early)(struct platform_device *);
-	int (*resume)(struct platform_device *);
-	struct device_driver driver;
-  };
-
-Note that probe() should in general verify that the specified device hardware
-actually exists; sometimes platform setup code can't be sure.  The probing
-can use device resources, including clocks, and device platform_data.
-
-Platform drivers register themselves the normal way::
-
-	int platform_driver_register(struct platform_driver *drv);
-
-Or, in common situations where the device is known not to be hot-pluggable,
-the probe() routine can live in an init section to reduce the driver's
-runtime memory footprint::
-
-	int platform_driver_probe(struct platform_driver *drv,
-			  int (*probe)(struct platform_device *))
-
-Kernel modules can be composed of several platform drivers. The platform core
-provides helpers to register and unregister an array of drivers::
-
-	int __platform_register_drivers(struct platform_driver * const *drivers,
-				      unsigned int count, struct module *owner);
-	void platform_unregister_drivers(struct platform_driver * const *drivers,
-					 unsigned int count);
-
-If one of the drivers fails to register, all drivers registered up to that
-point will be unregistered in reverse order. Note that there is a convenience
-macro that passes THIS_MODULE as owner parameter::
-
-	#define platform_register_drivers(drivers, count)
-
-
-Device Enumeration
-~~~~~~~~~~~~~~~~~~
-As a rule, platform specific (and often board-specific) setup code will
-register platform devices::
-
-	int platform_device_register(struct platform_device *pdev);
-
-	int platform_add_devices(struct platform_device **pdevs, int ndev);
-
-The general rule is to register only those devices that actually exist,
-but in some cases extra devices might be registered.  For example, a kernel
-might be configured to work with an external network adapter that might not
-be populated on all boards, or likewise to work with an integrated controller
-that some boards might not hook up to any peripherals.
-
-In some cases, boot firmware will export tables describing the devices
-that are populated on a given board.   Without such tables, often the
-only way for system setup code to set up the correct devices is to build
-a kernel for a specific target board.  Such board-specific kernels are
-common with embedded and custom systems development.
-
-In many cases, the memory and IRQ resources associated with the platform
-device are not enough to let the device's driver work.  Board setup code
-will often provide additional information using the device's platform_data
-field to hold additional information.
-
-Embedded systems frequently need one or more clocks for platform devices,
-which are normally kept off until they're actively needed (to save power).
-System setup also associates those clocks with the device, so that that
-calls to clk_get(&pdev->dev, clock_name) return them as needed.
-
-
-Legacy Drivers:  Device Probing
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Some drivers are not fully converted to the driver model, because they take
-on a non-driver role:  the driver registers its platform device, rather than
-leaving that for system infrastructure.  Such drivers can't be hotplugged
-or coldplugged, since those mechanisms require device creation to be in a
-different system component than the driver.
-
-The only "good" reason for this is to handle older system designs which, like
-original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware
-configuration.  Newer systems have largely abandoned that model, in favor of
-bus-level support for dynamic configuration (PCI, USB), or device tables
-provided by the boot firmware (e.g. PNPACPI on x86).  There are too many
-conflicting options about what might be where, and even educated guesses by
-an operating system will be wrong often enough to make trouble.
-
-This style of driver is discouraged.  If you're updating such a driver,
-please try to move the device enumeration to a more appropriate location,
-outside the driver.  This will usually be cleanup, since such drivers
-tend to already have "normal" modes, such as ones using device nodes that
-were created by PNP or by platform device setup.
-
-None the less, there are some APIs to support such legacy drivers.  Avoid
-using these calls except with such hotplug-deficient drivers::
-
-	struct platform_device *platform_device_alloc(
-			const char *name, int id);
-
-You can use platform_device_alloc() to dynamically allocate a device, which
-you will then initialize with resources and platform_device_register().
-A better solution is usually::
-
-	struct platform_device *platform_device_register_simple(
-			const char *name, int id,
-			struct resource *res, unsigned int nres);
-
-You can use platform_device_register_simple() as a one-step call to allocate
-and register a device.
-
-
-Device Naming and Driver Binding
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The platform_device.dev.bus_id is the canonical name for the devices.
-It's built from two components:
-
-    * platform_device.name ... which is also used to for driver matching.
-
-    * platform_device.id ... the device instance number, or else "-1"
-      to indicate there's only one.
-
-These are concatenated, so name/id "serial"/0 indicates bus_id "serial.0", and
-"serial/3" indicates bus_id "serial.3"; both would use the platform_driver
-named "serial".  While "my_rtc"/-1 would be bus_id "my_rtc" (no instance id)
-and use the platform_driver called "my_rtc".
-
-Driver binding is performed automatically by the driver core, invoking
-driver probe() after finding a match between device and driver.  If the
-probe() succeeds, the driver and device are bound as usual.  There are
-three different ways to find such a match:
-
-    - Whenever a device is registered, the drivers for that bus are
-      checked for matches.  Platform devices should be registered very
-      early during system boot.
-
-    - When a driver is registered using platform_driver_register(), all
-      unbound devices on that bus are checked for matches.  Drivers
-      usually register later during booting, or by module loading.
-
-    - Registering a driver using platform_driver_probe() works just like
-      using platform_driver_register(), except that the driver won't
-      be probed later if another device registers.  (Which is OK, since
-      this interface is only for use with non-hotpluggable devices.)
-
-
-Early Platform Devices and Drivers
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The early platform interfaces provide platform data to platform device
-drivers early on during the system boot. The code is built on top of the
-early_param() command line parsing and can be executed very early on.
-
-Example: "earlyprintk" class early serial console in 6 steps
-
-1. Registering early platform device data
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The architecture code registers platform device data using the function
-early_platform_add_devices(). In the case of early serial console this
-should be hardware configuration for the serial port. Devices registered
-at this point will later on be matched against early platform drivers.
-
-2. Parsing kernel command line
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The architecture code calls parse_early_param() to parse the kernel
-command line. This will execute all matching early_param() callbacks.
-User specified early platform devices will be registered at this point.
-For the early serial console case the user can specify port on the
-kernel command line as "earlyprintk=serial.0" where "earlyprintk" is
-the class string, "serial" is the name of the platform driver and
-0 is the platform device id. If the id is -1 then the dot and the
-id can be omitted.
-
-3. Installing early platform drivers belonging to a certain class
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The architecture code may optionally force registration of all early
-platform drivers belonging to a certain class using the function
-early_platform_driver_register_all(). User specified devices from
-step 2 have priority over these. This step is omitted by the serial
-driver example since the early serial driver code should be disabled
-unless the user has specified port on the kernel command line.
-
-4. Early platform driver registration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Compiled-in platform drivers making use of early_platform_init() are
-automatically registered during step 2 or 3. The serial driver example
-should use early_platform_init("earlyprintk", &platform_driver).
-
-5. Probing of early platform drivers belonging to a certain class
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The architecture code calls early_platform_driver_probe() to match
-registered early platform devices associated with a certain class with
-registered early platform drivers. Matched devices will get probed().
-This step can be executed at any point during the early boot. As soon
-as possible may be good for the serial port case.
-
-6. Inside the early platform driver probe()
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The driver code needs to take special care during early boot, especially
-when it comes to memory allocation and interrupt registration. The code
-in the probe() function can use is_early_platform_device() to check if
-it is called at early platform device or at the regular platform device
-time. The early serial driver performs register_console() at this point.
-
-For further information, see <linux/platform_device.h>.
diff --git a/Documentation/driver-model/porting.rst b/Documentation/driver-model/porting.rst
deleted file mode 100644
index ae4bf843c1d6..000000000000
--- a/Documentation/driver-model/porting.rst
+++ /dev/null
@@ -1,448 +0,0 @@
-=======================================
-Porting Drivers to the New Driver Model
-=======================================
-
-Patrick Mochel
-
-7 January 2003
-
-
-Overview
-
-Please refer to `Documentation/driver-model/*.rst` for definitions of
-various driver types and concepts.
-
-Most of the work of porting devices drivers to the new model happens
-at the bus driver layer. This was intentional, to minimize the
-negative effect on kernel drivers, and to allow a gradual transition
-of bus drivers.
-
-In a nutshell, the driver model consists of a set of objects that can
-be embedded in larger, bus-specific objects. Fields in these generic
-objects can replace fields in the bus-specific objects.
-
-The generic objects must be registered with the driver model core. By
-doing so, they will exported via the sysfs filesystem. sysfs can be
-mounted by doing::
-
-	# mount -t sysfs sysfs /sys
-
-
-
-The Process
-
-Step 0: Read include/linux/device.h for object and function definitions.
-
-Step 1: Registering the bus driver.
-
-
-- Define a struct bus_type for the bus driver::
-
-    struct bus_type pci_bus_type = {
-          .name           = "pci",
-    };
-
-
-- Register the bus type.
-
-  This should be done in the initialization function for the bus type,
-  which is usually the module_init(), or equivalent, function::
-
-    static int __init pci_driver_init(void)
-    {
-            return bus_register(&pci_bus_type);
-    }
-
-    subsys_initcall(pci_driver_init);
-
-
-  The bus type may be unregistered (if the bus driver may be compiled
-  as a module) by doing::
-
-     bus_unregister(&pci_bus_type);
-
-
-- Export the bus type for others to use.
-
-  Other code may wish to reference the bus type, so declare it in a
-  shared header file and export the symbol.
-
-From include/linux/pci.h::
-
-  extern struct bus_type pci_bus_type;
-
-
-From file the above code appears in::
-
-  EXPORT_SYMBOL(pci_bus_type);
-
-
-
-- This will cause the bus to show up in /sys/bus/pci/ with two
-  subdirectories: 'devices' and 'drivers'::
-
-    # tree -d /sys/bus/pci/
-    /sys/bus/pci/
-    |-- devices
-    `-- drivers
-
-
-
-Step 2: Registering Devices.
-
-struct device represents a single device. It mainly contains metadata
-describing the relationship the device has to other entities.
-
-
-- Embed a struct device in the bus-specific device type::
-
-
-    struct pci_dev {
-           ...
-           struct  device  dev;            /* Generic device interface */
-           ...
-    };
-
-  It is recommended that the generic device not be the first item in
-  the struct to discourage programmers from doing mindless casts
-  between the object types. Instead macros, or inline functions,
-  should be created to convert from the generic object type::
-
-
-    #define to_pci_dev(n) container_of(n, struct pci_dev, dev)
-
-    or
-
-    static inline struct pci_dev * to_pci_dev(struct kobject * kobj)
-    {
-	return container_of(n, struct pci_dev, dev);
-    }
-
-  This allows the compiler to verify type-safety of the operations
-  that are performed (which is Good).
-
-
-- Initialize the device on registration.
-
-  When devices are discovered or registered with the bus type, the
-  bus driver should initialize the generic device. The most important
-  things to initialize are the bus_id, parent, and bus fields.
-
-  The bus_id is an ASCII string that contains the device's address on
-  the bus. The format of this string is bus-specific. This is
-  necessary for representing devices in sysfs.
-
-  parent is the physical parent of the device. It is important that
-  the bus driver sets this field correctly.
-
-  The driver model maintains an ordered list of devices that it uses
-  for power management. This list must be in order to guarantee that
-  devices are shutdown before their physical parents, and vice versa.
-  The order of this list is determined by the parent of registered
-  devices.
-
-  Also, the location of the device's sysfs directory depends on a
-  device's parent. sysfs exports a directory structure that mirrors
-  the device hierarchy. Accurately setting the parent guarantees that
-  sysfs will accurately represent the hierarchy.
-
-  The device's bus field is a pointer to the bus type the device
-  belongs to. This should be set to the bus_type that was declared
-  and initialized before.
-
-  Optionally, the bus driver may set the device's name and release
-  fields.
-
-  The name field is an ASCII string describing the device, like
-
-     "ATI Technologies Inc Radeon QD"
-
-  The release field is a callback that the driver model core calls
-  when the device has been removed, and all references to it have
-  been released. More on this in a moment.
-
-
-- Register the device.
-
-  Once the generic device has been initialized, it can be registered
-  with the driver model core by doing::
-
-       device_register(&dev->dev);
-
-  It can later be unregistered by doing::
-
-       device_unregister(&dev->dev);
-
-  This should happen on buses that support hotpluggable devices.
-  If a bus driver unregisters a device, it should not immediately free
-  it. It should instead wait for the driver model core to call the
-  device's release method, then free the bus-specific object.
-  (There may be other code that is currently referencing the device
-  structure, and it would be rude to free the device while that is
-  happening).
-
-
-  When the device is registered, a directory in sysfs is created.
-  The PCI tree in sysfs looks like::
-
-    /sys/devices/pci0/
-    |-- 00:00.0
-    |-- 00:01.0
-    |   `-- 01:00.0
-    |-- 00:02.0
-    |   `-- 02:1f.0
-    |       `-- 03:00.0
-    |-- 00:1e.0
-    |   `-- 04:04.0
-    |-- 00:1f.0
-    |-- 00:1f.1
-    |   |-- ide0
-    |   |   |-- 0.0
-    |   |   `-- 0.1
-    |   `-- ide1
-    |       `-- 1.0
-    |-- 00:1f.2
-    |-- 00:1f.3
-    `-- 00:1f.5
-
-  Also, symlinks are created in the bus's 'devices' directory
-  that point to the device's directory in the physical hierarchy::
-
-    /sys/bus/pci/devices/
-    |-- 00:00.0 -> ../../../devices/pci0/00:00.0
-    |-- 00:01.0 -> ../../../devices/pci0/00:01.0
-    |-- 00:02.0 -> ../../../devices/pci0/00:02.0
-    |-- 00:1e.0 -> ../../../devices/pci0/00:1e.0
-    |-- 00:1f.0 -> ../../../devices/pci0/00:1f.0
-    |-- 00:1f.1 -> ../../../devices/pci0/00:1f.1
-    |-- 00:1f.2 -> ../../../devices/pci0/00:1f.2
-    |-- 00:1f.3 -> ../../../devices/pci0/00:1f.3
-    |-- 00:1f.5 -> ../../../devices/pci0/00:1f.5
-    |-- 01:00.0 -> ../../../devices/pci0/00:01.0/01:00.0
-    |-- 02:1f.0 -> ../../../devices/pci0/00:02.0/02:1f.0
-    |-- 03:00.0 -> ../../../devices/pci0/00:02.0/02:1f.0/03:00.0
-    `-- 04:04.0 -> ../../../devices/pci0/00:1e.0/04:04.0
-
-
-
-Step 3: Registering Drivers.
-
-struct device_driver is a simple driver structure that contains a set
-of operations that the driver model core may call.
-
-
-- Embed a struct device_driver in the bus-specific driver.
-
-  Just like with devices, do something like::
-
-    struct pci_driver {
-           ...
-           struct device_driver    driver;
-    };
-
-
-- Initialize the generic driver structure.
-
-  When the driver registers with the bus (e.g. doing pci_register_driver()),
-  initialize the necessary fields of the driver: the name and bus
-  fields.
-
-
-- Register the driver.
-
-  After the generic driver has been initialized, call::
-
-	driver_register(&drv->driver);
-
-  to register the driver with the core.
-
-  When the driver is unregistered from the bus, unregister it from the
-  core by doing::
-
-        driver_unregister(&drv->driver);
-
-  Note that this will block until all references to the driver have
-  gone away. Normally, there will not be any.
-
-
-- Sysfs representation.
-
-  Drivers are exported via sysfs in their bus's 'driver's directory.
-  For example::
-
-    /sys/bus/pci/drivers/
-    |-- 3c59x
-    |-- Ensoniq AudioPCI
-    |-- agpgart-amdk7
-    |-- e100
-    `-- serial
-
-
-Step 4: Define Generic Methods for Drivers.
-
-struct device_driver defines a set of operations that the driver model
-core calls. Most of these operations are probably similar to
-operations the bus already defines for drivers, but taking different
-parameters.
-
-It would be difficult and tedious to force every driver on a bus to
-simultaneously convert their drivers to generic format. Instead, the
-bus driver should define single instances of the generic methods that
-forward call to the bus-specific drivers. For instance::
-
-
-  static int pci_device_remove(struct device * dev)
-  {
-          struct pci_dev * pci_dev = to_pci_dev(dev);
-          struct pci_driver * drv = pci_dev->driver;
-
-          if (drv) {
-                  if (drv->remove)
-                          drv->remove(pci_dev);
-                  pci_dev->driver = NULL;
-          }
-          return 0;
-  }
-
-
-The generic driver should be initialized with these methods before it
-is registered::
-
-        /* initialize common driver fields */
-        drv->driver.name = drv->name;
-        drv->driver.bus = &pci_bus_type;
-        drv->driver.probe = pci_device_probe;
-        drv->driver.resume = pci_device_resume;
-        drv->driver.suspend = pci_device_suspend;
-        drv->driver.remove = pci_device_remove;
-
-        /* register with core */
-        driver_register(&drv->driver);
-
-
-Ideally, the bus should only initialize the fields if they are not
-already set. This allows the drivers to implement their own generic
-methods.
-
-
-Step 5: Support generic driver binding.
-
-The model assumes that a device or driver can be dynamically
-registered with the bus at any time. When registration happens,
-devices must be bound to a driver, or drivers must be bound to all
-devices that it supports.
-
-A driver typically contains a list of device IDs that it supports. The
-bus driver compares these IDs to the IDs of devices registered with it.
-The format of the device IDs, and the semantics for comparing them are
-bus-specific, so the generic model does attempt to generalize them.
-
-Instead, a bus may supply a method in struct bus_type that does the
-comparison::
-
-  int (*match)(struct device * dev, struct device_driver * drv);
-
-match should return positive value if the driver supports the device,
-and zero otherwise. It may also return error code (for example
--EPROBE_DEFER) if determining that given driver supports the device is
-not possible.
-
-When a device is registered, the bus's list of drivers is iterated
-over. bus->match() is called for each one until a match is found.
-
-When a driver is registered, the bus's list of devices is iterated
-over. bus->match() is called for each device that is not already
-claimed by a driver.
-
-When a device is successfully bound to a driver, device->driver is
-set, the device is added to a per-driver list of devices, and a
-symlink is created in the driver's sysfs directory that points to the
-device's physical directory::
-
-  /sys/bus/pci/drivers/
-  |-- 3c59x
-  |   `-- 00:0b.0 -> ../../../../devices/pci0/00:0b.0
-  |-- Ensoniq AudioPCI
-  |-- agpgart-amdk7
-  |   `-- 00:00.0 -> ../../../../devices/pci0/00:00.0
-  |-- e100
-  |   `-- 00:0c.0 -> ../../../../devices/pci0/00:0c.0
-  `-- serial
-
-
-This driver binding should replace the existing driver binding
-mechanism the bus currently uses.
-
-
-Step 6: Supply a hotplug callback.
-
-Whenever a device is registered with the driver model core, the
-userspace program /sbin/hotplug is called to notify userspace.
-Users can define actions to perform when a device is inserted or
-removed.
-
-The driver model core passes several arguments to userspace via
-environment variables, including
-
-- ACTION: set to 'add' or 'remove'
-- DEVPATH: set to the device's physical path in sysfs.
-
-A bus driver may also supply additional parameters for userspace to
-consume. To do this, a bus must implement the 'hotplug' method in
-struct bus_type::
-
-     int (*hotplug) (struct device *dev, char **envp,
-                     int num_envp, char *buffer, int buffer_size);
-
-This is called immediately before /sbin/hotplug is executed.
-
-
-Step 7: Cleaning up the bus driver.
-
-The generic bus, device, and driver structures provide several fields
-that can replace those defined privately to the bus driver.
-
-- Device list.
-
-struct bus_type contains a list of all devices registered with the bus
-type. This includes all devices on all instances of that bus type.
-An internal list that the bus uses may be removed, in favor of using
-this one.
-
-The core provides an iterator to access these devices::
-
-  int bus_for_each_dev(struct bus_type * bus, struct device * start,
-                       void * data, int (*fn)(struct device *, void *));
-
-
-- Driver list.
-
-struct bus_type also contains a list of all drivers registered with
-it. An internal list of drivers that the bus driver maintains may
-be removed in favor of using the generic one.
-
-The drivers may be iterated over, like devices::
-
-  int bus_for_each_drv(struct bus_type * bus, struct device_driver * start,
-                       void * data, int (*fn)(struct device_driver *, void *));
-
-
-Please see drivers/base/bus.c for more information.
-
-
-- rwsem
-
-struct bus_type contains an rwsem that protects all core accesses to
-the device and driver lists. This can be used by the bus driver
-internally, and should be used when accessing the device or driver
-lists the bus maintains.
-
-
-- Device and driver fields.
-
-Some of the fields in struct device and struct device_driver duplicate
-fields in the bus-specific representations of these objects. Feel free
-to remove the bus-specific ones and favor the generic ones. Note
-though, that this will likely mean fixing up all the drivers that
-reference the bus-specific fields (though those should all be 1-line
-changes).
diff --git a/Documentation/eisa.txt b/Documentation/eisa.txt
index f388545a85a7..c07565ba57da 100644
--- a/Documentation/eisa.txt
+++ b/Documentation/eisa.txt
@@ -103,7 +103,7 @@ id_table	an array of NULL terminated EISA id strings,
 		(driver_data).
 
 driver		a generic driver, such as described in
-		Documentation/driver-model/driver.rst. Only .name,
+		Documentation/driver-api/driver-model/driver.rst. Only .name,
 		.probe and .remove members are mandatory.
 =============== ====================================================
 
@@ -152,7 +152,7 @@ state    set of flags indicating the state of the device. Current
 	 flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED.
 res	 set of four 256 bytes I/O regions allocated to this device
 dma_mask DMA mask set from the parent device.
-dev	 generic device (see Documentation/driver-model/device.rst)
+dev	 generic device (see Documentation/driver-api/driver-model/device.rst)
 ======== ============================================================
 
 You can get the 'struct eisa_device' from 'struct device' using the
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index 5b5311f9358d..ddf15b1b0d5a 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -319,7 +319,7 @@ quick way to lookup the sysfs interface for a device from the result of
 a stat(2) operation.
 
 More information can driver-model specific features can be found in
-Documentation/driver-model/. 
+Documentation/driver-api/driver-model/.
 
 
 TODO: Finish this section.
diff --git a/Documentation/hwmon/submitting-patches.rst b/Documentation/hwmon/submitting-patches.rst
index d5b05d3e54ba..452fc28d8e0b 100644
--- a/Documentation/hwmon/submitting-patches.rst
+++ b/Documentation/hwmon/submitting-patches.rst
@@ -89,7 +89,7 @@ increase the chances of your change being accepted.
   console. Excessive logging can seriously affect system performance.
 
 * Use devres functions whenever possible to allocate resources. For rationale
-  and supported functions, please see Documentation/driver-model/devres.rst.
+  and supported functions, please see Documentation/driver-api/driver-model/devres.rst.
   If a function is not supported by devres, consider using devm_add_action().
 
 * If the driver has a detect function, make sure it is silent. Debug messages
diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt b/Documentation/translations/zh_CN/filesystems/sysfs.txt
index 452271dda141..ee1f37da5b23 100644
--- a/Documentation/translations/zh_CN/filesystems/sysfs.txt
+++ b/Documentation/translations/zh_CN/filesystems/sysfs.txt
@@ -288,7 +288,7 @@ dev/ 包含两个子目录： char/ 和 block/。在这两个子目录中，有
 中相应的设备。/sys/dev 提供一个通过一个 stat(2) 操作结果，查找
 设备 sysfs 接口快捷的方法。
 
-更多有关 driver-model 的特性信息可以在 Documentation/driver-model/
+更多有关 driver-model 的特性信息可以在 Documentation/driver-api/driver-model/
 中找到。
 
 
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 713903290385..506a0175a5a7 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -5,7 +5,7 @@
  * Copyright (c) 2002-3 Patrick Mochel
  * Copyright (c) 2002-3 Open Source Development Labs
  *
- * Please see Documentation/driver-model/platform.rst for more
+ * Please see Documentation/driver-api/driver-model/platform.rst for more
  * information.
  */
 
diff --git a/drivers/gpio/gpio-cs5535.c b/drivers/gpio/gpio-cs5535.c
index 3611a0571667..53b24e3ae7de 100644
--- a/drivers/gpio/gpio-cs5535.c
+++ b/drivers/gpio/gpio-cs5535.c
@@ -41,7 +41,7 @@ MODULE_PARM_DESC(mask, "GPIO channel mask.");
 
 /*
  * FIXME: convert this singleton driver to use the state container
- * design pattern, see Documentation/driver-model/design-patterns.rst
+ * design pattern, see Documentation/driver-api/driver-model/design-patterns.rst
  */
 static struct cs5535_gpio_chip {
 	struct gpio_chip chip;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 41c90f2ddb31..63db08d9bafa 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2286,7 +2286,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	struct ice_hw *hw;
 	int err;
 
-	/* this driver uses devres, see Documentation/driver-model/devres.rst */
+	/* this driver uses devres, see Documentation/driver-api/driver-model/devres.rst */
 	err = pcim_enable_device(pdev);
 	if (err)
 		return err;
diff --git a/drivers/staging/unisys/Documentation/overview.txt b/drivers/staging/unisys/Documentation/overview.txt
index 9ab30af265a5..f8a4144b239c 100644
--- a/drivers/staging/unisys/Documentation/overview.txt
+++ b/drivers/staging/unisys/Documentation/overview.txt
@@ -15,7 +15,7 @@ normally be unsharable, specifically:
 * visorinput - keyboard and mouse
 
 These drivers conform to the standard Linux bus/device model described
-within Documentation/driver-model/, and utilize a driver named visorbus to
+within Documentation/driver-api/driver-model/, and utilize a driver named visorbus to
 present the virtual busses involved. Drivers in the 'visor*' driver set are
 commonly referred to as "guest drivers" or "client drivers".  All drivers
 except visorbus expose a device of a specific usable class to the Linux guest
@@ -141,7 +141,7 @@ called automatically by the visorbus driver at appropriate times:
 -----------------------------------
 
 Because visorbus is a standard Linux bus driver in the model described in
-Documentation/driver-model/, the hierarchy of s-Par virtual devices is
+Documentation/driver-api/driver-model/, the hierarchy of s-Par virtual devices is
 published in the sysfs tree beneath /bus/visorbus/, e.g.,
 /sys/bus/visorbus/devices/ might look like:
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 5eabfa0c4dee..c330b75c6c57 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -6,7 +6,7 @@
  * Copyright (c) 2004-2009 Greg Kroah-Hartman <gregkh@suse.de>
  * Copyright (c) 2008-2009 Novell Inc.
  *
- * See Documentation/driver-model/ for more information.
+ * See Documentation/driver-api/driver-model/ for more information.
  */
 
 #ifndef _DEVICE_H_
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index beb25f277889..9bc36b589827 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -4,7 +4,7 @@
  *
  * Copyright (c) 2001-2003 Patrick Mochel <mochel@osdl.org>
  *
- * See Documentation/driver-model/ for more information.
+ * See Documentation/driver-api/driver-model/ for more information.
  */
 
 #ifndef _PLATFORM_DEVICE_H_
diff --git a/scripts/coccinelle/free/devm_free.cocci b/scripts/coccinelle/free/devm_free.cocci
index fefd0331a2de..441799b5359b 100644
--- a/scripts/coccinelle/free/devm_free.cocci
+++ b/scripts/coccinelle/free/devm_free.cocci
@@ -3,7 +3,7 @@
 /// functions.  Values allocated using the devm_functions are freed when
 /// the device is detached, and thus the use of the standard freeing
 /// function would cause a double free.
-/// See Documentation/driver-model/devres.rst for more information.
+/// See Documentation/driver-api/driver-model/devres.rst for more information.
 ///
 /// A difficulty of detecting this problem is that the standard freeing
 /// function might be called from a different function than the one
-- 
cgit v1.2.3-55-g7522


From df1b7ce784c220373d202ea9f8bc0c424f2c9f7c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 17:16:23 -0300
Subject: docs: add some documentation dirs to the driver-api book

Those are subsystem docs, with a mix of kABI and user-faced
docs. While they're not split, keep the dirs where they are,
adding just a pointer to the main index.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/accounting/index.rst | 2 +-
 Documentation/block/index.rst      | 2 +-
 Documentation/hid/index.rst        | 2 +-
 Documentation/iio/index.rst        | 2 +-
 Documentation/index.rst            | 4 ++++
 5 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/accounting/index.rst b/Documentation/accounting/index.rst
index e1f6284b5ff3..9369d8bf32be 100644
--- a/Documentation/accounting/index.rst
+++ b/Documentation/accounting/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ==========
 Accounting
diff --git a/Documentation/block/index.rst b/Documentation/block/index.rst
index 8cd226a0e86e..3fa7a52fafa4 100644
--- a/Documentation/block/index.rst
+++ b/Documentation/block/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 =====
 Block
diff --git a/Documentation/hid/index.rst b/Documentation/hid/index.rst
index af4324902622..737d66dc16a1 100644
--- a/Documentation/hid/index.rst
+++ b/Documentation/hid/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 =============================
 Human Interface Devices (HID)
diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst
index 0593dca89a94..58b7a4ebac51 100644
--- a/Documentation/iio/index.rst
+++ b/Documentation/iio/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ==============
 Industrial I/O
diff --git a/Documentation/index.rst b/Documentation/index.rst
index a322c8721d13..dcdaaff71633 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -92,6 +92,10 @@ needed).
 
    driver-api/index
    core-api/index
+   accounting/index
+   block/index
+   hid/index
+   iio/index
    leds/index
    media/index
    networking/index
-- 
cgit v1.2.3-55-g7522


From 83bbf6e103544d65f17f4b2ccea1c6a51c0b0769 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 27 Jun 2019 12:59:40 -0300
Subject: docs: aoe: add it to the driver-api book

Those files belong to the admin guide, so add them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Justin Sanders <justin@coraid.com>
---
 Documentation/admin-guide/aoe/aoe.rst         | 150 ++++++++++++++++++++++++++
 Documentation/admin-guide/aoe/autoload.sh     |  17 +++
 Documentation/admin-guide/aoe/examples.rst    |  23 ++++
 Documentation/admin-guide/aoe/index.rst       |  17 +++
 Documentation/admin-guide/aoe/status.sh       |  30 ++++++
 Documentation/admin-guide/aoe/todo.rst        |  17 +++
 Documentation/admin-guide/aoe/udev-install.sh |  33 ++++++
 Documentation/admin-guide/aoe/udev.txt        |  26 +++++
 Documentation/admin-guide/index.rst           |   1 +
 Documentation/aoe/aoe.rst                     | 150 --------------------------
 Documentation/aoe/autoload.sh                 |  17 ---
 Documentation/aoe/examples.rst                |  23 ----
 Documentation/aoe/index.rst                   |  19 ----
 Documentation/aoe/status.sh                   |  30 ------
 Documentation/aoe/todo.rst                    |  17 ---
 Documentation/aoe/udev-install.sh             |  33 ------
 Documentation/aoe/udev.txt                    |  26 -----
 MAINTAINERS                                   |   2 +-
 18 files changed, 315 insertions(+), 316 deletions(-)
 create mode 100644 Documentation/admin-guide/aoe/aoe.rst
 create mode 100644 Documentation/admin-guide/aoe/autoload.sh
 create mode 100644 Documentation/admin-guide/aoe/examples.rst
 create mode 100644 Documentation/admin-guide/aoe/index.rst
 create mode 100644 Documentation/admin-guide/aoe/status.sh
 create mode 100644 Documentation/admin-guide/aoe/todo.rst
 create mode 100644 Documentation/admin-guide/aoe/udev-install.sh
 create mode 100644 Documentation/admin-guide/aoe/udev.txt
 delete mode 100644 Documentation/aoe/aoe.rst
 delete mode 100644 Documentation/aoe/autoload.sh
 delete mode 100644 Documentation/aoe/examples.rst
 delete mode 100644 Documentation/aoe/index.rst
 delete mode 100644 Documentation/aoe/status.sh
 delete mode 100644 Documentation/aoe/todo.rst
 delete mode 100644 Documentation/aoe/udev-install.sh
 delete mode 100644 Documentation/aoe/udev.txt

diff --git a/Documentation/admin-guide/aoe/aoe.rst b/Documentation/admin-guide/aoe/aoe.rst
new file mode 100644
index 000000000000..a05e751363a0
--- /dev/null
+++ b/Documentation/admin-guide/aoe/aoe.rst
@@ -0,0 +1,150 @@
+Introduction
+============
+
+ATA over Ethernet is a network protocol that provides simple access to
+block storage on the LAN.
+
+  http://support.coraid.com/documents/AoEr11.txt
+
+The EtherDrive (R) HOWTO for 2.6 and 3.x kernels is found at ...
+
+  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html
+
+It has many tips and hints!  Please see, especially, recommended
+tunings for virtual memory:
+
+  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19
+
+The aoetools are userland programs that are designed to work with this
+driver.  The aoetools are on sourceforge.
+
+  http://aoetools.sourceforge.net/
+
+The scripts in this Documentation/admin-guide/aoe directory are intended to
+document the use of the driver and are not necessary if you install
+the aoetools.
+
+
+Creating Device Nodes
+=====================
+
+  Users of udev should find the block device nodes created
+  automatically, but to create all the necessary device nodes, use the
+  udev configuration rules provided in udev.txt (in this directory).
+
+  There is a udev-install.sh script that shows how to install these
+  rules on your system.
+
+  There is also an autoload script that shows how to edit
+  /etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when
+  necessary.  Preloading the aoe module is preferable to autoloading,
+  however, because AoE discovery takes a few seconds.  It can be
+  confusing when an AoE device is not present the first time the a
+  command is run but appears a second later.
+
+Using Device Nodes
+==================
+
+  "cat /dev/etherd/err" blocks, waiting for error diagnostic output,
+  like any retransmitted packets.
+
+  "echo eth2 eth4 > /dev/etherd/interfaces" tells the aoe driver to
+  limit ATA over Ethernet traffic to eth2 and eth4.  AoE traffic from
+  untrusted networks should be ignored as a matter of security.  See
+  also the aoe_iflist driver option described below.
+
+  "echo > /dev/etherd/discover" tells the driver to find out what AoE
+  devices are available.
+
+  In the future these character devices may disappear and be replaced
+  by sysfs counterparts.  Using the commands in aoetools insulates
+  users from these implementation details.
+
+  The block devices are named like this::
+
+	e{shelf}.{slot}
+	e{shelf}.{slot}p{part}
+
+  ... so that "e0.2" is the third blade from the left (slot 2) in the
+  first shelf (shelf address zero).  That's the whole disk.  The first
+  partition on that disk would be "e0.2p1".
+
+Using sysfs
+===========
+
+  Each aoe block device in /sys/block has the extra attributes of
+  state, mac, and netif.  The state attribute is "up" when the device
+  is ready for I/O and "down" if detected but unusable.  The
+  "down,closewait" state shows that the device is still open and
+  cannot come up again until it has been closed.
+
+  The mac attribute is the ethernet address of the remote AoE device.
+  The netif attribute is the network interface on the localhost
+  through which we are communicating with the remote AoE device.
+
+  There is a script in this directory that formats this information in
+  a convenient way.  Users with aoetools should use the aoe-stat
+  command::
+
+    root@makki root# sh Documentation/admin-guide/aoe/status.sh
+       e10.0            eth3              up
+       e10.1            eth3              up
+       e10.2            eth3              up
+       e10.3            eth3              up
+       e10.4            eth3              up
+       e10.5            eth3              up
+       e10.6            eth3              up
+       e10.7            eth3              up
+       e10.8            eth3              up
+       e10.9            eth3              up
+        e4.0            eth1              up
+        e4.1            eth1              up
+        e4.2            eth1              up
+        e4.3            eth1              up
+        e4.4            eth1              up
+        e4.5            eth1              up
+        e4.6            eth1              up
+        e4.7            eth1              up
+        e4.8            eth1              up
+        e4.9            eth1              up
+
+  Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
+  option discussed below) instead of /dev/etherd/interfaces to limit
+  AoE traffic to the network interfaces in the given
+  whitespace-separated list.  Unlike the old character device, the
+  sysfs entry can be read from as well as written to.
+
+  It's helpful to trigger discovery after setting the list of allowed
+  interfaces.  The aoetools package provides an aoe-discover script
+  for this purpose.  You can also directly use the
+  /dev/etherd/discover special file described above.
+
+Driver Options
+==============
+
+  There is a boot option for the built-in aoe driver and a
+  corresponding module parameter, aoe_iflist.  Without this option,
+  all network interfaces may be used for ATA over Ethernet.  Here is a
+  usage example for the module parameter::
+
+    modprobe aoe_iflist="eth1 eth3"
+
+  The aoe_deadsecs module parameter determines the maximum number of
+  seconds that the driver will wait for an AoE device to provide a
+  response to an AoE command.  After aoe_deadsecs seconds have
+  elapsed, the AoE device will be marked as "down".  A value of zero
+  is supported for testing purposes and makes the aoe driver keep
+  trying AoE commands forever.
+
+  The aoe_maxout module parameter has a default of 128.  This is the
+  maximum number of unresponded packets that will be sent to an AoE
+  target at one time.
+
+  The aoe_dyndevs module parameter defaults to 1, meaning that the
+  driver will assign a block device minor number to a discovered AoE
+  target based on the order of its discovery.  With dynamic minor
+  device numbers in use, a greater range of AoE shelf and slot
+  addresses can be supported.  Users with udev will never have to
+  think about minor numbers.  Using aoe_dyndevs=0 allows device nodes
+  to be pre-created using a static minor-number scheme with the
+  aoe-mkshelf script in the aoetools.
diff --git a/Documentation/admin-guide/aoe/autoload.sh b/Documentation/admin-guide/aoe/autoload.sh
new file mode 100644
index 000000000000..815dff4691c9
--- /dev/null
+++ b/Documentation/admin-guide/aoe/autoload.sh
@@ -0,0 +1,17 @@
+#!/bin/sh
+# set aoe to autoload by installing the
+# aliases in /etc/modprobe.d/
+
+f=/etc/modprobe.d/aoe.conf
+
+if test ! -r $f || test ! -w $f; then
+	echo "cannot configure $f for module autoloading" 1>&2
+	exit 1
+fi
+
+grep major-152 $f >/dev/null
+if [ $? = 1 ]; then
+	echo alias block-major-152 aoe >> $f
+	echo alias char-major-152 aoe >> $f
+fi
+
diff --git a/Documentation/admin-guide/aoe/examples.rst b/Documentation/admin-guide/aoe/examples.rst
new file mode 100644
index 000000000000..91f3198e52c1
--- /dev/null
+++ b/Documentation/admin-guide/aoe/examples.rst
@@ -0,0 +1,23 @@
+Example of udev rules
+---------------------
+
+ .. include:: udev.txt
+    :literal:
+
+Example of udev install rules script
+------------------------------------
+
+ .. literalinclude:: udev-install.sh
+    :language: shell
+
+Example script to get status
+----------------------------
+
+ .. literalinclude:: status.sh
+    :language: shell
+
+Example of AoE autoload script
+------------------------------
+
+ .. literalinclude:: autoload.sh
+    :language: shell
diff --git a/Documentation/admin-guide/aoe/index.rst b/Documentation/admin-guide/aoe/index.rst
new file mode 100644
index 000000000000..d71c5df15922
--- /dev/null
+++ b/Documentation/admin-guide/aoe/index.rst
@@ -0,0 +1,17 @@
+=======================
+ATA over Ethernet (AoE)
+=======================
+
+.. toctree::
+    :maxdepth: 1
+
+    aoe
+    todo
+    examples
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/admin-guide/aoe/status.sh b/Documentation/admin-guide/aoe/status.sh
new file mode 100644
index 000000000000..eeec7baae57a
--- /dev/null
+++ b/Documentation/admin-guide/aoe/status.sh
@@ -0,0 +1,30 @@
+#! /bin/sh
+# collate and present sysfs information about AoE storage
+#
+# A more complete version of this script is aoe-stat, in the
+# aoetools.
+
+set -e
+format="%8s\t%8s\t%8s\n"
+me=`basename $0`
+sysd=${sysfs_dir:-/sys}
+
+# printf "$format" device mac netif state
+
+# Suse 9.1 Pro doesn't put /sys in /etc/mtab
+#test -z "`mount | grep sysfs`" && {
+test ! -d "$sysd/block" && {
+	echo "$me Error: sysfs is not mounted" 1>&2
+	exit 1
+}
+
+for d in `ls -d $sysd/block/etherd* 2>/dev/null | grep -v p` end; do
+	# maybe ls comes up empty, so we use "end"
+	test $d = end && continue
+
+	dev=`echo "$d" | sed 's/.*!//'`
+	printf "$format" \
+		"$dev" \
+		"`cat \"$d/netif\"`" \
+		"`cat \"$d/state\"`"
+done | sort
diff --git a/Documentation/admin-guide/aoe/todo.rst b/Documentation/admin-guide/aoe/todo.rst
new file mode 100644
index 000000000000..dea8db5a33e1
--- /dev/null
+++ b/Documentation/admin-guide/aoe/todo.rst
@@ -0,0 +1,17 @@
+TODO
+====
+
+There is a potential for deadlock when allocating a struct sk_buff for
+data that needs to be written out to aoe storage.  If the data is
+being written from a dirty page in order to free that page, and if
+there are no other pages available, then deadlock may occur when a
+free page is needed for the sk_buff allocation.  This situation has
+not been observed, but it would be nice to eliminate any potential for
+deadlock under memory pressure.
+
+Because ATA over Ethernet is not fragmented by the kernel's IP code,
+the destructor member of the struct sk_buff is available to the aoe
+driver.  By using a mempool for allocating all but the first few
+sk_buffs, and by registering a destructor, we should be able to
+efficiently allocate sk_buffs without introducing any potential for
+deadlock.
diff --git a/Documentation/admin-guide/aoe/udev-install.sh b/Documentation/admin-guide/aoe/udev-install.sh
new file mode 100644
index 000000000000..15e86f58c036
--- /dev/null
+++ b/Documentation/admin-guide/aoe/udev-install.sh
@@ -0,0 +1,33 @@
+# install the aoe-specific udev rules from udev.txt into 
+# the system's udev configuration
+# 
+
+me="`basename $0`"
+
+# find udev.conf, often /etc/udev/udev.conf
+# (or environment can specify where to find udev.conf)
+#
+if test -z "$conf"; then
+	if test -r /etc/udev/udev.conf; then
+		conf=/etc/udev/udev.conf
+	else
+		conf="`find /etc -type f -name udev.conf 2> /dev/null`"
+		if test -z "$conf" || test ! -r "$conf"; then
+			echo "$me Error: no udev.conf found" 1>&2
+			exit 1
+		fi
+	fi
+fi
+
+# find the directory where udev rules are stored, often
+# /etc/udev/rules.d
+#
+rules_d="`sed -n '/^udev_rules=/{ s!udev_rules=!!; s!\"!!g; p; }' $conf`"
+if test -z "$rules_d" ; then
+	rules_d=/etc/udev/rules.d
+fi
+if test ! -d "$rules_d"; then
+	echo "$me Error: cannot find udev rules directory" 1>&2
+	exit 1
+fi
+sh -xc "cp `dirname $0`/udev.txt $rules_d/60-aoe.rules"
diff --git a/Documentation/admin-guide/aoe/udev.txt b/Documentation/admin-guide/aoe/udev.txt
new file mode 100644
index 000000000000..5fb756466bc7
--- /dev/null
+++ b/Documentation/admin-guide/aoe/udev.txt
@@ -0,0 +1,26 @@
+# These rules tell udev what device nodes to create for aoe support.
+# They may be installed along the following lines.  Check the section
+# 8 udev manpage to see whether your udev supports SUBSYSTEM, and
+# whether it uses one or two equal signs for SUBSYSTEM and KERNEL.
+# 
+#   ecashin@makki ~$ su
+#   Password:
+#   bash# find /etc -type f -name udev.conf
+#   /etc/udev/udev.conf
+#   bash# grep udev_rules= /etc/udev/udev.conf
+#   udev_rules="/etc/udev/rules.d/"
+#   bash# ls /etc/udev/rules.d/
+#   10-wacom.rules  50-udev.rules
+#   bash# cp /path/to/linux/Documentation/admin-guide/aoe/udev.txt \
+#           /etc/udev/rules.d/60-aoe.rules
+#  
+
+# aoe char devices
+SUBSYSTEM=="aoe", KERNEL=="discover",	NAME="etherd/%k", GROUP="disk", MODE="0220"
+SUBSYSTEM=="aoe", KERNEL=="err",	NAME="etherd/%k", GROUP="disk", MODE="0440"
+SUBSYSTEM=="aoe", KERNEL=="interfaces",	NAME="etherd/%k", GROUP="disk", MODE="0220"
+SUBSYSTEM=="aoe", KERNEL=="revalidate",	NAME="etherd/%k", GROUP="disk", MODE="0220"
+SUBSYSTEM=="aoe", KERNEL=="flush",	NAME="etherd/%k", GROUP="disk", MODE="0220"
+
+# aoe block devices     
+KERNEL=="etherd*",       GROUP="disk"
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 9228fbf5ce4e..1f0d9b939311 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -83,6 +83,7 @@ configure specific aspects of kernel behavior to your liking.
    namespaces/index
    perf-security
    acpi/index
+   aoe/index
    device-mapper/index
    laptops/index
 
diff --git a/Documentation/aoe/aoe.rst b/Documentation/aoe/aoe.rst
deleted file mode 100644
index 58747ecec71d..000000000000
--- a/Documentation/aoe/aoe.rst
+++ /dev/null
@@ -1,150 +0,0 @@
-Introduction
-============
-
-ATA over Ethernet is a network protocol that provides simple access to
-block storage on the LAN.
-
-  http://support.coraid.com/documents/AoEr11.txt
-
-The EtherDrive (R) HOWTO for 2.6 and 3.x kernels is found at ...
-
-  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html
-
-It has many tips and hints!  Please see, especially, recommended
-tunings for virtual memory:
-
-  http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.19
-
-The aoetools are userland programs that are designed to work with this
-driver.  The aoetools are on sourceforge.
-
-  http://aoetools.sourceforge.net/
-
-The scripts in this Documentation/aoe directory are intended to
-document the use of the driver and are not necessary if you install
-the aoetools.
-
-
-Creating Device Nodes
-=====================
-
-  Users of udev should find the block device nodes created
-  automatically, but to create all the necessary device nodes, use the
-  udev configuration rules provided in udev.txt (in this directory).
-
-  There is a udev-install.sh script that shows how to install these
-  rules on your system.
-
-  There is also an autoload script that shows how to edit
-  /etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when
-  necessary.  Preloading the aoe module is preferable to autoloading,
-  however, because AoE discovery takes a few seconds.  It can be
-  confusing when an AoE device is not present the first time the a
-  command is run but appears a second later.
-
-Using Device Nodes
-==================
-
-  "cat /dev/etherd/err" blocks, waiting for error diagnostic output,
-  like any retransmitted packets.
-
-  "echo eth2 eth4 > /dev/etherd/interfaces" tells the aoe driver to
-  limit ATA over Ethernet traffic to eth2 and eth4.  AoE traffic from
-  untrusted networks should be ignored as a matter of security.  See
-  also the aoe_iflist driver option described below.
-
-  "echo > /dev/etherd/discover" tells the driver to find out what AoE
-  devices are available.
-
-  In the future these character devices may disappear and be replaced
-  by sysfs counterparts.  Using the commands in aoetools insulates
-  users from these implementation details.
-
-  The block devices are named like this::
-
-	e{shelf}.{slot}
-	e{shelf}.{slot}p{part}
-
-  ... so that "e0.2" is the third blade from the left (slot 2) in the
-  first shelf (shelf address zero).  That's the whole disk.  The first
-  partition on that disk would be "e0.2p1".
-
-Using sysfs
-===========
-
-  Each aoe block device in /sys/block has the extra attributes of
-  state, mac, and netif.  The state attribute is "up" when the device
-  is ready for I/O and "down" if detected but unusable.  The
-  "down,closewait" state shows that the device is still open and
-  cannot come up again until it has been closed.
-
-  The mac attribute is the ethernet address of the remote AoE device.
-  The netif attribute is the network interface on the localhost
-  through which we are communicating with the remote AoE device.
-
-  There is a script in this directory that formats this information in
-  a convenient way.  Users with aoetools should use the aoe-stat
-  command::
-
-    root@makki root# sh Documentation/aoe/status.sh
-       e10.0            eth3              up
-       e10.1            eth3              up
-       e10.2            eth3              up
-       e10.3            eth3              up
-       e10.4            eth3              up
-       e10.5            eth3              up
-       e10.6            eth3              up
-       e10.7            eth3              up
-       e10.8            eth3              up
-       e10.9            eth3              up
-        e4.0            eth1              up
-        e4.1            eth1              up
-        e4.2            eth1              up
-        e4.3            eth1              up
-        e4.4            eth1              up
-        e4.5            eth1              up
-        e4.6            eth1              up
-        e4.7            eth1              up
-        e4.8            eth1              up
-        e4.9            eth1              up
-
-  Use /sys/module/aoe/parameters/aoe_iflist (or better, the driver
-  option discussed below) instead of /dev/etherd/interfaces to limit
-  AoE traffic to the network interfaces in the given
-  whitespace-separated list.  Unlike the old character device, the
-  sysfs entry can be read from as well as written to.
-
-  It's helpful to trigger discovery after setting the list of allowed
-  interfaces.  The aoetools package provides an aoe-discover script
-  for this purpose.  You can also directly use the
-  /dev/etherd/discover special file described above.
-
-Driver Options
-==============
-
-  There is a boot option for the built-in aoe driver and a
-  corresponding module parameter, aoe_iflist.  Without this option,
-  all network interfaces may be used for ATA over Ethernet.  Here is a
-  usage example for the module parameter::
-
-    modprobe aoe_iflist="eth1 eth3"
-
-  The aoe_deadsecs module parameter determines the maximum number of
-  seconds that the driver will wait for an AoE device to provide a
-  response to an AoE command.  After aoe_deadsecs seconds have
-  elapsed, the AoE device will be marked as "down".  A value of zero
-  is supported for testing purposes and makes the aoe driver keep
-  trying AoE commands forever.
-
-  The aoe_maxout module parameter has a default of 128.  This is the
-  maximum number of unresponded packets that will be sent to an AoE
-  target at one time.
-
-  The aoe_dyndevs module parameter defaults to 1, meaning that the
-  driver will assign a block device minor number to a discovered AoE
-  target based on the order of its discovery.  With dynamic minor
-  device numbers in use, a greater range of AoE shelf and slot
-  addresses can be supported.  Users with udev will never have to
-  think about minor numbers.  Using aoe_dyndevs=0 allows device nodes
-  to be pre-created using a static minor-number scheme with the
-  aoe-mkshelf script in the aoetools.
diff --git a/Documentation/aoe/autoload.sh b/Documentation/aoe/autoload.sh
deleted file mode 100644
index 815dff4691c9..000000000000
--- a/Documentation/aoe/autoload.sh
+++ /dev/null
@@ -1,17 +0,0 @@
-#!/bin/sh
-# set aoe to autoload by installing the
-# aliases in /etc/modprobe.d/
-
-f=/etc/modprobe.d/aoe.conf
-
-if test ! -r $f || test ! -w $f; then
-	echo "cannot configure $f for module autoloading" 1>&2
-	exit 1
-fi
-
-grep major-152 $f >/dev/null
-if [ $? = 1 ]; then
-	echo alias block-major-152 aoe >> $f
-	echo alias char-major-152 aoe >> $f
-fi
-
diff --git a/Documentation/aoe/examples.rst b/Documentation/aoe/examples.rst
deleted file mode 100644
index 91f3198e52c1..000000000000
--- a/Documentation/aoe/examples.rst
+++ /dev/null
@@ -1,23 +0,0 @@
-Example of udev rules
----------------------
-
- .. include:: udev.txt
-    :literal:
-
-Example of udev install rules script
-------------------------------------
-
- .. literalinclude:: udev-install.sh
-    :language: shell
-
-Example script to get status
-----------------------------
-
- .. literalinclude:: status.sh
-    :language: shell
-
-Example of AoE autoload script
-------------------------------
-
- .. literalinclude:: autoload.sh
-    :language: shell
diff --git a/Documentation/aoe/index.rst b/Documentation/aoe/index.rst
deleted file mode 100644
index 4394b9b7913c..000000000000
--- a/Documentation/aoe/index.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-:orphan:
-
-=======================
-ATA over Ethernet (AoE)
-=======================
-
-.. toctree::
-    :maxdepth: 1
-
-    aoe
-    todo
-    examples
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/aoe/status.sh b/Documentation/aoe/status.sh
deleted file mode 100644
index eeec7baae57a..000000000000
--- a/Documentation/aoe/status.sh
+++ /dev/null
@@ -1,30 +0,0 @@
-#! /bin/sh
-# collate and present sysfs information about AoE storage
-#
-# A more complete version of this script is aoe-stat, in the
-# aoetools.
-
-set -e
-format="%8s\t%8s\t%8s\n"
-me=`basename $0`
-sysd=${sysfs_dir:-/sys}
-
-# printf "$format" device mac netif state
-
-# Suse 9.1 Pro doesn't put /sys in /etc/mtab
-#test -z "`mount | grep sysfs`" && {
-test ! -d "$sysd/block" && {
-	echo "$me Error: sysfs is not mounted" 1>&2
-	exit 1
-}
-
-for d in `ls -d $sysd/block/etherd* 2>/dev/null | grep -v p` end; do
-	# maybe ls comes up empty, so we use "end"
-	test $d = end && continue
-
-	dev=`echo "$d" | sed 's/.*!//'`
-	printf "$format" \
-		"$dev" \
-		"`cat \"$d/netif\"`" \
-		"`cat \"$d/state\"`"
-done | sort
diff --git a/Documentation/aoe/todo.rst b/Documentation/aoe/todo.rst
deleted file mode 100644
index dea8db5a33e1..000000000000
--- a/Documentation/aoe/todo.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-TODO
-====
-
-There is a potential for deadlock when allocating a struct sk_buff for
-data that needs to be written out to aoe storage.  If the data is
-being written from a dirty page in order to free that page, and if
-there are no other pages available, then deadlock may occur when a
-free page is needed for the sk_buff allocation.  This situation has
-not been observed, but it would be nice to eliminate any potential for
-deadlock under memory pressure.
-
-Because ATA over Ethernet is not fragmented by the kernel's IP code,
-the destructor member of the struct sk_buff is available to the aoe
-driver.  By using a mempool for allocating all but the first few
-sk_buffs, and by registering a destructor, we should be able to
-efficiently allocate sk_buffs without introducing any potential for
-deadlock.
diff --git a/Documentation/aoe/udev-install.sh b/Documentation/aoe/udev-install.sh
deleted file mode 100644
index 15e86f58c036..000000000000
--- a/Documentation/aoe/udev-install.sh
+++ /dev/null
@@ -1,33 +0,0 @@
-# install the aoe-specific udev rules from udev.txt into 
-# the system's udev configuration
-# 
-
-me="`basename $0`"
-
-# find udev.conf, often /etc/udev/udev.conf
-# (or environment can specify where to find udev.conf)
-#
-if test -z "$conf"; then
-	if test -r /etc/udev/udev.conf; then
-		conf=/etc/udev/udev.conf
-	else
-		conf="`find /etc -type f -name udev.conf 2> /dev/null`"
-		if test -z "$conf" || test ! -r "$conf"; then
-			echo "$me Error: no udev.conf found" 1>&2
-			exit 1
-		fi
-	fi
-fi
-
-# find the directory where udev rules are stored, often
-# /etc/udev/rules.d
-#
-rules_d="`sed -n '/^udev_rules=/{ s!udev_rules=!!; s!\"!!g; p; }' $conf`"
-if test -z "$rules_d" ; then
-	rules_d=/etc/udev/rules.d
-fi
-if test ! -d "$rules_d"; then
-	echo "$me Error: cannot find udev rules directory" 1>&2
-	exit 1
-fi
-sh -xc "cp `dirname $0`/udev.txt $rules_d/60-aoe.rules"
diff --git a/Documentation/aoe/udev.txt b/Documentation/aoe/udev.txt
deleted file mode 100644
index 54feda5a0772..000000000000
--- a/Documentation/aoe/udev.txt
+++ /dev/null
@@ -1,26 +0,0 @@
-# These rules tell udev what device nodes to create for aoe support.
-# They may be installed along the following lines.  Check the section
-# 8 udev manpage to see whether your udev supports SUBSYSTEM, and
-# whether it uses one or two equal signs for SUBSYSTEM and KERNEL.
-# 
-#   ecashin@makki ~$ su
-#   Password:
-#   bash# find /etc -type f -name udev.conf
-#   /etc/udev/udev.conf
-#   bash# grep udev_rules= /etc/udev/udev.conf
-#   udev_rules="/etc/udev/rules.d/"
-#   bash# ls /etc/udev/rules.d/
-#   10-wacom.rules  50-udev.rules
-#   bash# cp /path/to/linux/Documentation/aoe/udev.txt \
-#           /etc/udev/rules.d/60-aoe.rules
-#  
-
-# aoe char devices
-SUBSYSTEM=="aoe", KERNEL=="discover",	NAME="etherd/%k", GROUP="disk", MODE="0220"
-SUBSYSTEM=="aoe", KERNEL=="err",	NAME="etherd/%k", GROUP="disk", MODE="0440"
-SUBSYSTEM=="aoe", KERNEL=="interfaces",	NAME="etherd/%k", GROUP="disk", MODE="0220"
-SUBSYSTEM=="aoe", KERNEL=="revalidate",	NAME="etherd/%k", GROUP="disk", MODE="0220"
-SUBSYSTEM=="aoe", KERNEL=="flush",	NAME="etherd/%k", GROUP="disk", MODE="0220"
-
-# aoe block devices     
-KERNEL=="etherd*",       GROUP="disk"
diff --git a/MAINTAINERS b/MAINTAINERS
index 3feb318e1433..0c603ea73034 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2689,7 +2689,7 @@ ATA OVER ETHERNET (AOE) DRIVER
 M:	"Justin Sanders" <justin@coraid.com>
 W:	http://www.openaoe.org/
 S:	Supported
-F:	Documentation/aoe/
+F:	Documentation/admin-guide/aoe/
 F:	drivers/block/aoe/
 
 ATHEROS 71XX/9XXX GPIO DRIVER
-- 
cgit v1.2.3-55-g7522


From da82c92f1150f66afabf78d2c85ef9ac18dc6d38 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 27 Jun 2019 13:08:35 -0300
Subject: docs: cgroup-v1: add it to the admin-guide book

Those files belong to the admin guide, so add them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 .../admin-guide/cgroup-v1/blkio-controller.rst     |  302 ++++++
 Documentation/admin-guide/cgroup-v1/cgroups.rst    |  695 ++++++++++++++
 Documentation/admin-guide/cgroup-v1/cpuacct.rst    |   50 +
 Documentation/admin-guide/cgroup-v1/cpusets.rst    |  866 +++++++++++++++++
 Documentation/admin-guide/cgroup-v1/devices.rst    |  132 +++
 .../admin-guide/cgroup-v1/freezer-subsystem.rst    |  127 +++
 Documentation/admin-guide/cgroup-v1/hugetlb.rst    |   50 +
 Documentation/admin-guide/cgroup-v1/index.rst      |   28 +
 Documentation/admin-guide/cgroup-v1/memcg_test.rst |  355 +++++++
 Documentation/admin-guide/cgroup-v1/memory.rst     | 1003 ++++++++++++++++++++
 Documentation/admin-guide/cgroup-v1/net_cls.rst    |   44 +
 Documentation/admin-guide/cgroup-v1/net_prio.rst   |   57 ++
 Documentation/admin-guide/cgroup-v1/pids.rst       |   92 ++
 Documentation/admin-guide/cgroup-v1/rdma.rst       |  117 +++
 Documentation/admin-guide/cgroup-v2.rst            |    2 +-
 Documentation/admin-guide/index.rst                |    1 +
 Documentation/admin-guide/kernel-parameters.txt    |    4 +-
 .../admin-guide/mm/numa_memory_policy.rst          |    2 +-
 Documentation/block/bfq-iosched.rst                |    2 +-
 Documentation/cgroup-v1/blkio-controller.rst       |  302 ------
 Documentation/cgroup-v1/cgroups.rst                |  695 --------------
 Documentation/cgroup-v1/cpuacct.rst                |   50 -
 Documentation/cgroup-v1/cpusets.rst                |  866 -----------------
 Documentation/cgroup-v1/devices.rst                |  132 ---
 Documentation/cgroup-v1/freezer-subsystem.rst      |  127 ---
 Documentation/cgroup-v1/hugetlb.rst                |   50 -
 Documentation/cgroup-v1/index.rst                  |   30 -
 Documentation/cgroup-v1/memcg_test.rst             |  355 -------
 Documentation/cgroup-v1/memory.rst                 | 1003 --------------------
 Documentation/cgroup-v1/net_cls.rst                |   44 -
 Documentation/cgroup-v1/net_prio.rst               |   57 --
 Documentation/cgroup-v1/pids.rst                   |   92 --
 Documentation/cgroup-v1/rdma.rst                   |  117 ---
 Documentation/filesystems/tmpfs.txt                |    2 +-
 Documentation/kernel-per-CPU-kthreads.txt          |    2 +-
 Documentation/scheduler/sched-deadline.rst         |    2 +-
 Documentation/scheduler/sched-design-CFS.rst       |    2 +-
 Documentation/scheduler/sched-rt-group.rst         |    2 +-
 Documentation/vm/numa.rst                          |    4 +-
 Documentation/vm/page_migration.rst                |    2 +-
 Documentation/vm/unevictable-lru.rst               |    2 +-
 Documentation/x86/x86_64/fake-numa-for-cpusets.rst |    4 +-
 MAINTAINERS                                        |    4 +-
 block/Kconfig                                      |    2 +-
 include/linux/cgroup-defs.h                        |    2 +-
 include/uapi/linux/bpf.h                           |    2 +-
 init/Kconfig                                       |    4 +-
 kernel/cgroup/cpuset.c                             |    2 +-
 security/device_cgroup.c                           |    2 +-
 tools/include/uapi/linux/bpf.h                     |    2 +-
 50 files changed, 3945 insertions(+), 3946 deletions(-)
 create mode 100644 Documentation/admin-guide/cgroup-v1/blkio-controller.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/cgroups.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/cpuacct.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/cpusets.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/devices.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/hugetlb.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/index.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/memcg_test.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/memory.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/net_cls.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/net_prio.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/pids.rst
 create mode 100644 Documentation/admin-guide/cgroup-v1/rdma.rst
 delete mode 100644 Documentation/cgroup-v1/blkio-controller.rst
 delete mode 100644 Documentation/cgroup-v1/cgroups.rst
 delete mode 100644 Documentation/cgroup-v1/cpuacct.rst
 delete mode 100644 Documentation/cgroup-v1/cpusets.rst
 delete mode 100644 Documentation/cgroup-v1/devices.rst
 delete mode 100644 Documentation/cgroup-v1/freezer-subsystem.rst
 delete mode 100644 Documentation/cgroup-v1/hugetlb.rst
 delete mode 100644 Documentation/cgroup-v1/index.rst
 delete mode 100644 Documentation/cgroup-v1/memcg_test.rst
 delete mode 100644 Documentation/cgroup-v1/memory.rst
 delete mode 100644 Documentation/cgroup-v1/net_cls.rst
 delete mode 100644 Documentation/cgroup-v1/net_prio.rst
 delete mode 100644 Documentation/cgroup-v1/pids.rst
 delete mode 100644 Documentation/cgroup-v1/rdma.rst

diff --git a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
new file mode 100644
index 000000000000..1d7d962933be
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
@@ -0,0 +1,302 @@
+===================
+Block IO Controller
+===================
+
+Overview
+========
+cgroup subsys "blkio" implements the block io controller. There seems to be
+a need of various kinds of IO control policies (like proportional BW, max BW)
+both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
+Plan is to use the same cgroup based management interface for blkio controller
+and based on user options switch IO policies in the background.
+
+One IO control policy is throttling policy which can be used to
+specify upper IO rate limits on devices. This policy is implemented in
+generic block layer and can be used on leaf nodes as well as higher
+level logical devices like device mapper.
+
+HOWTO
+=====
+Throttling/Upper Limit policy
+-----------------------------
+- Enable Block IO controller::
+
+	CONFIG_BLK_CGROUP=y
+
+- Enable throttling in block layer::
+
+	CONFIG_BLK_DEV_THROTTLING=y
+
+- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
+
+        mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
+
+- Specify a bandwidth rate on particular device for root group. The format
+  for policy is "<major>:<minor>  <bytes_per_second>"::
+
+        echo "8:16  1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
+
+  Above will put a limit of 1MB/second on reads happening for root group
+  on device having major/minor number 8:16.
+
+- Run dd to read a file and see if rate is throttled to 1MB/s or not::
+
+        # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
+        1024+0 records in
+        1024+0 records out
+        4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
+
+ Limits for writes can be put using blkio.throttle.write_bps_device file.
+
+Hierarchical Cgroups
+====================
+
+Throttling implements hierarchy support; however,
+throttling's hierarchy support is enabled iff "sane_behavior" is
+enabled from cgroup side, which currently is a development option and
+not publicly available.
+
+If somebody created a hierarchy like as follows::
+
+			root
+			/  \
+		     test1 test2
+			|
+		     test3
+
+Throttling with "sane_behavior" will handle the
+hierarchy correctly. For throttling, all limits apply
+to the whole subtree while all statistics are local to the IOs
+directly generated by tasks in that cgroup.
+
+Throttling without "sane_behavior" enabled from cgroup side will
+practically treat all groups at same level as if it looks like the
+following::
+
+				pivot
+			     /  /   \  \
+			root  test1 test2  test3
+
+Various user visible config options
+===================================
+CONFIG_BLK_CGROUP
+	- Block IO controller.
+
+CONFIG_BFQ_CGROUP_DEBUG
+	- Debug help. Right now some additional stats file show up in cgroup
+	  if this option is enabled.
+
+CONFIG_BLK_DEV_THROTTLING
+	- Enable block device throttling support in block layer.
+
+Details of cgroup files
+=======================
+Proportional weight policy files
+--------------------------------
+- blkio.weight
+	- Specifies per cgroup weight. This is default weight of the group
+	  on all the devices until and unless overridden by per device rule.
+	  (See blkio.weight_device).
+	  Currently allowed range of weights is from 10 to 1000.
+
+- blkio.weight_device
+	- One can specify per cgroup per device rules using this interface.
+	  These rules override the default value of group weight as specified
+	  by blkio.weight.
+
+	  Following is the format::
+
+	    # echo dev_maj:dev_minor weight > blkio.weight_device
+
+	  Configure weight=300 on /dev/sdb (8:16) in this cgroup::
+
+	    # echo 8:16 300 > blkio.weight_device
+	    # cat blkio.weight_device
+	    dev     weight
+	    8:16    300
+
+	  Configure weight=500 on /dev/sda (8:0) in this cgroup::
+
+	    # echo 8:0 500 > blkio.weight_device
+	    # cat blkio.weight_device
+	    dev     weight
+	    8:0     500
+	    8:16    300
+
+	  Remove specific weight for /dev/sda in this cgroup::
+
+	    # echo 8:0 0 > blkio.weight_device
+	    # cat blkio.weight_device
+	    dev     weight
+	    8:16    300
+
+- blkio.leaf_weight[_device]
+	- Equivalents of blkio.weight[_device] for the purpose of
+          deciding how much weight tasks in the given cgroup has while
+          competing with the cgroup's child cgroups. For details,
+          please refer to Documentation/block/cfq-iosched.txt.
+
+- blkio.time
+	- disk time allocated to cgroup per device in milliseconds. First
+	  two fields specify the major and minor number of the device and
+	  third field specifies the disk time allocated to group in
+	  milliseconds.
+
+- blkio.sectors
+	- number of sectors transferred to/from disk by the group. First
+	  two fields specify the major and minor number of the device and
+	  third field specifies the number of sectors transferred by the
+	  group to/from the device.
+
+- blkio.io_service_bytes
+	- Number of bytes transferred to/from the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of bytes.
+
+- blkio.io_serviced
+	- Number of IOs (bio) issued to the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of IOs.
+
+- blkio.io_service_time
+	- Total amount of time between request dispatch and request completion
+	  for the IOs done by this cgroup. This is in nanoseconds to make it
+	  meaningful for flash devices too. For devices with queue depth of 1,
+	  this time represents the actual service time. When queue_depth > 1,
+	  that is no longer true as requests may be served out of order. This
+	  may cause the service time for a given IO to include the service time
+	  of multiple IOs when served out of order which may result in total
+	  io_service_time > actual time elapsed. This time is further divided by
+	  the type of operation - read or write, sync or async. First two fields
+	  specify the major and minor number of the device, third field
+	  specifies the operation type and the fourth field specifies the
+	  io_service_time in ns.
+
+- blkio.io_wait_time
+	- Total amount of time the IOs for this cgroup spent waiting in the
+	  scheduler queues for service. This can be greater than the total time
+	  elapsed since it is cumulative io_wait_time for all IOs. It is not a
+	  measure of total time the cgroup spent waiting but rather a measure of
+	  the wait_time for its individual IOs. For devices with queue_depth > 1
+	  this metric does not include the time spent waiting for service once
+	  the IO is dispatched to the device but till it actually gets serviced
+	  (there might be a time lag here due to re-ordering of requests by the
+	  device). This is in nanoseconds to make it meaningful for flash
+	  devices too. This time is further divided by the type of operation -
+	  read or write, sync or async. First two fields specify the major and
+	  minor number of the device, third field specifies the operation type
+	  and the fourth field specifies the io_wait_time in ns.
+
+- blkio.io_merged
+	- Total number of bios/requests merged into requests belonging to this
+	  cgroup. This is further divided by the type of operation - read or
+	  write, sync or async.
+
+- blkio.io_queued
+	- Total number of requests queued up at any given instant for this
+	  cgroup. This is further divided by the type of operation - read or
+	  write, sync or async.
+
+- blkio.avg_queue_size
+	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+	  The average queue size for this cgroup over the entire time of this
+	  cgroup's existence. Queue size samples are taken each time one of the
+	  queues of this cgroup gets a timeslice.
+
+- blkio.group_wait_time
+	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+	  This is the amount of time the cgroup had to wait since it became busy
+	  (i.e., went from 0 to 1 request queued) to get a timeslice for one of
+	  its queues. This is different from the io_wait_time which is the
+	  cumulative total of the amount of time spent by each IO in that cgroup
+	  waiting in the scheduler queue. This is in nanoseconds. If this is
+	  read when the cgroup is in a waiting (for timeslice) state, the stat
+	  will only report the group_wait_time accumulated till the last time it
+	  got a timeslice and will not include the current delta.
+
+- blkio.empty_time
+	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+	  This is the amount of time a cgroup spends without any pending
+	  requests when not being served, i.e., it does not include any time
+	  spent idling for one of the queues of the cgroup. This is in
+	  nanoseconds. If this is read when the cgroup is in an empty state,
+	  the stat will only report the empty_time accumulated till the last
+	  time it had a pending request and will not include the current delta.
+
+- blkio.idle_time
+	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+	  This is the amount of time spent by the IO scheduler idling for a
+	  given cgroup in anticipation of a better request than the existing ones
+	  from other queues/cgroups. This is in nanoseconds. If this is read
+	  when the cgroup is in an idling state, the stat will only report the
+	  idle_time accumulated till the last idle period and will not include
+	  the current delta.
+
+- blkio.dequeue
+	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
+	  gives the statistics about how many a times a group was dequeued
+	  from service tree of the device. First two fields specify the major
+	  and minor number of the device and third field specifies the number
+	  of times a group was dequeued from a particular device.
+
+- blkio.*_recursive
+	- Recursive version of various stats. These files show the
+          same information as their non-recursive counterparts but
+          include stats from all the descendant cgroups.
+
+Throttling/Upper limit policy files
+-----------------------------------
+- blkio.throttle.read_bps_device
+	- Specifies upper limit on READ rate from the device. IO rate is
+	  specified in bytes per second. Rules are per device. Following is
+	  the format::
+
+	    echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
+
+- blkio.throttle.write_bps_device
+	- Specifies upper limit on WRITE rate to the device. IO rate is
+	  specified in bytes per second. Rules are per device. Following is
+	  the format::
+
+	    echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
+
+- blkio.throttle.read_iops_device
+	- Specifies upper limit on READ rate from the device. IO rate is
+	  specified in IO per second. Rules are per device. Following is
+	  the format::
+
+	   echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
+
+- blkio.throttle.write_iops_device
+	- Specifies upper limit on WRITE rate to the device. IO rate is
+	  specified in io per second. Rules are per device. Following is
+	  the format::
+
+	    echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
+
+Note: If both BW and IOPS rules are specified for a device, then IO is
+      subjected to both the constraints.
+
+- blkio.throttle.io_serviced
+	- Number of IOs (bio) issued to the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of IOs.
+
+- blkio.throttle.io_service_bytes
+	- Number of bytes transferred to/from the disk by the group. These
+	  are further divided by the type of operation - read or write, sync
+	  or async. First two fields specify the major and minor number of the
+	  device, third field specifies the operation type and the fourth field
+	  specifies the number of bytes.
+
+Common files among various policies
+-----------------------------------
+- blkio.reset_stats
+	- Writing an int to this file will result in resetting all the stats
+	  for that cgroup.
diff --git a/Documentation/admin-guide/cgroup-v1/cgroups.rst b/Documentation/admin-guide/cgroup-v1/cgroups.rst
new file mode 100644
index 000000000000..b0688011ed06
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/cgroups.rst
@@ -0,0 +1,695 @@
+==============
+Control Groups
+==============
+
+Written by Paul Menage <menage@google.com> based on
+Documentation/admin-guide/cgroup-v1/cpusets.rst
+
+Original copyright statements from cpusets.txt:
+
+Portions Copyright (C) 2004 BULL SA.
+
+Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
+
+Modified by Paul Jackson <pj@sgi.com>
+
+Modified by Christoph Lameter <cl@linux.com>
+
+.. CONTENTS:
+
+	1. Control Groups
+	1.1 What are cgroups ?
+	1.2 Why are cgroups needed ?
+	1.3 How are cgroups implemented ?
+	1.4 What does notify_on_release do ?
+	1.5 What does clone_children do ?
+	1.6 How do I use cgroups ?
+	2. Usage Examples and Syntax
+	2.1 Basic Usage
+	2.2 Attaching processes
+	2.3 Mounting hierarchies by name
+	3. Kernel API
+	3.1 Overview
+	3.2 Synchronization
+	3.3 Subsystem API
+	4. Extended attributes usage
+	5. Questions
+
+1. Control Groups
+=================
+
+1.1 What are cgroups ?
+----------------------
+
+Control Groups provide a mechanism for aggregating/partitioning sets of
+tasks, and all their future children, into hierarchical groups with
+specialized behaviour.
+
+Definitions:
+
+A *cgroup* associates a set of tasks with a set of parameters for one
+or more subsystems.
+
+A *subsystem* is a module that makes use of the task grouping
+facilities provided by cgroups to treat groups of tasks in
+particular ways. A subsystem is typically a "resource controller" that
+schedules a resource or applies per-cgroup limits, but it may be
+anything that wants to act on a group of processes, e.g. a
+virtualization subsystem.
+
+A *hierarchy* is a set of cgroups arranged in a tree, such that
+every task in the system is in exactly one of the cgroups in the
+hierarchy, and a set of subsystems; each subsystem has system-specific
+state attached to each cgroup in the hierarchy.  Each hierarchy has
+an instance of the cgroup virtual filesystem associated with it.
+
+At any one time there may be multiple active hierarchies of task
+cgroups. Each hierarchy is a partition of all tasks in the system.
+
+User-level code may create and destroy cgroups by name in an
+instance of the cgroup virtual file system, specify and query to
+which cgroup a task is assigned, and list the task PIDs assigned to
+a cgroup. Those creations and assignments only affect the hierarchy
+associated with that instance of the cgroup file system.
+
+On their own, the only use for cgroups is for simple job
+tracking. The intention is that other subsystems hook into the generic
+cgroup support to provide new attributes for cgroups, such as
+accounting/limiting the resources which processes in a cgroup can
+access. For example, cpusets (see Documentation/admin-guide/cgroup-v1/cpusets.rst) allow
+you to associate a set of CPUs and a set of memory nodes with the
+tasks in each cgroup.
+
+1.2 Why are cgroups needed ?
+----------------------------
+
+There are multiple efforts to provide process aggregations in the
+Linux kernel, mainly for resource-tracking purposes. Such efforts
+include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
+namespaces. These all require the basic notion of a
+grouping/partitioning of processes, with newly forked processes ending
+up in the same group (cgroup) as their parent process.
+
+The kernel cgroup patch provides the minimum essential kernel
+mechanisms required to efficiently implement such groups. It has
+minimal impact on the system fast paths, and provides hooks for
+specific subsystems such as cpusets to provide additional behaviour as
+desired.
+
+Multiple hierarchy support is provided to allow for situations where
+the division of tasks into cgroups is distinctly different for
+different subsystems - having parallel hierarchies allows each
+hierarchy to be a natural division of tasks, without having to handle
+complex combinations of tasks that would be present if several
+unrelated subsystems needed to be forced into the same tree of
+cgroups.
+
+At one extreme, each resource controller or subsystem could be in a
+separate hierarchy; at the other extreme, all subsystems
+would be attached to the same hierarchy.
+
+As an example of a scenario (originally proposed by vatsa@in.ibm.com)
+that can benefit from multiple hierarchies, consider a large
+university server with various users - students, professors, system
+tasks etc. The resource planning for this server could be along the
+following lines::
+
+       CPU :          "Top cpuset"
+                       /       \
+               CPUSet1         CPUSet2
+                  |               |
+               (Professors)    (Students)
+
+               In addition (system tasks) are attached to topcpuset (so
+               that they can run anywhere) with a limit of 20%
+
+       Memory : Professors (50%), Students (30%), system (20%)
+
+       Disk : Professors (50%), Students (30%), system (20%)
+
+       Network : WWW browsing (20%), Network File System (60%), others (20%)
+                               / \
+               Professors (15%)  students (5%)
+
+Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes
+into the NFS network class.
+
+At the same time Firefox/Lynx will share an appropriate CPU/Memory class
+depending on who launched it (prof/student).
+
+With the ability to classify tasks differently for different resources
+(by putting those resource subsystems in different hierarchies),
+the admin can easily set up a script which receives exec notifications
+and depending on who is launching the browser he can::
+
+    # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks
+
+With only a single hierarchy, he now would potentially have to create
+a separate cgroup for every browser launched and associate it with
+appropriate network and other resource class.  This may lead to
+proliferation of such cgroups.
+
+Also let's say that the administrator would like to give enhanced network
+access temporarily to a student's browser (since it is night and the user
+wants to do online gaming :))  OR give one of the student's simulation
+apps enhanced CPU power.
+
+With ability to write PIDs directly to resource classes, it's just a
+matter of::
+
+       # echo pid > /sys/fs/cgroup/network/<new_class>/tasks
+       (after some time)
+       # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
+
+Without this ability, the administrator would have to split the cgroup into
+multiple separate ones and then associate the new cgroups with the
+new resource classes.
+
+
+
+1.3 How are cgroups implemented ?
+---------------------------------
+
+Control Groups extends the kernel as follows:
+
+ - Each task in the system has a reference-counted pointer to a
+   css_set.
+
+ - A css_set contains a set of reference-counted pointers to
+   cgroup_subsys_state objects, one for each cgroup subsystem
+   registered in the system. There is no direct link from a task to
+   the cgroup of which it's a member in each hierarchy, but this
+   can be determined by following pointers through the
+   cgroup_subsys_state objects. This is because accessing the
+   subsystem state is something that's expected to happen frequently
+   and in performance-critical code, whereas operations that require a
+   task's actual cgroup assignments (in particular, moving between
+   cgroups) are less common. A linked list runs through the cg_list
+   field of each task_struct using the css_set, anchored at
+   css_set->tasks.
+
+ - A cgroup hierarchy filesystem can be mounted for browsing and
+   manipulation from user space.
+
+ - You can list all the tasks (by PID) attached to any cgroup.
+
+The implementation of cgroups requires a few, simple hooks
+into the rest of the kernel, none in performance-critical paths:
+
+ - in init/main.c, to initialize the root cgroups and initial
+   css_set at system boot.
+
+ - in fork and exit, to attach and detach a task from its css_set.
+
+In addition, a new file system of type "cgroup" may be mounted, to
+enable browsing and modifying the cgroups presently known to the
+kernel.  When mounting a cgroup hierarchy, you may specify a
+comma-separated list of subsystems to mount as the filesystem mount
+options.  By default, mounting the cgroup filesystem attempts to
+mount a hierarchy containing all registered subsystems.
+
+If an active hierarchy with exactly the same set of subsystems already
+exists, it will be reused for the new mount. If no existing hierarchy
+matches, and any of the requested subsystems are in use in an existing
+hierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy
+is activated, associated with the requested subsystems.
+
+It's not currently possible to bind a new subsystem to an active
+cgroup hierarchy, or to unbind a subsystem from an active cgroup
+hierarchy. This may be possible in future, but is fraught with nasty
+error-recovery issues.
+
+When a cgroup filesystem is unmounted, if there are any
+child cgroups created below the top-level cgroup, that hierarchy
+will remain active even though unmounted; if there are no
+child cgroups then the hierarchy will be deactivated.
+
+No new system calls are added for cgroups - all support for
+querying and modifying cgroups is via this cgroup file system.
+
+Each task under /proc has an added file named 'cgroup' displaying,
+for each active hierarchy, the subsystem names and the cgroup name
+as the path relative to the root of the cgroup file system.
+
+Each cgroup is represented by a directory in the cgroup file system
+containing the following files describing that cgroup:
+
+ - tasks: list of tasks (by PID) attached to that cgroup.  This list
+   is not guaranteed to be sorted.  Writing a thread ID into this file
+   moves the thread into this cgroup.
+ - cgroup.procs: list of thread group IDs in the cgroup.  This list is
+   not guaranteed to be sorted or free of duplicate TGIDs, and userspace
+   should sort/uniquify the list if this property is required.
+   Writing a thread group ID into this file moves all threads in that
+   group into this cgroup.
+ - notify_on_release flag: run the release agent on exit?
+ - release_agent: the path to use for release notifications (this file
+   exists in the top cgroup only)
+
+Other subsystems such as cpusets may add additional files in each
+cgroup dir.
+
+New cgroups are created using the mkdir system call or shell
+command.  The properties of a cgroup, such as its flags, are
+modified by writing to the appropriate file in that cgroups
+directory, as listed above.
+
+The named hierarchical structure of nested cgroups allows partitioning
+a large system into nested, dynamically changeable, "soft-partitions".
+
+The attachment of each task, automatically inherited at fork by any
+children of that task, to a cgroup allows organizing the work load
+on a system into related sets of tasks.  A task may be re-attached to
+any other cgroup, if allowed by the permissions on the necessary
+cgroup file system directories.
+
+When a task is moved from one cgroup to another, it gets a new
+css_set pointer - if there's an already existing css_set with the
+desired collection of cgroups then that group is reused, otherwise a new
+css_set is allocated. The appropriate existing css_set is located by
+looking into a hash table.
+
+To allow access from a cgroup to the css_sets (and hence tasks)
+that comprise it, a set of cg_cgroup_link objects form a lattice;
+each cg_cgroup_link is linked into a list of cg_cgroup_links for
+a single cgroup on its cgrp_link_list field, and a list of
+cg_cgroup_links for a single css_set on its cg_link_list.
+
+Thus the set of tasks in a cgroup can be listed by iterating over
+each css_set that references the cgroup, and sub-iterating over
+each css_set's task set.
+
+The use of a Linux virtual file system (vfs) to represent the
+cgroup hierarchy provides for a familiar permission and name space
+for cgroups, with a minimum of additional kernel code.
+
+1.4 What does notify_on_release do ?
+------------------------------------
+
+If the notify_on_release flag is enabled (1) in a cgroup, then
+whenever the last task in the cgroup leaves (exits or attaches to
+some other cgroup) and the last child cgroup of that cgroup
+is removed, then the kernel runs the command specified by the contents
+of the "release_agent" file in that hierarchy's root directory,
+supplying the pathname (relative to the mount point of the cgroup
+file system) of the abandoned cgroup.  This enables automatic
+removal of abandoned cgroups.  The default value of
+notify_on_release in the root cgroup at system boot is disabled
+(0).  The default value of other cgroups at creation is the current
+value of their parents' notify_on_release settings. The default value of
+a cgroup hierarchy's release_agent path is empty.
+
+1.5 What does clone_children do ?
+---------------------------------
+
+This flag only affects the cpuset controller. If the clone_children
+flag is enabled (1) in a cgroup, a new cpuset cgroup will copy its
+configuration from the parent during initialization.
+
+1.6 How do I use cgroups ?
+--------------------------
+
+To start a new job that is to be contained within a cgroup, using
+the "cpuset" cgroup subsystem, the steps are something like::
+
+ 1) mount -t tmpfs cgroup_root /sys/fs/cgroup
+ 2) mkdir /sys/fs/cgroup/cpuset
+ 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
+ 4) Create the new cgroup by doing mkdir's and write's (or echo's) in
+    the /sys/fs/cgroup/cpuset virtual file system.
+ 5) Start a task that will be the "founding father" of the new job.
+ 6) Attach that task to the new cgroup by writing its PID to the
+    /sys/fs/cgroup/cpuset tasks file for that cgroup.
+ 7) fork, exec or clone the job tasks from this founding father task.
+
+For example, the following sequence of commands will setup a cgroup
+named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
+and then start a subshell 'sh' in that cgroup::
+
+  mount -t tmpfs cgroup_root /sys/fs/cgroup
+  mkdir /sys/fs/cgroup/cpuset
+  mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
+  cd /sys/fs/cgroup/cpuset
+  mkdir Charlie
+  cd Charlie
+  /bin/echo 2-3 > cpuset.cpus
+  /bin/echo 1 > cpuset.mems
+  /bin/echo $$ > tasks
+  sh
+  # The subshell 'sh' is now running in cgroup Charlie
+  # The next line should display '/Charlie'
+  cat /proc/self/cgroup
+
+2. Usage Examples and Syntax
+============================
+
+2.1 Basic Usage
+---------------
+
+Creating, modifying, using cgroups can be done through the cgroup
+virtual filesystem.
+
+To mount a cgroup hierarchy with all available subsystems, type::
+
+  # mount -t cgroup xxx /sys/fs/cgroup
+
+The "xxx" is not interpreted by the cgroup code, but will appear in
+/proc/mounts so may be any useful identifying string that you like.
+
+Note: Some subsystems do not work without some user input first.  For instance,
+if cpusets are enabled the user will have to populate the cpus and mems files
+for each new cgroup created before that group can be used.
+
+As explained in section `1.2 Why are cgroups needed?` you should create
+different hierarchies of cgroups for each single resource or group of
+resources you want to control. Therefore, you should mount a tmpfs on
+/sys/fs/cgroup and create directories for each cgroup resource or resource
+group::
+
+  # mount -t tmpfs cgroup_root /sys/fs/cgroup
+  # mkdir /sys/fs/cgroup/rg1
+
+To mount a cgroup hierarchy with just the cpuset and memory
+subsystems, type::
+
+  # mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1
+
+While remounting cgroups is currently supported, it is not recommend
+to use it. Remounting allows changing bound subsystems and
+release_agent. Rebinding is hardly useful as it only works when the
+hierarchy is empty and release_agent itself should be replaced with
+conventional fsnotify. The support for remounting will be removed in
+the future.
+
+To Specify a hierarchy's release_agent::
+
+  # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
+    xxx /sys/fs/cgroup/rg1
+
+Note that specifying 'release_agent' more than once will return failure.
+
+Note that changing the set of subsystems is currently only supported
+when the hierarchy consists of a single (root) cgroup. Supporting
+the ability to arbitrarily bind/unbind subsystems from an existing
+cgroup hierarchy is intended to be implemented in the future.
+
+Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the
+tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1
+is the cgroup that holds the whole system.
+
+If you want to change the value of release_agent::
+
+  # echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent
+
+It can also be changed via remount.
+
+If you want to create a new cgroup under /sys/fs/cgroup/rg1::
+
+  # cd /sys/fs/cgroup/rg1
+  # mkdir my_cgroup
+
+Now you want to do something with this cgroup:
+
+  # cd my_cgroup
+
+In this directory you can find several files::
+
+  # ls
+  cgroup.procs notify_on_release tasks
+  (plus whatever files added by the attached subsystems)
+
+Now attach your shell to this cgroup::
+
+  # /bin/echo $$ > tasks
+
+You can also create cgroups inside your cgroup by using mkdir in this
+directory::
+
+  # mkdir my_sub_cs
+
+To remove a cgroup, just use rmdir::
+
+  # rmdir my_sub_cs
+
+This will fail if the cgroup is in use (has cgroups inside, or
+has processes attached, or is held alive by other subsystem-specific
+reference).
+
+2.2 Attaching processes
+-----------------------
+
+::
+
+  # /bin/echo PID > tasks
+
+Note that it is PID, not PIDs. You can only attach ONE task at a time.
+If you have several tasks to attach, you have to do it one after another::
+
+  # /bin/echo PID1 > tasks
+  # /bin/echo PID2 > tasks
+	  ...
+  # /bin/echo PIDn > tasks
+
+You can attach the current shell task by echoing 0::
+
+  # echo 0 > tasks
+
+You can use the cgroup.procs file instead of the tasks file to move all
+threads in a threadgroup at once. Echoing the PID of any task in a
+threadgroup to cgroup.procs causes all tasks in that threadgroup to be
+attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
+in the writing task's threadgroup.
+
+Note: Since every task is always a member of exactly one cgroup in each
+mounted hierarchy, to remove a task from its current cgroup you must
+move it into a new cgroup (possibly the root cgroup) by writing to the
+new cgroup's tasks file.
+
+Note: Due to some restrictions enforced by some cgroup subsystems, moving
+a process to another cgroup can fail.
+
+2.3 Mounting hierarchies by name
+--------------------------------
+
+Passing the name=<x> option when mounting a cgroups hierarchy
+associates the given name with the hierarchy.  This can be used when
+mounting a pre-existing hierarchy, in order to refer to it by name
+rather than by its set of active subsystems.  Each hierarchy is either
+nameless, or has a unique name.
+
+The name should match [\w.-]+
+
+When passing a name=<x> option for a new hierarchy, you need to
+specify subsystems manually; the legacy behaviour of mounting all
+subsystems when none are explicitly specified is not supported when
+you give a subsystem a name.
+
+The name of the subsystem appears as part of the hierarchy description
+in /proc/mounts and /proc/<pid>/cgroups.
+
+
+3. Kernel API
+=============
+
+3.1 Overview
+------------
+
+Each kernel subsystem that wants to hook into the generic cgroup
+system needs to create a cgroup_subsys object. This contains
+various methods, which are callbacks from the cgroup system, along
+with a subsystem ID which will be assigned by the cgroup system.
+
+Other fields in the cgroup_subsys object include:
+
+- subsys_id: a unique array index for the subsystem, indicating which
+  entry in cgroup->subsys[] this subsystem should be managing.
+
+- name: should be initialized to a unique subsystem name. Should be
+  no longer than MAX_CGROUP_TYPE_NAMELEN.
+
+- early_init: indicate if the subsystem needs early initialization
+  at system boot.
+
+Each cgroup object created by the system has an array of pointers,
+indexed by subsystem ID; this pointer is entirely managed by the
+subsystem; the generic cgroup code will never touch this pointer.
+
+3.2 Synchronization
+-------------------
+
+There is a global mutex, cgroup_mutex, used by the cgroup
+system. This should be taken by anything that wants to modify a
+cgroup. It may also be taken to prevent cgroups from being
+modified, but more specific locks may be more appropriate in that
+situation.
+
+See kernel/cgroup.c for more details.
+
+Subsystems can take/release the cgroup_mutex via the functions
+cgroup_lock()/cgroup_unlock().
+
+Accessing a task's cgroup pointer may be done in the following ways:
+- while holding cgroup_mutex
+- while holding the task's alloc_lock (via task_lock())
+- inside an rcu_read_lock() section via rcu_dereference()
+
+3.3 Subsystem API
+-----------------
+
+Each subsystem should:
+
+- add an entry in linux/cgroup_subsys.h
+- define a cgroup_subsys object called <name>_cgrp_subsys
+
+Each subsystem may export the following methods. The only mandatory
+methods are css_alloc/free. Any others that are null are presumed to
+be successful no-ops.
+
+``struct cgroup_subsys_state *css_alloc(struct cgroup *cgrp)``
+(cgroup_mutex held by caller)
+
+Called to allocate a subsystem state object for a cgroup. The
+subsystem should allocate its subsystem state object for the passed
+cgroup, returning a pointer to the new object on success or a
+ERR_PTR() value. On success, the subsystem pointer should point to
+a structure of type cgroup_subsys_state (typically embedded in a
+larger subsystem-specific object), which will be initialized by the
+cgroup system. Note that this will be called at initialization to
+create the root subsystem state for this subsystem; this case can be
+identified by the passed cgroup object having a NULL parent (since
+it's the root of the hierarchy) and may be an appropriate place for
+initialization code.
+
+``int css_online(struct cgroup *cgrp)``
+(cgroup_mutex held by caller)
+
+Called after @cgrp successfully completed all allocations and made
+visible to cgroup_for_each_child/descendant_*() iterators. The
+subsystem may choose to fail creation by returning -errno. This
+callback can be used to implement reliable state sharing and
+propagation along the hierarchy. See the comment on
+cgroup_for_each_descendant_pre() for details.
+
+``void css_offline(struct cgroup *cgrp);``
+(cgroup_mutex held by caller)
+
+This is the counterpart of css_online() and called iff css_online()
+has succeeded on @cgrp. This signifies the beginning of the end of
+@cgrp. @cgrp is being removed and the subsystem should start dropping
+all references it's holding on @cgrp. When all references are dropped,
+cgroup removal will proceed to the next step - css_free(). After this
+callback, @cgrp should be considered dead to the subsystem.
+
+``void css_free(struct cgroup *cgrp)``
+(cgroup_mutex held by caller)
+
+The cgroup system is about to free @cgrp; the subsystem should free
+its subsystem state object. By the time this method is called, @cgrp
+is completely unused; @cgrp->parent is still valid. (Note - can also
+be called for a newly-created cgroup if an error occurs after this
+subsystem's create() method has been called for the new cgroup).
+
+``int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
+(cgroup_mutex held by caller)
+
+Called prior to moving one or more tasks into a cgroup; if the
+subsystem returns an error, this will abort the attach operation.
+@tset contains the tasks to be attached and is guaranteed to have at
+least one task in it.
+
+If there are multiple tasks in the taskset, then:
+  - it's guaranteed that all are from the same thread group
+  - @tset contains all tasks from the thread group whether or not
+    they're switching cgroups
+  - the first task is the leader
+
+Each @tset entry also contains the task's old cgroup and tasks which
+aren't switching cgroup can be skipped easily using the
+cgroup_taskset_for_each() iterator. Note that this isn't called on a
+fork. If this method returns 0 (success) then this should remain valid
+while the caller holds cgroup_mutex and it is ensured that either
+attach() or cancel_attach() will be called in future.
+
+``void css_reset(struct cgroup_subsys_state *css)``
+(cgroup_mutex held by caller)
+
+An optional operation which should restore @css's configuration to the
+initial state.  This is currently only used on the unified hierarchy
+when a subsystem is disabled on a cgroup through
+"cgroup.subtree_control" but should remain enabled because other
+subsystems depend on it.  cgroup core makes such a css invisible by
+removing the associated interface files and invokes this callback so
+that the hidden subsystem can return to the initial neutral state.
+This prevents unexpected resource control from a hidden css and
+ensures that the configuration is in the initial state when it is made
+visible again later.
+
+``void cancel_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
+(cgroup_mutex held by caller)
+
+Called when a task attach operation has failed after can_attach() has succeeded.
+A subsystem whose can_attach() has some side-effects should provide this
+function, so that the subsystem can implement a rollback. If not, not necessary.
+This will be called only about subsystems whose can_attach() operation have
+succeeded. The parameters are identical to can_attach().
+
+``void attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
+(cgroup_mutex held by caller)
+
+Called after the task has been attached to the cgroup, to allow any
+post-attachment activity that requires memory allocations or blocking.
+The parameters are identical to can_attach().
+
+``void fork(struct task_struct *task)``
+
+Called when a task is forked into a cgroup.
+
+``void exit(struct task_struct *task)``
+
+Called during task exit.
+
+``void free(struct task_struct *task)``
+
+Called when the task_struct is freed.
+
+``void bind(struct cgroup *root)``
+(cgroup_mutex held by caller)
+
+Called when a cgroup subsystem is rebound to a different hierarchy
+and root cgroup. Currently this will only involve movement between
+the default hierarchy (which never has sub-cgroups) and a hierarchy
+that is being created/destroyed (and hence has no sub-cgroups).
+
+4. Extended attribute usage
+===========================
+
+cgroup filesystem supports certain types of extended attributes in its
+directories and files.  The current supported types are:
+
+	- Trusted (XATTR_TRUSTED)
+	- Security (XATTR_SECURITY)
+
+Both require CAP_SYS_ADMIN capability to set.
+
+Like in tmpfs, the extended attributes in cgroup filesystem are stored
+using kernel memory and it's advised to keep the usage at minimum.  This
+is the reason why user defined extended attributes are not supported, since
+any user can do it and there's no limit in the value size.
+
+The current known users for this feature are SELinux to limit cgroup usage
+in containers and systemd for assorted meta data like main PID in a cgroup
+(systemd creates a cgroup per service).
+
+5. Questions
+============
+
+::
+
+  Q: what's up with this '/bin/echo' ?
+  A: bash's builtin 'echo' command does not check calls to write() against
+     errors. If you use it in the cgroup file system, you won't be
+     able to tell whether a command succeeded or failed.
+
+  Q: When I attach processes, only the first of the line gets really attached !
+  A: We can only return one error code per call to write(). So you should also
+     put only ONE PID.
diff --git a/Documentation/admin-guide/cgroup-v1/cpuacct.rst b/Documentation/admin-guide/cgroup-v1/cpuacct.rst
new file mode 100644
index 000000000000..d30ed81d2ad7
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/cpuacct.rst
@@ -0,0 +1,50 @@
+=========================
+CPU Accounting Controller
+=========================
+
+The CPU accounting controller is used to group tasks using cgroups and
+account the CPU usage of these groups of tasks.
+
+The CPU accounting controller supports multi-hierarchy groups. An accounting
+group accumulates the CPU usage of all of its child groups and the tasks
+directly present in its group.
+
+Accounting groups can be created by first mounting the cgroup filesystem::
+
+  # mount -t cgroup -ocpuacct none /sys/fs/cgroup
+
+With the above step, the initial or the parent accounting group becomes
+visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
+the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
+/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained
+by this group which is essentially the CPU time obtained by all the tasks
+in the system.
+
+New accounting groups can be created under the parent group /sys/fs/cgroup::
+
+  # cd /sys/fs/cgroup
+  # mkdir g1
+  # echo $$ > g1/tasks
+
+The above steps create a new group g1 and move the current shell
+process (bash) into it. CPU time consumed by this bash and its children
+can be obtained from g1/cpuacct.usage and the same is accumulated in
+/sys/fs/cgroup/cpuacct.usage also.
+
+cpuacct.stat file lists a few statistics which further divide the
+CPU time obtained by the cgroup into user and system times. Currently
+the following statistics are supported:
+
+user: Time spent by tasks of the cgroup in user mode.
+system: Time spent by tasks of the cgroup in kernel mode.
+
+user and system are in USER_HZ unit.
+
+cpuacct controller uses percpu_counter interface to collect user and
+system times. This has two side effects:
+
+- It is theoretically possible to see wrong values for user and system times.
+  This is because percpu_counter_read() on 32bit systems isn't safe
+  against concurrent writes.
+- It is possible to see slightly outdated values for user and system times
+  due to the batch processing nature of percpu_counter.
diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
new file mode 100644
index 000000000000..86a6ae995d54
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
@@ -0,0 +1,866 @@
+=======
+CPUSETS
+=======
+
+Copyright (C) 2004 BULL SA.
+
+Written by Simon.Derr@bull.net
+
+- Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
+- Modified by Paul Jackson <pj@sgi.com>
+- Modified by Christoph Lameter <cl@linux.com>
+- Modified by Paul Menage <menage@google.com>
+- Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
+
+.. CONTENTS:
+
+   1. Cpusets
+     1.1 What are cpusets ?
+     1.2 Why are cpusets needed ?
+     1.3 How are cpusets implemented ?
+     1.4 What are exclusive cpusets ?
+     1.5 What is memory_pressure ?
+     1.6 What is memory spread ?
+     1.7 What is sched_load_balance ?
+     1.8 What is sched_relax_domain_level ?
+     1.9 How do I use cpusets ?
+   2. Usage Examples and Syntax
+     2.1 Basic Usage
+     2.2 Adding/removing cpus
+     2.3 Setting flags
+     2.4 Attaching processes
+   3. Questions
+   4. Contact
+
+1. Cpusets
+==========
+
+1.1 What are cpusets ?
+----------------------
+
+Cpusets provide a mechanism for assigning a set of CPUs and Memory
+Nodes to a set of tasks.   In this document "Memory Node" refers to
+an on-line node that contains memory.
+
+Cpusets constrain the CPU and Memory placement of tasks to only
+the resources within a task's current cpuset.  They form a nested
+hierarchy visible in a virtual file system.  These are the essential
+hooks, beyond what is already present, required to manage dynamic
+job placement on large systems.
+
+Cpusets use the generic cgroup subsystem described in
+Documentation/admin-guide/cgroup-v1/cgroups.rst.
+
+Requests by a task, using the sched_setaffinity(2) system call to
+include CPUs in its CPU affinity mask, and using the mbind(2) and
+set_mempolicy(2) system calls to include Memory Nodes in its memory
+policy, are both filtered through that task's cpuset, filtering out any
+CPUs or Memory Nodes not in that cpuset.  The scheduler will not
+schedule a task on a CPU that is not allowed in its cpus_allowed
+vector, and the kernel page allocator will not allocate a page on a
+node that is not allowed in the requesting task's mems_allowed vector.
+
+User level code may create and destroy cpusets by name in the cgroup
+virtual file system, manage the attributes and permissions of these
+cpusets and which CPUs and Memory Nodes are assigned to each cpuset,
+specify and query to which cpuset a task is assigned, and list the
+task pids assigned to a cpuset.
+
+
+1.2 Why are cpusets needed ?
+----------------------------
+
+The management of large computer systems, with many processors (CPUs),
+complex memory cache hierarchies and multiple Memory Nodes having
+non-uniform access times (NUMA) presents additional challenges for
+the efficient scheduling and memory placement of processes.
+
+Frequently more modest sized systems can be operated with adequate
+efficiency just by letting the operating system automatically share
+the available CPU and Memory resources amongst the requesting tasks.
+
+But larger systems, which benefit more from careful processor and
+memory placement to reduce memory access times and contention,
+and which typically represent a larger investment for the customer,
+can benefit from explicitly placing jobs on properly sized subsets of
+the system.
+
+This can be especially valuable on:
+
+    * Web Servers running multiple instances of the same web application,
+    * Servers running different applications (for instance, a web server
+      and a database), or
+    * NUMA systems running large HPC applications with demanding
+      performance characteristics.
+
+These subsets, or "soft partitions" must be able to be dynamically
+adjusted, as the job mix changes, without impacting other concurrently
+executing jobs. The location of the running jobs pages may also be moved
+when the memory locations are changed.
+
+The kernel cpuset patch provides the minimum essential kernel
+mechanisms required to efficiently implement such subsets.  It
+leverages existing CPU and Memory Placement facilities in the Linux
+kernel to avoid any additional impact on the critical scheduler or
+memory allocator code.
+
+
+1.3 How are cpusets implemented ?
+---------------------------------
+
+Cpusets provide a Linux kernel mechanism to constrain which CPUs and
+Memory Nodes are used by a process or set of processes.
+
+The Linux kernel already has a pair of mechanisms to specify on which
+CPUs a task may be scheduled (sched_setaffinity) and on which Memory
+Nodes it may obtain memory (mbind, set_mempolicy).
+
+Cpusets extends these two mechanisms as follows:
+
+ - Cpusets are sets of allowed CPUs and Memory Nodes, known to the
+   kernel.
+ - Each task in the system is attached to a cpuset, via a pointer
+   in the task structure to a reference counted cgroup structure.
+ - Calls to sched_setaffinity are filtered to just those CPUs
+   allowed in that task's cpuset.
+ - Calls to mbind and set_mempolicy are filtered to just
+   those Memory Nodes allowed in that task's cpuset.
+ - The root cpuset contains all the systems CPUs and Memory
+   Nodes.
+ - For any cpuset, one can define child cpusets containing a subset
+   of the parents CPU and Memory Node resources.
+ - The hierarchy of cpusets can be mounted at /dev/cpuset, for
+   browsing and manipulation from user space.
+ - A cpuset may be marked exclusive, which ensures that no other
+   cpuset (except direct ancestors and descendants) may contain
+   any overlapping CPUs or Memory Nodes.
+ - You can list all the tasks (by pid) attached to any cpuset.
+
+The implementation of cpusets requires a few, simple hooks
+into the rest of the kernel, none in performance critical paths:
+
+ - in init/main.c, to initialize the root cpuset at system boot.
+ - in fork and exit, to attach and detach a task from its cpuset.
+ - in sched_setaffinity, to mask the requested CPUs by what's
+   allowed in that task's cpuset.
+ - in sched.c migrate_live_tasks(), to keep migrating tasks within
+   the CPUs allowed by their cpuset, if possible.
+ - in the mbind and set_mempolicy system calls, to mask the requested
+   Memory Nodes by what's allowed in that task's cpuset.
+ - in page_alloc.c, to restrict memory to allowed nodes.
+ - in vmscan.c, to restrict page recovery to the current cpuset.
+
+You should mount the "cgroup" filesystem type in order to enable
+browsing and modifying the cpusets presently known to the kernel.  No
+new system calls are added for cpusets - all support for querying and
+modifying cpusets is via this cpuset file system.
+
+The /proc/<pid>/status file for each task has four added lines,
+displaying the task's cpus_allowed (on which CPUs it may be scheduled)
+and mems_allowed (on which Memory Nodes it may obtain memory),
+in the two formats seen in the following example::
+
+  Cpus_allowed:   ffffffff,ffffffff,ffffffff,ffffffff
+  Cpus_allowed_list:      0-127
+  Mems_allowed:   ffffffff,ffffffff
+  Mems_allowed_list:      0-63
+
+Each cpuset is represented by a directory in the cgroup file system
+containing (on top of the standard cgroup files) the following
+files describing that cpuset:
+
+ - cpuset.cpus: list of CPUs in that cpuset
+ - cpuset.mems: list of Memory Nodes in that cpuset
+ - cpuset.memory_migrate flag: if set, move pages to cpusets nodes
+ - cpuset.cpu_exclusive flag: is cpu placement exclusive?
+ - cpuset.mem_exclusive flag: is memory placement exclusive?
+ - cpuset.mem_hardwall flag:  is memory allocation hardwalled
+ - cpuset.memory_pressure: measure of how much paging pressure in cpuset
+ - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes
+ - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
+ - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset
+ - cpuset.sched_relax_domain_level: the searching range when migrating tasks
+
+In addition, only the root cpuset has the following file:
+
+ - cpuset.memory_pressure_enabled flag: compute memory_pressure?
+
+New cpusets are created using the mkdir system call or shell
+command.  The properties of a cpuset, such as its flags, allowed
+CPUs and Memory Nodes, and attached tasks, are modified by writing
+to the appropriate file in that cpusets directory, as listed above.
+
+The named hierarchical structure of nested cpusets allows partitioning
+a large system into nested, dynamically changeable, "soft-partitions".
+
+The attachment of each task, automatically inherited at fork by any
+children of that task, to a cpuset allows organizing the work load
+on a system into related sets of tasks such that each set is constrained
+to using the CPUs and Memory Nodes of a particular cpuset.  A task
+may be re-attached to any other cpuset, if allowed by the permissions
+on the necessary cpuset file system directories.
+
+Such management of a system "in the large" integrates smoothly with
+the detailed placement done on individual tasks and memory regions
+using the sched_setaffinity, mbind and set_mempolicy system calls.
+
+The following rules apply to each cpuset:
+
+ - Its CPUs and Memory Nodes must be a subset of its parents.
+ - It can't be marked exclusive unless its parent is.
+ - If its cpu or memory is exclusive, they may not overlap any sibling.
+
+These rules, and the natural hierarchy of cpusets, enable efficient
+enforcement of the exclusive guarantee, without having to scan all
+cpusets every time any of them change to ensure nothing overlaps a
+exclusive cpuset.  Also, the use of a Linux virtual file system (vfs)
+to represent the cpuset hierarchy provides for a familiar permission
+and name space for cpusets, with a minimum of additional kernel code.
+
+The cpus and mems files in the root (top_cpuset) cpuset are
+read-only.  The cpus file automatically tracks the value of
+cpu_online_mask using a CPU hotplug notifier, and the mems file
+automatically tracks the value of node_states[N_MEMORY]--i.e.,
+nodes with memory--using the cpuset_track_online_nodes() hook.
+
+
+1.4 What are exclusive cpusets ?
+--------------------------------
+
+If a cpuset is cpu or mem exclusive, no other cpuset, other than
+a direct ancestor or descendant, may share any of the same CPUs or
+Memory Nodes.
+
+A cpuset that is cpuset.mem_exclusive *or* cpuset.mem_hardwall is "hardwalled",
+i.e. it restricts kernel allocations for page, buffer and other data
+commonly shared by the kernel across multiple users.  All cpusets,
+whether hardwalled or not, restrict allocations of memory for user
+space.  This enables configuring a system so that several independent
+jobs can share common kernel data, such as file system pages, while
+isolating each job's user allocation in its own cpuset.  To do this,
+construct a large mem_exclusive cpuset to hold all the jobs, and
+construct child, non-mem_exclusive cpusets for each individual job.
+Only a small amount of typical kernel memory, such as requests from
+interrupt handlers, is allowed to be taken outside even a
+mem_exclusive cpuset.
+
+
+1.5 What is memory_pressure ?
+-----------------------------
+The memory_pressure of a cpuset provides a simple per-cpuset metric
+of the rate that the tasks in a cpuset are attempting to free up in
+use memory on the nodes of the cpuset to satisfy additional memory
+requests.
+
+This enables batch managers monitoring jobs running in dedicated
+cpusets to efficiently detect what level of memory pressure that job
+is causing.
+
+This is useful both on tightly managed systems running a wide mix of
+submitted jobs, which may choose to terminate or re-prioritize jobs that
+are trying to use more memory than allowed on the nodes assigned to them,
+and with tightly coupled, long running, massively parallel scientific
+computing jobs that will dramatically fail to meet required performance
+goals if they start to use more memory than allowed to them.
+
+This mechanism provides a very economical way for the batch manager
+to monitor a cpuset for signs of memory pressure.  It's up to the
+batch manager or other user code to decide what to do about it and
+take action.
+
+==>
+    Unless this feature is enabled by writing "1" to the special file
+    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
+    code of __alloc_pages() for this metric reduces to simply noticing
+    that the cpuset_memory_pressure_enabled flag is zero.  So only
+    systems that enable this feature will compute the metric.
+
+Why a per-cpuset, running average:
+
+    Because this meter is per-cpuset, rather than per-task or mm,
+    the system load imposed by a batch scheduler monitoring this
+    metric is sharply reduced on large systems, because a scan of
+    the tasklist can be avoided on each set of queries.
+
+    Because this meter is a running average, instead of an accumulating
+    counter, a batch scheduler can detect memory pressure with a
+    single read, instead of having to read and accumulate results
+    for a period of time.
+
+    Because this meter is per-cpuset rather than per-task or mm,
+    the batch scheduler can obtain the key information, memory
+    pressure in a cpuset, with a single read, rather than having to
+    query and accumulate results over all the (dynamically changing)
+    set of tasks in the cpuset.
+
+A per-cpuset simple digital filter (requires a spinlock and 3 words
+of data per-cpuset) is kept, and updated by any task attached to that
+cpuset, if it enters the synchronous (direct) page reclaim code.
+
+A per-cpuset file provides an integer number representing the recent
+(half-life of 10 seconds) rate of direct page reclaims caused by
+the tasks in the cpuset, in units of reclaims attempted per second,
+times 1000.
+
+
+1.6 What is memory spread ?
+---------------------------
+There are two boolean flag files per cpuset that control where the
+kernel allocates pages for the file system buffers and related in
+kernel data structures.  They are called 'cpuset.memory_spread_page' and
+'cpuset.memory_spread_slab'.
+
+If the per-cpuset boolean flag file 'cpuset.memory_spread_page' is set, then
+the kernel will spread the file system buffers (page cache) evenly
+over all the nodes that the faulting task is allowed to use, instead
+of preferring to put those pages on the node where the task is running.
+
+If the per-cpuset boolean flag file 'cpuset.memory_spread_slab' is set,
+then the kernel will spread some file system related slab caches,
+such as for inodes and dentries evenly over all the nodes that the
+faulting task is allowed to use, instead of preferring to put those
+pages on the node where the task is running.
+
+The setting of these flags does not affect anonymous data segment or
+stack segment pages of a task.
+
+By default, both kinds of memory spreading are off, and memory
+pages are allocated on the node local to where the task is running,
+except perhaps as modified by the task's NUMA mempolicy or cpuset
+configuration, so long as sufficient free memory pages are available.
+
+When new cpusets are created, they inherit the memory spread settings
+of their parent.
+
+Setting memory spreading causes allocations for the affected page
+or slab caches to ignore the task's NUMA mempolicy and be spread
+instead.    Tasks using mbind() or set_mempolicy() calls to set NUMA
+mempolicies will not notice any change in these calls as a result of
+their containing task's memory spread settings.  If memory spreading
+is turned off, then the currently specified NUMA mempolicy once again
+applies to memory page allocations.
+
+Both 'cpuset.memory_spread_page' and 'cpuset.memory_spread_slab' are boolean flag
+files.  By default they contain "0", meaning that the feature is off
+for that cpuset.  If a "1" is written to that file, then that turns
+the named feature on.
+
+The implementation is simple.
+
+Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
+PFA_SPREAD_PAGE for each task that is in that cpuset or subsequently
+joins that cpuset.  The page allocation calls for the page cache
+is modified to perform an inline check for this PFA_SPREAD_PAGE task
+flag, and if set, a call to a new routine cpuset_mem_spread_node()
+returns the node to prefer for the allocation.
+
+Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
+PFA_SPREAD_SLAB, and appropriately marked slab caches will allocate
+pages from the node returned by cpuset_mem_spread_node().
+
+The cpuset_mem_spread_node() routine is also simple.  It uses the
+value of a per-task rotor cpuset_mem_spread_rotor to select the next
+node in the current task's mems_allowed to prefer for the allocation.
+
+This memory placement policy is also known (in other contexts) as
+round-robin or interleave.
+
+This policy can provide substantial improvements for jobs that need
+to place thread local data on the corresponding node, but that need
+to access large file system data sets that need to be spread across
+the several nodes in the jobs cpuset in order to fit.  Without this
+policy, especially for jobs that might have one thread reading in the
+data set, the memory allocation across the nodes in the jobs cpuset
+can become very uneven.
+
+1.7 What is sched_load_balance ?
+--------------------------------
+
+The kernel scheduler (kernel/sched/core.c) automatically load balances
+tasks.  If one CPU is underutilized, kernel code running on that
+CPU will look for tasks on other more overloaded CPUs and move those
+tasks to itself, within the constraints of such placement mechanisms
+as cpusets and sched_setaffinity.
+
+The algorithmic cost of load balancing and its impact on key shared
+kernel data structures such as the task list increases more than
+linearly with the number of CPUs being balanced.  So the scheduler
+has support to partition the systems CPUs into a number of sched
+domains such that it only load balances within each sched domain.
+Each sched domain covers some subset of the CPUs in the system;
+no two sched domains overlap; some CPUs might not be in any sched
+domain and hence won't be load balanced.
+
+Put simply, it costs less to balance between two smaller sched domains
+than one big one, but doing so means that overloads in one of the
+two domains won't be load balanced to the other one.
+
+By default, there is one sched domain covering all CPUs, including those
+marked isolated using the kernel boot time "isolcpus=" argument. However,
+the isolated CPUs will not participate in load balancing, and will not
+have tasks running on them unless explicitly assigned.
+
+This default load balancing across all CPUs is not well suited for
+the following two situations:
+
+ 1) On large systems, load balancing across many CPUs is expensive.
+    If the system is managed using cpusets to place independent jobs
+    on separate sets of CPUs, full load balancing is unnecessary.
+ 2) Systems supporting realtime on some CPUs need to minimize
+    system overhead on those CPUs, including avoiding task load
+    balancing if that is not needed.
+
+When the per-cpuset flag "cpuset.sched_load_balance" is enabled (the default
+setting), it requests that all the CPUs in that cpusets allowed 'cpuset.cpus'
+be contained in a single sched domain, ensuring that load balancing
+can move a task (not otherwised pinned, as by sched_setaffinity)
+from any CPU in that cpuset to any other.
+
+When the per-cpuset flag "cpuset.sched_load_balance" is disabled, then the
+scheduler will avoid load balancing across the CPUs in that cpuset,
+--except-- in so far as is necessary because some overlapping cpuset
+has "sched_load_balance" enabled.
+
+So, for example, if the top cpuset has the flag "cpuset.sched_load_balance"
+enabled, then the scheduler will have one sched domain covering all
+CPUs, and the setting of the "cpuset.sched_load_balance" flag in any other
+cpusets won't matter, as we're already fully load balancing.
+
+Therefore in the above two situations, the top cpuset flag
+"cpuset.sched_load_balance" should be disabled, and only some of the smaller,
+child cpusets have this flag enabled.
+
+When doing this, you don't usually want to leave any unpinned tasks in
+the top cpuset that might use non-trivial amounts of CPU, as such tasks
+may be artificially constrained to some subset of CPUs, depending on
+the particulars of this flag setting in descendant cpusets.  Even if
+such a task could use spare CPU cycles in some other CPUs, the kernel
+scheduler might not consider the possibility of load balancing that
+task to that underused CPU.
+
+Of course, tasks pinned to a particular CPU can be left in a cpuset
+that disables "cpuset.sched_load_balance" as those tasks aren't going anywhere
+else anyway.
+
+There is an impedance mismatch here, between cpusets and sched domains.
+Cpusets are hierarchical and nest.  Sched domains are flat; they don't
+overlap and each CPU is in at most one sched domain.
+
+It is necessary for sched domains to be flat because load balancing
+across partially overlapping sets of CPUs would risk unstable dynamics
+that would be beyond our understanding.  So if each of two partially
+overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we
+form a single sched domain that is a superset of both.  We won't move
+a task to a CPU outside its cpuset, but the scheduler load balancing
+code might waste some compute cycles considering that possibility.
+
+This mismatch is why there is not a simple one-to-one relation
+between which cpusets have the flag "cpuset.sched_load_balance" enabled,
+and the sched domain configuration.  If a cpuset enables the flag, it
+will get balancing across all its CPUs, but if it disables the flag,
+it will only be assured of no load balancing if no other overlapping
+cpuset enables the flag.
+
+If two cpusets have partially overlapping 'cpuset.cpus' allowed, and only
+one of them has this flag enabled, then the other may find its
+tasks only partially load balanced, just on the overlapping CPUs.
+This is just the general case of the top_cpuset example given a few
+paragraphs above.  In the general case, as in the top cpuset case,
+don't leave tasks that might use non-trivial amounts of CPU in
+such partially load balanced cpusets, as they may be artificially
+constrained to some subset of the CPUs allowed to them, for lack of
+load balancing to the other CPUs.
+
+CPUs in "cpuset.isolcpus" were excluded from load balancing by the
+isolcpus= kernel boot option, and will never be load balanced regardless
+of the value of "cpuset.sched_load_balance" in any cpuset.
+
+1.7.1 sched_load_balance implementation details.
+------------------------------------------------
+
+The per-cpuset flag 'cpuset.sched_load_balance' defaults to enabled (contrary
+to most cpuset flags.)  When enabled for a cpuset, the kernel will
+ensure that it can load balance across all the CPUs in that cpuset
+(makes sure that all the CPUs in the cpus_allowed of that cpuset are
+in the same sched domain.)
+
+If two overlapping cpusets both have 'cpuset.sched_load_balance' enabled,
+then they will be (must be) both in the same sched domain.
+
+If, as is the default, the top cpuset has 'cpuset.sched_load_balance' enabled,
+then by the above that means there is a single sched domain covering
+the whole system, regardless of any other cpuset settings.
+
+The kernel commits to user space that it will avoid load balancing
+where it can.  It will pick as fine a granularity partition of sched
+domains as it can while still providing load balancing for any set
+of CPUs allowed to a cpuset having 'cpuset.sched_load_balance' enabled.
+
+The internal kernel cpuset to scheduler interface passes from the
+cpuset code to the scheduler code a partition of the load balanced
+CPUs in the system. This partition is a set of subsets (represented
+as an array of struct cpumask) of CPUs, pairwise disjoint, that cover
+all the CPUs that must be load balanced.
+
+The cpuset code builds a new such partition and passes it to the
+scheduler sched domain setup code, to have the sched domains rebuilt
+as necessary, whenever:
+
+ - the 'cpuset.sched_load_balance' flag of a cpuset with non-empty CPUs changes,
+ - or CPUs come or go from a cpuset with this flag enabled,
+ - or 'cpuset.sched_relax_domain_level' value of a cpuset with non-empty CPUs
+   and with this flag enabled changes,
+ - or a cpuset with non-empty CPUs and with this flag enabled is removed,
+ - or a cpu is offlined/onlined.
+
+This partition exactly defines what sched domains the scheduler should
+setup - one sched domain for each element (struct cpumask) in the
+partition.
+
+The scheduler remembers the currently active sched domain partitions.
+When the scheduler routine partition_sched_domains() is invoked from
+the cpuset code to update these sched domains, it compares the new
+partition requested with the current, and updates its sched domains,
+removing the old and adding the new, for each change.
+
+
+1.8 What is sched_relax_domain_level ?
+--------------------------------------
+
+In sched domain, the scheduler migrates tasks in 2 ways; periodic load
+balance on tick, and at time of some schedule events.
+
+When a task is woken up, scheduler try to move the task on idle CPU.
+For example, if a task A running on CPU X activates another task B
+on the same CPU X, and if CPU Y is X's sibling and performing idle,
+then scheduler migrate task B to CPU Y so that task B can start on
+CPU Y without waiting task A on CPU X.
+
+And if a CPU run out of tasks in its runqueue, the CPU try to pull
+extra tasks from other busy CPUs to help them before it is going to
+be idle.
+
+Of course it takes some searching cost to find movable tasks and/or
+idle CPUs, the scheduler might not search all CPUs in the domain
+every time.  In fact, in some architectures, the searching ranges on
+events are limited in the same socket or node where the CPU locates,
+while the load balance on tick searches all.
+
+For example, assume CPU Z is relatively far from CPU X.  Even if CPU Z
+is idle while CPU X and the siblings are busy, scheduler can't migrate
+woken task B from X to Z since it is out of its searching range.
+As the result, task B on CPU X need to wait task A or wait load balance
+on the next tick.  For some applications in special situation, waiting
+1 tick may be too long.
+
+The 'cpuset.sched_relax_domain_level' file allows you to request changing
+this searching range as you like.  This file takes int value which
+indicates size of searching range in levels ideally as follows,
+otherwise initial value -1 that indicates the cpuset has no request.
+
+====== ===========================================================
+  -1   no request. use system default or follow request of others.
+   0   no search.
+   1   search siblings (hyperthreads in a core).
+   2   search cores in a package.
+   3   search cpus in a node [= system wide on non-NUMA system]
+   4   search nodes in a chunk of node [on NUMA system]
+   5   search system wide [on NUMA system]
+====== ===========================================================
+
+The system default is architecture dependent.  The system default
+can be changed using the relax_domain_level= boot parameter.
+
+This file is per-cpuset and affect the sched domain where the cpuset
+belongs to.  Therefore if the flag 'cpuset.sched_load_balance' of a cpuset
+is disabled, then 'cpuset.sched_relax_domain_level' have no effect since
+there is no sched domain belonging the cpuset.
+
+If multiple cpusets are overlapping and hence they form a single sched
+domain, the largest value among those is used.  Be careful, if one
+requests 0 and others are -1 then 0 is used.
+
+Note that modifying this file will have both good and bad effects,
+and whether it is acceptable or not depends on your situation.
+Don't modify this file if you are not sure.
+
+If your situation is:
+
+ - The migration costs between each cpu can be assumed considerably
+   small(for you) due to your special application's behavior or
+   special hardware support for CPU cache etc.
+ - The searching cost doesn't have impact(for you) or you can make
+   the searching cost enough small by managing cpuset to compact etc.
+ - The latency is required even it sacrifices cache hit rate etc.
+   then increasing 'sched_relax_domain_level' would benefit you.
+
+
+1.9 How do I use cpusets ?
+--------------------------
+
+In order to minimize the impact of cpusets on critical kernel
+code, such as the scheduler, and due to the fact that the kernel
+does not support one task updating the memory placement of another
+task directly, the impact on a task of changing its cpuset CPU
+or Memory Node placement, or of changing to which cpuset a task
+is attached, is subtle.
+
+If a cpuset has its Memory Nodes modified, then for each task attached
+to that cpuset, the next time that the kernel attempts to allocate
+a page of memory for that task, the kernel will notice the change
+in the task's cpuset, and update its per-task memory placement to
+remain within the new cpusets memory placement.  If the task was using
+mempolicy MPOL_BIND, and the nodes to which it was bound overlap with
+its new cpuset, then the task will continue to use whatever subset
+of MPOL_BIND nodes are still allowed in the new cpuset.  If the task
+was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
+in the new cpuset, then the task will be essentially treated as if it
+was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
+as queried by get_mempolicy(), doesn't change).  If a task is moved
+from one cpuset to another, then the kernel will adjust the task's
+memory placement, as above, the next time that the kernel attempts
+to allocate a page of memory for that task.
+
+If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset
+will have its allowed CPU placement changed immediately.  Similarly,
+if a task's pid is written to another cpuset's 'tasks' file, then its
+allowed CPU placement is changed immediately.  If such a task had been
+bound to some subset of its cpuset using the sched_setaffinity() call,
+the task will be allowed to run on any CPU allowed in its new cpuset,
+negating the effect of the prior sched_setaffinity() call.
+
+In summary, the memory placement of a task whose cpuset is changed is
+updated by the kernel, on the next allocation of a page for that task,
+and the processor placement is updated immediately.
+
+Normally, once a page is allocated (given a physical page
+of main memory) then that page stays on whatever node it
+was allocated, so long as it remains allocated, even if the
+cpusets memory placement policy 'cpuset.mems' subsequently changes.
+If the cpuset flag file 'cpuset.memory_migrate' is set true, then when
+tasks are attached to that cpuset, any pages that task had
+allocated to it on nodes in its previous cpuset are migrated
+to the task's new cpuset. The relative placement of the page within
+the cpuset is preserved during these migration operations if possible.
+For example if the page was on the second valid node of the prior cpuset
+then the page will be placed on the second valid node of the new cpuset.
+
+Also if 'cpuset.memory_migrate' is set true, then if that cpuset's
+'cpuset.mems' file is modified, pages allocated to tasks in that
+cpuset, that were on nodes in the previous setting of 'cpuset.mems',
+will be moved to nodes in the new setting of 'mems.'
+Pages that were not in the task's prior cpuset, or in the cpuset's
+prior 'cpuset.mems' setting, will not be moved.
+
+There is an exception to the above.  If hotplug functionality is used
+to remove all the CPUs that are currently assigned to a cpuset,
+then all the tasks in that cpuset will be moved to the nearest ancestor
+with non-empty cpus.  But the moving of some (or all) tasks might fail if
+cpuset is bound with another cgroup subsystem which has some restrictions
+on task attaching.  In this failing case, those tasks will stay
+in the original cpuset, and the kernel will automatically update
+their cpus_allowed to allow all online CPUs.  When memory hotplug
+functionality for removing Memory Nodes is available, a similar exception
+is expected to apply there as well.  In general, the kernel prefers to
+violate cpuset placement, over starving a task that has had all
+its allowed CPUs or Memory Nodes taken offline.
+
+There is a second exception to the above.  GFP_ATOMIC requests are
+kernel internal allocations that must be satisfied, immediately.
+The kernel may drop some request, in rare cases even panic, if a
+GFP_ATOMIC alloc fails.  If the request cannot be satisfied within
+the current task's cpuset, then we relax the cpuset, and look for
+memory anywhere we can find it.  It's better to violate the cpuset
+than stress the kernel.
+
+To start a new job that is to be contained within a cpuset, the steps are:
+
+ 1) mkdir /sys/fs/cgroup/cpuset
+ 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
+ 3) Create the new cpuset by doing mkdir's and write's (or echo's) in
+    the /sys/fs/cgroup/cpuset virtual file system.
+ 4) Start a task that will be the "founding father" of the new job.
+ 5) Attach that task to the new cpuset by writing its pid to the
+    /sys/fs/cgroup/cpuset tasks file for that cpuset.
+ 6) fork, exec or clone the job tasks from this founding father task.
+
+For example, the following sequence of commands will setup a cpuset
+named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
+and then start a subshell 'sh' in that cpuset::
+
+  mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
+  cd /sys/fs/cgroup/cpuset
+  mkdir Charlie
+  cd Charlie
+  /bin/echo 2-3 > cpuset.cpus
+  /bin/echo 1 > cpuset.mems
+  /bin/echo $$ > tasks
+  sh
+  # The subshell 'sh' is now running in cpuset Charlie
+  # The next line should display '/Charlie'
+  cat /proc/self/cpuset
+
+There are ways to query or modify cpusets:
+
+ - via the cpuset file system directly, using the various cd, mkdir, echo,
+   cat, rmdir commands from the shell, or their equivalent from C.
+ - via the C library libcpuset.
+ - via the C library libcgroup.
+   (http://sourceforge.net/projects/libcg/)
+ - via the python application cset.
+   (http://code.google.com/p/cpuset/)
+
+The sched_setaffinity calls can also be done at the shell prompt using
+SGI's runon or Robert Love's taskset.  The mbind and set_mempolicy
+calls can be done at the shell prompt using the numactl command
+(part of Andi Kleen's numa package).
+
+2. Usage Examples and Syntax
+============================
+
+2.1 Basic Usage
+---------------
+
+Creating, modifying, using the cpusets can be done through the cpuset
+virtual filesystem.
+
+To mount it, type:
+# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset
+
+Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the
+tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset
+is the cpuset that holds the whole system.
+
+If you want to create a new cpuset under /sys/fs/cgroup/cpuset::
+
+  # cd /sys/fs/cgroup/cpuset
+  # mkdir my_cpuset
+
+Now you want to do something with this cpuset::
+
+  # cd my_cpuset
+
+In this directory you can find several files::
+
+  # ls
+  cgroup.clone_children  cpuset.memory_pressure
+  cgroup.event_control   cpuset.memory_spread_page
+  cgroup.procs           cpuset.memory_spread_slab
+  cpuset.cpu_exclusive   cpuset.mems
+  cpuset.cpus            cpuset.sched_load_balance
+  cpuset.mem_exclusive   cpuset.sched_relax_domain_level
+  cpuset.mem_hardwall    notify_on_release
+  cpuset.memory_migrate  tasks
+
+Reading them will give you information about the state of this cpuset:
+the CPUs and Memory Nodes it can use, the processes that are using
+it, its properties.  By writing to these files you can manipulate
+the cpuset.
+
+Set some flags::
+
+  # /bin/echo 1 > cpuset.cpu_exclusive
+
+Add some cpus::
+
+  # /bin/echo 0-7 > cpuset.cpus
+
+Add some mems::
+
+  # /bin/echo 0-7 > cpuset.mems
+
+Now attach your shell to this cpuset::
+
+  # /bin/echo $$ > tasks
+
+You can also create cpusets inside your cpuset by using mkdir in this
+directory::
+
+  # mkdir my_sub_cs
+
+To remove a cpuset, just use rmdir::
+
+  # rmdir my_sub_cs
+
+This will fail if the cpuset is in use (has cpusets inside, or has
+processes attached).
+
+Note that for legacy reasons, the "cpuset" filesystem exists as a
+wrapper around the cgroup filesystem.
+
+The command::
+
+  mount -t cpuset X /sys/fs/cgroup/cpuset
+
+is equivalent to::
+
+  mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset
+  echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent
+
+2.2 Adding/removing cpus
+------------------------
+
+This is the syntax to use when writing in the cpus or mems files
+in cpuset directories::
+
+  # /bin/echo 1-4 > cpuset.cpus		-> set cpus list to cpus 1,2,3,4
+  # /bin/echo 1,2,3,4 > cpuset.cpus	-> set cpus list to cpus 1,2,3,4
+
+To add a CPU to a cpuset, write the new list of CPUs including the
+CPU to be added. To add 6 to the above cpuset::
+
+  # /bin/echo 1-4,6 > cpuset.cpus	-> set cpus list to cpus 1,2,3,4,6
+
+Similarly to remove a CPU from a cpuset, write the new list of CPUs
+without the CPU to be removed.
+
+To remove all the CPUs::
+
+  # /bin/echo "" > cpuset.cpus		-> clear cpus list
+
+2.3 Setting flags
+-----------------
+
+The syntax is very simple::
+
+  # /bin/echo 1 > cpuset.cpu_exclusive 	-> set flag 'cpuset.cpu_exclusive'
+  # /bin/echo 0 > cpuset.cpu_exclusive 	-> unset flag 'cpuset.cpu_exclusive'
+
+2.4 Attaching processes
+-----------------------
+
+::
+
+  # /bin/echo PID > tasks
+
+Note that it is PID, not PIDs. You can only attach ONE task at a time.
+If you have several tasks to attach, you have to do it one after another::
+
+  # /bin/echo PID1 > tasks
+  # /bin/echo PID2 > tasks
+	...
+  # /bin/echo PIDn > tasks
+
+
+3. Questions
+============
+
+Q:
+   what's up with this '/bin/echo' ?
+
+A:
+   bash's builtin 'echo' command does not check calls to write() against
+   errors. If you use it in the cpuset file system, you won't be
+   able to tell whether a command succeeded or failed.
+
+Q:
+   When I attach processes, only the first of the line gets really attached !
+
+A:
+   We can only return one error code per call to write(). So you should also
+   put only ONE pid.
+
+4. Contact
+==========
+
+Web: http://www.bullopensource.org/cpuset
diff --git a/Documentation/admin-guide/cgroup-v1/devices.rst b/Documentation/admin-guide/cgroup-v1/devices.rst
new file mode 100644
index 000000000000..e1886783961e
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/devices.rst
@@ -0,0 +1,132 @@
+===========================
+Device Whitelist Controller
+===========================
+
+1. Description
+==============
+
+Implement a cgroup to track and enforce open and mknod restrictions
+on device files.  A device cgroup associates a device access
+whitelist with each cgroup.  A whitelist entry has 4 fields.
+'type' is a (all), c (char), or b (block).  'all' means it applies
+to all types and all major and minor numbers.  Major and minor are
+either an integer or * for all.  Access is a composition of r
+(read), w (write), and m (mknod).
+
+The root device cgroup starts with rwm to 'all'.  A child device
+cgroup gets a copy of the parent.  Administrators can then remove
+devices from the whitelist or add new entries.  A child cgroup can
+never receive a device access which is denied by its parent.
+
+2. User Interface
+=================
+
+An entry is added using devices.allow, and removed using
+devices.deny.  For instance::
+
+	echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
+
+allows cgroup 1 to read and mknod the device usually known as
+/dev/null.  Doing::
+
+	echo a > /sys/fs/cgroup/1/devices.deny
+
+will remove the default 'a *:* rwm' entry. Doing::
+
+	echo a > /sys/fs/cgroup/1/devices.allow
+
+will add the 'a *:* rwm' entry to the whitelist.
+
+3. Security
+===========
+
+Any task can move itself between cgroups.  This clearly won't
+suffice, but we can decide the best way to adequately restrict
+movement as people get some experience with this.  We may just want
+to require CAP_SYS_ADMIN, which at least is a separate bit from
+CAP_MKNOD.  We may want to just refuse moving to a cgroup which
+isn't a descendant of the current one.  Or we may want to use
+CAP_MAC_ADMIN, since we really are trying to lock down root.
+
+CAP_SYS_ADMIN is needed to modify the whitelist or move another
+task to a new cgroup.  (Again we'll probably want to change that).
+
+A cgroup may not be granted more permissions than the cgroup's
+parent has.
+
+4. Hierarchy
+============
+
+device cgroups maintain hierarchy by making sure a cgroup never has more
+access permissions than its parent.  Every time an entry is written to
+a cgroup's devices.deny file, all its children will have that entry removed
+from their whitelist and all the locally set whitelist entries will be
+re-evaluated.  In case one of the locally set whitelist entries would provide
+more access than the cgroup's parent, it'll be removed from the whitelist.
+
+Example::
+
+      A
+     / \
+        B
+
+    group        behavior	exceptions
+    A            allow		"b 8:* rwm", "c 116:1 rw"
+    B            deny		"c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
+
+If a device is denied in group A::
+
+	# echo "c 116:* r" > A/devices.deny
+
+it'll propagate down and after revalidating B's entries, the whitelist entry
+"c 116:2 rwm" will be removed::
+
+    group        whitelist entries                        denied devices
+    A            all                                      "b 8:* rwm", "c 116:* rw"
+    B            "c 1:3 rwm", "b 3:* rwm"                 all the rest
+
+In case parent's exceptions change and local exceptions are not allowed
+anymore, they'll be deleted.
+
+Notice that new whitelist entries will not be propagated::
+
+      A
+     / \
+        B
+
+    group        whitelist entries                        denied devices
+    A            "c 1:3 rwm", "c 1:5 r"                   all the rest
+    B            "c 1:3 rwm", "c 1:5 r"                   all the rest
+
+when adding ``c *:3 rwm``::
+
+	# echo "c *:3 rwm" >A/devices.allow
+
+the result::
+
+    group        whitelist entries                        denied devices
+    A            "c *:3 rwm", "c 1:5 r"                   all the rest
+    B            "c 1:3 rwm", "c 1:5 r"                   all the rest
+
+but now it'll be possible to add new entries to B::
+
+	# echo "c 2:3 rwm" >B/devices.allow
+	# echo "c 50:3 r" >B/devices.allow
+
+or even::
+
+	# echo "c *:3 rwm" >B/devices.allow
+
+Allowing or denying all by writing 'a' to devices.allow or devices.deny will
+not be possible once the device cgroups has children.
+
+4.1 Hierarchy (internal implementation)
+---------------------------------------
+
+device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
+list of exceptions.  The internal state is controlled using the same user
+interface to preserve compatibility with the previous whitelist-only
+implementation.  Removal or addition of exceptions that will reduce the access
+to devices will be propagated down the hierarchy.
+For every propagated exception, the effective rules will be re-evaluated based
+on current parent's access rules.
diff --git a/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst b/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst
new file mode 100644
index 000000000000..582d3427de3f
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst
@@ -0,0 +1,127 @@
+==============
+Cgroup Freezer
+==============
+
+The cgroup freezer is useful to batch job management system which start
+and stop sets of tasks in order to schedule the resources of a machine
+according to the desires of a system administrator. This sort of program
+is often used on HPC clusters to schedule access to the cluster as a
+whole. The cgroup freezer uses cgroups to describe the set of tasks to
+be started/stopped by the batch job management system. It also provides
+a means to start and stop the tasks composing the job.
+
+The cgroup freezer will also be useful for checkpointing running groups
+of tasks. The freezer allows the checkpoint code to obtain a consistent
+image of the tasks by attempting to force the tasks in a cgroup into a
+quiescent state. Once the tasks are quiescent another task can
+walk /proc or invoke a kernel interface to gather information about the
+quiesced tasks. Checkpointed tasks can be restarted later should a
+recoverable error occur. This also allows the checkpointed tasks to be
+migrated between nodes in a cluster by copying the gathered information
+to another node and restarting the tasks there.
+
+Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
+and resuming tasks in userspace. Both of these signals are observable
+from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
+blocked, or ignored it can be seen by waiting or ptracing parent tasks.
+SIGCONT is especially unsuitable since it can be caught by the task. Any
+programs designed to watch for SIGSTOP and SIGCONT could be broken by
+attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
+demonstrate this problem using nested bash shells::
+
+	$ echo $$
+	16644
+	$ bash
+	$ echo $$
+	16690
+
+	From a second, unrelated bash shell:
+	$ kill -SIGSTOP 16690
+	$ kill -SIGCONT 16690
+
+	<at this point 16690 exits and causes 16644 to exit too>
+
+This happens because bash can observe both signals and choose how it
+responds to them.
+
+Another example of a program which catches and responds to these
+signals is gdb. In fact any program designed to use ptrace is likely to
+have a problem with this method of stopping and resuming tasks.
+
+In contrast, the cgroup freezer uses the kernel freezer code to
+prevent the freeze/unfreeze cycle from becoming visible to the tasks
+being frozen. This allows the bash example above and gdb to run as
+expected.
+
+The cgroup freezer is hierarchical. Freezing a cgroup freezes all
+tasks belonging to the cgroup and all its descendant cgroups. Each
+cgroup has its own state (self-state) and the state inherited from the
+parent (parent-state). Iff both states are THAWED, the cgroup is
+THAWED.
+
+The following cgroupfs files are created by cgroup freezer.
+
+* freezer.state: Read-write.
+
+  When read, returns the effective state of the cgroup - "THAWED",
+  "FREEZING" or "FROZEN". This is the combined self and parent-states.
+  If any is freezing, the cgroup is freezing (FREEZING or FROZEN).
+
+  FREEZING cgroup transitions into FROZEN state when all tasks
+  belonging to the cgroup and its descendants become frozen. Note that
+  a cgroup reverts to FREEZING from FROZEN after a new task is added
+  to the cgroup or one of its descendant cgroups until the new task is
+  frozen.
+
+  When written, sets the self-state of the cgroup. Two values are
+  allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup,
+  if not already freezing, enters FREEZING state along with all its
+  descendant cgroups.
+
+  If THAWED is written, the self-state of the cgroup is changed to
+  THAWED.  Note that the effective state may not change to THAWED if
+  the parent-state is still freezing. If a cgroup's effective state
+  becomes THAWED, all its descendants which are freezing because of
+  the cgroup also leave the freezing state.
+
+* freezer.self_freezing: Read only.
+
+  Shows the self-state. 0 if the self-state is THAWED; otherwise, 1.
+  This value is 1 iff the last write to freezer.state was "FROZEN".
+
+* freezer.parent_freezing: Read only.
+
+  Shows the parent-state.  0 if none of the cgroup's ancestors is
+  frozen; otherwise, 1.
+
+The root cgroup is non-freezable and the above interface files don't
+exist.
+
+* Examples of usage::
+
+   # mkdir /sys/fs/cgroup/freezer
+   # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
+   # mkdir /sys/fs/cgroup/freezer/0
+   # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
+
+to get status of the freezer subsystem::
+
+   # cat /sys/fs/cgroup/freezer/0/freezer.state
+   THAWED
+
+to freeze all tasks in the container::
+
+   # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
+   # cat /sys/fs/cgroup/freezer/0/freezer.state
+   FREEZING
+   # cat /sys/fs/cgroup/freezer/0/freezer.state
+   FROZEN
+
+to unfreeze all tasks in the container::
+
+   # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
+   # cat /sys/fs/cgroup/freezer/0/freezer.state
+   THAWED
+
+This is the basic mechanism which should do the right thing for user space task
+in a simple scenario.
diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
new file mode 100644
index 000000000000..a3902aa253a9
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
@@ -0,0 +1,50 @@
+==================
+HugeTLB Controller
+==================
+
+The HugeTLB controller allows to limit the HugeTLB usage per control group and
+enforces the controller limit during page fault. Since HugeTLB doesn't
+support page reclaim, enforcing the limit at page fault time implies that,
+the application will get SIGBUS signal if it tries to access HugeTLB pages
+beyond its limit. This requires the application to know beforehand how much
+HugeTLB pages it would require for its use.
+
+HugeTLB controller can be created by first mounting the cgroup filesystem.
+
+# mount -t cgroup -o hugetlb none /sys/fs/cgroup
+
+With the above step, the initial or the parent HugeTLB group becomes
+visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
+the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
+
+New groups can be created under the parent group /sys/fs/cgroup::
+
+  # cd /sys/fs/cgroup
+  # mkdir g1
+  # echo $$ > g1/tasks
+
+The above steps create a new group g1 and move the current shell
+process (bash) into it.
+
+Brief summary of control files::
+
+ hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
+ hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
+ hugetlb.<hugepagesize>.usage_in_bytes     # show current usage for "hugepagesize" hugetlb
+ hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
+
+For a system supporting three hugepage sizes (64k, 32M and 1G), the control
+files include::
+
+  hugetlb.1GB.limit_in_bytes
+  hugetlb.1GB.max_usage_in_bytes
+  hugetlb.1GB.usage_in_bytes
+  hugetlb.1GB.failcnt
+  hugetlb.64KB.limit_in_bytes
+  hugetlb.64KB.max_usage_in_bytes
+  hugetlb.64KB.usage_in_bytes
+  hugetlb.64KB.failcnt
+  hugetlb.32MB.limit_in_bytes
+  hugetlb.32MB.max_usage_in_bytes
+  hugetlb.32MB.usage_in_bytes
+  hugetlb.32MB.failcnt
diff --git a/Documentation/admin-guide/cgroup-v1/index.rst b/Documentation/admin-guide/cgroup-v1/index.rst
new file mode 100644
index 000000000000..10bf48bae0b0
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/index.rst
@@ -0,0 +1,28 @@
+========================
+Control Groups version 1
+========================
+
+.. toctree::
+    :maxdepth: 1
+
+    cgroups
+
+    blkio-controller
+    cpuacct
+    cpusets
+    devices
+    freezer-subsystem
+    hugetlb
+    memcg_test
+    memory
+    net_cls
+    net_prio
+    pids
+    rdma
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
new file mode 100644
index 000000000000..3f7115e07b5d
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
@@ -0,0 +1,355 @@
+=====================================================
+Memory Resource Controller(Memcg) Implementation Memo
+=====================================================
+
+Last Updated: 2010/2
+
+Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34).
+
+Because VM is getting complex (one of reasons is memcg...), memcg's behavior
+is complex. This is a document for memcg's internal behavior.
+Please note that implementation details can be changed.
+
+(*) Topics on API should be in Documentation/admin-guide/cgroup-v1/memory.rst)
+
+0. How to record usage ?
+========================
+
+   2 objects are used.
+
+   page_cgroup ....an object per page.
+
+	Allocated at boot or memory hotplug. Freed at memory hot removal.
+
+   swap_cgroup ... an entry per swp_entry.
+
+	Allocated at swapon(). Freed at swapoff().
+
+   The page_cgroup has USED bit and double count against a page_cgroup never
+   occurs. swap_cgroup is used only when a charged page is swapped-out.
+
+1. Charge
+=========
+
+   a page/swp_entry may be charged (usage += PAGE_SIZE) at
+
+	mem_cgroup_try_charge()
+
+2. Uncharge
+===========
+
+  a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
+
+	mem_cgroup_uncharge()
+	  Called when a page's refcount goes down to 0.
+
+	mem_cgroup_uncharge_swap()
+	  Called when swp_entry's refcnt goes down to 0. A charge against swap
+	  disappears.
+
+3. charge-commit-cancel
+=======================
+
+	Memcg pages are charged in two steps:
+
+		- mem_cgroup_try_charge()
+		- mem_cgroup_commit_charge() or mem_cgroup_cancel_charge()
+
+	At try_charge(), there are no flags to say "this page is charged".
+	at this point, usage += PAGE_SIZE.
+
+	At commit(), the page is associated with the memcg.
+
+	At cancel(), simply usage -= PAGE_SIZE.
+
+Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
+
+4. Anonymous
+============
+
+	Anonymous page is newly allocated at
+		  - page fault into MAP_ANONYMOUS mapping.
+		  - Copy-On-Write.
+
+	4.1 Swap-in.
+	At swap-in, the page is taken from swap-cache. There are 2 cases.
+
+	(a) If the SwapCache is newly allocated and read, it has no charges.
+	(b) If the SwapCache has been mapped by processes, it has been
+	    charged already.
+
+	4.2 Swap-out.
+	At swap-out, typical state transition is below.
+
+	(a) add to swap cache. (marked as SwapCache)
+	    swp_entry's refcnt += 1.
+	(b) fully unmapped.
+	    swp_entry's refcnt += # of ptes.
+	(c) write back to swap.
+	(d) delete from swap cache. (remove from SwapCache)
+	    swp_entry's refcnt -= 1.
+
+
+	Finally, at task exit,
+	(e) zap_pte() is called and swp_entry's refcnt -=1 -> 0.
+
+5. Page Cache
+=============
+
+	Page Cache is charged at
+	- add_to_page_cache_locked().
+
+	The logic is very clear. (About migration, see below)
+
+	Note:
+	  __remove_from_page_cache() is called by remove_from_page_cache()
+	  and __remove_mapping().
+
+6. Shmem(tmpfs) Page Cache
+===========================
+
+	The best way to understand shmem's page state transition is to read
+	mm/shmem.c.
+
+	But brief explanation of the behavior of memcg around shmem will be
+	helpful to understand the logic.
+
+	Shmem's page (just leaf page, not direct/indirect block) can be on
+
+		- radix-tree of shmem's inode.
+		- SwapCache.
+		- Both on radix-tree and SwapCache. This happens at swap-in
+		  and swap-out,
+
+	It's charged when...
+
+	- A new page is added to shmem's radix-tree.
+	- A swp page is read. (move a charge from swap_cgroup to page_cgroup)
+
+7. Page Migration
+=================
+
+	mem_cgroup_migrate()
+
+8. LRU
+======
+        Each memcg has its own private LRU. Now, its handling is under global
+	VM's control (means that it's handled under global pgdat->lru_lock).
+	Almost all routines around memcg's LRU is called by global LRU's
+	list management functions under pgdat->lru_lock.
+
+	A special function is mem_cgroup_isolate_pages(). This scans
+	memcg's private LRU and call __isolate_lru_page() to extract a page
+	from LRU.
+
+	(By __isolate_lru_page(), the page is removed from both of global and
+	private LRU.)
+
+
+9. Typical Tests.
+=================
+
+ Tests for racy cases.
+
+9.1 Small limit to memcg.
+-------------------------
+
+	When you do test to do racy case, it's good test to set memcg's limit
+	to be very small rather than GB. Many races found in the test under
+	xKB or xxMB limits.
+
+	(Memory behavior under GB and Memory behavior under MB shows very
+	different situation.)
+
+9.2 Shmem
+---------
+
+	Historically, memcg's shmem handling was poor and we saw some amount
+	of troubles here. This is because shmem is page-cache but can be
+	SwapCache. Test with shmem/tmpfs is always good test.
+
+9.3 Migration
+-------------
+
+	For NUMA, migration is an another special case. To do easy test, cpuset
+	is useful. Following is a sample script to do migration::
+
+		mount -t cgroup -o cpuset none /opt/cpuset
+
+		mkdir /opt/cpuset/01
+		echo 1 > /opt/cpuset/01/cpuset.cpus
+		echo 0 > /opt/cpuset/01/cpuset.mems
+		echo 1 > /opt/cpuset/01/cpuset.memory_migrate
+		mkdir /opt/cpuset/02
+		echo 1 > /opt/cpuset/02/cpuset.cpus
+		echo 1 > /opt/cpuset/02/cpuset.mems
+		echo 1 > /opt/cpuset/02/cpuset.memory_migrate
+
+	In above set, when you moves a task from 01 to 02, page migration to
+	node 0 to node 1 will occur. Following is a script to migrate all
+	under cpuset.::
+
+		--
+		move_task()
+		{
+		for pid in $1
+		do
+			/bin/echo $pid >$2/tasks 2>/dev/null
+			echo -n $pid
+			echo -n " "
+		done
+		echo END
+		}
+
+		G1_TASK=`cat ${G1}/tasks`
+		G2_TASK=`cat ${G2}/tasks`
+		move_task "${G1_TASK}" ${G2} &
+		--
+
+9.4 Memory hotplug
+------------------
+
+	memory hotplug test is one of good test.
+
+	to offline memory, do following::
+
+		# echo offline > /sys/devices/system/memory/memoryXXX/state
+
+	(XXX is the place of memory)
+
+	This is an easy way to test page migration, too.
+
+9.5 mkdir/rmdir
+---------------
+
+	When using hierarchy, mkdir/rmdir test should be done.
+	Use tests like the following::
+
+		echo 1 >/opt/cgroup/01/memory/use_hierarchy
+		mkdir /opt/cgroup/01/child_a
+		mkdir /opt/cgroup/01/child_b
+
+		set limit to 01.
+		add limit to 01/child_b
+		run jobs under child_a and child_b
+
+	create/delete following groups at random while jobs are running::
+
+		/opt/cgroup/01/child_a/child_aa
+		/opt/cgroup/01/child_b/child_bb
+		/opt/cgroup/01/child_c
+
+	running new jobs in new group is also good.
+
+9.6 Mount with other subsystems
+-------------------------------
+
+	Mounting with other subsystems is a good test because there is a
+	race and lock dependency with other cgroup subsystems.
+
+	example::
+
+		# mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices
+
+	and do task move, mkdir, rmdir etc...under this.
+
+9.7 swapoff
+-----------
+
+	Besides management of swap is one of complicated parts of memcg,
+	call path of swap-in at swapoff is not same as usual swap-in path..
+	It's worth to be tested explicitly.
+
+	For example, test like following is good:
+
+	(Shell-A)::
+
+		# mount -t cgroup none /cgroup -o memory
+		# mkdir /cgroup/test
+		# echo 40M > /cgroup/test/memory.limit_in_bytes
+		# echo 0 > /cgroup/test/tasks
+
+	Run malloc(100M) program under this. You'll see 60M of swaps.
+
+	(Shell-B)::
+
+		# move all tasks in /cgroup/test to /cgroup
+		# /sbin/swapoff -a
+		# rmdir /cgroup/test
+		# kill malloc task.
+
+	Of course, tmpfs v.s. swapoff test should be tested, too.
+
+9.8 OOM-Killer
+--------------
+
+	Out-of-memory caused by memcg's limit will kill tasks under
+	the memcg. When hierarchy is used, a task under hierarchy
+	will be killed by the kernel.
+
+	In this case, panic_on_oom shouldn't be invoked and tasks
+	in other groups shouldn't be killed.
+
+	It's not difficult to cause OOM under memcg as following.
+
+	Case A) when you can swapoff::
+
+		#swapoff -a
+		#echo 50M > /memory.limit_in_bytes
+
+	run 51M of malloc
+
+	Case B) when you use mem+swap limitation::
+
+		#echo 50M > memory.limit_in_bytes
+		#echo 50M > memory.memsw.limit_in_bytes
+
+	run 51M of malloc
+
+9.9 Move charges at task migration
+----------------------------------
+
+	Charges associated with a task can be moved along with task migration.
+
+	(Shell-A)::
+
+		#mkdir /cgroup/A
+		#echo $$ >/cgroup/A/tasks
+
+	run some programs which uses some amount of memory in /cgroup/A.
+
+	(Shell-B)::
+
+		#mkdir /cgroup/B
+		#echo 1 >/cgroup/B/memory.move_charge_at_immigrate
+		#echo "pid of the program running in group A" >/cgroup/B/tasks
+
+	You can see charges have been moved by reading ``*.usage_in_bytes`` or
+	memory.stat of both A and B.
+
+	See 8.2 of Documentation/admin-guide/cgroup-v1/memory.rst to see what value should
+	be written to move_charge_at_immigrate.
+
+9.10 Memory thresholds
+----------------------
+
+	Memory controller implements memory thresholds using cgroups notification
+	API. You can use tools/cgroup/cgroup_event_listener.c to test it.
+
+	(Shell-A) Create cgroup and run event listener::
+
+		# mkdir /cgroup/A
+		# ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M
+
+	(Shell-B) Add task to cgroup and try to allocate and free memory::
+
+		# echo $$ >/cgroup/A/tasks
+		# a="$(dd if=/dev/zero bs=1M count=10)"
+		# a=
+
+	You will see message from cgroup_event_listener every time you cross
+	the thresholds.
+
+	Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds.
+
+	It's good idea to test root cgroup as well.
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
new file mode 100644
index 000000000000..41bdc038dad9
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -0,0 +1,1003 @@
+==========================
+Memory Resource Controller
+==========================
+
+NOTE:
+      This document is hopelessly outdated and it asks for a complete
+      rewrite. It still contains a useful information so we are keeping it
+      here but make sure to check the current code if you need a deeper
+      understanding.
+
+NOTE:
+      The Memory Resource Controller has generically been referred to as the
+      memory controller in this document. Do not confuse memory controller
+      used here with the memory controller that is used in hardware.
+
+(For editors) In this document:
+      When we mention a cgroup (cgroupfs's directory) with memory controller,
+      we call it "memory cgroup". When you see git-log and source code, you'll
+      see patch's title and function names tend to use "memcg".
+      In this document, we avoid using it.
+
+Benefits and Purpose of the memory controller
+=============================================
+
+The memory controller isolates the memory behaviour of a group of tasks
+from the rest of the system. The article on LWN [12] mentions some probable
+uses of the memory controller. The memory controller can be used to
+
+a. Isolate an application or a group of applications
+   Memory-hungry applications can be isolated and limited to a smaller
+   amount of memory.
+b. Create a cgroup with a limited amount of memory; this can be used
+   as a good alternative to booting with mem=XXXX.
+c. Virtualization solutions can control the amount of memory they want
+   to assign to a virtual machine instance.
+d. A CD/DVD burner could control the amount of memory used by the
+   rest of the system to ensure that burning does not fail due to lack
+   of available memory.
+e. There are several other use cases; find one or use the controller just
+   for fun (to learn and hack on the VM subsystem).
+
+Current Status: linux-2.6.34-mmotm(development version of 2010/April)
+
+Features:
+
+ - accounting anonymous pages, file caches, swap caches usage and limiting them.
+ - pages are linked to per-memcg LRU exclusively, and there is no global LRU.
+ - optionally, memory+swap usage can be accounted and limited.
+ - hierarchical accounting
+ - soft limit
+ - moving (recharging) account at moving a task is selectable.
+ - usage threshold notifier
+ - memory pressure notifier
+ - oom-killer disable knob and oom-notifier
+ - Root cgroup has no limit controls.
+
+ Kernel memory support is a work in progress, and the current version provides
+ basically functionality. (See Section 2.7)
+
+Brief summary of control files.
+
+==================================== ==========================================
+ tasks				     attach a task(thread) and show list of
+				     threads
+ cgroup.procs			     show list of processes
+ cgroup.event_control		     an interface for event_fd()
+ memory.usage_in_bytes		     show current usage for memory
+				     (See 5.5 for details)
+ memory.memsw.usage_in_bytes	     show current usage for memory+Swap
+				     (See 5.5 for details)
+ memory.limit_in_bytes		     set/show limit of memory usage
+ memory.memsw.limit_in_bytes	     set/show limit of memory+Swap usage
+ memory.failcnt			     show the number of memory usage hits limits
+ memory.memsw.failcnt		     show the number of memory+Swap hits limits
+ memory.max_usage_in_bytes	     show max memory usage recorded
+ memory.memsw.max_usage_in_bytes     show max memory+Swap usage recorded
+ memory.soft_limit_in_bytes	     set/show soft limit of memory usage
+ memory.stat			     show various statistics
+ memory.use_hierarchy		     set/show hierarchical account enabled
+ memory.force_empty		     trigger forced page reclaim
+ memory.pressure_level		     set memory pressure notifications
+ memory.swappiness		     set/show swappiness parameter of vmscan
+				     (See sysctl's vm.swappiness)
+ memory.move_charge_at_immigrate     set/show controls of moving charges
+ memory.oom_control		     set/show oom controls.
+ memory.numa_stat		     show the number of memory usage per numa
+				     node
+
+ memory.kmem.limit_in_bytes          set/show hard limit for kernel memory
+ memory.kmem.usage_in_bytes          show current kernel memory allocation
+ memory.kmem.failcnt                 show the number of kernel memory usage
+				     hits limits
+ memory.kmem.max_usage_in_bytes      show max kernel memory usage recorded
+
+ memory.kmem.tcp.limit_in_bytes      set/show hard limit for tcp buf memory
+ memory.kmem.tcp.usage_in_bytes      show current tcp buf memory allocation
+ memory.kmem.tcp.failcnt             show the number of tcp buf memory usage
+				     hits limits
+ memory.kmem.tcp.max_usage_in_bytes  show max tcp buf memory usage recorded
+==================================== ==========================================
+
+1. History
+==========
+
+The memory controller has a long history. A request for comments for the memory
+controller was posted by Balbir Singh [1]. At the time the RFC was posted
+there were several implementations for memory control. The goal of the
+RFC was to build consensus and agreement for the minimal features required
+for memory control. The first RSS controller was posted by Balbir Singh[2]
+in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the
+RSS controller. At OLS, at the resource management BoF, everyone suggested
+that we handle both page cache and RSS together. Another request was raised
+to allow user space handling of OOM. The current memory controller is
+at version 6; it combines both mapped (RSS) and unmapped Page
+Cache Control [11].
+
+2. Memory Control
+=================
+
+Memory is a unique resource in the sense that it is present in a limited
+amount. If a task requires a lot of CPU processing, the task can spread
+its processing over a period of hours, days, months or years, but with
+memory, the same physical memory needs to be reused to accomplish the task.
+
+The memory controller implementation has been divided into phases. These
+are:
+
+1. Memory controller
+2. mlock(2) controller
+3. Kernel user memory accounting and slab control
+4. user mappings length controller
+
+The memory controller is the first controller developed.
+
+2.1. Design
+-----------
+
+The core of the design is a counter called the page_counter. The
+page_counter tracks the current memory usage and limit of the group of
+processes associated with the controller. Each cgroup has a memory controller
+specific data structure (mem_cgroup) associated with it.
+
+2.2. Accounting
+---------------
+
+::
+
+		+--------------------+
+		|  mem_cgroup        |
+		|  (page_counter)    |
+		+--------------------+
+		 /            ^      \
+		/             |       \
+           +---------------+  |        +---------------+
+           | mm_struct     |  |....    | mm_struct     |
+           |               |  |        |               |
+           +---------------+  |        +---------------+
+                              |
+                              + --------------+
+                                              |
+           +---------------+           +------+--------+
+           | page          +---------->  page_cgroup|
+           |               |           |               |
+           +---------------+           +---------------+
+
+             (Figure 1: Hierarchy of Accounting)
+
+
+Figure 1 shows the important aspects of the controller
+
+1. Accounting happens per cgroup
+2. Each mm_struct knows about which cgroup it belongs to
+3. Each page has a pointer to the page_cgroup, which in turn knows the
+   cgroup it belongs to
+
+The accounting is done as follows: mem_cgroup_charge_common() is invoked to
+set up the necessary data structures and check if the cgroup that is being
+charged is over its limit. If it is, then reclaim is invoked on the cgroup.
+More details can be found in the reclaim section of this document.
+If everything goes well, a page meta-data-structure called page_cgroup is
+updated. page_cgroup has its own LRU on cgroup.
+(*) page_cgroup structure is allocated at boot/memory-hotplug time.
+
+2.2.1 Accounting details
+------------------------
+
+All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
+Some pages which are never reclaimable and will not be on the LRU
+are not accounted. We just account pages under usual VM management.
+
+RSS pages are accounted at page_fault unless they've already been accounted
+for earlier. A file page will be accounted for as Page Cache when it's
+inserted into inode (radix-tree). While it's mapped into the page tables of
+processes, duplicate accounting is carefully avoided.
+
+An RSS page is unaccounted when it's fully unmapped. A PageCache page is
+unaccounted when it's removed from radix-tree. Even if RSS pages are fully
+unmapped (by kswapd), they may exist as SwapCache in the system until they
+are really freed. Such SwapCaches are also accounted.
+A swapped-in page is not accounted until it's mapped.
+
+Note: The kernel does swapin-readahead and reads multiple swaps at once.
+This means swapped-in pages may contain pages for other tasks than a task
+causing page fault. So, we avoid accounting at swap-in I/O.
+
+At page migration, accounting information is kept.
+
+Note: we just account pages-on-LRU because our purpose is to control amount
+of used pages; not-on-LRU pages tend to be out-of-control from VM view.
+
+2.3 Shared Page Accounting
+--------------------------
+
+Shared pages are accounted on the basis of the first touch approach. The
+cgroup that first touches a page is accounted for the page. The principle
+behind this approach is that a cgroup that aggressively uses a shared
+page will eventually get charged for it (once it is uncharged from
+the cgroup that brought it in -- this will happen on memory pressure).
+
+But see section 8.2: when moving a task to another cgroup, its pages may
+be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
+
+Exception: If CONFIG_MEMCG_SWAP is not used.
+When you do swapoff and make swapped-out pages of shmem(tmpfs) to
+be backed into memory in force, charges for pages are accounted against the
+caller of swapoff rather than the users of shmem.
+
+2.4 Swap Extension (CONFIG_MEMCG_SWAP)
+--------------------------------------
+
+Swap Extension allows you to record charge for swap. A swapped-in page is
+charged back to original page allocator if possible.
+
+When swap is accounted, following files are added.
+
+ - memory.memsw.usage_in_bytes.
+ - memory.memsw.limit_in_bytes.
+
+memsw means memory+swap. Usage of memory+swap is limited by
+memsw.limit_in_bytes.
+
+Example: Assume a system with 4G of swap. A task which allocates 6G of memory
+(by mistake) under 2G memory limitation will use all swap.
+In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap.
+By using the memsw limit, you can avoid system OOM which can be caused by swap
+shortage.
+
+**why 'memory+swap' rather than swap**
+
+The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
+to move account from memory to swap...there is no change in usage of
+memory+swap. In other words, when we want to limit the usage of swap without
+affecting global LRU, memory+swap limit is better than just limiting swap from
+an OS point of view.
+
+**What happens when a cgroup hits memory.memsw.limit_in_bytes**
+
+When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
+in this cgroup. Then, swap-out will not be done by cgroup routine and file
+caches are dropped. But as mentioned above, global LRU can do swapout memory
+from it for sanity of the system's memory management state. You can't forbid
+it by cgroup.
+
+2.5 Reclaim
+-----------
+
+Each cgroup maintains a per cgroup LRU which has the same structure as
+global VM. When a cgroup goes over its limit, we first try
+to reclaim memory from the cgroup so as to make space for the new
+pages that the cgroup has touched. If the reclaim is unsuccessful,
+an OOM routine is invoked to select and kill the bulkiest task in the
+cgroup. (See 10. OOM Control below.)
+
+The reclaim algorithm has not been modified for cgroups, except that
+pages that are selected for reclaiming come from the per-cgroup LRU
+list.
+
+NOTE:
+  Reclaim does not work for the root cgroup, since we cannot set any
+  limits on the root cgroup.
+
+Note2:
+  When panic_on_oom is set to "2", the whole system will panic.
+
+When oom event notifier is registered, event will be delivered.
+(See oom_control section)
+
+2.6 Locking
+-----------
+
+   lock_page_cgroup()/unlock_page_cgroup() should not be called under
+   the i_pages lock.
+
+   Other lock order is following:
+
+   PG_locked.
+     mm->page_table_lock
+         pgdat->lru_lock
+	   lock_page_cgroup.
+
+  In many cases, just lock_page_cgroup() is called.
+
+  per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
+  pgdat->lru_lock, it has no lock of its own.
+
+2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
+-----------------------------------------------
+
+With the Kernel memory extension, the Memory Controller is able to limit
+the amount of kernel memory used by the system. Kernel memory is fundamentally
+different than user memory, since it can't be swapped out, which makes it
+possible to DoS the system by consuming too much of this precious resource.
+
+Kernel memory accounting is enabled for all memory cgroups by default. But
+it can be disabled system-wide by passing cgroup.memory=nokmem to the kernel
+at boot time. In this case, kernel memory will not be accounted at all.
+
+Kernel memory limits are not imposed for the root cgroup. Usage for the root
+cgroup may or may not be accounted. The memory used is accumulated into
+memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
+(currently only for tcp).
+
+The main "kmem" counter is fed into the main counter, so kmem charges will
+also be visible from the user counter.
+
+Currently no soft limit is implemented for kernel memory. It is future work
+to trigger slab reclaim when those limits are reached.
+
+2.7.1 Current Kernel Memory resources accounted
+-----------------------------------------------
+
+stack pages:
+  every process consumes some stack pages. By accounting into
+  kernel memory, we prevent new processes from being created when the kernel
+  memory usage is too high.
+
+slab pages:
+  pages allocated by the SLAB or SLUB allocator are tracked. A copy
+  of each kmem_cache is created every time the cache is touched by the first time
+  from inside the memcg. The creation is done lazily, so some objects can still be
+  skipped while the cache is being created. All objects in a slab page should
+  belong to the same memcg. This only fails to hold when a task is migrated to a
+  different memcg during the page allocation by the cache.
+
+sockets memory pressure:
+  some sockets protocols have memory pressure
+  thresholds. The Memory Controller allows them to be controlled individually
+  per cgroup, instead of globally.
+
+tcp memory pressure:
+  sockets memory pressure for the tcp protocol.
+
+2.7.2 Common use cases
+----------------------
+
+Because the "kmem" counter is fed to the main user counter, kernel memory can
+never be limited completely independently of user memory. Say "U" is the user
+limit, and "K" the kernel limit. There are three possible ways limits can be
+set:
+
+U != 0, K = unlimited:
+    This is the standard memcg limitation mechanism already present before kmem
+    accounting. Kernel memory is completely ignored.
+
+U != 0, K < U:
+    Kernel memory is a subset of the user memory. This setup is useful in
+    deployments where the total amount of memory per-cgroup is overcommited.
+    Overcommiting kernel memory limits is definitely not recommended, since the
+    box can still run out of non-reclaimable memory.
+    In this case, the admin could set up K so that the sum of all groups is
+    never greater than the total memory, and freely set U at the cost of his
+    QoS.
+
+WARNING:
+    In the current implementation, memory reclaim will NOT be
+    triggered for a cgroup when it hits K while staying below U, which makes
+    this setup impractical.
+
+U != 0, K >= U:
+    Since kmem charges will also be fed to the user counter and reclaim will be
+    triggered for the cgroup for both kinds of memory. This setup gives the
+    admin a unified view of memory, and it is also useful for people who just
+    want to track kernel memory usage.
+
+3. User Interface
+=================
+
+3.0. Configuration
+------------------
+
+a. Enable CONFIG_CGROUPS
+b. Enable CONFIG_MEMCG
+c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
+d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
+
+3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
+-------------------------------------------------------------------
+
+::
+
+	# mount -t tmpfs none /sys/fs/cgroup
+	# mkdir /sys/fs/cgroup/memory
+	# mount -t cgroup none /sys/fs/cgroup/memory -o memory
+
+3.2. Make the new group and move bash into it::
+
+	# mkdir /sys/fs/cgroup/memory/0
+	# echo $$ > /sys/fs/cgroup/memory/0/tasks
+
+Since now we're in the 0 cgroup, we can alter the memory limit::
+
+	# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
+
+NOTE:
+  We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
+  mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
+  Gibibytes.)
+
+NOTE:
+  We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
+
+NOTE:
+  We cannot set limits on the root cgroup any more.
+
+::
+
+  # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
+  4194304
+
+We can check the usage::
+
+  # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
+  1216512
+
+A successful write to this file does not guarantee a successful setting of
+this limit to the value written into the file. This can be due to a
+number of factors, such as rounding up to page boundaries or the total
+availability of memory on the system. The user is required to re-read
+this file after a write to guarantee the value committed by the kernel::
+
+  # echo 1 > memory.limit_in_bytes
+  # cat memory.limit_in_bytes
+  4096
+
+The memory.failcnt field gives the number of times that the cgroup limit was
+exceeded.
+
+The memory.stat file gives accounting information. Now, the number of
+caches, RSS and Active pages/Inactive pages are shown.
+
+4. Testing
+==========
+
+For testing features and implementation, see memcg_test.txt.
+
+Performance test is also important. To see pure memory controller's overhead,
+testing on tmpfs will give you good numbers of small overheads.
+Example: do kernel make on tmpfs.
+
+Page-fault scalability is also important. At measuring parallel
+page fault test, multi-process test may be better than multi-thread
+test because it has noise of shared objects/status.
+
+But the above two are testing extreme situations.
+Trying usual test under memory controller is always helpful.
+
+4.1 Troubleshooting
+-------------------
+
+Sometimes a user might find that the application under a cgroup is
+terminated by the OOM killer. There are several causes for this:
+
+1. The cgroup limit is too low (just too low to do anything useful)
+2. The user is using anonymous memory and swap is turned off or too low
+
+A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
+some of the pages cached in the cgroup (page cache pages).
+
+To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
+seeing what happens will be helpful.
+
+4.2 Task migration
+------------------
+
+When a task migrates from one cgroup to another, its charge is not
+carried forward by default. The pages allocated from the original cgroup still
+remain charged to it, the charge is dropped when the page is freed or
+reclaimed.
+
+You can move charges of a task along with task migration.
+See 8. "Move charges at task migration"
+
+4.3 Removing a cgroup
+---------------------
+
+A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
+cgroup might have some charge associated with it, even though all
+tasks have migrated away from it. (because we charge against pages, not
+against tasks.)
+
+We move the stats to root (if use_hierarchy==0) or parent (if
+use_hierarchy==1), and no change on the charge except uncharging
+from the child.
+
+Charges recorded in swap information is not updated at removal of cgroup.
+Recorded information is discarded and a cgroup which uses swap (swapcache)
+will be charged as a new owner of it.
+
+About use_hierarchy, see Section 6.
+
+5. Misc. interfaces
+===================
+
+5.1 force_empty
+---------------
+  memory.force_empty interface is provided to make cgroup's memory usage empty.
+  When writing anything to this::
+
+    # echo 0 > memory.force_empty
+
+  the cgroup will be reclaimed and as many pages reclaimed as possible.
+
+  The typical use case for this interface is before calling rmdir().
+  Though rmdir() offlines memcg, but the memcg may still stay there due to
+  charged file caches. Some out-of-use page caches may keep charged until
+  memory pressure happens. If you want to avoid that, force_empty will be useful.
+
+  Also, note that when memory.kmem.limit_in_bytes is set the charges due to
+  kernel pages will still be seen. This is not considered a failure and the
+  write will still return success. In this case, it is expected that
+  memory.kmem.usage_in_bytes == memory.usage_in_bytes.
+
+  About use_hierarchy, see Section 6.
+
+5.2 stat file
+-------------
+
+memory.stat file includes following statistics
+
+per-memory cgroup local status
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+=============== ===============================================================
+cache		# of bytes of page cache memory.
+rss		# of bytes of anonymous and swap cache memory (includes
+		transparent hugepages).
+rss_huge	# of bytes of anonymous transparent hugepages.
+mapped_file	# of bytes of mapped file (includes tmpfs/shmem)
+pgpgin		# of charging events to the memory cgroup. The charging
+		event happens each time a page is accounted as either mapped
+		anon page(RSS) or cache page(Page Cache) to the cgroup.
+pgpgout		# of uncharging events to the memory cgroup. The uncharging
+		event happens each time a page is unaccounted from the cgroup.
+swap		# of bytes of swap usage
+dirty		# of bytes that are waiting to get written back to the disk.
+writeback	# of bytes of file/anon cache that are queued for syncing to
+		disk.
+inactive_anon	# of bytes of anonymous and swap cache memory on inactive
+		LRU list.
+active_anon	# of bytes of anonymous and swap cache memory on active
+		LRU list.
+inactive_file	# of bytes of file-backed memory on inactive LRU list.
+active_file	# of bytes of file-backed memory on active LRU list.
+unevictable	# of bytes of memory that cannot be reclaimed (mlocked etc).
+=============== ===============================================================
+
+status considering hierarchy (see memory.use_hierarchy settings)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+========================= ===================================================
+hierarchical_memory_limit # of bytes of memory limit with regard to hierarchy
+			  under which the memory cgroup is
+hierarchical_memsw_limit  # of bytes of memory+swap limit with regard to
+			  hierarchy under which memory cgroup is.
+
+total_<counter>		  # hierarchical version of <counter>, which in
+			  addition to the cgroup's own value includes the
+			  sum of all hierarchical children's values of
+			  <counter>, i.e. total_cache
+========================= ===================================================
+
+The following additional stats are dependent on CONFIG_DEBUG_VM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+========================= ========================================
+recent_rotated_anon	  VM internal parameter. (see mm/vmscan.c)
+recent_rotated_file	  VM internal parameter. (see mm/vmscan.c)
+recent_scanned_anon	  VM internal parameter. (see mm/vmscan.c)
+recent_scanned_file	  VM internal parameter. (see mm/vmscan.c)
+========================= ========================================
+
+Memo:
+	recent_rotated means recent frequency of LRU rotation.
+	recent_scanned means recent # of scans to LRU.
+	showing for better debug please see the code for meanings.
+
+Note:
+	Only anonymous and swap cache memory is listed as part of 'rss' stat.
+	This should not be confused with the true 'resident set size' or the
+	amount of physical memory used by the cgroup.
+
+	'rss + mapped_file" will give you resident set size of cgroup.
+
+	(Note: file and shmem may be shared among other cgroups. In that case,
+	mapped_file is accounted only when the memory cgroup is owner of page
+	cache.)
+
+5.3 swappiness
+--------------
+
+Overrides /proc/sys/vm/swappiness for the particular group. The tunable
+in the root cgroup corresponds to the global swappiness setting.
+
+Please note that unlike during the global reclaim, limit reclaim
+enforces that 0 swappiness really prevents from any swapping even if
+there is a swap storage available. This might lead to memcg OOM killer
+if there are no file pages to reclaim.
+
+5.4 failcnt
+-----------
+
+A memory cgroup provides memory.failcnt and memory.memsw.failcnt files.
+This failcnt(== failure count) shows the number of times that a usage counter
+hit its limit. When a memory cgroup hits a limit, failcnt increases and
+memory under it will be reclaimed.
+
+You can reset failcnt by writing 0 to failcnt file::
+
+	# echo 0 > .../memory.failcnt
+
+5.5 usage_in_bytes
+------------------
+
+For efficiency, as other kernel components, memory cgroup uses some optimization
+to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
+method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
+value for efficient access. (Of course, when necessary, it's synchronized.)
+If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
+value in memory.stat(see 5.2).
+
+5.6 numa_stat
+-------------
+
+This is similar to numa_maps but operates on a per-memcg basis.  This is
+useful for providing visibility into the numa locality information within
+an memcg since the pages are allowed to be allocated from any physical
+node.  One of the use cases is evaluating application performance by
+combining this information with the application's CPU allocation.
+
+Each memcg's numa_stat file includes "total", "file", "anon" and "unevictable"
+per-node page counts including "hierarchical_<counter>" which sums up all
+hierarchical children's values in addition to the memcg's own value.
+
+The output format of memory.numa_stat is::
+
+  total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
+  file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
+  anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
+  unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
+  hierarchical_<counter>=<counter pages> N0=<node 0 pages> N1=<node 1 pages> ...
+
+The "total" count is sum of file + anon + unevictable.
+
+6. Hierarchy support
+====================
+
+The memory controller supports a deep hierarchy and hierarchical accounting.
+The hierarchy is created by creating the appropriate cgroups in the
+cgroup filesystem. Consider for example, the following cgroup filesystem
+hierarchy::
+
+	       root
+	     /  |   \
+            /	|    \
+	   a	b     c
+		      | \
+		      |  \
+		      d   e
+
+In the diagram above, with hierarchical accounting enabled, all memory
+usage of e, is accounted to its ancestors up until the root (i.e, c and root),
+that has memory.use_hierarchy enabled. If one of the ancestors goes over its
+limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
+children of the ancestor.
+
+6.1 Enabling hierarchical accounting and reclaim
+------------------------------------------------
+
+A memory cgroup by default disables the hierarchy feature. Support
+can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup::
+
+	# echo 1 > memory.use_hierarchy
+
+The feature can be disabled by::
+
+	# echo 0 > memory.use_hierarchy
+
+NOTE1:
+       Enabling/disabling will fail if either the cgroup already has other
+       cgroups created below it, or if the parent cgroup has use_hierarchy
+       enabled.
+
+NOTE2:
+       When panic_on_oom is set to "2", the whole system will panic in
+       case of an OOM event in any cgroup.
+
+7. Soft limits
+==============
+
+Soft limits allow for greater sharing of memory. The idea behind soft limits
+is to allow control groups to use as much of the memory as needed, provided
+
+a. There is no memory contention
+b. They do not exceed their hard limit
+
+When the system detects memory contention or low memory, control groups
+are pushed back to their soft limits. If the soft limit of each control
+group is very high, they are pushed back as much as possible to make
+sure that one control group does not starve the others of memory.
+
+Please note that soft limits is a best-effort feature; it comes with
+no guarantees, but it does its best to make sure that when memory is
+heavily contended for, memory is allocated based on the soft limit
+hints/setup. Currently soft limit based reclaim is set up such that
+it gets invoked from balance_pgdat (kswapd).
+
+7.1 Interface
+-------------
+
+Soft limits can be setup by using the following commands (in this example we
+assume a soft limit of 256 MiB)::
+
+	# echo 256M > memory.soft_limit_in_bytes
+
+If we want to change this to 1G, we can at any time use::
+
+	# echo 1G > memory.soft_limit_in_bytes
+
+NOTE1:
+       Soft limits take effect over a long period of time, since they involve
+       reclaiming memory for balancing between memory cgroups
+NOTE2:
+       It is recommended to set the soft limit always below the hard limit,
+       otherwise the hard limit will take precedence.
+
+8. Move charges at task migration
+=================================
+
+Users can move charges associated with a task along with task migration, that
+is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
+This feature is not supported in !CONFIG_MMU environments because of lack of
+page tables.
+
+8.1 Interface
+-------------
+
+This feature is disabled by default. It can be enabled (and disabled again) by
+writing to memory.move_charge_at_immigrate of the destination cgroup.
+
+If you want to enable it::
+
+	# echo (some positive value) > memory.move_charge_at_immigrate
+
+Note:
+      Each bits of move_charge_at_immigrate has its own meaning about what type
+      of charges should be moved. See 8.2 for details.
+Note:
+      Charges are moved only when you move mm->owner, in other words,
+      a leader of a thread group.
+Note:
+      If we cannot find enough space for the task in the destination cgroup, we
+      try to make space by reclaiming memory. Task migration may fail if we
+      cannot make enough space.
+Note:
+      It can take several seconds if you move charges much.
+
+And if you want disable it again::
+
+	# echo 0 > memory.move_charge_at_immigrate
+
+8.2 Type of charges which can be moved
+--------------------------------------
+
+Each bit in move_charge_at_immigrate has its own meaning about what type of
+charges should be moved. But in any case, it must be noted that an account of
+a page or a swap can be moved only when it is charged to the task's current
+(old) memory cgroup.
+
++---+--------------------------------------------------------------------------+
+|bit| what type of charges would be moved ?                                    |
++===+==========================================================================+
+| 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
+|   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
++---+--------------------------------------------------------------------------+
+| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
+|   | and swaps of tmpfs file) mmapped by the target task. Unlike the case of  |
+|   | anonymous pages, file pages (and swaps) in the range mmapped by the task |
+|   | will be moved even if the task hasn't done page fault, i.e. they might   |
+|   | not be the task's "RSS", but other task's "RSS" that maps the same file. |
+|   | And mapcount of the page is ignored (the page can be moved even if       |
+|   | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to    |
+|   | enable move of swap charges.                                             |
++---+--------------------------------------------------------------------------+
+
+8.3 TODO
+--------
+
+- All of moving charge operations are done under cgroup_mutex. It's not good
+  behavior to hold the mutex too long, so we may need some trick.
+
+9. Memory thresholds
+====================
+
+Memory cgroup implements memory thresholds using the cgroups notification
+API (see cgroups.txt). It allows to register multiple memory and memsw
+thresholds and gets notifications when it crosses.
+
+To register a threshold, an application must:
+
+- create an eventfd using eventfd(2);
+- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
+- write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to
+  cgroup.event_control.
+
+Application will be notified through eventfd when memory usage crosses
+threshold in any direction.
+
+It's applicable for root and non-root cgroup.
+
+10. OOM Control
+===============
+
+memory.oom_control file is for OOM notification and other controls.
+
+Memory cgroup implements OOM notifier using the cgroup notification
+API (See cgroups.txt). It allows to register multiple OOM notification
+delivery and gets notification when OOM happens.
+
+To register a notifier, an application must:
+
+ - create an eventfd using eventfd(2)
+ - open memory.oom_control file
+ - write string like "<event_fd> <fd of memory.oom_control>" to
+   cgroup.event_control
+
+The application will be notified through eventfd when OOM happens.
+OOM notification doesn't work for the root cgroup.
+
+You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
+
+	#echo 1 > memory.oom_control
+
+If OOM-killer is disabled, tasks under cgroup will hang/sleep
+in memory cgroup's OOM-waitqueue when they request accountable memory.
+
+For running them, you have to relax the memory cgroup's OOM status by
+
+	* enlarge limit or reduce usage.
+
+To reduce usage,
+
+	* kill some tasks.
+	* move some tasks to other group with account migration.
+	* remove some files (on tmpfs?)
+
+Then, stopped tasks will work again.
+
+At reading, current status of OOM is shown.
+
+	- oom_kill_disable 0 or 1
+	  (if 1, oom-killer is disabled)
+	- under_oom	   0 or 1
+	  (if 1, the memory cgroup is under OOM, tasks may be stopped.)
+
+11. Memory Pressure
+===================
+
+The pressure level notifications can be used to monitor the memory
+allocation cost; based on the pressure, applications can implement
+different strategies of managing their memory resources. The pressure
+levels are defined as following:
+
+The "low" level means that the system is reclaiming memory for new
+allocations. Monitoring this reclaiming activity might be useful for
+maintaining cache level. Upon notification, the program (typically
+"Activity Manager") might analyze vmstat and act in advance (i.e.
+prematurely shutdown unimportant services).
+
+The "medium" level means that the system is experiencing medium memory
+pressure, the system might be making swap, paging out active file caches,
+etc. Upon this event applications may decide to further analyze
+vmstat/zoneinfo/memcg or internal memory usage statistics and free any
+resources that can be easily reconstructed or re-read from a disk.
+
+The "critical" level means that the system is actively thrashing, it is
+about to out of memory (OOM) or even the in-kernel OOM killer is on its
+way to trigger. Applications should do whatever they can to help the
+system. It might be too late to consult with vmstat or any other
+statistics, so it's advisable to take an immediate action.
+
+By default, events are propagated upward until the event is handled, i.e. the
+events are not pass-through. For example, you have three cgroups: A->B->C. Now
+you set up an event listener on cgroups A, B and C, and suppose group C
+experiences some pressure. In this situation, only group C will receive the
+notification, i.e. groups A and B will not receive it. This is done to avoid
+excessive "broadcasting" of messages, which disturbs the system and which is
+especially bad if we are low on memory or thrashing. Group B, will receive
+notification only if there are no event listers for group C.
+
+There are three optional modes that specify different propagation behavior:
+
+ - "default": this is the default behavior specified above. This mode is the
+   same as omitting the optional mode parameter, preserved by backwards
+   compatibility.
+
+ - "hierarchy": events always propagate up to the root, similar to the default
+   behavior, except that propagation continues regardless of whether there are
+   event listeners at each level, with the "hierarchy" mode. In the above
+   example, groups A, B, and C will receive notification of memory pressure.
+
+ - "local": events are pass-through, i.e. they only receive notifications when
+   memory pressure is experienced in the memcg for which the notification is
+   registered. In the above example, group C will receive notification if
+   registered for "local" notification and the group experiences memory
+   pressure. However, group B will never receive notification, regardless if
+   there is an event listener for group C or not, if group B is registered for
+   local notification.
+
+The level and event notification mode ("hierarchy" or "local", if necessary) are
+specified by a comma-delimited string, i.e. "low,hierarchy" specifies
+hierarchical, pass-through, notification for all ancestor memcgs. Notification
+that is the default, non pass-through behavior, does not specify a mode.
+"medium,local" specifies pass-through notification for the medium level.
+
+The file memory.pressure_level is only used to setup an eventfd. To
+register a notification, an application must:
+
+- create an eventfd using eventfd(2);
+- open memory.pressure_level;
+- write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>"
+  to cgroup.event_control.
+
+Application will be notified through eventfd when memory pressure is at
+the specific level (or higher). Read/write operations to
+memory.pressure_level are no implemented.
+
+Test:
+
+   Here is a small script example that makes a new cgroup, sets up a
+   memory limit, sets up a notification in the cgroup and then makes child
+   cgroup experience a critical pressure::
+
+	# cd /sys/fs/cgroup/memory/
+	# mkdir foo
+	# cd foo
+	# cgroup_event_listener memory.pressure_level low,hierarchy &
+	# echo 8000000 > memory.limit_in_bytes
+	# echo 8000000 > memory.memsw.limit_in_bytes
+	# echo $$ > tasks
+	# dd if=/dev/zero | read x
+
+   (Expect a bunch of notifications, and eventually, the oom-killer will
+   trigger.)
+
+12. TODO
+========
+
+1. Make per-cgroup scanner reclaim not-shared pages first
+2. Teach controller to account for shared-pages
+3. Start reclamation in the background when the limit is
+   not yet hit but the usage is getting closer
+
+Summary
+=======
+
+Overall, the memory controller has been a stable controller and has been
+commented and discussed quite extensively in the community.
+
+References
+==========
+
+1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
+2. Singh, Balbir. Memory Controller (RSS Control),
+   http://lwn.net/Articles/222762/
+3. Emelianov, Pavel. Resource controllers based on process cgroups
+   http://lkml.org/lkml/2007/3/6/198
+4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
+   http://lkml.org/lkml/2007/4/9/78
+5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
+   http://lkml.org/lkml/2007/5/30/244
+6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
+7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
+   subsystem (v3), http://lwn.net/Articles/235534/
+8. Singh, Balbir. RSS controller v2 test results (lmbench),
+   http://lkml.org/lkml/2007/5/17/232
+9. Singh, Balbir. RSS controller v2 AIM9 results
+   http://lkml.org/lkml/2007/5/18/1
+10. Singh, Balbir. Memory controller v6 test results,
+    http://lkml.org/lkml/2007/8/19/36
+11. Singh, Balbir. Memory controller introduction (v6),
+    http://lkml.org/lkml/2007/8/17/69
+12. Corbet, Jonathan, Controlling memory use in cgroups,
+    http://lwn.net/Articles/243795/
diff --git a/Documentation/admin-guide/cgroup-v1/net_cls.rst b/Documentation/admin-guide/cgroup-v1/net_cls.rst
new file mode 100644
index 000000000000..a2cf272af7a0
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/net_cls.rst
@@ -0,0 +1,44 @@
+=========================
+Network classifier cgroup
+=========================
+
+The Network classifier cgroup provides an interface to
+tag network packets with a class identifier (classid).
+
+The Traffic Controller (tc) can be used to assign
+different priorities to packets from different cgroups.
+Also, Netfilter (iptables) can use this tag to perform
+actions on such packets.
+
+Creating a net_cls cgroups instance creates a net_cls.classid file.
+This net_cls.classid value is initialized to 0.
+
+You can write hexadecimal values to net_cls.classid; the format for these
+values is 0xAAAABBBB; AAAA is the major handle number and BBBB
+is the minor handle number.
+Reading net_cls.classid yields a decimal result.
+
+Example::
+
+	mkdir /sys/fs/cgroup/net_cls
+	mount -t cgroup -onet_cls net_cls /sys/fs/cgroup/net_cls
+	mkdir /sys/fs/cgroup/net_cls/0
+	echo 0x100001 >  /sys/fs/cgroup/net_cls/0/net_cls.classid
+
+- setting a 10:1 handle::
+
+	cat /sys/fs/cgroup/net_cls/0/net_cls.classid
+	1048577
+
+- configuring tc::
+
+	tc qdisc add dev eth0 root handle 10: htb
+	tc class add dev eth0 parent 10: classid 10:1 htb rate 40mbit
+
+- creating traffic class 10:1::
+
+	tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup
+
+configuring iptables, basic example::
+
+	iptables -A OUTPUT -m cgroup ! --cgroup 0x100001 -j DROP
diff --git a/Documentation/admin-guide/cgroup-v1/net_prio.rst b/Documentation/admin-guide/cgroup-v1/net_prio.rst
new file mode 100644
index 000000000000..b40905871c64
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/net_prio.rst
@@ -0,0 +1,57 @@
+=======================
+Network priority cgroup
+=======================
+
+The Network priority cgroup provides an interface to allow an administrator to
+dynamically set the priority of network traffic generated by various
+applications
+
+Nominally, an application would set the priority of its traffic via the
+SO_PRIORITY socket option.  This however, is not always possible because:
+
+1) The application may not have been coded to set this value
+2) The priority of application traffic is often a site-specific administrative
+   decision rather than an application defined one.
+
+This cgroup allows an administrator to assign a process to a group which defines
+the priority of egress traffic on a given interface. Network priority groups can
+be created by first mounting the cgroup filesystem::
+
+	# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio
+
+With the above step, the initial group acting as the parent accounting group
+becomes visible at '/sys/fs/cgroup/net_prio'.  This group includes all tasks in
+the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup.
+
+Each net_prio cgroup contains two files that are subsystem specific
+
+net_prio.prioidx
+  This file is read-only, and is simply informative.  It contains a unique
+  integer value that the kernel uses as an internal representation of this
+  cgroup.
+
+net_prio.ifpriomap
+  This file contains a map of the priorities assigned to traffic originating
+  from processes in this group and egressing the system on various interfaces.
+  It contains a list of tuples in the form <ifname priority>.  Contents of this
+  file can be modified by echoing a string into the file using the same tuple
+  format. For example::
+
+	echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap
+
+This command would force any traffic originating from processes belonging to the
+iscsi net_prio cgroup and egressing on interface eth0 to have the priority of
+said traffic set to the value 5. The parent accounting group also has a
+writeable 'net_prio.ifpriomap' file that can be used to set a system default
+priority.
+
+Priorities are set immediately prior to queueing a frame to the device
+queueing discipline (qdisc) so priorities will be assigned prior to the hardware
+queue selection being made.
+
+One usage for the net_prio cgroup is with mqprio qdisc allowing application
+traffic to be steered to hardware/driver based traffic classes. These mappings
+can then be managed by administrators or other networking protocols such as
+DCBX.
+
+A new net_prio cgroup inherits the parent's configuration.
diff --git a/Documentation/admin-guide/cgroup-v1/pids.rst b/Documentation/admin-guide/cgroup-v1/pids.rst
new file mode 100644
index 000000000000..6acebd9e72c8
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/pids.rst
@@ -0,0 +1,92 @@
+=========================
+Process Number Controller
+=========================
+
+Abstract
+--------
+
+The process number controller is used to allow a cgroup hierarchy to stop any
+new tasks from being fork()'d or clone()'d after a certain limit is reached.
+
+Since it is trivial to hit the task limit without hitting any kmemcg limits in
+place, PIDs are a fundamental resource. As such, PID exhaustion must be
+preventable in the scope of a cgroup hierarchy by allowing resource limiting of
+the number of tasks in a cgroup.
+
+Usage
+-----
+
+In order to use the `pids` controller, set the maximum number of tasks in
+pids.max (this is not available in the root cgroup for obvious reasons). The
+number of processes currently in the cgroup is given by pids.current.
+
+Organisational operations are not blocked by cgroup policies, so it is possible
+to have pids.current > pids.max. This can be done by either setting the limit to
+be smaller than pids.current, or attaching enough processes to the cgroup such
+that pids.current > pids.max. However, it is not possible to violate a cgroup
+policy through fork() or clone(). fork() and clone() will return -EAGAIN if the
+creation of a new process would cause a cgroup policy to be violated.
+
+To set a cgroup to have no limit, set pids.max to "max". This is the default for
+all new cgroups (N.B. that PID limits are hierarchical, so the most stringent
+limit in the hierarchy is followed).
+
+pids.current tracks all child cgroup hierarchies, so parent/pids.current is a
+superset of parent/child/pids.current.
+
+The pids.events file contains event counters:
+
+  - max: Number of times fork failed because limit was hit.
+
+Example
+-------
+
+First, we mount the pids controller::
+
+	# mkdir -p /sys/fs/cgroup/pids
+	# mount -t cgroup -o pids none /sys/fs/cgroup/pids
+
+Then we create a hierarchy, set limits and attach processes to it::
+
+	# mkdir -p /sys/fs/cgroup/pids/parent/child
+	# echo 2 > /sys/fs/cgroup/pids/parent/pids.max
+	# echo $$ > /sys/fs/cgroup/pids/parent/cgroup.procs
+	# cat /sys/fs/cgroup/pids/parent/pids.current
+	2
+	#
+
+It should be noted that attempts to overcome the set limit (2 in this case) will
+fail::
+
+	# cat /sys/fs/cgroup/pids/parent/pids.current
+	2
+	# ( /bin/echo "Here's some processes for you." | cat )
+	sh: fork: Resource temporary unavailable
+	#
+
+Even if we migrate to a child cgroup (which doesn't have a set limit), we will
+not be able to overcome the most stringent limit in the hierarchy (in this case,
+parent's)::
+
+	# echo $$ > /sys/fs/cgroup/pids/parent/child/cgroup.procs
+	# cat /sys/fs/cgroup/pids/parent/pids.current
+	2
+	# cat /sys/fs/cgroup/pids/parent/child/pids.current
+	2
+	# cat /sys/fs/cgroup/pids/parent/child/pids.max
+	max
+	# ( /bin/echo "Here's some processes for you." | cat )
+	sh: fork: Resource temporary unavailable
+	#
+
+We can set a limit that is smaller than pids.current, which will stop any new
+processes from being forked at all (note that the shell itself counts towards
+pids.current)::
+
+	# echo 1 > /sys/fs/cgroup/pids/parent/pids.max
+	# /bin/echo "We can't even spawn a single process now."
+	sh: fork: Resource temporary unavailable
+	# echo 0 > /sys/fs/cgroup/pids/parent/pids.max
+	# /bin/echo "We can't even spawn a single process now."
+	sh: fork: Resource temporary unavailable
+	#
diff --git a/Documentation/admin-guide/cgroup-v1/rdma.rst b/Documentation/admin-guide/cgroup-v1/rdma.rst
new file mode 100644
index 000000000000..2fcb0a9bf790
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/rdma.rst
@@ -0,0 +1,117 @@
+===============
+RDMA Controller
+===============
+
+.. Contents
+
+   1. Overview
+     1-1. What is RDMA controller?
+     1-2. Why RDMA controller needed?
+     1-3. How is RDMA controller implemented?
+   2. Usage Examples
+
+1. Overview
+===========
+
+1-1. What is RDMA controller?
+-----------------------------
+
+RDMA controller allows user to limit RDMA/IB specific resources that a given
+set of processes can use. These processes are grouped using RDMA controller.
+
+RDMA controller defines two resources which can be limited for processes of a
+cgroup.
+
+1-2. Why RDMA controller needed?
+--------------------------------
+
+Currently user space applications can easily take away all the rdma verb
+specific resources such as AH, CQ, QP, MR etc. Due to which other applications
+in other cgroup or kernel space ULPs may not even get chance to allocate any
+rdma resources. This can lead to service unavailability.
+
+Therefore RDMA controller is needed through which resource consumption
+of processes can be limited. Through this controller different rdma
+resources can be accounted.
+
+1-3. How is RDMA controller implemented?
+----------------------------------------
+
+RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
+resource accounting per cgroup, per device using resource pool structure.
+Each such resource pool is limited up to 64 resources in given resource pool
+by rdma cgroup, which can be extended later if required.
+
+This resource pool object is linked to the cgroup css. Typically there
+are 0 to 4 resource pool instances per cgroup, per device in most use cases.
+But nothing limits to have it more. At present hundreds of RDMA devices per
+single cgroup may not be handled optimally, however there is no
+known use case or requirement for such configuration either.
+
+Since RDMA resources can be allocated from any process and can be freed by any
+of the child processes which shares the address space, rdma resources are
+always owned by the creator cgroup css. This allows process migration from one
+to other cgroup without major complexity of transferring resource ownership;
+because such ownership is not really present due to shared nature of
+rdma resources. Linking resources around css also ensures that cgroups can be
+deleted after processes migrated. This allow progress migration as well with
+active resources, even though that is not a primary use case.
+
+Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
+the caller. Same rdma cgroup should be passed while uncharging the resource.
+This also allows process migrated with active RDMA resource to charge
+to new owner cgroup for new resource. It also allows to uncharge resource of
+a process from previously charged cgroup which is migrated to new cgroup,
+even though that is not a primary use case.
+
+Resource pool object is created in following situations.
+(a) User sets the limit and no previous resource pool exist for the device
+of interest for the cgroup.
+(b) No resource limits were configured, but IB/RDMA stack tries to
+charge the resource. So that it correctly uncharge them when applications are
+running without limits and later on when limits are enforced during uncharging,
+otherwise usage count will drop to negative.
+
+Resource pool is destroyed if all the resource limits are set to max and
+it is the last resource getting deallocated.
+
+User should set all the limit to max value if it intents to remove/unconfigure
+the resource pool for a particular device.
+
+IB stack honors limits enforced by the rdma controller. When application
+query about maximum resource limits of IB device, it returns minimum of
+what is configured by user for a given cgroup and what is supported by
+IB device.
+
+Following resources can be accounted by rdma controller.
+
+  ==========    =============================
+  hca_handle	Maximum number of HCA Handles
+  hca_object 	Maximum number of HCA Objects
+  ==========    =============================
+
+2. Usage Examples
+=================
+
+(a) Configure resource limit::
+
+	echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
+	echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
+
+(b) Query resource limit::
+
+	cat /sys/fs/cgroup/rdma/2/rdma.max
+	#Output:
+	mlx4_0 hca_handle=2 hca_object=2000
+	ocrdma1 hca_handle=3 hca_object=max
+
+(c) Query current usage::
+
+	cat /sys/fs/cgroup/rdma/2/rdma.current
+	#Output:
+	mlx4_0 hca_handle=1 hca_object=20
+	ocrdma1 hca_handle=1 hca_object=23
+
+(d) Delete resource limit::
+
+	echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 080b18ce2a5d..ed4c5977d6e1 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -9,7 +9,7 @@ This is the authoritative documentation on the design, interface and
 conventions of cgroup v2.  It describes all userland-visible aspects
 of cgroup including core and specific controller behaviors.  All
 future changes must be reflected in this document.  Documentation for
-v1 is available under Documentation/cgroup-v1/.
+v1 is available under Documentation/admin-guide/cgroup-v1/.
 
 .. CONTENTS
 
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 1f0d9b939311..a5fdb1a846ce 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -59,6 +59,7 @@ configure specific aspects of kernel behavior to your liking.
 
    initrd
    cgroup-v2
+   cgroup-v1/index
    serial-console
    braille-console
    parport
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 78576aa45cce..a571a67e0c85 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4089,7 +4089,7 @@
 
 	relax_domain_level=
 			[KNL, SMP] Set scheduler's default relax_domain_level.
-			See Documentation/cgroup-v1/cpusets.rst.
+			See Documentation/admin-guide/cgroup-v1/cpusets.rst.
 
 	reserve=	[KNL,BUGS] Force kernel to ignore I/O ports or memory
 			Format: <base1>,<size1>[,<base2>,<size2>,...]
@@ -4599,7 +4599,7 @@
 	swapaccount=[0|1]
 			[KNL] Enable accounting of swap in memory resource
 			controller if no parameter or 1 is given or disable
-			it if 0 is given (See Documentation/cgroup-v1/memory.rst)
+			it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
 
 	swiotlb=	[ARM,IA-64,PPC,MIPS,X86]
 			Format: { <int> | force | noforce }
diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 546f174e5d6a..8463f5538fda 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -15,7 +15,7 @@ document attempts to describe the concepts and APIs of the 2.6 memory policy
 support.
 
 Memory policies should not be confused with cpusets
-(``Documentation/cgroup-v1/cpusets.rst``)
+(``Documentation/admin-guide/cgroup-v1/cpusets.rst``)
 which is an administrative mechanism for restricting the nodes from which
 memory may be allocated by a set of processes. Memory policies are a
 programming interface that a NUMA-aware application can take advantage of.  When
diff --git a/Documentation/block/bfq-iosched.rst b/Documentation/block/bfq-iosched.rst
index 2c13b2fc1888..0d237d402860 100644
--- a/Documentation/block/bfq-iosched.rst
+++ b/Documentation/block/bfq-iosched.rst
@@ -547,7 +547,7 @@ As for cgroups-v1 (blkio controller), the exact set of stat files
 created, and kept up-to-date by bfq, depends on whether
 CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
 the stat files documented in
-Documentation/cgroup-v1/blkio-controller.rst. If, instead,
+Documentation/admin-guide/cgroup-v1/blkio-controller.rst. If, instead,
 CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files::
 
   blkio.bfq.io_service_bytes
diff --git a/Documentation/cgroup-v1/blkio-controller.rst b/Documentation/cgroup-v1/blkio-controller.rst
deleted file mode 100644
index 1d7d962933be..000000000000
--- a/Documentation/cgroup-v1/blkio-controller.rst
+++ /dev/null
@@ -1,302 +0,0 @@
-===================
-Block IO Controller
-===================
-
-Overview
-========
-cgroup subsys "blkio" implements the block io controller. There seems to be
-a need of various kinds of IO control policies (like proportional BW, max BW)
-both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
-Plan is to use the same cgroup based management interface for blkio controller
-and based on user options switch IO policies in the background.
-
-One IO control policy is throttling policy which can be used to
-specify upper IO rate limits on devices. This policy is implemented in
-generic block layer and can be used on leaf nodes as well as higher
-level logical devices like device mapper.
-
-HOWTO
-=====
-Throttling/Upper Limit policy
------------------------------
-- Enable Block IO controller::
-
-	CONFIG_BLK_CGROUP=y
-
-- Enable throttling in block layer::
-
-	CONFIG_BLK_DEV_THROTTLING=y
-
-- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
-
-        mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
-
-- Specify a bandwidth rate on particular device for root group. The format
-  for policy is "<major>:<minor>  <bytes_per_second>"::
-
-        echo "8:16  1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
-
-  Above will put a limit of 1MB/second on reads happening for root group
-  on device having major/minor number 8:16.
-
-- Run dd to read a file and see if rate is throttled to 1MB/s or not::
-
-        # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
-        1024+0 records in
-        1024+0 records out
-        4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
-
- Limits for writes can be put using blkio.throttle.write_bps_device file.
-
-Hierarchical Cgroups
-====================
-
-Throttling implements hierarchy support; however,
-throttling's hierarchy support is enabled iff "sane_behavior" is
-enabled from cgroup side, which currently is a development option and
-not publicly available.
-
-If somebody created a hierarchy like as follows::
-
-			root
-			/  \
-		     test1 test2
-			|
-		     test3
-
-Throttling with "sane_behavior" will handle the
-hierarchy correctly. For throttling, all limits apply
-to the whole subtree while all statistics are local to the IOs
-directly generated by tasks in that cgroup.
-
-Throttling without "sane_behavior" enabled from cgroup side will
-practically treat all groups at same level as if it looks like the
-following::
-
-				pivot
-			     /  /   \  \
-			root  test1 test2  test3
-
-Various user visible config options
-===================================
-CONFIG_BLK_CGROUP
-	- Block IO controller.
-
-CONFIG_BFQ_CGROUP_DEBUG
-	- Debug help. Right now some additional stats file show up in cgroup
-	  if this option is enabled.
-
-CONFIG_BLK_DEV_THROTTLING
-	- Enable block device throttling support in block layer.
-
-Details of cgroup files
-=======================
-Proportional weight policy files
---------------------------------
-- blkio.weight
-	- Specifies per cgroup weight. This is default weight of the group
-	  on all the devices until and unless overridden by per device rule.
-	  (See blkio.weight_device).
-	  Currently allowed range of weights is from 10 to 1000.
-
-- blkio.weight_device
-	- One can specify per cgroup per device rules using this interface.
-	  These rules override the default value of group weight as specified
-	  by blkio.weight.
-
-	  Following is the format::
-
-	    # echo dev_maj:dev_minor weight > blkio.weight_device
-
-	  Configure weight=300 on /dev/sdb (8:16) in this cgroup::
-
-	    # echo 8:16 300 > blkio.weight_device
-	    # cat blkio.weight_device
-	    dev     weight
-	    8:16    300
-
-	  Configure weight=500 on /dev/sda (8:0) in this cgroup::
-
-	    # echo 8:0 500 > blkio.weight_device
-	    # cat blkio.weight_device
-	    dev     weight
-	    8:0     500
-	    8:16    300
-
-	  Remove specific weight for /dev/sda in this cgroup::
-
-	    # echo 8:0 0 > blkio.weight_device
-	    # cat blkio.weight_device
-	    dev     weight
-	    8:16    300
-
-- blkio.leaf_weight[_device]
-	- Equivalents of blkio.weight[_device] for the purpose of
-          deciding how much weight tasks in the given cgroup has while
-          competing with the cgroup's child cgroups. For details,
-          please refer to Documentation/block/cfq-iosched.txt.
-
-- blkio.time
-	- disk time allocated to cgroup per device in milliseconds. First
-	  two fields specify the major and minor number of the device and
-	  third field specifies the disk time allocated to group in
-	  milliseconds.
-
-- blkio.sectors
-	- number of sectors transferred to/from disk by the group. First
-	  two fields specify the major and minor number of the device and
-	  third field specifies the number of sectors transferred by the
-	  group to/from the device.
-
-- blkio.io_service_bytes
-	- Number of bytes transferred to/from the disk by the group. These
-	  are further divided by the type of operation - read or write, sync
-	  or async. First two fields specify the major and minor number of the
-	  device, third field specifies the operation type and the fourth field
-	  specifies the number of bytes.
-
-- blkio.io_serviced
-	- Number of IOs (bio) issued to the disk by the group. These
-	  are further divided by the type of operation - read or write, sync
-	  or async. First two fields specify the major and minor number of the
-	  device, third field specifies the operation type and the fourth field
-	  specifies the number of IOs.
-
-- blkio.io_service_time
-	- Total amount of time between request dispatch and request completion
-	  for the IOs done by this cgroup. This is in nanoseconds to make it
-	  meaningful for flash devices too. For devices with queue depth of 1,
-	  this time represents the actual service time. When queue_depth > 1,
-	  that is no longer true as requests may be served out of order. This
-	  may cause the service time for a given IO to include the service time
-	  of multiple IOs when served out of order which may result in total
-	  io_service_time > actual time elapsed. This time is further divided by
-	  the type of operation - read or write, sync or async. First two fields
-	  specify the major and minor number of the device, third field
-	  specifies the operation type and the fourth field specifies the
-	  io_service_time in ns.
-
-- blkio.io_wait_time
-	- Total amount of time the IOs for this cgroup spent waiting in the
-	  scheduler queues for service. This can be greater than the total time
-	  elapsed since it is cumulative io_wait_time for all IOs. It is not a
-	  measure of total time the cgroup spent waiting but rather a measure of
-	  the wait_time for its individual IOs. For devices with queue_depth > 1
-	  this metric does not include the time spent waiting for service once
-	  the IO is dispatched to the device but till it actually gets serviced
-	  (there might be a time lag here due to re-ordering of requests by the
-	  device). This is in nanoseconds to make it meaningful for flash
-	  devices too. This time is further divided by the type of operation -
-	  read or write, sync or async. First two fields specify the major and
-	  minor number of the device, third field specifies the operation type
-	  and the fourth field specifies the io_wait_time in ns.
-
-- blkio.io_merged
-	- Total number of bios/requests merged into requests belonging to this
-	  cgroup. This is further divided by the type of operation - read or
-	  write, sync or async.
-
-- blkio.io_queued
-	- Total number of requests queued up at any given instant for this
-	  cgroup. This is further divided by the type of operation - read or
-	  write, sync or async.
-
-- blkio.avg_queue_size
-	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
-	  The average queue size for this cgroup over the entire time of this
-	  cgroup's existence. Queue size samples are taken each time one of the
-	  queues of this cgroup gets a timeslice.
-
-- blkio.group_wait_time
-	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
-	  This is the amount of time the cgroup had to wait since it became busy
-	  (i.e., went from 0 to 1 request queued) to get a timeslice for one of
-	  its queues. This is different from the io_wait_time which is the
-	  cumulative total of the amount of time spent by each IO in that cgroup
-	  waiting in the scheduler queue. This is in nanoseconds. If this is
-	  read when the cgroup is in a waiting (for timeslice) state, the stat
-	  will only report the group_wait_time accumulated till the last time it
-	  got a timeslice and will not include the current delta.
-
-- blkio.empty_time
-	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
-	  This is the amount of time a cgroup spends without any pending
-	  requests when not being served, i.e., it does not include any time
-	  spent idling for one of the queues of the cgroup. This is in
-	  nanoseconds. If this is read when the cgroup is in an empty state,
-	  the stat will only report the empty_time accumulated till the last
-	  time it had a pending request and will not include the current delta.
-
-- blkio.idle_time
-	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
-	  This is the amount of time spent by the IO scheduler idling for a
-	  given cgroup in anticipation of a better request than the existing ones
-	  from other queues/cgroups. This is in nanoseconds. If this is read
-	  when the cgroup is in an idling state, the stat will only report the
-	  idle_time accumulated till the last idle period and will not include
-	  the current delta.
-
-- blkio.dequeue
-	- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
-	  gives the statistics about how many a times a group was dequeued
-	  from service tree of the device. First two fields specify the major
-	  and minor number of the device and third field specifies the number
-	  of times a group was dequeued from a particular device.
-
-- blkio.*_recursive
-	- Recursive version of various stats. These files show the
-          same information as their non-recursive counterparts but
-          include stats from all the descendant cgroups.
-
-Throttling/Upper limit policy files
------------------------------------
-- blkio.throttle.read_bps_device
-	- Specifies upper limit on READ rate from the device. IO rate is
-	  specified in bytes per second. Rules are per device. Following is
-	  the format::
-
-	    echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
-
-- blkio.throttle.write_bps_device
-	- Specifies upper limit on WRITE rate to the device. IO rate is
-	  specified in bytes per second. Rules are per device. Following is
-	  the format::
-
-	    echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
-
-- blkio.throttle.read_iops_device
-	- Specifies upper limit on READ rate from the device. IO rate is
-	  specified in IO per second. Rules are per device. Following is
-	  the format::
-
-	   echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
-
-- blkio.throttle.write_iops_device
-	- Specifies upper limit on WRITE rate to the device. IO rate is
-	  specified in io per second. Rules are per device. Following is
-	  the format::
-
-	    echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
-
-Note: If both BW and IOPS rules are specified for a device, then IO is
-      subjected to both the constraints.
-
-- blkio.throttle.io_serviced
-	- Number of IOs (bio) issued to the disk by the group. These
-	  are further divided by the type of operation - read or write, sync
-	  or async. First two fields specify the major and minor number of the
-	  device, third field specifies the operation type and the fourth field
-	  specifies the number of IOs.
-
-- blkio.throttle.io_service_bytes
-	- Number of bytes transferred to/from the disk by the group. These
-	  are further divided by the type of operation - read or write, sync
-	  or async. First two fields specify the major and minor number of the
-	  device, third field specifies the operation type and the fourth field
-	  specifies the number of bytes.
-
-Common files among various policies
------------------------------------
-- blkio.reset_stats
-	- Writing an int to this file will result in resetting all the stats
-	  for that cgroup.
diff --git a/Documentation/cgroup-v1/cgroups.rst b/Documentation/cgroup-v1/cgroups.rst
deleted file mode 100644
index 46bbe7e022d4..000000000000
--- a/Documentation/cgroup-v1/cgroups.rst
+++ /dev/null
@@ -1,695 +0,0 @@
-==============
-Control Groups
-==============
-
-Written by Paul Menage <menage@google.com> based on
-Documentation/cgroup-v1/cpusets.rst
-
-Original copyright statements from cpusets.txt:
-
-Portions Copyright (C) 2004 BULL SA.
-
-Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
-
-Modified by Paul Jackson <pj@sgi.com>
-
-Modified by Christoph Lameter <cl@linux.com>
-
-.. CONTENTS:
-
-	1. Control Groups
-	1.1 What are cgroups ?
-	1.2 Why are cgroups needed ?
-	1.3 How are cgroups implemented ?
-	1.4 What does notify_on_release do ?
-	1.5 What does clone_children do ?
-	1.6 How do I use cgroups ?
-	2. Usage Examples and Syntax
-	2.1 Basic Usage
-	2.2 Attaching processes
-	2.3 Mounting hierarchies by name
-	3. Kernel API
-	3.1 Overview
-	3.2 Synchronization
-	3.3 Subsystem API
-	4. Extended attributes usage
-	5. Questions
-
-1. Control Groups
-=================
-
-1.1 What are cgroups ?
-----------------------
-
-Control Groups provide a mechanism for aggregating/partitioning sets of
-tasks, and all their future children, into hierarchical groups with
-specialized behaviour.
-
-Definitions:
-
-A *cgroup* associates a set of tasks with a set of parameters for one
-or more subsystems.
-
-A *subsystem* is a module that makes use of the task grouping
-facilities provided by cgroups to treat groups of tasks in
-particular ways. A subsystem is typically a "resource controller" that
-schedules a resource or applies per-cgroup limits, but it may be
-anything that wants to act on a group of processes, e.g. a
-virtualization subsystem.
-
-A *hierarchy* is a set of cgroups arranged in a tree, such that
-every task in the system is in exactly one of the cgroups in the
-hierarchy, and a set of subsystems; each subsystem has system-specific
-state attached to each cgroup in the hierarchy.  Each hierarchy has
-an instance of the cgroup virtual filesystem associated with it.
-
-At any one time there may be multiple active hierarchies of task
-cgroups. Each hierarchy is a partition of all tasks in the system.
-
-User-level code may create and destroy cgroups by name in an
-instance of the cgroup virtual file system, specify and query to
-which cgroup a task is assigned, and list the task PIDs assigned to
-a cgroup. Those creations and assignments only affect the hierarchy
-associated with that instance of the cgroup file system.
-
-On their own, the only use for cgroups is for simple job
-tracking. The intention is that other subsystems hook into the generic
-cgroup support to provide new attributes for cgroups, such as
-accounting/limiting the resources which processes in a cgroup can
-access. For example, cpusets (see Documentation/cgroup-v1/cpusets.rst) allow
-you to associate a set of CPUs and a set of memory nodes with the
-tasks in each cgroup.
-
-1.2 Why are cgroups needed ?
-----------------------------
-
-There are multiple efforts to provide process aggregations in the
-Linux kernel, mainly for resource-tracking purposes. Such efforts
-include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
-namespaces. These all require the basic notion of a
-grouping/partitioning of processes, with newly forked processes ending
-up in the same group (cgroup) as their parent process.
-
-The kernel cgroup patch provides the minimum essential kernel
-mechanisms required to efficiently implement such groups. It has
-minimal impact on the system fast paths, and provides hooks for
-specific subsystems such as cpusets to provide additional behaviour as
-desired.
-
-Multiple hierarchy support is provided to allow for situations where
-the division of tasks into cgroups is distinctly different for
-different subsystems - having parallel hierarchies allows each
-hierarchy to be a natural division of tasks, without having to handle
-complex combinations of tasks that would be present if several
-unrelated subsystems needed to be forced into the same tree of
-cgroups.
-
-At one extreme, each resource controller or subsystem could be in a
-separate hierarchy; at the other extreme, all subsystems
-would be attached to the same hierarchy.
-
-As an example of a scenario (originally proposed by vatsa@in.ibm.com)
-that can benefit from multiple hierarchies, consider a large
-university server with various users - students, professors, system
-tasks etc. The resource planning for this server could be along the
-following lines::
-
-       CPU :          "Top cpuset"
-                       /       \
-               CPUSet1         CPUSet2
-                  |               |
-               (Professors)    (Students)
-
-               In addition (system tasks) are attached to topcpuset (so
-               that they can run anywhere) with a limit of 20%
-
-       Memory : Professors (50%), Students (30%), system (20%)
-
-       Disk : Professors (50%), Students (30%), system (20%)
-
-       Network : WWW browsing (20%), Network File System (60%), others (20%)
-                               / \
-               Professors (15%)  students (5%)
-
-Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes
-into the NFS network class.
-
-At the same time Firefox/Lynx will share an appropriate CPU/Memory class
-depending on who launched it (prof/student).
-
-With the ability to classify tasks differently for different resources
-(by putting those resource subsystems in different hierarchies),
-the admin can easily set up a script which receives exec notifications
-and depending on who is launching the browser he can::
-
-    # echo browser_pid > /sys/fs/cgroup/<restype>/<userclass>/tasks
-
-With only a single hierarchy, he now would potentially have to create
-a separate cgroup for every browser launched and associate it with
-appropriate network and other resource class.  This may lead to
-proliferation of such cgroups.
-
-Also let's say that the administrator would like to give enhanced network
-access temporarily to a student's browser (since it is night and the user
-wants to do online gaming :))  OR give one of the student's simulation
-apps enhanced CPU power.
-
-With ability to write PIDs directly to resource classes, it's just a
-matter of::
-
-       # echo pid > /sys/fs/cgroup/network/<new_class>/tasks
-       (after some time)
-       # echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
-
-Without this ability, the administrator would have to split the cgroup into
-multiple separate ones and then associate the new cgroups with the
-new resource classes.
-
-
-
-1.3 How are cgroups implemented ?
----------------------------------
-
-Control Groups extends the kernel as follows:
-
- - Each task in the system has a reference-counted pointer to a
-   css_set.
-
- - A css_set contains a set of reference-counted pointers to
-   cgroup_subsys_state objects, one for each cgroup subsystem
-   registered in the system. There is no direct link from a task to
-   the cgroup of which it's a member in each hierarchy, but this
-   can be determined by following pointers through the
-   cgroup_subsys_state objects. This is because accessing the
-   subsystem state is something that's expected to happen frequently
-   and in performance-critical code, whereas operations that require a
-   task's actual cgroup assignments (in particular, moving between
-   cgroups) are less common. A linked list runs through the cg_list
-   field of each task_struct using the css_set, anchored at
-   css_set->tasks.
-
- - A cgroup hierarchy filesystem can be mounted for browsing and
-   manipulation from user space.
-
- - You can list all the tasks (by PID) attached to any cgroup.
-
-The implementation of cgroups requires a few, simple hooks
-into the rest of the kernel, none in performance-critical paths:
-
- - in init/main.c, to initialize the root cgroups and initial
-   css_set at system boot.
-
- - in fork and exit, to attach and detach a task from its css_set.
-
-In addition, a new file system of type "cgroup" may be mounted, to
-enable browsing and modifying the cgroups presently known to the
-kernel.  When mounting a cgroup hierarchy, you may specify a
-comma-separated list of subsystems to mount as the filesystem mount
-options.  By default, mounting the cgroup filesystem attempts to
-mount a hierarchy containing all registered subsystems.
-
-If an active hierarchy with exactly the same set of subsystems already
-exists, it will be reused for the new mount. If no existing hierarchy
-matches, and any of the requested subsystems are in use in an existing
-hierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy
-is activated, associated with the requested subsystems.
-
-It's not currently possible to bind a new subsystem to an active
-cgroup hierarchy, or to unbind a subsystem from an active cgroup
-hierarchy. This may be possible in future, but is fraught with nasty
-error-recovery issues.
-
-When a cgroup filesystem is unmounted, if there are any
-child cgroups created below the top-level cgroup, that hierarchy
-will remain active even though unmounted; if there are no
-child cgroups then the hierarchy will be deactivated.
-
-No new system calls are added for cgroups - all support for
-querying and modifying cgroups is via this cgroup file system.
-
-Each task under /proc has an added file named 'cgroup' displaying,
-for each active hierarchy, the subsystem names and the cgroup name
-as the path relative to the root of the cgroup file system.
-
-Each cgroup is represented by a directory in the cgroup file system
-containing the following files describing that cgroup:
-
- - tasks: list of tasks (by PID) attached to that cgroup.  This list
-   is not guaranteed to be sorted.  Writing a thread ID into this file
-   moves the thread into this cgroup.
- - cgroup.procs: list of thread group IDs in the cgroup.  This list is
-   not guaranteed to be sorted or free of duplicate TGIDs, and userspace
-   should sort/uniquify the list if this property is required.
-   Writing a thread group ID into this file moves all threads in that
-   group into this cgroup.
- - notify_on_release flag: run the release agent on exit?
- - release_agent: the path to use for release notifications (this file
-   exists in the top cgroup only)
-
-Other subsystems such as cpusets may add additional files in each
-cgroup dir.
-
-New cgroups are created using the mkdir system call or shell
-command.  The properties of a cgroup, such as its flags, are
-modified by writing to the appropriate file in that cgroups
-directory, as listed above.
-
-The named hierarchical structure of nested cgroups allows partitioning
-a large system into nested, dynamically changeable, "soft-partitions".
-
-The attachment of each task, automatically inherited at fork by any
-children of that task, to a cgroup allows organizing the work load
-on a system into related sets of tasks.  A task may be re-attached to
-any other cgroup, if allowed by the permissions on the necessary
-cgroup file system directories.
-
-When a task is moved from one cgroup to another, it gets a new
-css_set pointer - if there's an already existing css_set with the
-desired collection of cgroups then that group is reused, otherwise a new
-css_set is allocated. The appropriate existing css_set is located by
-looking into a hash table.
-
-To allow access from a cgroup to the css_sets (and hence tasks)
-that comprise it, a set of cg_cgroup_link objects form a lattice;
-each cg_cgroup_link is linked into a list of cg_cgroup_links for
-a single cgroup on its cgrp_link_list field, and a list of
-cg_cgroup_links for a single css_set on its cg_link_list.
-
-Thus the set of tasks in a cgroup can be listed by iterating over
-each css_set that references the cgroup, and sub-iterating over
-each css_set's task set.
-
-The use of a Linux virtual file system (vfs) to represent the
-cgroup hierarchy provides for a familiar permission and name space
-for cgroups, with a minimum of additional kernel code.
-
-1.4 What does notify_on_release do ?
-------------------------------------
-
-If the notify_on_release flag is enabled (1) in a cgroup, then
-whenever the last task in the cgroup leaves (exits or attaches to
-some other cgroup) and the last child cgroup of that cgroup
-is removed, then the kernel runs the command specified by the contents
-of the "release_agent" file in that hierarchy's root directory,
-supplying the pathname (relative to the mount point of the cgroup
-file system) of the abandoned cgroup.  This enables automatic
-removal of abandoned cgroups.  The default value of
-notify_on_release in the root cgroup at system boot is disabled
-(0).  The default value of other cgroups at creation is the current
-value of their parents' notify_on_release settings. The default value of
-a cgroup hierarchy's release_agent path is empty.
-
-1.5 What does clone_children do ?
----------------------------------
-
-This flag only affects the cpuset controller. If the clone_children
-flag is enabled (1) in a cgroup, a new cpuset cgroup will copy its
-configuration from the parent during initialization.
-
-1.6 How do I use cgroups ?
---------------------------
-
-To start a new job that is to be contained within a cgroup, using
-the "cpuset" cgroup subsystem, the steps are something like::
-
- 1) mount -t tmpfs cgroup_root /sys/fs/cgroup
- 2) mkdir /sys/fs/cgroup/cpuset
- 3) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
- 4) Create the new cgroup by doing mkdir's and write's (or echo's) in
-    the /sys/fs/cgroup/cpuset virtual file system.
- 5) Start a task that will be the "founding father" of the new job.
- 6) Attach that task to the new cgroup by writing its PID to the
-    /sys/fs/cgroup/cpuset tasks file for that cgroup.
- 7) fork, exec or clone the job tasks from this founding father task.
-
-For example, the following sequence of commands will setup a cgroup
-named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
-and then start a subshell 'sh' in that cgroup::
-
-  mount -t tmpfs cgroup_root /sys/fs/cgroup
-  mkdir /sys/fs/cgroup/cpuset
-  mount -t cgroup cpuset -ocpuset /sys/fs/cgroup/cpuset
-  cd /sys/fs/cgroup/cpuset
-  mkdir Charlie
-  cd Charlie
-  /bin/echo 2-3 > cpuset.cpus
-  /bin/echo 1 > cpuset.mems
-  /bin/echo $$ > tasks
-  sh
-  # The subshell 'sh' is now running in cgroup Charlie
-  # The next line should display '/Charlie'
-  cat /proc/self/cgroup
-
-2. Usage Examples and Syntax
-============================
-
-2.1 Basic Usage
----------------
-
-Creating, modifying, using cgroups can be done through the cgroup
-virtual filesystem.
-
-To mount a cgroup hierarchy with all available subsystems, type::
-
-  # mount -t cgroup xxx /sys/fs/cgroup
-
-The "xxx" is not interpreted by the cgroup code, but will appear in
-/proc/mounts so may be any useful identifying string that you like.
-
-Note: Some subsystems do not work without some user input first.  For instance,
-if cpusets are enabled the user will have to populate the cpus and mems files
-for each new cgroup created before that group can be used.
-
-As explained in section `1.2 Why are cgroups needed?` you should create
-different hierarchies of cgroups for each single resource or group of
-resources you want to control. Therefore, you should mount a tmpfs on
-/sys/fs/cgroup and create directories for each cgroup resource or resource
-group::
-
-  # mount -t tmpfs cgroup_root /sys/fs/cgroup
-  # mkdir /sys/fs/cgroup/rg1
-
-To mount a cgroup hierarchy with just the cpuset and memory
-subsystems, type::
-
-  # mount -t cgroup -o cpuset,memory hier1 /sys/fs/cgroup/rg1
-
-While remounting cgroups is currently supported, it is not recommend
-to use it. Remounting allows changing bound subsystems and
-release_agent. Rebinding is hardly useful as it only works when the
-hierarchy is empty and release_agent itself should be replaced with
-conventional fsnotify. The support for remounting will be removed in
-the future.
-
-To Specify a hierarchy's release_agent::
-
-  # mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \
-    xxx /sys/fs/cgroup/rg1
-
-Note that specifying 'release_agent' more than once will return failure.
-
-Note that changing the set of subsystems is currently only supported
-when the hierarchy consists of a single (root) cgroup. Supporting
-the ability to arbitrarily bind/unbind subsystems from an existing
-cgroup hierarchy is intended to be implemented in the future.
-
-Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the
-tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1
-is the cgroup that holds the whole system.
-
-If you want to change the value of release_agent::
-
-  # echo "/sbin/new_release_agent" > /sys/fs/cgroup/rg1/release_agent
-
-It can also be changed via remount.
-
-If you want to create a new cgroup under /sys/fs/cgroup/rg1::
-
-  # cd /sys/fs/cgroup/rg1
-  # mkdir my_cgroup
-
-Now you want to do something with this cgroup:
-
-  # cd my_cgroup
-
-In this directory you can find several files::
-
-  # ls
-  cgroup.procs notify_on_release tasks
-  (plus whatever files added by the attached subsystems)
-
-Now attach your shell to this cgroup::
-
-  # /bin/echo $$ > tasks
-
-You can also create cgroups inside your cgroup by using mkdir in this
-directory::
-
-  # mkdir my_sub_cs
-
-To remove a cgroup, just use rmdir::
-
-  # rmdir my_sub_cs
-
-This will fail if the cgroup is in use (has cgroups inside, or
-has processes attached, or is held alive by other subsystem-specific
-reference).
-
-2.2 Attaching processes
------------------------
-
-::
-
-  # /bin/echo PID > tasks
-
-Note that it is PID, not PIDs. You can only attach ONE task at a time.
-If you have several tasks to attach, you have to do it one after another::
-
-  # /bin/echo PID1 > tasks
-  # /bin/echo PID2 > tasks
-	  ...
-  # /bin/echo PIDn > tasks
-
-You can attach the current shell task by echoing 0::
-
-  # echo 0 > tasks
-
-You can use the cgroup.procs file instead of the tasks file to move all
-threads in a threadgroup at once. Echoing the PID of any task in a
-threadgroup to cgroup.procs causes all tasks in that threadgroup to be
-attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
-in the writing task's threadgroup.
-
-Note: Since every task is always a member of exactly one cgroup in each
-mounted hierarchy, to remove a task from its current cgroup you must
-move it into a new cgroup (possibly the root cgroup) by writing to the
-new cgroup's tasks file.
-
-Note: Due to some restrictions enforced by some cgroup subsystems, moving
-a process to another cgroup can fail.
-
-2.3 Mounting hierarchies by name
---------------------------------
-
-Passing the name=<x> option when mounting a cgroups hierarchy
-associates the given name with the hierarchy.  This can be used when
-mounting a pre-existing hierarchy, in order to refer to it by name
-rather than by its set of active subsystems.  Each hierarchy is either
-nameless, or has a unique name.
-
-The name should match [\w.-]+
-
-When passing a name=<x> option for a new hierarchy, you need to
-specify subsystems manually; the legacy behaviour of mounting all
-subsystems when none are explicitly specified is not supported when
-you give a subsystem a name.
-
-The name of the subsystem appears as part of the hierarchy description
-in /proc/mounts and /proc/<pid>/cgroups.
-
-
-3. Kernel API
-=============
-
-3.1 Overview
-------------
-
-Each kernel subsystem that wants to hook into the generic cgroup
-system needs to create a cgroup_subsys object. This contains
-various methods, which are callbacks from the cgroup system, along
-with a subsystem ID which will be assigned by the cgroup system.
-
-Other fields in the cgroup_subsys object include:
-
-- subsys_id: a unique array index for the subsystem, indicating which
-  entry in cgroup->subsys[] this subsystem should be managing.
-
-- name: should be initialized to a unique subsystem name. Should be
-  no longer than MAX_CGROUP_TYPE_NAMELEN.
-
-- early_init: indicate if the subsystem needs early initialization
-  at system boot.
-
-Each cgroup object created by the system has an array of pointers,
-indexed by subsystem ID; this pointer is entirely managed by the
-subsystem; the generic cgroup code will never touch this pointer.
-
-3.2 Synchronization
--------------------
-
-There is a global mutex, cgroup_mutex, used by the cgroup
-system. This should be taken by anything that wants to modify a
-cgroup. It may also be taken to prevent cgroups from being
-modified, but more specific locks may be more appropriate in that
-situation.
-
-See kernel/cgroup.c for more details.
-
-Subsystems can take/release the cgroup_mutex via the functions
-cgroup_lock()/cgroup_unlock().
-
-Accessing a task's cgroup pointer may be done in the following ways:
-- while holding cgroup_mutex
-- while holding the task's alloc_lock (via task_lock())
-- inside an rcu_read_lock() section via rcu_dereference()
-
-3.3 Subsystem API
------------------
-
-Each subsystem should:
-
-- add an entry in linux/cgroup_subsys.h
-- define a cgroup_subsys object called <name>_cgrp_subsys
-
-Each subsystem may export the following methods. The only mandatory
-methods are css_alloc/free. Any others that are null are presumed to
-be successful no-ops.
-
-``struct cgroup_subsys_state *css_alloc(struct cgroup *cgrp)``
-(cgroup_mutex held by caller)
-
-Called to allocate a subsystem state object for a cgroup. The
-subsystem should allocate its subsystem state object for the passed
-cgroup, returning a pointer to the new object on success or a
-ERR_PTR() value. On success, the subsystem pointer should point to
-a structure of type cgroup_subsys_state (typically embedded in a
-larger subsystem-specific object), which will be initialized by the
-cgroup system. Note that this will be called at initialization to
-create the root subsystem state for this subsystem; this case can be
-identified by the passed cgroup object having a NULL parent (since
-it's the root of the hierarchy) and may be an appropriate place for
-initialization code.
-
-``int css_online(struct cgroup *cgrp)``
-(cgroup_mutex held by caller)
-
-Called after @cgrp successfully completed all allocations and made
-visible to cgroup_for_each_child/descendant_*() iterators. The
-subsystem may choose to fail creation by returning -errno. This
-callback can be used to implement reliable state sharing and
-propagation along the hierarchy. See the comment on
-cgroup_for_each_descendant_pre() for details.
-
-``void css_offline(struct cgroup *cgrp);``
-(cgroup_mutex held by caller)
-
-This is the counterpart of css_online() and called iff css_online()
-has succeeded on @cgrp. This signifies the beginning of the end of
-@cgrp. @cgrp is being removed and the subsystem should start dropping
-all references it's holding on @cgrp. When all references are dropped,
-cgroup removal will proceed to the next step - css_free(). After this
-callback, @cgrp should be considered dead to the subsystem.
-
-``void css_free(struct cgroup *cgrp)``
-(cgroup_mutex held by caller)
-
-The cgroup system is about to free @cgrp; the subsystem should free
-its subsystem state object. By the time this method is called, @cgrp
-is completely unused; @cgrp->parent is still valid. (Note - can also
-be called for a newly-created cgroup if an error occurs after this
-subsystem's create() method has been called for the new cgroup).
-
-``int can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
-(cgroup_mutex held by caller)
-
-Called prior to moving one or more tasks into a cgroup; if the
-subsystem returns an error, this will abort the attach operation.
-@tset contains the tasks to be attached and is guaranteed to have at
-least one task in it.
-
-If there are multiple tasks in the taskset, then:
-  - it's guaranteed that all are from the same thread group
-  - @tset contains all tasks from the thread group whether or not
-    they're switching cgroups
-  - the first task is the leader
-
-Each @tset entry also contains the task's old cgroup and tasks which
-aren't switching cgroup can be skipped easily using the
-cgroup_taskset_for_each() iterator. Note that this isn't called on a
-fork. If this method returns 0 (success) then this should remain valid
-while the caller holds cgroup_mutex and it is ensured that either
-attach() or cancel_attach() will be called in future.
-
-``void css_reset(struct cgroup_subsys_state *css)``
-(cgroup_mutex held by caller)
-
-An optional operation which should restore @css's configuration to the
-initial state.  This is currently only used on the unified hierarchy
-when a subsystem is disabled on a cgroup through
-"cgroup.subtree_control" but should remain enabled because other
-subsystems depend on it.  cgroup core makes such a css invisible by
-removing the associated interface files and invokes this callback so
-that the hidden subsystem can return to the initial neutral state.
-This prevents unexpected resource control from a hidden css and
-ensures that the configuration is in the initial state when it is made
-visible again later.
-
-``void cancel_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
-(cgroup_mutex held by caller)
-
-Called when a task attach operation has failed after can_attach() has succeeded.
-A subsystem whose can_attach() has some side-effects should provide this
-function, so that the subsystem can implement a rollback. If not, not necessary.
-This will be called only about subsystems whose can_attach() operation have
-succeeded. The parameters are identical to can_attach().
-
-``void attach(struct cgroup *cgrp, struct cgroup_taskset *tset)``
-(cgroup_mutex held by caller)
-
-Called after the task has been attached to the cgroup, to allow any
-post-attachment activity that requires memory allocations or blocking.
-The parameters are identical to can_attach().
-
-``void fork(struct task_struct *task)``
-
-Called when a task is forked into a cgroup.
-
-``void exit(struct task_struct *task)``
-
-Called during task exit.
-
-``void free(struct task_struct *task)``
-
-Called when the task_struct is freed.
-
-``void bind(struct cgroup *root)``
-(cgroup_mutex held by caller)
-
-Called when a cgroup subsystem is rebound to a different hierarchy
-and root cgroup. Currently this will only involve movement between
-the default hierarchy (which never has sub-cgroups) and a hierarchy
-that is being created/destroyed (and hence has no sub-cgroups).
-
-4. Extended attribute usage
-===========================
-
-cgroup filesystem supports certain types of extended attributes in its
-directories and files.  The current supported types are:
-
-	- Trusted (XATTR_TRUSTED)
-	- Security (XATTR_SECURITY)
-
-Both require CAP_SYS_ADMIN capability to set.
-
-Like in tmpfs, the extended attributes in cgroup filesystem are stored
-using kernel memory and it's advised to keep the usage at minimum.  This
-is the reason why user defined extended attributes are not supported, since
-any user can do it and there's no limit in the value size.
-
-The current known users for this feature are SELinux to limit cgroup usage
-in containers and systemd for assorted meta data like main PID in a cgroup
-(systemd creates a cgroup per service).
-
-5. Questions
-============
-
-::
-
-  Q: what's up with this '/bin/echo' ?
-  A: bash's builtin 'echo' command does not check calls to write() against
-     errors. If you use it in the cgroup file system, you won't be
-     able to tell whether a command succeeded or failed.
-
-  Q: When I attach processes, only the first of the line gets really attached !
-  A: We can only return one error code per call to write(). So you should also
-     put only ONE PID.
diff --git a/Documentation/cgroup-v1/cpuacct.rst b/Documentation/cgroup-v1/cpuacct.rst
deleted file mode 100644
index d30ed81d2ad7..000000000000
--- a/Documentation/cgroup-v1/cpuacct.rst
+++ /dev/null
@@ -1,50 +0,0 @@
-=========================
-CPU Accounting Controller
-=========================
-
-The CPU accounting controller is used to group tasks using cgroups and
-account the CPU usage of these groups of tasks.
-
-The CPU accounting controller supports multi-hierarchy groups. An accounting
-group accumulates the CPU usage of all of its child groups and the tasks
-directly present in its group.
-
-Accounting groups can be created by first mounting the cgroup filesystem::
-
-  # mount -t cgroup -ocpuacct none /sys/fs/cgroup
-
-With the above step, the initial or the parent accounting group becomes
-visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
-the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
-/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained
-by this group which is essentially the CPU time obtained by all the tasks
-in the system.
-
-New accounting groups can be created under the parent group /sys/fs/cgroup::
-
-  # cd /sys/fs/cgroup
-  # mkdir g1
-  # echo $$ > g1/tasks
-
-The above steps create a new group g1 and move the current shell
-process (bash) into it. CPU time consumed by this bash and its children
-can be obtained from g1/cpuacct.usage and the same is accumulated in
-/sys/fs/cgroup/cpuacct.usage also.
-
-cpuacct.stat file lists a few statistics which further divide the
-CPU time obtained by the cgroup into user and system times. Currently
-the following statistics are supported:
-
-user: Time spent by tasks of the cgroup in user mode.
-system: Time spent by tasks of the cgroup in kernel mode.
-
-user and system are in USER_HZ unit.
-
-cpuacct controller uses percpu_counter interface to collect user and
-system times. This has two side effects:
-
-- It is theoretically possible to see wrong values for user and system times.
-  This is because percpu_counter_read() on 32bit systems isn't safe
-  against concurrent writes.
-- It is possible to see slightly outdated values for user and system times
-  due to the batch processing nature of percpu_counter.
diff --git a/Documentation/cgroup-v1/cpusets.rst b/Documentation/cgroup-v1/cpusets.rst
deleted file mode 100644
index b6a42cdea72b..000000000000
--- a/Documentation/cgroup-v1/cpusets.rst
+++ /dev/null
@@ -1,866 +0,0 @@
-=======
-CPUSETS
-=======
-
-Copyright (C) 2004 BULL SA.
-
-Written by Simon.Derr@bull.net
-
-- Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
-- Modified by Paul Jackson <pj@sgi.com>
-- Modified by Christoph Lameter <cl@linux.com>
-- Modified by Paul Menage <menage@google.com>
-- Modified by Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
-
-.. CONTENTS:
-
-   1. Cpusets
-     1.1 What are cpusets ?
-     1.2 Why are cpusets needed ?
-     1.3 How are cpusets implemented ?
-     1.4 What are exclusive cpusets ?
-     1.5 What is memory_pressure ?
-     1.6 What is memory spread ?
-     1.7 What is sched_load_balance ?
-     1.8 What is sched_relax_domain_level ?
-     1.9 How do I use cpusets ?
-   2. Usage Examples and Syntax
-     2.1 Basic Usage
-     2.2 Adding/removing cpus
-     2.3 Setting flags
-     2.4 Attaching processes
-   3. Questions
-   4. Contact
-
-1. Cpusets
-==========
-
-1.1 What are cpusets ?
-----------------------
-
-Cpusets provide a mechanism for assigning a set of CPUs and Memory
-Nodes to a set of tasks.   In this document "Memory Node" refers to
-an on-line node that contains memory.
-
-Cpusets constrain the CPU and Memory placement of tasks to only
-the resources within a task's current cpuset.  They form a nested
-hierarchy visible in a virtual file system.  These are the essential
-hooks, beyond what is already present, required to manage dynamic
-job placement on large systems.
-
-Cpusets use the generic cgroup subsystem described in
-Documentation/cgroup-v1/cgroups.rst.
-
-Requests by a task, using the sched_setaffinity(2) system call to
-include CPUs in its CPU affinity mask, and using the mbind(2) and
-set_mempolicy(2) system calls to include Memory Nodes in its memory
-policy, are both filtered through that task's cpuset, filtering out any
-CPUs or Memory Nodes not in that cpuset.  The scheduler will not
-schedule a task on a CPU that is not allowed in its cpus_allowed
-vector, and the kernel page allocator will not allocate a page on a
-node that is not allowed in the requesting task's mems_allowed vector.
-
-User level code may create and destroy cpusets by name in the cgroup
-virtual file system, manage the attributes and permissions of these
-cpusets and which CPUs and Memory Nodes are assigned to each cpuset,
-specify and query to which cpuset a task is assigned, and list the
-task pids assigned to a cpuset.
-
-
-1.2 Why are cpusets needed ?
-----------------------------
-
-The management of large computer systems, with many processors (CPUs),
-complex memory cache hierarchies and multiple Memory Nodes having
-non-uniform access times (NUMA) presents additional challenges for
-the efficient scheduling and memory placement of processes.
-
-Frequently more modest sized systems can be operated with adequate
-efficiency just by letting the operating system automatically share
-the available CPU and Memory resources amongst the requesting tasks.
-
-But larger systems, which benefit more from careful processor and
-memory placement to reduce memory access times and contention,
-and which typically represent a larger investment for the customer,
-can benefit from explicitly placing jobs on properly sized subsets of
-the system.
-
-This can be especially valuable on:
-
-    * Web Servers running multiple instances of the same web application,
-    * Servers running different applications (for instance, a web server
-      and a database), or
-    * NUMA systems running large HPC applications with demanding
-      performance characteristics.
-
-These subsets, or "soft partitions" must be able to be dynamically
-adjusted, as the job mix changes, without impacting other concurrently
-executing jobs. The location of the running jobs pages may also be moved
-when the memory locations are changed.
-
-The kernel cpuset patch provides the minimum essential kernel
-mechanisms required to efficiently implement such subsets.  It
-leverages existing CPU and Memory Placement facilities in the Linux
-kernel to avoid any additional impact on the critical scheduler or
-memory allocator code.
-
-
-1.3 How are cpusets implemented ?
----------------------------------
-
-Cpusets provide a Linux kernel mechanism to constrain which CPUs and
-Memory Nodes are used by a process or set of processes.
-
-The Linux kernel already has a pair of mechanisms to specify on which
-CPUs a task may be scheduled (sched_setaffinity) and on which Memory
-Nodes it may obtain memory (mbind, set_mempolicy).
-
-Cpusets extends these two mechanisms as follows:
-
- - Cpusets are sets of allowed CPUs and Memory Nodes, known to the
-   kernel.
- - Each task in the system is attached to a cpuset, via a pointer
-   in the task structure to a reference counted cgroup structure.
- - Calls to sched_setaffinity are filtered to just those CPUs
-   allowed in that task's cpuset.
- - Calls to mbind and set_mempolicy are filtered to just
-   those Memory Nodes allowed in that task's cpuset.
- - The root cpuset contains all the systems CPUs and Memory
-   Nodes.
- - For any cpuset, one can define child cpusets containing a subset
-   of the parents CPU and Memory Node resources.
- - The hierarchy of cpusets can be mounted at /dev/cpuset, for
-   browsing and manipulation from user space.
- - A cpuset may be marked exclusive, which ensures that no other
-   cpuset (except direct ancestors and descendants) may contain
-   any overlapping CPUs or Memory Nodes.
- - You can list all the tasks (by pid) attached to any cpuset.
-
-The implementation of cpusets requires a few, simple hooks
-into the rest of the kernel, none in performance critical paths:
-
- - in init/main.c, to initialize the root cpuset at system boot.
- - in fork and exit, to attach and detach a task from its cpuset.
- - in sched_setaffinity, to mask the requested CPUs by what's
-   allowed in that task's cpuset.
- - in sched.c migrate_live_tasks(), to keep migrating tasks within
-   the CPUs allowed by their cpuset, if possible.
- - in the mbind and set_mempolicy system calls, to mask the requested
-   Memory Nodes by what's allowed in that task's cpuset.
- - in page_alloc.c, to restrict memory to allowed nodes.
- - in vmscan.c, to restrict page recovery to the current cpuset.
-
-You should mount the "cgroup" filesystem type in order to enable
-browsing and modifying the cpusets presently known to the kernel.  No
-new system calls are added for cpusets - all support for querying and
-modifying cpusets is via this cpuset file system.
-
-The /proc/<pid>/status file for each task has four added lines,
-displaying the task's cpus_allowed (on which CPUs it may be scheduled)
-and mems_allowed (on which Memory Nodes it may obtain memory),
-in the two formats seen in the following example::
-
-  Cpus_allowed:   ffffffff,ffffffff,ffffffff,ffffffff
-  Cpus_allowed_list:      0-127
-  Mems_allowed:   ffffffff,ffffffff
-  Mems_allowed_list:      0-63
-
-Each cpuset is represented by a directory in the cgroup file system
-containing (on top of the standard cgroup files) the following
-files describing that cpuset:
-
- - cpuset.cpus: list of CPUs in that cpuset
- - cpuset.mems: list of Memory Nodes in that cpuset
- - cpuset.memory_migrate flag: if set, move pages to cpusets nodes
- - cpuset.cpu_exclusive flag: is cpu placement exclusive?
- - cpuset.mem_exclusive flag: is memory placement exclusive?
- - cpuset.mem_hardwall flag:  is memory allocation hardwalled
- - cpuset.memory_pressure: measure of how much paging pressure in cpuset
- - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes
- - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
- - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset
- - cpuset.sched_relax_domain_level: the searching range when migrating tasks
-
-In addition, only the root cpuset has the following file:
-
- - cpuset.memory_pressure_enabled flag: compute memory_pressure?
-
-New cpusets are created using the mkdir system call or shell
-command.  The properties of a cpuset, such as its flags, allowed
-CPUs and Memory Nodes, and attached tasks, are modified by writing
-to the appropriate file in that cpusets directory, as listed above.
-
-The named hierarchical structure of nested cpusets allows partitioning
-a large system into nested, dynamically changeable, "soft-partitions".
-
-The attachment of each task, automatically inherited at fork by any
-children of that task, to a cpuset allows organizing the work load
-on a system into related sets of tasks such that each set is constrained
-to using the CPUs and Memory Nodes of a particular cpuset.  A task
-may be re-attached to any other cpuset, if allowed by the permissions
-on the necessary cpuset file system directories.
-
-Such management of a system "in the large" integrates smoothly with
-the detailed placement done on individual tasks and memory regions
-using the sched_setaffinity, mbind and set_mempolicy system calls.
-
-The following rules apply to each cpuset:
-
- - Its CPUs and Memory Nodes must be a subset of its parents.
- - It can't be marked exclusive unless its parent is.
- - If its cpu or memory is exclusive, they may not overlap any sibling.
-
-These rules, and the natural hierarchy of cpusets, enable efficient
-enforcement of the exclusive guarantee, without having to scan all
-cpusets every time any of them change to ensure nothing overlaps a
-exclusive cpuset.  Also, the use of a Linux virtual file system (vfs)
-to represent the cpuset hierarchy provides for a familiar permission
-and name space for cpusets, with a minimum of additional kernel code.
-
-The cpus and mems files in the root (top_cpuset) cpuset are
-read-only.  The cpus file automatically tracks the value of
-cpu_online_mask using a CPU hotplug notifier, and the mems file
-automatically tracks the value of node_states[N_MEMORY]--i.e.,
-nodes with memory--using the cpuset_track_online_nodes() hook.
-
-
-1.4 What are exclusive cpusets ?
---------------------------------
-
-If a cpuset is cpu or mem exclusive, no other cpuset, other than
-a direct ancestor or descendant, may share any of the same CPUs or
-Memory Nodes.
-
-A cpuset that is cpuset.mem_exclusive *or* cpuset.mem_hardwall is "hardwalled",
-i.e. it restricts kernel allocations for page, buffer and other data
-commonly shared by the kernel across multiple users.  All cpusets,
-whether hardwalled or not, restrict allocations of memory for user
-space.  This enables configuring a system so that several independent
-jobs can share common kernel data, such as file system pages, while
-isolating each job's user allocation in its own cpuset.  To do this,
-construct a large mem_exclusive cpuset to hold all the jobs, and
-construct child, non-mem_exclusive cpusets for each individual job.
-Only a small amount of typical kernel memory, such as requests from
-interrupt handlers, is allowed to be taken outside even a
-mem_exclusive cpuset.
-
-
-1.5 What is memory_pressure ?
------------------------------
-The memory_pressure of a cpuset provides a simple per-cpuset metric
-of the rate that the tasks in a cpuset are attempting to free up in
-use memory on the nodes of the cpuset to satisfy additional memory
-requests.
-
-This enables batch managers monitoring jobs running in dedicated
-cpusets to efficiently detect what level of memory pressure that job
-is causing.
-
-This is useful both on tightly managed systems running a wide mix of
-submitted jobs, which may choose to terminate or re-prioritize jobs that
-are trying to use more memory than allowed on the nodes assigned to them,
-and with tightly coupled, long running, massively parallel scientific
-computing jobs that will dramatically fail to meet required performance
-goals if they start to use more memory than allowed to them.
-
-This mechanism provides a very economical way for the batch manager
-to monitor a cpuset for signs of memory pressure.  It's up to the
-batch manager or other user code to decide what to do about it and
-take action.
-
-==>
-    Unless this feature is enabled by writing "1" to the special file
-    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
-    code of __alloc_pages() for this metric reduces to simply noticing
-    that the cpuset_memory_pressure_enabled flag is zero.  So only
-    systems that enable this feature will compute the metric.
-
-Why a per-cpuset, running average:
-
-    Because this meter is per-cpuset, rather than per-task or mm,
-    the system load imposed by a batch scheduler monitoring this
-    metric is sharply reduced on large systems, because a scan of
-    the tasklist can be avoided on each set of queries.
-
-    Because this meter is a running average, instead of an accumulating
-    counter, a batch scheduler can detect memory pressure with a
-    single read, instead of having to read and accumulate results
-    for a period of time.
-
-    Because this meter is per-cpuset rather than per-task or mm,
-    the batch scheduler can obtain the key information, memory
-    pressure in a cpuset, with a single read, rather than having to
-    query and accumulate results over all the (dynamically changing)
-    set of tasks in the cpuset.
-
-A per-cpuset simple digital filter (requires a spinlock and 3 words
-of data per-cpuset) is kept, and updated by any task attached to that
-cpuset, if it enters the synchronous (direct) page reclaim code.
-
-A per-cpuset file provides an integer number representing the recent
-(half-life of 10 seconds) rate of direct page reclaims caused by
-the tasks in the cpuset, in units of reclaims attempted per second,
-times 1000.
-
-
-1.6 What is memory spread ?
----------------------------
-There are two boolean flag files per cpuset that control where the
-kernel allocates pages for the file system buffers and related in
-kernel data structures.  They are called 'cpuset.memory_spread_page' and
-'cpuset.memory_spread_slab'.
-
-If the per-cpuset boolean flag file 'cpuset.memory_spread_page' is set, then
-the kernel will spread the file system buffers (page cache) evenly
-over all the nodes that the faulting task is allowed to use, instead
-of preferring to put those pages on the node where the task is running.
-
-If the per-cpuset boolean flag file 'cpuset.memory_spread_slab' is set,
-then the kernel will spread some file system related slab caches,
-such as for inodes and dentries evenly over all the nodes that the
-faulting task is allowed to use, instead of preferring to put those
-pages on the node where the task is running.
-
-The setting of these flags does not affect anonymous data segment or
-stack segment pages of a task.
-
-By default, both kinds of memory spreading are off, and memory
-pages are allocated on the node local to where the task is running,
-except perhaps as modified by the task's NUMA mempolicy or cpuset
-configuration, so long as sufficient free memory pages are available.
-
-When new cpusets are created, they inherit the memory spread settings
-of their parent.
-
-Setting memory spreading causes allocations for the affected page
-or slab caches to ignore the task's NUMA mempolicy and be spread
-instead.    Tasks using mbind() or set_mempolicy() calls to set NUMA
-mempolicies will not notice any change in these calls as a result of
-their containing task's memory spread settings.  If memory spreading
-is turned off, then the currently specified NUMA mempolicy once again
-applies to memory page allocations.
-
-Both 'cpuset.memory_spread_page' and 'cpuset.memory_spread_slab' are boolean flag
-files.  By default they contain "0", meaning that the feature is off
-for that cpuset.  If a "1" is written to that file, then that turns
-the named feature on.
-
-The implementation is simple.
-
-Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
-PFA_SPREAD_PAGE for each task that is in that cpuset or subsequently
-joins that cpuset.  The page allocation calls for the page cache
-is modified to perform an inline check for this PFA_SPREAD_PAGE task
-flag, and if set, a call to a new routine cpuset_mem_spread_node()
-returns the node to prefer for the allocation.
-
-Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
-PFA_SPREAD_SLAB, and appropriately marked slab caches will allocate
-pages from the node returned by cpuset_mem_spread_node().
-
-The cpuset_mem_spread_node() routine is also simple.  It uses the
-value of a per-task rotor cpuset_mem_spread_rotor to select the next
-node in the current task's mems_allowed to prefer for the allocation.
-
-This memory placement policy is also known (in other contexts) as
-round-robin or interleave.
-
-This policy can provide substantial improvements for jobs that need
-to place thread local data on the corresponding node, but that need
-to access large file system data sets that need to be spread across
-the several nodes in the jobs cpuset in order to fit.  Without this
-policy, especially for jobs that might have one thread reading in the
-data set, the memory allocation across the nodes in the jobs cpuset
-can become very uneven.
-
-1.7 What is sched_load_balance ?
---------------------------------
-
-The kernel scheduler (kernel/sched/core.c) automatically load balances
-tasks.  If one CPU is underutilized, kernel code running on that
-CPU will look for tasks on other more overloaded CPUs and move those
-tasks to itself, within the constraints of such placement mechanisms
-as cpusets and sched_setaffinity.
-
-The algorithmic cost of load balancing and its impact on key shared
-kernel data structures such as the task list increases more than
-linearly with the number of CPUs being balanced.  So the scheduler
-has support to partition the systems CPUs into a number of sched
-domains such that it only load balances within each sched domain.
-Each sched domain covers some subset of the CPUs in the system;
-no two sched domains overlap; some CPUs might not be in any sched
-domain and hence won't be load balanced.
-
-Put simply, it costs less to balance between two smaller sched domains
-than one big one, but doing so means that overloads in one of the
-two domains won't be load balanced to the other one.
-
-By default, there is one sched domain covering all CPUs, including those
-marked isolated using the kernel boot time "isolcpus=" argument. However,
-the isolated CPUs will not participate in load balancing, and will not
-have tasks running on them unless explicitly assigned.
-
-This default load balancing across all CPUs is not well suited for
-the following two situations:
-
- 1) On large systems, load balancing across many CPUs is expensive.
-    If the system is managed using cpusets to place independent jobs
-    on separate sets of CPUs, full load balancing is unnecessary.
- 2) Systems supporting realtime on some CPUs need to minimize
-    system overhead on those CPUs, including avoiding task load
-    balancing if that is not needed.
-
-When the per-cpuset flag "cpuset.sched_load_balance" is enabled (the default
-setting), it requests that all the CPUs in that cpusets allowed 'cpuset.cpus'
-be contained in a single sched domain, ensuring that load balancing
-can move a task (not otherwised pinned, as by sched_setaffinity)
-from any CPU in that cpuset to any other.
-
-When the per-cpuset flag "cpuset.sched_load_balance" is disabled, then the
-scheduler will avoid load balancing across the CPUs in that cpuset,
---except-- in so far as is necessary because some overlapping cpuset
-has "sched_load_balance" enabled.
-
-So, for example, if the top cpuset has the flag "cpuset.sched_load_balance"
-enabled, then the scheduler will have one sched domain covering all
-CPUs, and the setting of the "cpuset.sched_load_balance" flag in any other
-cpusets won't matter, as we're already fully load balancing.
-
-Therefore in the above two situations, the top cpuset flag
-"cpuset.sched_load_balance" should be disabled, and only some of the smaller,
-child cpusets have this flag enabled.
-
-When doing this, you don't usually want to leave any unpinned tasks in
-the top cpuset that might use non-trivial amounts of CPU, as such tasks
-may be artificially constrained to some subset of CPUs, depending on
-the particulars of this flag setting in descendant cpusets.  Even if
-such a task could use spare CPU cycles in some other CPUs, the kernel
-scheduler might not consider the possibility of load balancing that
-task to that underused CPU.
-
-Of course, tasks pinned to a particular CPU can be left in a cpuset
-that disables "cpuset.sched_load_balance" as those tasks aren't going anywhere
-else anyway.
-
-There is an impedance mismatch here, between cpusets and sched domains.
-Cpusets are hierarchical and nest.  Sched domains are flat; they don't
-overlap and each CPU is in at most one sched domain.
-
-It is necessary for sched domains to be flat because load balancing
-across partially overlapping sets of CPUs would risk unstable dynamics
-that would be beyond our understanding.  So if each of two partially
-overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we
-form a single sched domain that is a superset of both.  We won't move
-a task to a CPU outside its cpuset, but the scheduler load balancing
-code might waste some compute cycles considering that possibility.
-
-This mismatch is why there is not a simple one-to-one relation
-between which cpusets have the flag "cpuset.sched_load_balance" enabled,
-and the sched domain configuration.  If a cpuset enables the flag, it
-will get balancing across all its CPUs, but if it disables the flag,
-it will only be assured of no load balancing if no other overlapping
-cpuset enables the flag.
-
-If two cpusets have partially overlapping 'cpuset.cpus' allowed, and only
-one of them has this flag enabled, then the other may find its
-tasks only partially load balanced, just on the overlapping CPUs.
-This is just the general case of the top_cpuset example given a few
-paragraphs above.  In the general case, as in the top cpuset case,
-don't leave tasks that might use non-trivial amounts of CPU in
-such partially load balanced cpusets, as they may be artificially
-constrained to some subset of the CPUs allowed to them, for lack of
-load balancing to the other CPUs.
-
-CPUs in "cpuset.isolcpus" were excluded from load balancing by the
-isolcpus= kernel boot option, and will never be load balanced regardless
-of the value of "cpuset.sched_load_balance" in any cpuset.
-
-1.7.1 sched_load_balance implementation details.
-------------------------------------------------
-
-The per-cpuset flag 'cpuset.sched_load_balance' defaults to enabled (contrary
-to most cpuset flags.)  When enabled for a cpuset, the kernel will
-ensure that it can load balance across all the CPUs in that cpuset
-(makes sure that all the CPUs in the cpus_allowed of that cpuset are
-in the same sched domain.)
-
-If two overlapping cpusets both have 'cpuset.sched_load_balance' enabled,
-then they will be (must be) both in the same sched domain.
-
-If, as is the default, the top cpuset has 'cpuset.sched_load_balance' enabled,
-then by the above that means there is a single sched domain covering
-the whole system, regardless of any other cpuset settings.
-
-The kernel commits to user space that it will avoid load balancing
-where it can.  It will pick as fine a granularity partition of sched
-domains as it can while still providing load balancing for any set
-of CPUs allowed to a cpuset having 'cpuset.sched_load_balance' enabled.
-
-The internal kernel cpuset to scheduler interface passes from the
-cpuset code to the scheduler code a partition of the load balanced
-CPUs in the system. This partition is a set of subsets (represented
-as an array of struct cpumask) of CPUs, pairwise disjoint, that cover
-all the CPUs that must be load balanced.
-
-The cpuset code builds a new such partition and passes it to the
-scheduler sched domain setup code, to have the sched domains rebuilt
-as necessary, whenever:
-
- - the 'cpuset.sched_load_balance' flag of a cpuset with non-empty CPUs changes,
- - or CPUs come or go from a cpuset with this flag enabled,
- - or 'cpuset.sched_relax_domain_level' value of a cpuset with non-empty CPUs
-   and with this flag enabled changes,
- - or a cpuset with non-empty CPUs and with this flag enabled is removed,
- - or a cpu is offlined/onlined.
-
-This partition exactly defines what sched domains the scheduler should
-setup - one sched domain for each element (struct cpumask) in the
-partition.
-
-The scheduler remembers the currently active sched domain partitions.
-When the scheduler routine partition_sched_domains() is invoked from
-the cpuset code to update these sched domains, it compares the new
-partition requested with the current, and updates its sched domains,
-removing the old and adding the new, for each change.
-
-
-1.8 What is sched_relax_domain_level ?
---------------------------------------
-
-In sched domain, the scheduler migrates tasks in 2 ways; periodic load
-balance on tick, and at time of some schedule events.
-
-When a task is woken up, scheduler try to move the task on idle CPU.
-For example, if a task A running on CPU X activates another task B
-on the same CPU X, and if CPU Y is X's sibling and performing idle,
-then scheduler migrate task B to CPU Y so that task B can start on
-CPU Y without waiting task A on CPU X.
-
-And if a CPU run out of tasks in its runqueue, the CPU try to pull
-extra tasks from other busy CPUs to help them before it is going to
-be idle.
-
-Of course it takes some searching cost to find movable tasks and/or
-idle CPUs, the scheduler might not search all CPUs in the domain
-every time.  In fact, in some architectures, the searching ranges on
-events are limited in the same socket or node where the CPU locates,
-while the load balance on tick searches all.
-
-For example, assume CPU Z is relatively far from CPU X.  Even if CPU Z
-is idle while CPU X and the siblings are busy, scheduler can't migrate
-woken task B from X to Z since it is out of its searching range.
-As the result, task B on CPU X need to wait task A or wait load balance
-on the next tick.  For some applications in special situation, waiting
-1 tick may be too long.
-
-The 'cpuset.sched_relax_domain_level' file allows you to request changing
-this searching range as you like.  This file takes int value which
-indicates size of searching range in levels ideally as follows,
-otherwise initial value -1 that indicates the cpuset has no request.
-
-====== ===========================================================
-  -1   no request. use system default or follow request of others.
-   0   no search.
-   1   search siblings (hyperthreads in a core).
-   2   search cores in a package.
-   3   search cpus in a node [= system wide on non-NUMA system]
-   4   search nodes in a chunk of node [on NUMA system]
-   5   search system wide [on NUMA system]
-====== ===========================================================
-
-The system default is architecture dependent.  The system default
-can be changed using the relax_domain_level= boot parameter.
-
-This file is per-cpuset and affect the sched domain where the cpuset
-belongs to.  Therefore if the flag 'cpuset.sched_load_balance' of a cpuset
-is disabled, then 'cpuset.sched_relax_domain_level' have no effect since
-there is no sched domain belonging the cpuset.
-
-If multiple cpusets are overlapping and hence they form a single sched
-domain, the largest value among those is used.  Be careful, if one
-requests 0 and others are -1 then 0 is used.
-
-Note that modifying this file will have both good and bad effects,
-and whether it is acceptable or not depends on your situation.
-Don't modify this file if you are not sure.
-
-If your situation is:
-
- - The migration costs between each cpu can be assumed considerably
-   small(for you) due to your special application's behavior or
-   special hardware support for CPU cache etc.
- - The searching cost doesn't have impact(for you) or you can make
-   the searching cost enough small by managing cpuset to compact etc.
- - The latency is required even it sacrifices cache hit rate etc.
-   then increasing 'sched_relax_domain_level' would benefit you.
-
-
-1.9 How do I use cpusets ?
---------------------------
-
-In order to minimize the impact of cpusets on critical kernel
-code, such as the scheduler, and due to the fact that the kernel
-does not support one task updating the memory placement of another
-task directly, the impact on a task of changing its cpuset CPU
-or Memory Node placement, or of changing to which cpuset a task
-is attached, is subtle.
-
-If a cpuset has its Memory Nodes modified, then for each task attached
-to that cpuset, the next time that the kernel attempts to allocate
-a page of memory for that task, the kernel will notice the change
-in the task's cpuset, and update its per-task memory placement to
-remain within the new cpusets memory placement.  If the task was using
-mempolicy MPOL_BIND, and the nodes to which it was bound overlap with
-its new cpuset, then the task will continue to use whatever subset
-of MPOL_BIND nodes are still allowed in the new cpuset.  If the task
-was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed
-in the new cpuset, then the task will be essentially treated as if it
-was MPOL_BIND bound to the new cpuset (even though its NUMA placement,
-as queried by get_mempolicy(), doesn't change).  If a task is moved
-from one cpuset to another, then the kernel will adjust the task's
-memory placement, as above, the next time that the kernel attempts
-to allocate a page of memory for that task.
-
-If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset
-will have its allowed CPU placement changed immediately.  Similarly,
-if a task's pid is written to another cpuset's 'tasks' file, then its
-allowed CPU placement is changed immediately.  If such a task had been
-bound to some subset of its cpuset using the sched_setaffinity() call,
-the task will be allowed to run on any CPU allowed in its new cpuset,
-negating the effect of the prior sched_setaffinity() call.
-
-In summary, the memory placement of a task whose cpuset is changed is
-updated by the kernel, on the next allocation of a page for that task,
-and the processor placement is updated immediately.
-
-Normally, once a page is allocated (given a physical page
-of main memory) then that page stays on whatever node it
-was allocated, so long as it remains allocated, even if the
-cpusets memory placement policy 'cpuset.mems' subsequently changes.
-If the cpuset flag file 'cpuset.memory_migrate' is set true, then when
-tasks are attached to that cpuset, any pages that task had
-allocated to it on nodes in its previous cpuset are migrated
-to the task's new cpuset. The relative placement of the page within
-the cpuset is preserved during these migration operations if possible.
-For example if the page was on the second valid node of the prior cpuset
-then the page will be placed on the second valid node of the new cpuset.
-
-Also if 'cpuset.memory_migrate' is set true, then if that cpuset's
-'cpuset.mems' file is modified, pages allocated to tasks in that
-cpuset, that were on nodes in the previous setting of 'cpuset.mems',
-will be moved to nodes in the new setting of 'mems.'
-Pages that were not in the task's prior cpuset, or in the cpuset's
-prior 'cpuset.mems' setting, will not be moved.
-
-There is an exception to the above.  If hotplug functionality is used
-to remove all the CPUs that are currently assigned to a cpuset,
-then all the tasks in that cpuset will be moved to the nearest ancestor
-with non-empty cpus.  But the moving of some (or all) tasks might fail if
-cpuset is bound with another cgroup subsystem which has some restrictions
-on task attaching.  In this failing case, those tasks will stay
-in the original cpuset, and the kernel will automatically update
-their cpus_allowed to allow all online CPUs.  When memory hotplug
-functionality for removing Memory Nodes is available, a similar exception
-is expected to apply there as well.  In general, the kernel prefers to
-violate cpuset placement, over starving a task that has had all
-its allowed CPUs or Memory Nodes taken offline.
-
-There is a second exception to the above.  GFP_ATOMIC requests are
-kernel internal allocations that must be satisfied, immediately.
-The kernel may drop some request, in rare cases even panic, if a
-GFP_ATOMIC alloc fails.  If the request cannot be satisfied within
-the current task's cpuset, then we relax the cpuset, and look for
-memory anywhere we can find it.  It's better to violate the cpuset
-than stress the kernel.
-
-To start a new job that is to be contained within a cpuset, the steps are:
-
- 1) mkdir /sys/fs/cgroup/cpuset
- 2) mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
- 3) Create the new cpuset by doing mkdir's and write's (or echo's) in
-    the /sys/fs/cgroup/cpuset virtual file system.
- 4) Start a task that will be the "founding father" of the new job.
- 5) Attach that task to the new cpuset by writing its pid to the
-    /sys/fs/cgroup/cpuset tasks file for that cpuset.
- 6) fork, exec or clone the job tasks from this founding father task.
-
-For example, the following sequence of commands will setup a cpuset
-named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
-and then start a subshell 'sh' in that cpuset::
-
-  mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset
-  cd /sys/fs/cgroup/cpuset
-  mkdir Charlie
-  cd Charlie
-  /bin/echo 2-3 > cpuset.cpus
-  /bin/echo 1 > cpuset.mems
-  /bin/echo $$ > tasks
-  sh
-  # The subshell 'sh' is now running in cpuset Charlie
-  # The next line should display '/Charlie'
-  cat /proc/self/cpuset
-
-There are ways to query or modify cpusets:
-
- - via the cpuset file system directly, using the various cd, mkdir, echo,
-   cat, rmdir commands from the shell, or their equivalent from C.
- - via the C library libcpuset.
- - via the C library libcgroup.
-   (http://sourceforge.net/projects/libcg/)
- - via the python application cset.
-   (http://code.google.com/p/cpuset/)
-
-The sched_setaffinity calls can also be done at the shell prompt using
-SGI's runon or Robert Love's taskset.  The mbind and set_mempolicy
-calls can be done at the shell prompt using the numactl command
-(part of Andi Kleen's numa package).
-
-2. Usage Examples and Syntax
-============================
-
-2.1 Basic Usage
----------------
-
-Creating, modifying, using the cpusets can be done through the cpuset
-virtual filesystem.
-
-To mount it, type:
-# mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset
-
-Then under /sys/fs/cgroup/cpuset you can find a tree that corresponds to the
-tree of the cpusets in the system. For instance, /sys/fs/cgroup/cpuset
-is the cpuset that holds the whole system.
-
-If you want to create a new cpuset under /sys/fs/cgroup/cpuset::
-
-  # cd /sys/fs/cgroup/cpuset
-  # mkdir my_cpuset
-
-Now you want to do something with this cpuset::
-
-  # cd my_cpuset
-
-In this directory you can find several files::
-
-  # ls
-  cgroup.clone_children  cpuset.memory_pressure
-  cgroup.event_control   cpuset.memory_spread_page
-  cgroup.procs           cpuset.memory_spread_slab
-  cpuset.cpu_exclusive   cpuset.mems
-  cpuset.cpus            cpuset.sched_load_balance
-  cpuset.mem_exclusive   cpuset.sched_relax_domain_level
-  cpuset.mem_hardwall    notify_on_release
-  cpuset.memory_migrate  tasks
-
-Reading them will give you information about the state of this cpuset:
-the CPUs and Memory Nodes it can use, the processes that are using
-it, its properties.  By writing to these files you can manipulate
-the cpuset.
-
-Set some flags::
-
-  # /bin/echo 1 > cpuset.cpu_exclusive
-
-Add some cpus::
-
-  # /bin/echo 0-7 > cpuset.cpus
-
-Add some mems::
-
-  # /bin/echo 0-7 > cpuset.mems
-
-Now attach your shell to this cpuset::
-
-  # /bin/echo $$ > tasks
-
-You can also create cpusets inside your cpuset by using mkdir in this
-directory::
-
-  # mkdir my_sub_cs
-
-To remove a cpuset, just use rmdir::
-
-  # rmdir my_sub_cs
-
-This will fail if the cpuset is in use (has cpusets inside, or has
-processes attached).
-
-Note that for legacy reasons, the "cpuset" filesystem exists as a
-wrapper around the cgroup filesystem.
-
-The command::
-
-  mount -t cpuset X /sys/fs/cgroup/cpuset
-
-is equivalent to::
-
-  mount -t cgroup -ocpuset,noprefix X /sys/fs/cgroup/cpuset
-  echo "/sbin/cpuset_release_agent" > /sys/fs/cgroup/cpuset/release_agent
-
-2.2 Adding/removing cpus
-------------------------
-
-This is the syntax to use when writing in the cpus or mems files
-in cpuset directories::
-
-  # /bin/echo 1-4 > cpuset.cpus		-> set cpus list to cpus 1,2,3,4
-  # /bin/echo 1,2,3,4 > cpuset.cpus	-> set cpus list to cpus 1,2,3,4
-
-To add a CPU to a cpuset, write the new list of CPUs including the
-CPU to be added. To add 6 to the above cpuset::
-
-  # /bin/echo 1-4,6 > cpuset.cpus	-> set cpus list to cpus 1,2,3,4,6
-
-Similarly to remove a CPU from a cpuset, write the new list of CPUs
-without the CPU to be removed.
-
-To remove all the CPUs::
-
-  # /bin/echo "" > cpuset.cpus		-> clear cpus list
-
-2.3 Setting flags
------------------
-
-The syntax is very simple::
-
-  # /bin/echo 1 > cpuset.cpu_exclusive 	-> set flag 'cpuset.cpu_exclusive'
-  # /bin/echo 0 > cpuset.cpu_exclusive 	-> unset flag 'cpuset.cpu_exclusive'
-
-2.4 Attaching processes
------------------------
-
-::
-
-  # /bin/echo PID > tasks
-
-Note that it is PID, not PIDs. You can only attach ONE task at a time.
-If you have several tasks to attach, you have to do it one after another::
-
-  # /bin/echo PID1 > tasks
-  # /bin/echo PID2 > tasks
-	...
-  # /bin/echo PIDn > tasks
-
-
-3. Questions
-============
-
-Q:
-   what's up with this '/bin/echo' ?
-
-A:
-   bash's builtin 'echo' command does not check calls to write() against
-   errors. If you use it in the cpuset file system, you won't be
-   able to tell whether a command succeeded or failed.
-
-Q:
-   When I attach processes, only the first of the line gets really attached !
-
-A:
-   We can only return one error code per call to write(). So you should also
-   put only ONE pid.
-
-4. Contact
-==========
-
-Web: http://www.bullopensource.org/cpuset
diff --git a/Documentation/cgroup-v1/devices.rst b/Documentation/cgroup-v1/devices.rst
deleted file mode 100644
index e1886783961e..000000000000
--- a/Documentation/cgroup-v1/devices.rst
+++ /dev/null
@@ -1,132 +0,0 @@
-===========================
-Device Whitelist Controller
-===========================
-
-1. Description
-==============
-
-Implement a cgroup to track and enforce open and mknod restrictions
-on device files.  A device cgroup associates a device access
-whitelist with each cgroup.  A whitelist entry has 4 fields.
-'type' is a (all), c (char), or b (block).  'all' means it applies
-to all types and all major and minor numbers.  Major and minor are
-either an integer or * for all.  Access is a composition of r
-(read), w (write), and m (mknod).
-
-The root device cgroup starts with rwm to 'all'.  A child device
-cgroup gets a copy of the parent.  Administrators can then remove
-devices from the whitelist or add new entries.  A child cgroup can
-never receive a device access which is denied by its parent.
-
-2. User Interface
-=================
-
-An entry is added using devices.allow, and removed using
-devices.deny.  For instance::
-
-	echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
-
-allows cgroup 1 to read and mknod the device usually known as
-/dev/null.  Doing::
-
-	echo a > /sys/fs/cgroup/1/devices.deny
-
-will remove the default 'a *:* rwm' entry. Doing::
-
-	echo a > /sys/fs/cgroup/1/devices.allow
-
-will add the 'a *:* rwm' entry to the whitelist.
-
-3. Security
-===========
-
-Any task can move itself between cgroups.  This clearly won't
-suffice, but we can decide the best way to adequately restrict
-movement as people get some experience with this.  We may just want
-to require CAP_SYS_ADMIN, which at least is a separate bit from
-CAP_MKNOD.  We may want to just refuse moving to a cgroup which
-isn't a descendant of the current one.  Or we may want to use
-CAP_MAC_ADMIN, since we really are trying to lock down root.
-
-CAP_SYS_ADMIN is needed to modify the whitelist or move another
-task to a new cgroup.  (Again we'll probably want to change that).
-
-A cgroup may not be granted more permissions than the cgroup's
-parent has.
-
-4. Hierarchy
-============
-
-device cgroups maintain hierarchy by making sure a cgroup never has more
-access permissions than its parent.  Every time an entry is written to
-a cgroup's devices.deny file, all its children will have that entry removed
-from their whitelist and all the locally set whitelist entries will be
-re-evaluated.  In case one of the locally set whitelist entries would provide
-more access than the cgroup's parent, it'll be removed from the whitelist.
-
-Example::
-
-      A
-     / \
-        B
-
-    group        behavior	exceptions
-    A            allow		"b 8:* rwm", "c 116:1 rw"
-    B            deny		"c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
-
-If a device is denied in group A::
-
-	# echo "c 116:* r" > A/devices.deny
-
-it'll propagate down and after revalidating B's entries, the whitelist entry
-"c 116:2 rwm" will be removed::
-
-    group        whitelist entries                        denied devices
-    A            all                                      "b 8:* rwm", "c 116:* rw"
-    B            "c 1:3 rwm", "b 3:* rwm"                 all the rest
-
-In case parent's exceptions change and local exceptions are not allowed
-anymore, they'll be deleted.
-
-Notice that new whitelist entries will not be propagated::
-
-      A
-     / \
-        B
-
-    group        whitelist entries                        denied devices
-    A            "c 1:3 rwm", "c 1:5 r"                   all the rest
-    B            "c 1:3 rwm", "c 1:5 r"                   all the rest
-
-when adding ``c *:3 rwm``::
-
-	# echo "c *:3 rwm" >A/devices.allow
-
-the result::
-
-    group        whitelist entries                        denied devices
-    A            "c *:3 rwm", "c 1:5 r"                   all the rest
-    B            "c 1:3 rwm", "c 1:5 r"                   all the rest
-
-but now it'll be possible to add new entries to B::
-
-	# echo "c 2:3 rwm" >B/devices.allow
-	# echo "c 50:3 r" >B/devices.allow
-
-or even::
-
-	# echo "c *:3 rwm" >B/devices.allow
-
-Allowing or denying all by writing 'a' to devices.allow or devices.deny will
-not be possible once the device cgroups has children.
-
-4.1 Hierarchy (internal implementation)
----------------------------------------
-
-device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
-list of exceptions.  The internal state is controlled using the same user
-interface to preserve compatibility with the previous whitelist-only
-implementation.  Removal or addition of exceptions that will reduce the access
-to devices will be propagated down the hierarchy.
-For every propagated exception, the effective rules will be re-evaluated based
-on current parent's access rules.
diff --git a/Documentation/cgroup-v1/freezer-subsystem.rst b/Documentation/cgroup-v1/freezer-subsystem.rst
deleted file mode 100644
index 582d3427de3f..000000000000
--- a/Documentation/cgroup-v1/freezer-subsystem.rst
+++ /dev/null
@@ -1,127 +0,0 @@
-==============
-Cgroup Freezer
-==============
-
-The cgroup freezer is useful to batch job management system which start
-and stop sets of tasks in order to schedule the resources of a machine
-according to the desires of a system administrator. This sort of program
-is often used on HPC clusters to schedule access to the cluster as a
-whole. The cgroup freezer uses cgroups to describe the set of tasks to
-be started/stopped by the batch job management system. It also provides
-a means to start and stop the tasks composing the job.
-
-The cgroup freezer will also be useful for checkpointing running groups
-of tasks. The freezer allows the checkpoint code to obtain a consistent
-image of the tasks by attempting to force the tasks in a cgroup into a
-quiescent state. Once the tasks are quiescent another task can
-walk /proc or invoke a kernel interface to gather information about the
-quiesced tasks. Checkpointed tasks can be restarted later should a
-recoverable error occur. This also allows the checkpointed tasks to be
-migrated between nodes in a cluster by copying the gathered information
-to another node and restarting the tasks there.
-
-Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
-and resuming tasks in userspace. Both of these signals are observable
-from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
-blocked, or ignored it can be seen by waiting or ptracing parent tasks.
-SIGCONT is especially unsuitable since it can be caught by the task. Any
-programs designed to watch for SIGSTOP and SIGCONT could be broken by
-attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
-demonstrate this problem using nested bash shells::
-
-	$ echo $$
-	16644
-	$ bash
-	$ echo $$
-	16690
-
-	From a second, unrelated bash shell:
-	$ kill -SIGSTOP 16690
-	$ kill -SIGCONT 16690
-
-	<at this point 16690 exits and causes 16644 to exit too>
-
-This happens because bash can observe both signals and choose how it
-responds to them.
-
-Another example of a program which catches and responds to these
-signals is gdb. In fact any program designed to use ptrace is likely to
-have a problem with this method of stopping and resuming tasks.
-
-In contrast, the cgroup freezer uses the kernel freezer code to
-prevent the freeze/unfreeze cycle from becoming visible to the tasks
-being frozen. This allows the bash example above and gdb to run as
-expected.
-
-The cgroup freezer is hierarchical. Freezing a cgroup freezes all
-tasks belonging to the cgroup and all its descendant cgroups. Each
-cgroup has its own state (self-state) and the state inherited from the
-parent (parent-state). Iff both states are THAWED, the cgroup is
-THAWED.
-
-The following cgroupfs files are created by cgroup freezer.
-
-* freezer.state: Read-write.
-
-  When read, returns the effective state of the cgroup - "THAWED",
-  "FREEZING" or "FROZEN". This is the combined self and parent-states.
-  If any is freezing, the cgroup is freezing (FREEZING or FROZEN).
-
-  FREEZING cgroup transitions into FROZEN state when all tasks
-  belonging to the cgroup and its descendants become frozen. Note that
-  a cgroup reverts to FREEZING from FROZEN after a new task is added
-  to the cgroup or one of its descendant cgroups until the new task is
-  frozen.
-
-  When written, sets the self-state of the cgroup. Two values are
-  allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup,
-  if not already freezing, enters FREEZING state along with all its
-  descendant cgroups.
-
-  If THAWED is written, the self-state of the cgroup is changed to
-  THAWED.  Note that the effective state may not change to THAWED if
-  the parent-state is still freezing. If a cgroup's effective state
-  becomes THAWED, all its descendants which are freezing because of
-  the cgroup also leave the freezing state.
-
-* freezer.self_freezing: Read only.
-
-  Shows the self-state. 0 if the self-state is THAWED; otherwise, 1.
-  This value is 1 iff the last write to freezer.state was "FROZEN".
-
-* freezer.parent_freezing: Read only.
-
-  Shows the parent-state.  0 if none of the cgroup's ancestors is
-  frozen; otherwise, 1.
-
-The root cgroup is non-freezable and the above interface files don't
-exist.
-
-* Examples of usage::
-
-   # mkdir /sys/fs/cgroup/freezer
-   # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
-   # mkdir /sys/fs/cgroup/freezer/0
-   # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
-
-to get status of the freezer subsystem::
-
-   # cat /sys/fs/cgroup/freezer/0/freezer.state
-   THAWED
-
-to freeze all tasks in the container::
-
-   # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
-   # cat /sys/fs/cgroup/freezer/0/freezer.state
-   FREEZING
-   # cat /sys/fs/cgroup/freezer/0/freezer.state
-   FROZEN
-
-to unfreeze all tasks in the container::
-
-   # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
-   # cat /sys/fs/cgroup/freezer/0/freezer.state
-   THAWED
-
-This is the basic mechanism which should do the right thing for user space task
-in a simple scenario.
diff --git a/Documentation/cgroup-v1/hugetlb.rst b/Documentation/cgroup-v1/hugetlb.rst
deleted file mode 100644
index a3902aa253a9..000000000000
--- a/Documentation/cgroup-v1/hugetlb.rst
+++ /dev/null
@@ -1,50 +0,0 @@
-==================
-HugeTLB Controller
-==================
-
-The HugeTLB controller allows to limit the HugeTLB usage per control group and
-enforces the controller limit during page fault. Since HugeTLB doesn't
-support page reclaim, enforcing the limit at page fault time implies that,
-the application will get SIGBUS signal if it tries to access HugeTLB pages
-beyond its limit. This requires the application to know beforehand how much
-HugeTLB pages it would require for its use.
-
-HugeTLB controller can be created by first mounting the cgroup filesystem.
-
-# mount -t cgroup -o hugetlb none /sys/fs/cgroup
-
-With the above step, the initial or the parent HugeTLB group becomes
-visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
-the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
-
-New groups can be created under the parent group /sys/fs/cgroup::
-
-  # cd /sys/fs/cgroup
-  # mkdir g1
-  # echo $$ > g1/tasks
-
-The above steps create a new group g1 and move the current shell
-process (bash) into it.
-
-Brief summary of control files::
-
- hugetlb.<hugepagesize>.limit_in_bytes     # set/show limit of "hugepagesize" hugetlb usage
- hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb  usage recorded
- hugetlb.<hugepagesize>.usage_in_bytes     # show current usage for "hugepagesize" hugetlb
- hugetlb.<hugepagesize>.failcnt		   # show the number of allocation failure due to HugeTLB limit
-
-For a system supporting three hugepage sizes (64k, 32M and 1G), the control
-files include::
-
-  hugetlb.1GB.limit_in_bytes
-  hugetlb.1GB.max_usage_in_bytes
-  hugetlb.1GB.usage_in_bytes
-  hugetlb.1GB.failcnt
-  hugetlb.64KB.limit_in_bytes
-  hugetlb.64KB.max_usage_in_bytes
-  hugetlb.64KB.usage_in_bytes
-  hugetlb.64KB.failcnt
-  hugetlb.32MB.limit_in_bytes
-  hugetlb.32MB.max_usage_in_bytes
-  hugetlb.32MB.usage_in_bytes
-  hugetlb.32MB.failcnt
diff --git a/Documentation/cgroup-v1/index.rst b/Documentation/cgroup-v1/index.rst
deleted file mode 100644
index fe76d42edc11..000000000000
--- a/Documentation/cgroup-v1/index.rst
+++ /dev/null
@@ -1,30 +0,0 @@
-:orphan:
-
-========================
-Control Groups version 1
-========================
-
-.. toctree::
-    :maxdepth: 1
-
-    cgroups
-
-    blkio-controller
-    cpuacct
-    cpusets
-    devices
-    freezer-subsystem
-    hugetlb
-    memcg_test
-    memory
-    net_cls
-    net_prio
-    pids
-    rdma
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/cgroup-v1/memcg_test.rst b/Documentation/cgroup-v1/memcg_test.rst
deleted file mode 100644
index 91bd18c6a514..000000000000
--- a/Documentation/cgroup-v1/memcg_test.rst
+++ /dev/null
@@ -1,355 +0,0 @@
-=====================================================
-Memory Resource Controller(Memcg) Implementation Memo
-=====================================================
-
-Last Updated: 2010/2
-
-Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34).
-
-Because VM is getting complex (one of reasons is memcg...), memcg's behavior
-is complex. This is a document for memcg's internal behavior.
-Please note that implementation details can be changed.
-
-(*) Topics on API should be in Documentation/cgroup-v1/memory.rst)
-
-0. How to record usage ?
-========================
-
-   2 objects are used.
-
-   page_cgroup ....an object per page.
-
-	Allocated at boot or memory hotplug. Freed at memory hot removal.
-
-   swap_cgroup ... an entry per swp_entry.
-
-	Allocated at swapon(). Freed at swapoff().
-
-   The page_cgroup has USED bit and double count against a page_cgroup never
-   occurs. swap_cgroup is used only when a charged page is swapped-out.
-
-1. Charge
-=========
-
-   a page/swp_entry may be charged (usage += PAGE_SIZE) at
-
-	mem_cgroup_try_charge()
-
-2. Uncharge
-===========
-
-  a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
-
-	mem_cgroup_uncharge()
-	  Called when a page's refcount goes down to 0.
-
-	mem_cgroup_uncharge_swap()
-	  Called when swp_entry's refcnt goes down to 0. A charge against swap
-	  disappears.
-
-3. charge-commit-cancel
-=======================
-
-	Memcg pages are charged in two steps:
-
-		- mem_cgroup_try_charge()
-		- mem_cgroup_commit_charge() or mem_cgroup_cancel_charge()
-
-	At try_charge(), there are no flags to say "this page is charged".
-	at this point, usage += PAGE_SIZE.
-
-	At commit(), the page is associated with the memcg.
-
-	At cancel(), simply usage -= PAGE_SIZE.
-
-Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
-
-4. Anonymous
-============
-
-	Anonymous page is newly allocated at
-		  - page fault into MAP_ANONYMOUS mapping.
-		  - Copy-On-Write.
-
-	4.1 Swap-in.
-	At swap-in, the page is taken from swap-cache. There are 2 cases.
-
-	(a) If the SwapCache is newly allocated and read, it has no charges.
-	(b) If the SwapCache has been mapped by processes, it has been
-	    charged already.
-
-	4.2 Swap-out.
-	At swap-out, typical state transition is below.
-
-	(a) add to swap cache. (marked as SwapCache)
-	    swp_entry's refcnt += 1.
-	(b) fully unmapped.
-	    swp_entry's refcnt += # of ptes.
-	(c) write back to swap.
-	(d) delete from swap cache. (remove from SwapCache)
-	    swp_entry's refcnt -= 1.
-
-
-	Finally, at task exit,
-	(e) zap_pte() is called and swp_entry's refcnt -=1 -> 0.
-
-5. Page Cache
-=============
-
-	Page Cache is charged at
-	- add_to_page_cache_locked().
-
-	The logic is very clear. (About migration, see below)
-
-	Note:
-	  __remove_from_page_cache() is called by remove_from_page_cache()
-	  and __remove_mapping().
-
-6. Shmem(tmpfs) Page Cache
-===========================
-
-	The best way to understand shmem's page state transition is to read
-	mm/shmem.c.
-
-	But brief explanation of the behavior of memcg around shmem will be
-	helpful to understand the logic.
-
-	Shmem's page (just leaf page, not direct/indirect block) can be on
-
-		- radix-tree of shmem's inode.
-		- SwapCache.
-		- Both on radix-tree and SwapCache. This happens at swap-in
-		  and swap-out,
-
-	It's charged when...
-
-	- A new page is added to shmem's radix-tree.
-	- A swp page is read. (move a charge from swap_cgroup to page_cgroup)
-
-7. Page Migration
-=================
-
-	mem_cgroup_migrate()
-
-8. LRU
-======
-        Each memcg has its own private LRU. Now, its handling is under global
-	VM's control (means that it's handled under global pgdat->lru_lock).
-	Almost all routines around memcg's LRU is called by global LRU's
-	list management functions under pgdat->lru_lock.
-
-	A special function is mem_cgroup_isolate_pages(). This scans
-	memcg's private LRU and call __isolate_lru_page() to extract a page
-	from LRU.
-
-	(By __isolate_lru_page(), the page is removed from both of global and
-	private LRU.)
-
-
-9. Typical Tests.
-=================
-
- Tests for racy cases.
-
-9.1 Small limit to memcg.
--------------------------
-
-	When you do test to do racy case, it's good test to set memcg's limit
-	to be very small rather than GB. Many races found in the test under
-	xKB or xxMB limits.
-
-	(Memory behavior under GB and Memory behavior under MB shows very
-	different situation.)
-
-9.2 Shmem
----------
-
-	Historically, memcg's shmem handling was poor and we saw some amount
-	of troubles here. This is because shmem is page-cache but can be
-	SwapCache. Test with shmem/tmpfs is always good test.
-
-9.3 Migration
--------------
-
-	For NUMA, migration is an another special case. To do easy test, cpuset
-	is useful. Following is a sample script to do migration::
-
-		mount -t cgroup -o cpuset none /opt/cpuset
-
-		mkdir /opt/cpuset/01
-		echo 1 > /opt/cpuset/01/cpuset.cpus
-		echo 0 > /opt/cpuset/01/cpuset.mems
-		echo 1 > /opt/cpuset/01/cpuset.memory_migrate
-		mkdir /opt/cpuset/02
-		echo 1 > /opt/cpuset/02/cpuset.cpus
-		echo 1 > /opt/cpuset/02/cpuset.mems
-		echo 1 > /opt/cpuset/02/cpuset.memory_migrate
-
-	In above set, when you moves a task from 01 to 02, page migration to
-	node 0 to node 1 will occur. Following is a script to migrate all
-	under cpuset.::
-
-		--
-		move_task()
-		{
-		for pid in $1
-		do
-			/bin/echo $pid >$2/tasks 2>/dev/null
-			echo -n $pid
-			echo -n " "
-		done
-		echo END
-		}
-
-		G1_TASK=`cat ${G1}/tasks`
-		G2_TASK=`cat ${G2}/tasks`
-		move_task "${G1_TASK}" ${G2} &
-		--
-
-9.4 Memory hotplug
-------------------
-
-	memory hotplug test is one of good test.
-
-	to offline memory, do following::
-
-		# echo offline > /sys/devices/system/memory/memoryXXX/state
-
-	(XXX is the place of memory)
-
-	This is an easy way to test page migration, too.
-
-9.5 mkdir/rmdir
----------------
-
-	When using hierarchy, mkdir/rmdir test should be done.
-	Use tests like the following::
-
-		echo 1 >/opt/cgroup/01/memory/use_hierarchy
-		mkdir /opt/cgroup/01/child_a
-		mkdir /opt/cgroup/01/child_b
-
-		set limit to 01.
-		add limit to 01/child_b
-		run jobs under child_a and child_b
-
-	create/delete following groups at random while jobs are running::
-
-		/opt/cgroup/01/child_a/child_aa
-		/opt/cgroup/01/child_b/child_bb
-		/opt/cgroup/01/child_c
-
-	running new jobs in new group is also good.
-
-9.6 Mount with other subsystems
--------------------------------
-
-	Mounting with other subsystems is a good test because there is a
-	race and lock dependency with other cgroup subsystems.
-
-	example::
-
-		# mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices
-
-	and do task move, mkdir, rmdir etc...under this.
-
-9.7 swapoff
------------
-
-	Besides management of swap is one of complicated parts of memcg,
-	call path of swap-in at swapoff is not same as usual swap-in path..
-	It's worth to be tested explicitly.
-
-	For example, test like following is good:
-
-	(Shell-A)::
-
-		# mount -t cgroup none /cgroup -o memory
-		# mkdir /cgroup/test
-		# echo 40M > /cgroup/test/memory.limit_in_bytes
-		# echo 0 > /cgroup/test/tasks
-
-	Run malloc(100M) program under this. You'll see 60M of swaps.
-
-	(Shell-B)::
-
-		# move all tasks in /cgroup/test to /cgroup
-		# /sbin/swapoff -a
-		# rmdir /cgroup/test
-		# kill malloc task.
-
-	Of course, tmpfs v.s. swapoff test should be tested, too.
-
-9.8 OOM-Killer
---------------
-
-	Out-of-memory caused by memcg's limit will kill tasks under
-	the memcg. When hierarchy is used, a task under hierarchy
-	will be killed by the kernel.
-
-	In this case, panic_on_oom shouldn't be invoked and tasks
-	in other groups shouldn't be killed.
-
-	It's not difficult to cause OOM under memcg as following.
-
-	Case A) when you can swapoff::
-
-		#swapoff -a
-		#echo 50M > /memory.limit_in_bytes
-
-	run 51M of malloc
-
-	Case B) when you use mem+swap limitation::
-
-		#echo 50M > memory.limit_in_bytes
-		#echo 50M > memory.memsw.limit_in_bytes
-
-	run 51M of malloc
-
-9.9 Move charges at task migration
-----------------------------------
-
-	Charges associated with a task can be moved along with task migration.
-
-	(Shell-A)::
-
-		#mkdir /cgroup/A
-		#echo $$ >/cgroup/A/tasks
-
-	run some programs which uses some amount of memory in /cgroup/A.
-
-	(Shell-B)::
-
-		#mkdir /cgroup/B
-		#echo 1 >/cgroup/B/memory.move_charge_at_immigrate
-		#echo "pid of the program running in group A" >/cgroup/B/tasks
-
-	You can see charges have been moved by reading ``*.usage_in_bytes`` or
-	memory.stat of both A and B.
-
-	See 8.2 of Documentation/cgroup-v1/memory.rst to see what value should
-	be written to move_charge_at_immigrate.
-
-9.10 Memory thresholds
-----------------------
-
-	Memory controller implements memory thresholds using cgroups notification
-	API. You can use tools/cgroup/cgroup_event_listener.c to test it.
-
-	(Shell-A) Create cgroup and run event listener::
-
-		# mkdir /cgroup/A
-		# ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M
-
-	(Shell-B) Add task to cgroup and try to allocate and free memory::
-
-		# echo $$ >/cgroup/A/tasks
-		# a="$(dd if=/dev/zero bs=1M count=10)"
-		# a=
-
-	You will see message from cgroup_event_listener every time you cross
-	the thresholds.
-
-	Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds.
-
-	It's good idea to test root cgroup as well.
diff --git a/Documentation/cgroup-v1/memory.rst b/Documentation/cgroup-v1/memory.rst
deleted file mode 100644
index 41bdc038dad9..000000000000
--- a/Documentation/cgroup-v1/memory.rst
+++ /dev/null
@@ -1,1003 +0,0 @@
-==========================
-Memory Resource Controller
-==========================
-
-NOTE:
-      This document is hopelessly outdated and it asks for a complete
-      rewrite. It still contains a useful information so we are keeping it
-      here but make sure to check the current code if you need a deeper
-      understanding.
-
-NOTE:
-      The Memory Resource Controller has generically been referred to as the
-      memory controller in this document. Do not confuse memory controller
-      used here with the memory controller that is used in hardware.
-
-(For editors) In this document:
-      When we mention a cgroup (cgroupfs's directory) with memory controller,
-      we call it "memory cgroup". When you see git-log and source code, you'll
-      see patch's title and function names tend to use "memcg".
-      In this document, we avoid using it.
-
-Benefits and Purpose of the memory controller
-=============================================
-
-The memory controller isolates the memory behaviour of a group of tasks
-from the rest of the system. The article on LWN [12] mentions some probable
-uses of the memory controller. The memory controller can be used to
-
-a. Isolate an application or a group of applications
-   Memory-hungry applications can be isolated and limited to a smaller
-   amount of memory.
-b. Create a cgroup with a limited amount of memory; this can be used
-   as a good alternative to booting with mem=XXXX.
-c. Virtualization solutions can control the amount of memory they want
-   to assign to a virtual machine instance.
-d. A CD/DVD burner could control the amount of memory used by the
-   rest of the system to ensure that burning does not fail due to lack
-   of available memory.
-e. There are several other use cases; find one or use the controller just
-   for fun (to learn and hack on the VM subsystem).
-
-Current Status: linux-2.6.34-mmotm(development version of 2010/April)
-
-Features:
-
- - accounting anonymous pages, file caches, swap caches usage and limiting them.
- - pages are linked to per-memcg LRU exclusively, and there is no global LRU.
- - optionally, memory+swap usage can be accounted and limited.
- - hierarchical accounting
- - soft limit
- - moving (recharging) account at moving a task is selectable.
- - usage threshold notifier
- - memory pressure notifier
- - oom-killer disable knob and oom-notifier
- - Root cgroup has no limit controls.
-
- Kernel memory support is a work in progress, and the current version provides
- basically functionality. (See Section 2.7)
-
-Brief summary of control files.
-
-==================================== ==========================================
- tasks				     attach a task(thread) and show list of
-				     threads
- cgroup.procs			     show list of processes
- cgroup.event_control		     an interface for event_fd()
- memory.usage_in_bytes		     show current usage for memory
-				     (See 5.5 for details)
- memory.memsw.usage_in_bytes	     show current usage for memory+Swap
-				     (See 5.5 for details)
- memory.limit_in_bytes		     set/show limit of memory usage
- memory.memsw.limit_in_bytes	     set/show limit of memory+Swap usage
- memory.failcnt			     show the number of memory usage hits limits
- memory.memsw.failcnt		     show the number of memory+Swap hits limits
- memory.max_usage_in_bytes	     show max memory usage recorded
- memory.memsw.max_usage_in_bytes     show max memory+Swap usage recorded
- memory.soft_limit_in_bytes	     set/show soft limit of memory usage
- memory.stat			     show various statistics
- memory.use_hierarchy		     set/show hierarchical account enabled
- memory.force_empty		     trigger forced page reclaim
- memory.pressure_level		     set memory pressure notifications
- memory.swappiness		     set/show swappiness parameter of vmscan
-				     (See sysctl's vm.swappiness)
- memory.move_charge_at_immigrate     set/show controls of moving charges
- memory.oom_control		     set/show oom controls.
- memory.numa_stat		     show the number of memory usage per numa
-				     node
-
- memory.kmem.limit_in_bytes          set/show hard limit for kernel memory
- memory.kmem.usage_in_bytes          show current kernel memory allocation
- memory.kmem.failcnt                 show the number of kernel memory usage
-				     hits limits
- memory.kmem.max_usage_in_bytes      show max kernel memory usage recorded
-
- memory.kmem.tcp.limit_in_bytes      set/show hard limit for tcp buf memory
- memory.kmem.tcp.usage_in_bytes      show current tcp buf memory allocation
- memory.kmem.tcp.failcnt             show the number of tcp buf memory usage
-				     hits limits
- memory.kmem.tcp.max_usage_in_bytes  show max tcp buf memory usage recorded
-==================================== ==========================================
-
-1. History
-==========
-
-The memory controller has a long history. A request for comments for the memory
-controller was posted by Balbir Singh [1]. At the time the RFC was posted
-there were several implementations for memory control. The goal of the
-RFC was to build consensus and agreement for the minimal features required
-for memory control. The first RSS controller was posted by Balbir Singh[2]
-in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the
-RSS controller. At OLS, at the resource management BoF, everyone suggested
-that we handle both page cache and RSS together. Another request was raised
-to allow user space handling of OOM. The current memory controller is
-at version 6; it combines both mapped (RSS) and unmapped Page
-Cache Control [11].
-
-2. Memory Control
-=================
-
-Memory is a unique resource in the sense that it is present in a limited
-amount. If a task requires a lot of CPU processing, the task can spread
-its processing over a period of hours, days, months or years, but with
-memory, the same physical memory needs to be reused to accomplish the task.
-
-The memory controller implementation has been divided into phases. These
-are:
-
-1. Memory controller
-2. mlock(2) controller
-3. Kernel user memory accounting and slab control
-4. user mappings length controller
-
-The memory controller is the first controller developed.
-
-2.1. Design
------------
-
-The core of the design is a counter called the page_counter. The
-page_counter tracks the current memory usage and limit of the group of
-processes associated with the controller. Each cgroup has a memory controller
-specific data structure (mem_cgroup) associated with it.
-
-2.2. Accounting
----------------
-
-::
-
-		+--------------------+
-		|  mem_cgroup        |
-		|  (page_counter)    |
-		+--------------------+
-		 /            ^      \
-		/             |       \
-           +---------------+  |        +---------------+
-           | mm_struct     |  |....    | mm_struct     |
-           |               |  |        |               |
-           +---------------+  |        +---------------+
-                              |
-                              + --------------+
-                                              |
-           +---------------+           +------+--------+
-           | page          +---------->  page_cgroup|
-           |               |           |               |
-           +---------------+           +---------------+
-
-             (Figure 1: Hierarchy of Accounting)
-
-
-Figure 1 shows the important aspects of the controller
-
-1. Accounting happens per cgroup
-2. Each mm_struct knows about which cgroup it belongs to
-3. Each page has a pointer to the page_cgroup, which in turn knows the
-   cgroup it belongs to
-
-The accounting is done as follows: mem_cgroup_charge_common() is invoked to
-set up the necessary data structures and check if the cgroup that is being
-charged is over its limit. If it is, then reclaim is invoked on the cgroup.
-More details can be found in the reclaim section of this document.
-If everything goes well, a page meta-data-structure called page_cgroup is
-updated. page_cgroup has its own LRU on cgroup.
-(*) page_cgroup structure is allocated at boot/memory-hotplug time.
-
-2.2.1 Accounting details
-------------------------
-
-All mapped anon pages (RSS) and cache pages (Page Cache) are accounted.
-Some pages which are never reclaimable and will not be on the LRU
-are not accounted. We just account pages under usual VM management.
-
-RSS pages are accounted at page_fault unless they've already been accounted
-for earlier. A file page will be accounted for as Page Cache when it's
-inserted into inode (radix-tree). While it's mapped into the page tables of
-processes, duplicate accounting is carefully avoided.
-
-An RSS page is unaccounted when it's fully unmapped. A PageCache page is
-unaccounted when it's removed from radix-tree. Even if RSS pages are fully
-unmapped (by kswapd), they may exist as SwapCache in the system until they
-are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
-
-Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
-
-At page migration, accounting information is kept.
-
-Note: we just account pages-on-LRU because our purpose is to control amount
-of used pages; not-on-LRU pages tend to be out-of-control from VM view.
-
-2.3 Shared Page Accounting
---------------------------
-
-Shared pages are accounted on the basis of the first touch approach. The
-cgroup that first touches a page is accounted for the page. The principle
-behind this approach is that a cgroup that aggressively uses a shared
-page will eventually get charged for it (once it is uncharged from
-the cgroup that brought it in -- this will happen on memory pressure).
-
-But see section 8.2: when moving a task to another cgroup, its pages may
-be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
-
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.
-
-2.4 Swap Extension (CONFIG_MEMCG_SWAP)
---------------------------------------
-
-Swap Extension allows you to record charge for swap. A swapped-in page is
-charged back to original page allocator if possible.
-
-When swap is accounted, following files are added.
-
- - memory.memsw.usage_in_bytes.
- - memory.memsw.limit_in_bytes.
-
-memsw means memory+swap. Usage of memory+swap is limited by
-memsw.limit_in_bytes.
-
-Example: Assume a system with 4G of swap. A task which allocates 6G of memory
-(by mistake) under 2G memory limitation will use all swap.
-In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap.
-By using the memsw limit, you can avoid system OOM which can be caused by swap
-shortage.
-
-**why 'memory+swap' rather than swap**
-
-The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
-to move account from memory to swap...there is no change in usage of
-memory+swap. In other words, when we want to limit the usage of swap without
-affecting global LRU, memory+swap limit is better than just limiting swap from
-an OS point of view.
-
-**What happens when a cgroup hits memory.memsw.limit_in_bytes**
-
-When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
-in this cgroup. Then, swap-out will not be done by cgroup routine and file
-caches are dropped. But as mentioned above, global LRU can do swapout memory
-from it for sanity of the system's memory management state. You can't forbid
-it by cgroup.
-
-2.5 Reclaim
------------
-
-Each cgroup maintains a per cgroup LRU which has the same structure as
-global VM. When a cgroup goes over its limit, we first try
-to reclaim memory from the cgroup so as to make space for the new
-pages that the cgroup has touched. If the reclaim is unsuccessful,
-an OOM routine is invoked to select and kill the bulkiest task in the
-cgroup. (See 10. OOM Control below.)
-
-The reclaim algorithm has not been modified for cgroups, except that
-pages that are selected for reclaiming come from the per-cgroup LRU
-list.
-
-NOTE:
-  Reclaim does not work for the root cgroup, since we cannot set any
-  limits on the root cgroup.
-
-Note2:
-  When panic_on_oom is set to "2", the whole system will panic.
-
-When oom event notifier is registered, event will be delivered.
-(See oom_control section)
-
-2.6 Locking
------------
-
-   lock_page_cgroup()/unlock_page_cgroup() should not be called under
-   the i_pages lock.
-
-   Other lock order is following:
-
-   PG_locked.
-     mm->page_table_lock
-         pgdat->lru_lock
-	   lock_page_cgroup.
-
-  In many cases, just lock_page_cgroup() is called.
-
-  per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
-  pgdat->lru_lock, it has no lock of its own.
-
-2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
------------------------------------------------
-
-With the Kernel memory extension, the Memory Controller is able to limit
-the amount of kernel memory used by the system. Kernel memory is fundamentally
-different than user memory, since it can't be swapped out, which makes it
-possible to DoS the system by consuming too much of this precious resource.
-
-Kernel memory accounting is enabled for all memory cgroups by default. But
-it can be disabled system-wide by passing cgroup.memory=nokmem to the kernel
-at boot time. In this case, kernel memory will not be accounted at all.
-
-Kernel memory limits are not imposed for the root cgroup. Usage for the root
-cgroup may or may not be accounted. The memory used is accumulated into
-memory.kmem.usage_in_bytes, or in a separate counter when it makes sense.
-(currently only for tcp).
-
-The main "kmem" counter is fed into the main counter, so kmem charges will
-also be visible from the user counter.
-
-Currently no soft limit is implemented for kernel memory. It is future work
-to trigger slab reclaim when those limits are reached.
-
-2.7.1 Current Kernel Memory resources accounted
------------------------------------------------
-
-stack pages:
-  every process consumes some stack pages. By accounting into
-  kernel memory, we prevent new processes from being created when the kernel
-  memory usage is too high.
-
-slab pages:
-  pages allocated by the SLAB or SLUB allocator are tracked. A copy
-  of each kmem_cache is created every time the cache is touched by the first time
-  from inside the memcg. The creation is done lazily, so some objects can still be
-  skipped while the cache is being created. All objects in a slab page should
-  belong to the same memcg. This only fails to hold when a task is migrated to a
-  different memcg during the page allocation by the cache.
-
-sockets memory pressure:
-  some sockets protocols have memory pressure
-  thresholds. The Memory Controller allows them to be controlled individually
-  per cgroup, instead of globally.
-
-tcp memory pressure:
-  sockets memory pressure for the tcp protocol.
-
-2.7.2 Common use cases
-----------------------
-
-Because the "kmem" counter is fed to the main user counter, kernel memory can
-never be limited completely independently of user memory. Say "U" is the user
-limit, and "K" the kernel limit. There are three possible ways limits can be
-set:
-
-U != 0, K = unlimited:
-    This is the standard memcg limitation mechanism already present before kmem
-    accounting. Kernel memory is completely ignored.
-
-U != 0, K < U:
-    Kernel memory is a subset of the user memory. This setup is useful in
-    deployments where the total amount of memory per-cgroup is overcommited.
-    Overcommiting kernel memory limits is definitely not recommended, since the
-    box can still run out of non-reclaimable memory.
-    In this case, the admin could set up K so that the sum of all groups is
-    never greater than the total memory, and freely set U at the cost of his
-    QoS.
-
-WARNING:
-    In the current implementation, memory reclaim will NOT be
-    triggered for a cgroup when it hits K while staying below U, which makes
-    this setup impractical.
-
-U != 0, K >= U:
-    Since kmem charges will also be fed to the user counter and reclaim will be
-    triggered for the cgroup for both kinds of memory. This setup gives the
-    admin a unified view of memory, and it is also useful for people who just
-    want to track kernel memory usage.
-
-3. User Interface
-=================
-
-3.0. Configuration
-------------------
-
-a. Enable CONFIG_CGROUPS
-b. Enable CONFIG_MEMCG
-c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
-d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
-
-3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
--------------------------------------------------------------------
-
-::
-
-	# mount -t tmpfs none /sys/fs/cgroup
-	# mkdir /sys/fs/cgroup/memory
-	# mount -t cgroup none /sys/fs/cgroup/memory -o memory
-
-3.2. Make the new group and move bash into it::
-
-	# mkdir /sys/fs/cgroup/memory/0
-	# echo $$ > /sys/fs/cgroup/memory/0/tasks
-
-Since now we're in the 0 cgroup, we can alter the memory limit::
-
-	# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
-
-NOTE:
-  We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
-  mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
-  Gibibytes.)
-
-NOTE:
-  We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
-
-NOTE:
-  We cannot set limits on the root cgroup any more.
-
-::
-
-  # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
-  4194304
-
-We can check the usage::
-
-  # cat /sys/fs/cgroup/memory/0/memory.usage_in_bytes
-  1216512
-
-A successful write to this file does not guarantee a successful setting of
-this limit to the value written into the file. This can be due to a
-number of factors, such as rounding up to page boundaries or the total
-availability of memory on the system. The user is required to re-read
-this file after a write to guarantee the value committed by the kernel::
-
-  # echo 1 > memory.limit_in_bytes
-  # cat memory.limit_in_bytes
-  4096
-
-The memory.failcnt field gives the number of times that the cgroup limit was
-exceeded.
-
-The memory.stat file gives accounting information. Now, the number of
-caches, RSS and Active pages/Inactive pages are shown.
-
-4. Testing
-==========
-
-For testing features and implementation, see memcg_test.txt.
-
-Performance test is also important. To see pure memory controller's overhead,
-testing on tmpfs will give you good numbers of small overheads.
-Example: do kernel make on tmpfs.
-
-Page-fault scalability is also important. At measuring parallel
-page fault test, multi-process test may be better than multi-thread
-test because it has noise of shared objects/status.
-
-But the above two are testing extreme situations.
-Trying usual test under memory controller is always helpful.
-
-4.1 Troubleshooting
--------------------
-
-Sometimes a user might find that the application under a cgroup is
-terminated by the OOM killer. There are several causes for this:
-
-1. The cgroup limit is too low (just too low to do anything useful)
-2. The user is using anonymous memory and swap is turned off or too low
-
-A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
-some of the pages cached in the cgroup (page cache pages).
-
-To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
-seeing what happens will be helpful.
-
-4.2 Task migration
-------------------
-
-When a task migrates from one cgroup to another, its charge is not
-carried forward by default. The pages allocated from the original cgroup still
-remain charged to it, the charge is dropped when the page is freed or
-reclaimed.
-
-You can move charges of a task along with task migration.
-See 8. "Move charges at task migration"
-
-4.3 Removing a cgroup
----------------------
-
-A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
-cgroup might have some charge associated with it, even though all
-tasks have migrated away from it. (because we charge against pages, not
-against tasks.)
-
-We move the stats to root (if use_hierarchy==0) or parent (if
-use_hierarchy==1), and no change on the charge except uncharging
-from the child.
-
-Charges recorded in swap information is not updated at removal of cgroup.
-Recorded information is discarded and a cgroup which uses swap (swapcache)
-will be charged as a new owner of it.
-
-About use_hierarchy, see Section 6.
-
-5. Misc. interfaces
-===================
-
-5.1 force_empty
----------------
-  memory.force_empty interface is provided to make cgroup's memory usage empty.
-  When writing anything to this::
-
-    # echo 0 > memory.force_empty
-
-  the cgroup will be reclaimed and as many pages reclaimed as possible.
-
-  The typical use case for this interface is before calling rmdir().
-  Though rmdir() offlines memcg, but the memcg may still stay there due to
-  charged file caches. Some out-of-use page caches may keep charged until
-  memory pressure happens. If you want to avoid that, force_empty will be useful.
-
-  Also, note that when memory.kmem.limit_in_bytes is set the charges due to
-  kernel pages will still be seen. This is not considered a failure and the
-  write will still return success. In this case, it is expected that
-  memory.kmem.usage_in_bytes == memory.usage_in_bytes.
-
-  About use_hierarchy, see Section 6.
-
-5.2 stat file
--------------
-
-memory.stat file includes following statistics
-
-per-memory cgroup local status
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-=============== ===============================================================
-cache		# of bytes of page cache memory.
-rss		# of bytes of anonymous and swap cache memory (includes
-		transparent hugepages).
-rss_huge	# of bytes of anonymous transparent hugepages.
-mapped_file	# of bytes of mapped file (includes tmpfs/shmem)
-pgpgin		# of charging events to the memory cgroup. The charging
-		event happens each time a page is accounted as either mapped
-		anon page(RSS) or cache page(Page Cache) to the cgroup.
-pgpgout		# of uncharging events to the memory cgroup. The uncharging
-		event happens each time a page is unaccounted from the cgroup.
-swap		# of bytes of swap usage
-dirty		# of bytes that are waiting to get written back to the disk.
-writeback	# of bytes of file/anon cache that are queued for syncing to
-		disk.
-inactive_anon	# of bytes of anonymous and swap cache memory on inactive
-		LRU list.
-active_anon	# of bytes of anonymous and swap cache memory on active
-		LRU list.
-inactive_file	# of bytes of file-backed memory on inactive LRU list.
-active_file	# of bytes of file-backed memory on active LRU list.
-unevictable	# of bytes of memory that cannot be reclaimed (mlocked etc).
-=============== ===============================================================
-
-status considering hierarchy (see memory.use_hierarchy settings)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-========================= ===================================================
-hierarchical_memory_limit # of bytes of memory limit with regard to hierarchy
-			  under which the memory cgroup is
-hierarchical_memsw_limit  # of bytes of memory+swap limit with regard to
-			  hierarchy under which memory cgroup is.
-
-total_<counter>		  # hierarchical version of <counter>, which in
-			  addition to the cgroup's own value includes the
-			  sum of all hierarchical children's values of
-			  <counter>, i.e. total_cache
-========================= ===================================================
-
-The following additional stats are dependent on CONFIG_DEBUG_VM
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-========================= ========================================
-recent_rotated_anon	  VM internal parameter. (see mm/vmscan.c)
-recent_rotated_file	  VM internal parameter. (see mm/vmscan.c)
-recent_scanned_anon	  VM internal parameter. (see mm/vmscan.c)
-recent_scanned_file	  VM internal parameter. (see mm/vmscan.c)
-========================= ========================================
-
-Memo:
-	recent_rotated means recent frequency of LRU rotation.
-	recent_scanned means recent # of scans to LRU.
-	showing for better debug please see the code for meanings.
-
-Note:
-	Only anonymous and swap cache memory is listed as part of 'rss' stat.
-	This should not be confused with the true 'resident set size' or the
-	amount of physical memory used by the cgroup.
-
-	'rss + mapped_file" will give you resident set size of cgroup.
-
-	(Note: file and shmem may be shared among other cgroups. In that case,
-	mapped_file is accounted only when the memory cgroup is owner of page
-	cache.)
-
-5.3 swappiness
---------------
-
-Overrides /proc/sys/vm/swappiness for the particular group. The tunable
-in the root cgroup corresponds to the global swappiness setting.
-
-Please note that unlike during the global reclaim, limit reclaim
-enforces that 0 swappiness really prevents from any swapping even if
-there is a swap storage available. This might lead to memcg OOM killer
-if there are no file pages to reclaim.
-
-5.4 failcnt
------------
-
-A memory cgroup provides memory.failcnt and memory.memsw.failcnt files.
-This failcnt(== failure count) shows the number of times that a usage counter
-hit its limit. When a memory cgroup hits a limit, failcnt increases and
-memory under it will be reclaimed.
-
-You can reset failcnt by writing 0 to failcnt file::
-
-	# echo 0 > .../memory.failcnt
-
-5.5 usage_in_bytes
-------------------
-
-For efficiency, as other kernel components, memory cgroup uses some optimization
-to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
-method and doesn't show 'exact' value of memory (and swap) usage, it's a fuzz
-value for efficient access. (Of course, when necessary, it's synchronized.)
-If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
-value in memory.stat(see 5.2).
-
-5.6 numa_stat
--------------
-
-This is similar to numa_maps but operates on a per-memcg basis.  This is
-useful for providing visibility into the numa locality information within
-an memcg since the pages are allowed to be allocated from any physical
-node.  One of the use cases is evaluating application performance by
-combining this information with the application's CPU allocation.
-
-Each memcg's numa_stat file includes "total", "file", "anon" and "unevictable"
-per-node page counts including "hierarchical_<counter>" which sums up all
-hierarchical children's values in addition to the memcg's own value.
-
-The output format of memory.numa_stat is::
-
-  total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
-  file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
-  anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
-  unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
-  hierarchical_<counter>=<counter pages> N0=<node 0 pages> N1=<node 1 pages> ...
-
-The "total" count is sum of file + anon + unevictable.
-
-6. Hierarchy support
-====================
-
-The memory controller supports a deep hierarchy and hierarchical accounting.
-The hierarchy is created by creating the appropriate cgroups in the
-cgroup filesystem. Consider for example, the following cgroup filesystem
-hierarchy::
-
-	       root
-	     /  |   \
-            /	|    \
-	   a	b     c
-		      | \
-		      |  \
-		      d   e
-
-In the diagram above, with hierarchical accounting enabled, all memory
-usage of e, is accounted to its ancestors up until the root (i.e, c and root),
-that has memory.use_hierarchy enabled. If one of the ancestors goes over its
-limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
-children of the ancestor.
-
-6.1 Enabling hierarchical accounting and reclaim
-------------------------------------------------
-
-A memory cgroup by default disables the hierarchy feature. Support
-can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup::
-
-	# echo 1 > memory.use_hierarchy
-
-The feature can be disabled by::
-
-	# echo 0 > memory.use_hierarchy
-
-NOTE1:
-       Enabling/disabling will fail if either the cgroup already has other
-       cgroups created below it, or if the parent cgroup has use_hierarchy
-       enabled.
-
-NOTE2:
-       When panic_on_oom is set to "2", the whole system will panic in
-       case of an OOM event in any cgroup.
-
-7. Soft limits
-==============
-
-Soft limits allow for greater sharing of memory. The idea behind soft limits
-is to allow control groups to use as much of the memory as needed, provided
-
-a. There is no memory contention
-b. They do not exceed their hard limit
-
-When the system detects memory contention or low memory, control groups
-are pushed back to their soft limits. If the soft limit of each control
-group is very high, they are pushed back as much as possible to make
-sure that one control group does not starve the others of memory.
-
-Please note that soft limits is a best-effort feature; it comes with
-no guarantees, but it does its best to make sure that when memory is
-heavily contended for, memory is allocated based on the soft limit
-hints/setup. Currently soft limit based reclaim is set up such that
-it gets invoked from balance_pgdat (kswapd).
-
-7.1 Interface
--------------
-
-Soft limits can be setup by using the following commands (in this example we
-assume a soft limit of 256 MiB)::
-
-	# echo 256M > memory.soft_limit_in_bytes
-
-If we want to change this to 1G, we can at any time use::
-
-	# echo 1G > memory.soft_limit_in_bytes
-
-NOTE1:
-       Soft limits take effect over a long period of time, since they involve
-       reclaiming memory for balancing between memory cgroups
-NOTE2:
-       It is recommended to set the soft limit always below the hard limit,
-       otherwise the hard limit will take precedence.
-
-8. Move charges at task migration
-=================================
-
-Users can move charges associated with a task along with task migration, that
-is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
-This feature is not supported in !CONFIG_MMU environments because of lack of
-page tables.
-
-8.1 Interface
--------------
-
-This feature is disabled by default. It can be enabled (and disabled again) by
-writing to memory.move_charge_at_immigrate of the destination cgroup.
-
-If you want to enable it::
-
-	# echo (some positive value) > memory.move_charge_at_immigrate
-
-Note:
-      Each bits of move_charge_at_immigrate has its own meaning about what type
-      of charges should be moved. See 8.2 for details.
-Note:
-      Charges are moved only when you move mm->owner, in other words,
-      a leader of a thread group.
-Note:
-      If we cannot find enough space for the task in the destination cgroup, we
-      try to make space by reclaiming memory. Task migration may fail if we
-      cannot make enough space.
-Note:
-      It can take several seconds if you move charges much.
-
-And if you want disable it again::
-
-	# echo 0 > memory.move_charge_at_immigrate
-
-8.2 Type of charges which can be moved
---------------------------------------
-
-Each bit in move_charge_at_immigrate has its own meaning about what type of
-charges should be moved. But in any case, it must be noted that an account of
-a page or a swap can be moved only when it is charged to the task's current
-(old) memory cgroup.
-
-+---+--------------------------------------------------------------------------+
-|bit| what type of charges would be moved ?                                    |
-+===+==========================================================================+
-| 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
-|   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
-+---+--------------------------------------------------------------------------+
-| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
-|   | and swaps of tmpfs file) mmapped by the target task. Unlike the case of  |
-|   | anonymous pages, file pages (and swaps) in the range mmapped by the task |
-|   | will be moved even if the task hasn't done page fault, i.e. they might   |
-|   | not be the task's "RSS", but other task's "RSS" that maps the same file. |
-|   | And mapcount of the page is ignored (the page can be moved even if       |
-|   | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to    |
-|   | enable move of swap charges.                                             |
-+---+--------------------------------------------------------------------------+
-
-8.3 TODO
---------
-
-- All of moving charge operations are done under cgroup_mutex. It's not good
-  behavior to hold the mutex too long, so we may need some trick.
-
-9. Memory thresholds
-====================
-
-Memory cgroup implements memory thresholds using the cgroups notification
-API (see cgroups.txt). It allows to register multiple memory and memsw
-thresholds and gets notifications when it crosses.
-
-To register a threshold, an application must:
-
-- create an eventfd using eventfd(2);
-- open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
-- write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to
-  cgroup.event_control.
-
-Application will be notified through eventfd when memory usage crosses
-threshold in any direction.
-
-It's applicable for root and non-root cgroup.
-
-10. OOM Control
-===============
-
-memory.oom_control file is for OOM notification and other controls.
-
-Memory cgroup implements OOM notifier using the cgroup notification
-API (See cgroups.txt). It allows to register multiple OOM notification
-delivery and gets notification when OOM happens.
-
-To register a notifier, an application must:
-
- - create an eventfd using eventfd(2)
- - open memory.oom_control file
- - write string like "<event_fd> <fd of memory.oom_control>" to
-   cgroup.event_control
-
-The application will be notified through eventfd when OOM happens.
-OOM notification doesn't work for the root cgroup.
-
-You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
-
-	#echo 1 > memory.oom_control
-
-If OOM-killer is disabled, tasks under cgroup will hang/sleep
-in memory cgroup's OOM-waitqueue when they request accountable memory.
-
-For running them, you have to relax the memory cgroup's OOM status by
-
-	* enlarge limit or reduce usage.
-
-To reduce usage,
-
-	* kill some tasks.
-	* move some tasks to other group with account migration.
-	* remove some files (on tmpfs?)
-
-Then, stopped tasks will work again.
-
-At reading, current status of OOM is shown.
-
-	- oom_kill_disable 0 or 1
-	  (if 1, oom-killer is disabled)
-	- under_oom	   0 or 1
-	  (if 1, the memory cgroup is under OOM, tasks may be stopped.)
-
-11. Memory Pressure
-===================
-
-The pressure level notifications can be used to monitor the memory
-allocation cost; based on the pressure, applications can implement
-different strategies of managing their memory resources. The pressure
-levels are defined as following:
-
-The "low" level means that the system is reclaiming memory for new
-allocations. Monitoring this reclaiming activity might be useful for
-maintaining cache level. Upon notification, the program (typically
-"Activity Manager") might analyze vmstat and act in advance (i.e.
-prematurely shutdown unimportant services).
-
-The "medium" level means that the system is experiencing medium memory
-pressure, the system might be making swap, paging out active file caches,
-etc. Upon this event applications may decide to further analyze
-vmstat/zoneinfo/memcg or internal memory usage statistics and free any
-resources that can be easily reconstructed or re-read from a disk.
-
-The "critical" level means that the system is actively thrashing, it is
-about to out of memory (OOM) or even the in-kernel OOM killer is on its
-way to trigger. Applications should do whatever they can to help the
-system. It might be too late to consult with vmstat or any other
-statistics, so it's advisable to take an immediate action.
-
-By default, events are propagated upward until the event is handled, i.e. the
-events are not pass-through. For example, you have three cgroups: A->B->C. Now
-you set up an event listener on cgroups A, B and C, and suppose group C
-experiences some pressure. In this situation, only group C will receive the
-notification, i.e. groups A and B will not receive it. This is done to avoid
-excessive "broadcasting" of messages, which disturbs the system and which is
-especially bad if we are low on memory or thrashing. Group B, will receive
-notification only if there are no event listers for group C.
-
-There are three optional modes that specify different propagation behavior:
-
- - "default": this is the default behavior specified above. This mode is the
-   same as omitting the optional mode parameter, preserved by backwards
-   compatibility.
-
- - "hierarchy": events always propagate up to the root, similar to the default
-   behavior, except that propagation continues regardless of whether there are
-   event listeners at each level, with the "hierarchy" mode. In the above
-   example, groups A, B, and C will receive notification of memory pressure.
-
- - "local": events are pass-through, i.e. they only receive notifications when
-   memory pressure is experienced in the memcg for which the notification is
-   registered. In the above example, group C will receive notification if
-   registered for "local" notification and the group experiences memory
-   pressure. However, group B will never receive notification, regardless if
-   there is an event listener for group C or not, if group B is registered for
-   local notification.
-
-The level and event notification mode ("hierarchy" or "local", if necessary) are
-specified by a comma-delimited string, i.e. "low,hierarchy" specifies
-hierarchical, pass-through, notification for all ancestor memcgs. Notification
-that is the default, non pass-through behavior, does not specify a mode.
-"medium,local" specifies pass-through notification for the medium level.
-
-The file memory.pressure_level is only used to setup an eventfd. To
-register a notification, an application must:
-
-- create an eventfd using eventfd(2);
-- open memory.pressure_level;
-- write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>"
-  to cgroup.event_control.
-
-Application will be notified through eventfd when memory pressure is at
-the specific level (or higher). Read/write operations to
-memory.pressure_level are no implemented.
-
-Test:
-
-   Here is a small script example that makes a new cgroup, sets up a
-   memory limit, sets up a notification in the cgroup and then makes child
-   cgroup experience a critical pressure::
-
-	# cd /sys/fs/cgroup/memory/
-	# mkdir foo
-	# cd foo
-	# cgroup_event_listener memory.pressure_level low,hierarchy &
-	# echo 8000000 > memory.limit_in_bytes
-	# echo 8000000 > memory.memsw.limit_in_bytes
-	# echo $$ > tasks
-	# dd if=/dev/zero | read x
-
-   (Expect a bunch of notifications, and eventually, the oom-killer will
-   trigger.)
-
-12. TODO
-========
-
-1. Make per-cgroup scanner reclaim not-shared pages first
-2. Teach controller to account for shared-pages
-3. Start reclamation in the background when the limit is
-   not yet hit but the usage is getting closer
-
-Summary
-=======
-
-Overall, the memory controller has been a stable controller and has been
-commented and discussed quite extensively in the community.
-
-References
-==========
-
-1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
-2. Singh, Balbir. Memory Controller (RSS Control),
-   http://lwn.net/Articles/222762/
-3. Emelianov, Pavel. Resource controllers based on process cgroups
-   http://lkml.org/lkml/2007/3/6/198
-4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
-   http://lkml.org/lkml/2007/4/9/78
-5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
-   http://lkml.org/lkml/2007/5/30/244
-6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
-7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
-   subsystem (v3), http://lwn.net/Articles/235534/
-8. Singh, Balbir. RSS controller v2 test results (lmbench),
-   http://lkml.org/lkml/2007/5/17/232
-9. Singh, Balbir. RSS controller v2 AIM9 results
-   http://lkml.org/lkml/2007/5/18/1
-10. Singh, Balbir. Memory controller v6 test results,
-    http://lkml.org/lkml/2007/8/19/36
-11. Singh, Balbir. Memory controller introduction (v6),
-    http://lkml.org/lkml/2007/8/17/69
-12. Corbet, Jonathan, Controlling memory use in cgroups,
-    http://lwn.net/Articles/243795/
diff --git a/Documentation/cgroup-v1/net_cls.rst b/Documentation/cgroup-v1/net_cls.rst
deleted file mode 100644
index a2cf272af7a0..000000000000
--- a/Documentation/cgroup-v1/net_cls.rst
+++ /dev/null
@@ -1,44 +0,0 @@
-=========================
-Network classifier cgroup
-=========================
-
-The Network classifier cgroup provides an interface to
-tag network packets with a class identifier (classid).
-
-The Traffic Controller (tc) can be used to assign
-different priorities to packets from different cgroups.
-Also, Netfilter (iptables) can use this tag to perform
-actions on such packets.
-
-Creating a net_cls cgroups instance creates a net_cls.classid file.
-This net_cls.classid value is initialized to 0.
-
-You can write hexadecimal values to net_cls.classid; the format for these
-values is 0xAAAABBBB; AAAA is the major handle number and BBBB
-is the minor handle number.
-Reading net_cls.classid yields a decimal result.
-
-Example::
-
-	mkdir /sys/fs/cgroup/net_cls
-	mount -t cgroup -onet_cls net_cls /sys/fs/cgroup/net_cls
-	mkdir /sys/fs/cgroup/net_cls/0
-	echo 0x100001 >  /sys/fs/cgroup/net_cls/0/net_cls.classid
-
-- setting a 10:1 handle::
-
-	cat /sys/fs/cgroup/net_cls/0/net_cls.classid
-	1048577
-
-- configuring tc::
-
-	tc qdisc add dev eth0 root handle 10: htb
-	tc class add dev eth0 parent 10: classid 10:1 htb rate 40mbit
-
-- creating traffic class 10:1::
-
-	tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup
-
-configuring iptables, basic example::
-
-	iptables -A OUTPUT -m cgroup ! --cgroup 0x100001 -j DROP
diff --git a/Documentation/cgroup-v1/net_prio.rst b/Documentation/cgroup-v1/net_prio.rst
deleted file mode 100644
index b40905871c64..000000000000
--- a/Documentation/cgroup-v1/net_prio.rst
+++ /dev/null
@@ -1,57 +0,0 @@
-=======================
-Network priority cgroup
-=======================
-
-The Network priority cgroup provides an interface to allow an administrator to
-dynamically set the priority of network traffic generated by various
-applications
-
-Nominally, an application would set the priority of its traffic via the
-SO_PRIORITY socket option.  This however, is not always possible because:
-
-1) The application may not have been coded to set this value
-2) The priority of application traffic is often a site-specific administrative
-   decision rather than an application defined one.
-
-This cgroup allows an administrator to assign a process to a group which defines
-the priority of egress traffic on a given interface. Network priority groups can
-be created by first mounting the cgroup filesystem::
-
-	# mount -t cgroup -onet_prio none /sys/fs/cgroup/net_prio
-
-With the above step, the initial group acting as the parent accounting group
-becomes visible at '/sys/fs/cgroup/net_prio'.  This group includes all tasks in
-the system. '/sys/fs/cgroup/net_prio/tasks' lists the tasks in this cgroup.
-
-Each net_prio cgroup contains two files that are subsystem specific
-
-net_prio.prioidx
-  This file is read-only, and is simply informative.  It contains a unique
-  integer value that the kernel uses as an internal representation of this
-  cgroup.
-
-net_prio.ifpriomap
-  This file contains a map of the priorities assigned to traffic originating
-  from processes in this group and egressing the system on various interfaces.
-  It contains a list of tuples in the form <ifname priority>.  Contents of this
-  file can be modified by echoing a string into the file using the same tuple
-  format. For example::
-
-	echo "eth0 5" > /sys/fs/cgroups/net_prio/iscsi/net_prio.ifpriomap
-
-This command would force any traffic originating from processes belonging to the
-iscsi net_prio cgroup and egressing on interface eth0 to have the priority of
-said traffic set to the value 5. The parent accounting group also has a
-writeable 'net_prio.ifpriomap' file that can be used to set a system default
-priority.
-
-Priorities are set immediately prior to queueing a frame to the device
-queueing discipline (qdisc) so priorities will be assigned prior to the hardware
-queue selection being made.
-
-One usage for the net_prio cgroup is with mqprio qdisc allowing application
-traffic to be steered to hardware/driver based traffic classes. These mappings
-can then be managed by administrators or other networking protocols such as
-DCBX.
-
-A new net_prio cgroup inherits the parent's configuration.
diff --git a/Documentation/cgroup-v1/pids.rst b/Documentation/cgroup-v1/pids.rst
deleted file mode 100644
index 6acebd9e72c8..000000000000
--- a/Documentation/cgroup-v1/pids.rst
+++ /dev/null
@@ -1,92 +0,0 @@
-=========================
-Process Number Controller
-=========================
-
-Abstract
---------
-
-The process number controller is used to allow a cgroup hierarchy to stop any
-new tasks from being fork()'d or clone()'d after a certain limit is reached.
-
-Since it is trivial to hit the task limit without hitting any kmemcg limits in
-place, PIDs are a fundamental resource. As such, PID exhaustion must be
-preventable in the scope of a cgroup hierarchy by allowing resource limiting of
-the number of tasks in a cgroup.
-
-Usage
------
-
-In order to use the `pids` controller, set the maximum number of tasks in
-pids.max (this is not available in the root cgroup for obvious reasons). The
-number of processes currently in the cgroup is given by pids.current.
-
-Organisational operations are not blocked by cgroup policies, so it is possible
-to have pids.current > pids.max. This can be done by either setting the limit to
-be smaller than pids.current, or attaching enough processes to the cgroup such
-that pids.current > pids.max. However, it is not possible to violate a cgroup
-policy through fork() or clone(). fork() and clone() will return -EAGAIN if the
-creation of a new process would cause a cgroup policy to be violated.
-
-To set a cgroup to have no limit, set pids.max to "max". This is the default for
-all new cgroups (N.B. that PID limits are hierarchical, so the most stringent
-limit in the hierarchy is followed).
-
-pids.current tracks all child cgroup hierarchies, so parent/pids.current is a
-superset of parent/child/pids.current.
-
-The pids.events file contains event counters:
-
-  - max: Number of times fork failed because limit was hit.
-
-Example
--------
-
-First, we mount the pids controller::
-
-	# mkdir -p /sys/fs/cgroup/pids
-	# mount -t cgroup -o pids none /sys/fs/cgroup/pids
-
-Then we create a hierarchy, set limits and attach processes to it::
-
-	# mkdir -p /sys/fs/cgroup/pids/parent/child
-	# echo 2 > /sys/fs/cgroup/pids/parent/pids.max
-	# echo $$ > /sys/fs/cgroup/pids/parent/cgroup.procs
-	# cat /sys/fs/cgroup/pids/parent/pids.current
-	2
-	#
-
-It should be noted that attempts to overcome the set limit (2 in this case) will
-fail::
-
-	# cat /sys/fs/cgroup/pids/parent/pids.current
-	2
-	# ( /bin/echo "Here's some processes for you." | cat )
-	sh: fork: Resource temporary unavailable
-	#
-
-Even if we migrate to a child cgroup (which doesn't have a set limit), we will
-not be able to overcome the most stringent limit in the hierarchy (in this case,
-parent's)::
-
-	# echo $$ > /sys/fs/cgroup/pids/parent/child/cgroup.procs
-	# cat /sys/fs/cgroup/pids/parent/pids.current
-	2
-	# cat /sys/fs/cgroup/pids/parent/child/pids.current
-	2
-	# cat /sys/fs/cgroup/pids/parent/child/pids.max
-	max
-	# ( /bin/echo "Here's some processes for you." | cat )
-	sh: fork: Resource temporary unavailable
-	#
-
-We can set a limit that is smaller than pids.current, which will stop any new
-processes from being forked at all (note that the shell itself counts towards
-pids.current)::
-
-	# echo 1 > /sys/fs/cgroup/pids/parent/pids.max
-	# /bin/echo "We can't even spawn a single process now."
-	sh: fork: Resource temporary unavailable
-	# echo 0 > /sys/fs/cgroup/pids/parent/pids.max
-	# /bin/echo "We can't even spawn a single process now."
-	sh: fork: Resource temporary unavailable
-	#
diff --git a/Documentation/cgroup-v1/rdma.rst b/Documentation/cgroup-v1/rdma.rst
deleted file mode 100644
index 2fcb0a9bf790..000000000000
--- a/Documentation/cgroup-v1/rdma.rst
+++ /dev/null
@@ -1,117 +0,0 @@
-===============
-RDMA Controller
-===============
-
-.. Contents
-
-   1. Overview
-     1-1. What is RDMA controller?
-     1-2. Why RDMA controller needed?
-     1-3. How is RDMA controller implemented?
-   2. Usage Examples
-
-1. Overview
-===========
-
-1-1. What is RDMA controller?
------------------------------
-
-RDMA controller allows user to limit RDMA/IB specific resources that a given
-set of processes can use. These processes are grouped using RDMA controller.
-
-RDMA controller defines two resources which can be limited for processes of a
-cgroup.
-
-1-2. Why RDMA controller needed?
---------------------------------
-
-Currently user space applications can easily take away all the rdma verb
-specific resources such as AH, CQ, QP, MR etc. Due to which other applications
-in other cgroup or kernel space ULPs may not even get chance to allocate any
-rdma resources. This can lead to service unavailability.
-
-Therefore RDMA controller is needed through which resource consumption
-of processes can be limited. Through this controller different rdma
-resources can be accounted.
-
-1-3. How is RDMA controller implemented?
-----------------------------------------
-
-RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
-resource accounting per cgroup, per device using resource pool structure.
-Each such resource pool is limited up to 64 resources in given resource pool
-by rdma cgroup, which can be extended later if required.
-
-This resource pool object is linked to the cgroup css. Typically there
-are 0 to 4 resource pool instances per cgroup, per device in most use cases.
-But nothing limits to have it more. At present hundreds of RDMA devices per
-single cgroup may not be handled optimally, however there is no
-known use case or requirement for such configuration either.
-
-Since RDMA resources can be allocated from any process and can be freed by any
-of the child processes which shares the address space, rdma resources are
-always owned by the creator cgroup css. This allows process migration from one
-to other cgroup without major complexity of transferring resource ownership;
-because such ownership is not really present due to shared nature of
-rdma resources. Linking resources around css also ensures that cgroups can be
-deleted after processes migrated. This allow progress migration as well with
-active resources, even though that is not a primary use case.
-
-Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
-the caller. Same rdma cgroup should be passed while uncharging the resource.
-This also allows process migrated with active RDMA resource to charge
-to new owner cgroup for new resource. It also allows to uncharge resource of
-a process from previously charged cgroup which is migrated to new cgroup,
-even though that is not a primary use case.
-
-Resource pool object is created in following situations.
-(a) User sets the limit and no previous resource pool exist for the device
-of interest for the cgroup.
-(b) No resource limits were configured, but IB/RDMA stack tries to
-charge the resource. So that it correctly uncharge them when applications are
-running without limits and later on when limits are enforced during uncharging,
-otherwise usage count will drop to negative.
-
-Resource pool is destroyed if all the resource limits are set to max and
-it is the last resource getting deallocated.
-
-User should set all the limit to max value if it intents to remove/unconfigure
-the resource pool for a particular device.
-
-IB stack honors limits enforced by the rdma controller. When application
-query about maximum resource limits of IB device, it returns minimum of
-what is configured by user for a given cgroup and what is supported by
-IB device.
-
-Following resources can be accounted by rdma controller.
-
-  ==========    =============================
-  hca_handle	Maximum number of HCA Handles
-  hca_object 	Maximum number of HCA Objects
-  ==========    =============================
-
-2. Usage Examples
-=================
-
-(a) Configure resource limit::
-
-	echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
-	echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
-
-(b) Query resource limit::
-
-	cat /sys/fs/cgroup/rdma/2/rdma.max
-	#Output:
-	mlx4_0 hca_handle=2 hca_object=2000
-	ocrdma1 hca_handle=3 hca_object=max
-
-(c) Query current usage::
-
-	cat /sys/fs/cgroup/rdma/2/rdma.current
-	#Output:
-	mlx4_0 hca_handle=1 hca_object=20
-	ocrdma1 hca_handle=1 hca_object=23
-
-(d) Delete resource limit::
-
-	echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index cad797a8a39e..5ecbc03e6b2f 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -98,7 +98,7 @@ A memory policy with a valid NodeList will be saved, as specified, for
 use at file creation time.  When a task allocates a file in the file
 system, the mount option memory policy will be applied with a NodeList,
 if any, modified by the calling task's cpuset constraints
-[See Documentation/cgroup-v1/cpusets.rst] and any optional flags, listed
+[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed
 below.  If the resulting NodeLists is the empty set, the effective memory
 policy for the file will revert to "default" policy.
 
diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt
index 5623b9916411..4f18456dd3b1 100644
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ b/Documentation/kernel-per-CPU-kthreads.txt
@@ -12,7 +12,7 @@ References
 
 -	Documentation/IRQ-affinity.txt:  Binding interrupts to sets of CPUs.
 
--	Documentation/cgroup-v1:  Using cgroups to bind tasks to sets of CPUs.
+-	Documentation/admin-guide/cgroup-v1:  Using cgroups to bind tasks to sets of CPUs.
 
 -	man taskset:  Using the taskset command to bind tasks to sets
 	of CPUs.
diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index 3391e86d810c..14a2f7bf63fe 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -669,7 +669,7 @@ Deadline Task Scheduling
 
  -deadline tasks cannot have an affinity mask smaller that the entire
  root_domain they are created on. However, affinities can be specified
- through the cpuset facility (Documentation/cgroup-v1/cpusets.rst).
+ through the cpuset facility (Documentation/admin-guide/cgroup-v1/cpusets.rst).
 
 5.1 SCHED_DEADLINE and cpusets HOWTO
 ------------------------------------
diff --git a/Documentation/scheduler/sched-design-CFS.rst b/Documentation/scheduler/sched-design-CFS.rst
index 53b30d1967cf..a96c72651877 100644
--- a/Documentation/scheduler/sched-design-CFS.rst
+++ b/Documentation/scheduler/sched-design-CFS.rst
@@ -222,7 +222,7 @@ SCHED_BATCH) tasks.
 
    These options need CONFIG_CGROUPS to be defined, and let the administrator
    create arbitrary groups of tasks, using the "cgroup" pseudo filesystem.  See
-   Documentation/cgroup-v1/cgroups.rst for more information about this filesystem.
+   Documentation/admin-guide/cgroup-v1/cgroups.rst for more information about this filesystem.
 
 When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
 group created using the pseudo filesystem.  See example steps below to create
diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/scheduler/sched-rt-group.rst
index d27d3f3712fd..655a096ec8fb 100644
--- a/Documentation/scheduler/sched-rt-group.rst
+++ b/Documentation/scheduler/sched-rt-group.rst
@@ -133,7 +133,7 @@ This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
 to control the CPU time reserved for each control group.
 
 For more information on working with control groups, you should read
-Documentation/cgroup-v1/cgroups.rst as well.
+Documentation/admin-guide/cgroup-v1/cgroups.rst as well.
 
 Group settings are checked against the following limits in order to keep the
 configuration schedulable:
diff --git a/Documentation/vm/numa.rst b/Documentation/vm/numa.rst
index 130f3cfa1c19..99fdeca917ca 100644
--- a/Documentation/vm/numa.rst
+++ b/Documentation/vm/numa.rst
@@ -67,7 +67,7 @@ nodes.  Each emulated node will manage a fraction of the underlying cells'
 physical memory.  NUMA emluation is useful for testing NUMA kernel and
 application features on non-NUMA platforms, and as a sort of memory resource
 management mechanism when used together with cpusets.
-[see Documentation/cgroup-v1/cpusets.rst]
+[see Documentation/admin-guide/cgroup-v1/cpusets.rst]
 
 For each node with memory, Linux constructs an independent memory management
 subsystem, complete with its own free page lists, in-use page lists, usage
@@ -114,7 +114,7 @@ allocation behavior using Linux NUMA memory policy. [see
 
 System administrators can restrict the CPUs and nodes' memories that a non-
 privileged user can specify in the scheduling or NUMA commands and functions
-using control groups and CPUsets.  [see Documentation/cgroup-v1/cpusets.rst]
+using control groups and CPUsets.  [see Documentation/admin-guide/cgroup-v1/cpusets.rst]
 
 On architectures that do not hide memoryless nodes, Linux will include only
 zones [nodes] with memory in the zonelists.  This means that for a memoryless
diff --git a/Documentation/vm/page_migration.rst b/Documentation/vm/page_migration.rst
index 35bba27d5fff..1d6cd7db4e43 100644
--- a/Documentation/vm/page_migration.rst
+++ b/Documentation/vm/page_migration.rst
@@ -41,7 +41,7 @@ locations.
 Larger installations usually partition the system using cpusets into
 sections of nodes. Paul Jackson has equipped cpusets with the ability to
 move pages when a task is moved to another cpuset (See
-Documentation/cgroup-v1/cpusets.rst).
+Documentation/admin-guide/cgroup-v1/cpusets.rst).
 Cpusets allows the automation of process locality. If a task is moved to
 a new cpuset then also all its pages are moved with it so that the
 performance of the process does not sink dramatically. Also the pages
diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
index 109052215bce..17d0861b0f1d 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -98,7 +98,7 @@ Memory Control Group Interaction
 --------------------------------
 
 The unevictable LRU facility interacts with the memory control group [aka
-memory controller; see Documentation/cgroup-v1/memory.rst] by extending the
+memory controller; see Documentation/admin-guide/cgroup-v1/memory.rst] by extending the
 lru_list enum.
 
 The memory controller data structure automatically gets a per-zone unevictable
diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
index 30108684ae87..ff9bcfd2cc14 100644
--- a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
+++ b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
@@ -15,7 +15,7 @@ assign them to cpusets and their attached tasks.  This is a way of limiting the
 amount of system memory that are available to a certain class of tasks.
 
 For more information on the features of cpusets, see
-Documentation/cgroup-v1/cpusets.rst.
+Documentation/admin-guide/cgroup-v1/cpusets.rst.
 There are a number of different configurations you can use for your needs.  For
 more information on the numa=fake command line option and its various ways of
 configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst.
@@ -40,7 +40,7 @@ A machine may be split as follows with "numa=fake=4*512," as reported by dmesg::
 	On node 3 totalpages: 131072
 
 Now following the instructions for mounting the cpusets filesystem from
-Documentation/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory
+Documentation/admin-guide/cgroup-v1/cpusets.rst, you can assign fake nodes (i.e. contiguous memory
 address spaces) to individual cpusets::
 
 	[root@xroads /]# mkdir exampleset
diff --git a/MAINTAINERS b/MAINTAINERS
index 0c603ea73034..c1593a668f80 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4158,7 +4158,7 @@ L:	cgroups@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
 S:	Maintained
 F:	Documentation/admin-guide/cgroup-v2.rst
-F:	Documentation/cgroup-v1/
+F:	Documentation/admin-guide/cgroup-v1/
 F:	include/linux/cgroup*
 F:	kernel/cgroup/
 
@@ -4169,7 +4169,7 @@ W:	http://www.bullopensource.org/cpuset/
 W:	http://oss.sgi.com/projects/cpusets/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
 S:	Maintained
-F:	Documentation/cgroup-v1/cpusets.rst
+F:	Documentation/admin-guide/cgroup-v1/cpusets.rst
 F:	include/linux/cpuset.h
 F:	kernel/cgroup/cpuset.c
 
diff --git a/block/Kconfig b/block/Kconfig
index b16b3e075d31..8b5f8e560eb4 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -89,7 +89,7 @@ config BLK_DEV_THROTTLING
 	one needs to mount and use blkio cgroup controller for creating
 	cgroups and specifying per device IO rate policies.
 
-	See Documentation/cgroup-v1/blkio-controller.rst for more information.
+	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
 config BLK_DEV_THROTTLING_LOW
 	bool "Block throttling .low limit interface support (EXPERIMENTAL)"
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index c5311935239d..430e219e3aba 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -624,7 +624,7 @@ struct cftype {
 
 /*
  * Control Group subsystem type.
- * See Documentation/cgroup-v1/cgroups.rst for details
+ * See Documentation/admin-guide/cgroup-v1/cgroups.rst for details
  */
 struct cgroup_subsys {
 	struct cgroup_subsys_state *(*css_alloc)(struct cgroup_subsys_state *parent_css);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6f68438aa4ed..82699845ef79 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -806,7 +806,7 @@ union bpf_attr {
  * 		based on a user-provided identifier for all traffic coming from
  * 		the tasks belonging to the related cgroup. See also the related
  * 		kernel documentation, available from the Linux sources in file
- * 		*Documentation/cgroup-v1/net_cls.rst*.
+ * 		*Documentation/admin-guide/cgroup-v1/net_cls.rst*.
  *
  * 		The Linux kernel has two versions for cgroups: there are
  * 		cgroups v1 and cgroups v2. Both are available to users, who can
diff --git a/init/Kconfig b/init/Kconfig
index 9eb92ee52d40..381cdfee6e0e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -821,7 +821,7 @@ menuconfig CGROUPS
 	  controls or device isolation.
 	  See
 		- Documentation/scheduler/sched-design-CFS.rst	(CFS)
-		- Documentation/cgroup-v1/ (features for grouping, isolation
+		- Documentation/admin-guide/cgroup-v1/ (features for grouping, isolation
 					  and resource control)
 
 	  Say N if unsure.
@@ -883,7 +883,7 @@ config BLK_CGROUP
 	CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set
 	CONFIG_BLK_DEV_THROTTLING=y.
 
-	See Documentation/cgroup-v1/blkio-controller.rst for more information.
+	See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information.
 
 config CGROUP_WRITEBACK
 	bool
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b3b02b9c4405..863e434a6020 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -729,7 +729,7 @@ static inline int nr_cpusets(void)
  * load balancing domains (sched domains) as specified by that partial
  * partition.
  *
- * See "What is sched_load_balance" in Documentation/cgroup-v1/cpusets.rst
+ * See "What is sched_load_balance" in Documentation/admin-guide/cgroup-v1/cpusets.rst
  * for a background explanation of this.
  *
  * Does not return errors, on the theory that the callers of this
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index c07196502577..725674f3276d 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -509,7 +509,7 @@ static inline int may_allow_all(struct dev_cgroup *parent)
  * This is one of the three key functions for hierarchy implementation.
  * This function is responsible for re-evaluating all the cgroup's active
  * exceptions due to a parent's exception change.
- * Refer to Documentation/cgroup-v1/devices.rst for more details.
+ * Refer to Documentation/admin-guide/cgroup-v1/devices.rst for more details.
  */
 static void revalidate_active_exceptions(struct dev_cgroup *devcg)
 {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f506c68b2612..17e2b1713702 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -806,7 +806,7 @@ union bpf_attr {
  * 		based on a user-provided identifier for all traffic coming from
  * 		the tasks belonging to the related cgroup. See also the related
  * 		kernel documentation, available from the Linux sources in file
- * 		*Documentation/cgroup-v1/net_cls.rst*.
+ * 		*Documentation/admin-guide/cgroup-v1/net_cls.rst*.
  *
  * 		The Linux kernel has two versions for cgroups: there are
  * 		cgroups v1 and cgroups v2. Both are available to users, who can
-- 
cgit v1.2.3-55-g7522


From 4f4cfa6c560c93ba180c30675cf845e1597de44c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 27 Jun 2019 14:56:51 -0300
Subject: docs: admin-guide: add a series of orphaned documents

There are lots of documents that belong to the admin-guide but
are on random places (most under Documentation root dir).

Move them to the admin guide.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
---
 Documentation/ABI/stable/sysfs-devices-node        |   2 +-
 Documentation/ABI/testing/procfs-diskstats         |   2 +-
 Documentation/ABI/testing/sysfs-block              |   2 +-
 Documentation/ABI/testing/sysfs-devices-system-cpu |   4 +-
 Documentation/admin-guide/btmrvl.rst               | 124 +++++++
 Documentation/admin-guide/clearing-warn-once.rst   |   9 +
 Documentation/admin-guide/cpu-load.rst             | 114 +++++++
 Documentation/admin-guide/cputopology.rst          | 177 ++++++++++
 .../admin-guide/device-mapper/statistics.rst       |   4 +-
 Documentation/admin-guide/efi-stub.rst             | 100 ++++++
 Documentation/admin-guide/highuid.rst              |  80 +++++
 Documentation/admin-guide/hw-vuln/l1tf.rst         |   2 +-
 Documentation/admin-guide/hw_random.rst            | 105 ++++++
 Documentation/admin-guide/index.rst                |  17 +
 Documentation/admin-guide/iostats.rst              | 197 ++++++++++++
 Documentation/admin-guide/kernel-parameters.txt    |   2 +-
 .../admin-guide/kernel-per-CPU-kthreads.rst        | 356 +++++++++++++++++++++
 Documentation/admin-guide/lcd-panel-cgram.rst      |  27 ++
 Documentation/admin-guide/ldm.rst                  | 121 +++++++
 Documentation/admin-guide/lockup-watchdogs.rst     |  83 +++++
 Documentation/admin-guide/mm/cma_debugfs.rst       |  25 ++
 Documentation/admin-guide/mm/index.rst             |   1 +
 Documentation/admin-guide/numastat.rst             |  30 ++
 Documentation/admin-guide/pnp.rst                  | 292 +++++++++++++++++
 Documentation/admin-guide/rtc.rst                  | 140 ++++++++
 Documentation/admin-guide/svga.rst                 | 249 ++++++++++++++
 Documentation/admin-guide/sysctl/kernel.rst        |   2 +-
 Documentation/admin-guide/video-output.rst         |  34 ++
 Documentation/auxdisplay/lcd-panel-cgram.rst       |  29 --
 Documentation/btmrvl.txt                           | 124 -------
 Documentation/clearing-warn-once.txt               |   9 -
 Documentation/cma/debugfs.rst                      |  27 --
 Documentation/cpu-load.txt                         | 114 -------
 Documentation/cputopology.txt                      | 177 ----------
 Documentation/efi-stub.txt                         | 100 ------
 Documentation/fb/vesafb.rst                        |   2 +-
 Documentation/highuid.txt                          |  80 -----
 Documentation/hw_random.txt                        | 105 ------
 Documentation/iostats.txt                          | 197 ------------
 Documentation/kernel-per-CPU-kthreads.txt          | 356 ---------------------
 Documentation/ldm.txt                              | 121 -------
 Documentation/lockup-watchdogs.txt                 |  83 -----
 Documentation/numastat.txt                         |  30 --
 Documentation/pnp.txt                              | 292 -----------------
 Documentation/rtc.txt                              | 140 --------
 Documentation/svga.txt                             | 249 --------------
 Documentation/video-output.txt                     |  34 --
 Documentation/x86/topology.rst                     |   2 +-
 MAINTAINERS                                        |  12 +-
 arch/arm/Kconfig                                   |   2 +-
 arch/parisc/Kconfig                                |   2 +-
 arch/sh/Kconfig                                    |   2 +-
 arch/sparc/Kconfig                                 |   2 +-
 arch/x86/Kconfig                                   |   4 +-
 block/partitions/Kconfig                           |   2 +-
 drivers/char/Kconfig                               |   4 +-
 drivers/char/hw_random/core.c                      |   2 +-
 include/linux/hw_random.h                          |   2 +-
 58 files changed, 2310 insertions(+), 2296 deletions(-)
 create mode 100644 Documentation/admin-guide/btmrvl.rst
 create mode 100644 Documentation/admin-guide/clearing-warn-once.rst
 create mode 100644 Documentation/admin-guide/cpu-load.rst
 create mode 100644 Documentation/admin-guide/cputopology.rst
 create mode 100644 Documentation/admin-guide/efi-stub.rst
 create mode 100644 Documentation/admin-guide/highuid.rst
 create mode 100644 Documentation/admin-guide/hw_random.rst
 create mode 100644 Documentation/admin-guide/iostats.rst
 create mode 100644 Documentation/admin-guide/kernel-per-CPU-kthreads.rst
 create mode 100644 Documentation/admin-guide/lcd-panel-cgram.rst
 create mode 100644 Documentation/admin-guide/ldm.rst
 create mode 100644 Documentation/admin-guide/lockup-watchdogs.rst
 create mode 100644 Documentation/admin-guide/mm/cma_debugfs.rst
 create mode 100644 Documentation/admin-guide/numastat.rst
 create mode 100644 Documentation/admin-guide/pnp.rst
 create mode 100644 Documentation/admin-guide/rtc.rst
 create mode 100644 Documentation/admin-guide/svga.rst
 create mode 100644 Documentation/admin-guide/video-output.rst
 delete mode 100644 Documentation/auxdisplay/lcd-panel-cgram.rst
 delete mode 100644 Documentation/btmrvl.txt
 delete mode 100644 Documentation/clearing-warn-once.txt
 delete mode 100644 Documentation/cma/debugfs.rst
 delete mode 100644 Documentation/cpu-load.txt
 delete mode 100644 Documentation/cputopology.txt
 delete mode 100644 Documentation/efi-stub.txt
 delete mode 100644 Documentation/highuid.txt
 delete mode 100644 Documentation/hw_random.txt
 delete mode 100644 Documentation/iostats.txt
 delete mode 100644 Documentation/kernel-per-CPU-kthreads.txt
 delete mode 100644 Documentation/ldm.txt
 delete mode 100644 Documentation/lockup-watchdogs.txt
 delete mode 100644 Documentation/numastat.txt
 delete mode 100644 Documentation/pnp.txt
 delete mode 100644 Documentation/rtc.txt
 delete mode 100644 Documentation/svga.txt
 delete mode 100644 Documentation/video-output.txt

diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
index f7ce68fbd4b9..df8413cf1468 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -61,7 +61,7 @@ Date:		October 2002
 Contact:	Linux Memory Management list <linux-mm@kvack.org>
 Description:
 		The node's hit/miss statistics, in units of pages.
-		See Documentation/numastat.txt
+		See Documentation/admin-guide/numastat.rst
 
 What:		/sys/devices/system/node/nodeX/distance
 Date:		October 2002
diff --git a/Documentation/ABI/testing/procfs-diskstats b/Documentation/ABI/testing/procfs-diskstats
index abac31d216de..2c44b4f1b060 100644
--- a/Documentation/ABI/testing/procfs-diskstats
+++ b/Documentation/ABI/testing/procfs-diskstats
@@ -29,4 +29,4 @@ Description:
 		17 - sectors discarded
 		18 - time spent discarding
 
-		For more details refer to Documentation/iostats.txt
+		For more details refer to Documentation/admin-guide/iostats.rst
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index dfad7427817c..f8c7c7126bb1 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -15,7 +15,7 @@ Description:
 		 9 - I/Os currently in progress
 		10 - time spent doing I/Os (ms)
 		11 - weighted time spent doing I/Os (ms)
-		For more details refer Documentation/iostats.txt
+		For more details refer Documentation/admin-guide/iostats.rst
 
 
 What:		/sys/block/<disk>/<part>/stat
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index d404603c6b52..5f7d7b14fa44 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -34,7 +34,7 @@ Description:	CPU topology files that describe kernel limits related to
 		present: cpus that have been identified as being present in
 		the system.
 
-		See Documentation/cputopology.txt for more information.
+		See Documentation/admin-guide/cputopology.rst for more information.
 
 
 What:		/sys/devices/system/cpu/probe
@@ -103,7 +103,7 @@ Description:	CPU topology files that describe a logical CPU's relationship
 		thread_siblings_list: human-readable list of cpu#'s hardware
 		threads within the same core as cpu#
 
-		See Documentation/cputopology.txt for more information.
+		See Documentation/admin-guide/cputopology.rst for more information.
 
 
 What:		/sys/devices/system/cpu/cpuidle/current_driver
diff --git a/Documentation/admin-guide/btmrvl.rst b/Documentation/admin-guide/btmrvl.rst
new file mode 100644
index 000000000000..ec57740ead0c
--- /dev/null
+++ b/Documentation/admin-guide/btmrvl.rst
@@ -0,0 +1,124 @@
+=============
+btmrvl driver
+=============
+
+All commands are used via debugfs interface.
+
+Set/get driver configurations
+=============================
+
+Path:	/debug/btmrvl/config/
+
+gpiogap=[n], hscfgcmd
+	These commands are used to configure the host sleep parameters::
+	bit 8:0  -- Gap
+	bit 16:8 -- GPIO
+
+	where GPIO is the pin number of GPIO used to wake up the host.
+	It could be any valid GPIO pin# (e.g. 0-7) or 0xff (SDIO interface
+	wakeup will be used instead).
+
+	where Gap is the gap in milli seconds between wakeup signal and
+	wakeup event, or 0xff for special host sleep setting.
+
+	Usage::
+
+		# Use SDIO interface to wake up the host and set GAP to 0x80:
+		echo 0xff80 > /debug/btmrvl/config/gpiogap
+		echo 1 > /debug/btmrvl/config/hscfgcmd
+
+		# Use GPIO pin #3 to wake up the host and set GAP to 0xff:
+		echo 0x03ff >  /debug/btmrvl/config/gpiogap
+		echo 1 > /debug/btmrvl/config/hscfgcmd
+
+psmode=[n], pscmd
+	These commands are used to enable/disable auto sleep mode
+
+	where the option is::
+
+			1 	-- Enable auto sleep mode
+			0 	-- Disable auto sleep mode
+
+	Usage::
+
+		# Enable auto sleep mode
+		echo 1 > /debug/btmrvl/config/psmode
+		echo 1 > /debug/btmrvl/config/pscmd
+
+		# Disable auto sleep mode
+		echo 0 > /debug/btmrvl/config/psmode
+		echo 1 > /debug/btmrvl/config/pscmd
+
+
+hsmode=[n], hscmd
+	These commands are used to enable host sleep or wake up firmware
+
+	where the option is::
+
+			1	-- Enable host sleep
+			0	-- Wake up firmware
+
+	Usage::
+
+		# Enable host sleep
+		echo 1 > /debug/btmrvl/config/hsmode
+		echo 1 > /debug/btmrvl/config/hscmd
+
+		# Wake up firmware
+		echo 0 > /debug/btmrvl/config/hsmode
+		echo 1 > /debug/btmrvl/config/hscmd
+
+
+Get driver status
+=================
+
+Path:	/debug/btmrvl/status/
+
+Usage::
+
+	cat /debug/btmrvl/status/<args>
+
+where the args are:
+
+curpsmode
+	This command displays current auto sleep status.
+
+psstate
+	This command display the power save state.
+
+hsstate
+	This command display the host sleep state.
+
+txdnldrdy
+	This command displays the value of Tx download ready flag.
+
+Issuing a raw hci command
+=========================
+
+Use hcitool to issue raw hci command, refer to hcitool manual
+
+Usage::
+
+	Hcitool cmd <ogf> <ocf> [Parameters]
+
+Interface Control Command::
+
+	hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00    --Enable All interface
+	hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01    --Enable Wlan interface
+	hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02    --Enable BT interface
+	hcitool cmd 0x3f 0x5b 0xf5 0x00 0x00    --Disable All interface
+	hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01    --Disable Wlan interface
+	hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02    --Disable BT interface
+
+SD8688 firmware
+===============
+
+Images:
+
+- /lib/firmware/sd8688_helper.bin
+- /lib/firmware/sd8688.bin
+
+
+The images can be downloaded from:
+
+git.infradead.org/users/dwmw2/linux-firmware.git/libertas/
diff --git a/Documentation/admin-guide/clearing-warn-once.rst b/Documentation/admin-guide/clearing-warn-once.rst
new file mode 100644
index 000000000000..211fd926cf00
--- /dev/null
+++ b/Documentation/admin-guide/clearing-warn-once.rst
@@ -0,0 +1,9 @@
+Clearing WARN_ONCE
+------------------
+
+WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
+
+echo 1 > /sys/kernel/debug/clear_warn_once
+
+clears the state and allows the warnings to print once again.
+This can be useful after test suite runs to reproduce problems.
diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst
new file mode 100644
index 000000000000..2d01ce43d2a2
--- /dev/null
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -0,0 +1,114 @@
+========
+CPU load
+========
+
+Linux exports various bits of information via ``/proc/stat`` and
+``/proc/uptime`` that userland tools, such as top(1), use to calculate
+the average time system spent in a particular state, for example::
+
+    $ iostat
+    Linux 2.6.18.3-exp (linmac)     02/20/2007
+
+    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
+              10.01    0.00    2.92    5.44    0.00   81.63
+
+    ...
+
+Here the system thinks that over the default sampling period the
+system spent 10.01% of the time doing work in user space, 2.92% in the
+kernel, and was overall 81.63% of the time idle.
+
+In most cases the ``/proc/stat``	 information reflects the reality quite
+closely, however due to the nature of how/when the kernel collects
+this data sometimes it can not be trusted at all.
+
+So how is this information collected?  Whenever timer interrupt is
+signalled the kernel looks what kind of task was running at this
+moment and increments the counter that corresponds to this tasks
+kind/state.  The problem with this is that the system could have
+switched between various states multiple times between two timer
+interrupts yet the counter is incremented only for the last state.
+
+
+Example
+-------
+
+If we imagine the system with one task that periodically burns cycles
+in the following manner::
+
+     time line between two timer interrupts
+    |--------------------------------------|
+     ^                                    ^
+     |_ something begins working          |
+                                          |_ something goes to sleep
+                                         (only to be awaken quite soon)
+
+In the above situation the system will be 0% loaded according to the
+``/proc/stat`` (since the timer interrupt will always happen when the
+system is executing the idle handler), but in reality the load is
+closer to 99%.
+
+One can imagine many more situations where this behavior of the kernel
+will lead to quite erratic information inside ``/proc/stat``::
+
+
+	/* gcc -o hog smallhog.c */
+	#include <time.h>
+	#include <limits.h>
+	#include <signal.h>
+	#include <sys/time.h>
+	#define HIST 10
+
+	static volatile sig_atomic_t stop;
+
+	static void sighandler (int signr)
+	{
+	(void) signr;
+	stop = 1;
+	}
+	static unsigned long hog (unsigned long niters)
+	{
+	stop = 0;
+	while (!stop && --niters);
+	return niters;
+	}
+	int main (void)
+	{
+	int i;
+	struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
+				.it_value = { .tv_sec = 0, .tv_usec = 1 } };
+	sigset_t set;
+	unsigned long v[HIST];
+	double tmp = 0.0;
+	unsigned long n;
+	signal (SIGALRM, &sighandler);
+	setitimer (ITIMER_REAL, &it, NULL);
+
+	hog (ULONG_MAX);
+	for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
+	for (i = 0; i < HIST; ++i) tmp += v[i];
+	tmp /= HIST;
+	n = tmp - (tmp / 3.0);
+
+	sigemptyset (&set);
+	sigaddset (&set, SIGALRM);
+
+	for (;;) {
+		hog (n);
+		sigwait (&set, &i);
+	}
+	return 0;
+	}
+
+
+References
+----------
+
+- http://lkml.org/lkml/2007/2/12/6
+- Documentation/filesystems/proc.txt (1.8)
+
+
+Thanks
+------
+
+Con Kolivas, Pavel Machek
diff --git a/Documentation/admin-guide/cputopology.rst b/Documentation/admin-guide/cputopology.rst
new file mode 100644
index 000000000000..b90dafcc8237
--- /dev/null
+++ b/Documentation/admin-guide/cputopology.rst
@@ -0,0 +1,177 @@
+===========================================
+How CPU topology info is exported via sysfs
+===========================================
+
+Export CPU topology info via sysfs. Items (attributes) are similar
+to /proc/cpuinfo output of some architectures.  They reside in
+/sys/devices/system/cpu/cpuX/topology/:
+
+physical_package_id:
+
+	physical package id of cpuX. Typically corresponds to a physical
+	socket number, but the actual value is architecture and platform
+	dependent.
+
+die_id:
+
+	the CPU die ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).  The actual value is
+	architecture and platform dependent.
+
+core_id:
+
+	the CPU core ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).  The actual value is
+	architecture and platform dependent.
+
+book_id:
+
+	the book ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).	The actual value is
+	architecture and platform dependent.
+
+drawer_id:
+
+	the drawer ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).	The actual value is
+	architecture and platform dependent.
+
+core_cpus:
+
+	internal kernel map of CPUs within the same core.
+	(deprecated name: "thread_siblings")
+
+core_cpus_list:
+
+	human-readable list of CPUs within the same core.
+	(deprecated name: "thread_siblings_list");
+
+package_cpus:
+
+	internal kernel map of the CPUs sharing the same physical_package_id.
+	(deprecated name: "core_siblings")
+
+package_cpus_list:
+
+	human-readable list of CPUs sharing the same physical_package_id.
+	(deprecated name: "core_siblings_list")
+
+die_cpus:
+
+	internal kernel map of CPUs within the same die.
+
+die_cpus_list:
+
+	human-readable list of CPUs within the same die.
+
+book_siblings:
+
+	internal kernel map of cpuX's hardware threads within the same
+	book_id.
+
+book_siblings_list:
+
+	human-readable list of cpuX's hardware threads within the same
+	book_id.
+
+drawer_siblings:
+
+	internal kernel map of cpuX's hardware threads within the same
+	drawer_id.
+
+drawer_siblings_list:
+
+	human-readable list of cpuX's hardware threads within the same
+	drawer_id.
+
+Architecture-neutral, drivers/base/topology.c, exports these attributes.
+However, the book and drawer related sysfs files will only be created if
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
+
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390,
+where they reflect the cpu and cache hierarchy.
+
+For an architecture to support this feature, it must define some of
+these macros in include/asm-XXX/topology.h::
+
+	#define topology_physical_package_id(cpu)
+	#define topology_die_id(cpu)
+	#define topology_core_id(cpu)
+	#define topology_book_id(cpu)
+	#define topology_drawer_id(cpu)
+	#define topology_sibling_cpumask(cpu)
+	#define topology_core_cpumask(cpu)
+	#define topology_die_cpumask(cpu)
+	#define topology_book_cpumask(cpu)
+	#define topology_drawer_cpumask(cpu)
+
+The type of ``**_id macros`` is int.
+The type of ``**_cpumask macros`` is ``(const) struct cpumask *``. The latter
+correspond with appropriate ``**_siblings`` sysfs attributes (except for
+topology_sibling_cpumask() which corresponds with thread_siblings).
+
+To be consistent on all architectures, include/linux/topology.h
+provides default definitions for any of the above macros that are
+not defined by include/asm-XXX/topology.h:
+
+1) topology_physical_package_id: -1
+2) topology_die_id: -1
+3) topology_core_id: 0
+4) topology_sibling_cpumask: just the given CPU
+5) topology_core_cpumask: just the given CPU
+6) topology_die_cpumask: just the given CPU
+
+For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
+default definitions for topology_book_id() and topology_book_cpumask().
+For architectures that don't support drawers (CONFIG_SCHED_DRAWER) there are
+no default definitions for topology_drawer_id() and topology_drawer_cpumask().
+
+Additionally, CPU topology information is provided under
+/sys/devices/system/cpu and includes these files.  The internal
+source for the output is in brackets ("[]").
+
+    =========== ==========================================================
+    kernel_max: the maximum CPU index allowed by the kernel configuration.
+		[NR_CPUS-1]
+
+    offline:	CPUs that are not online because they have been
+		HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
+		of CPUs allowed by the kernel configuration (kernel_max
+		above). [~cpu_online_mask + cpus >= NR_CPUS]
+
+    online:	CPUs that are online and being scheduled [cpu_online_mask]
+
+    possible:	CPUs that have been allocated resources and can be
+		brought online if they are present. [cpu_possible_mask]
+
+    present:	CPUs that have been identified as being present in the
+		system. [cpu_present_mask]
+    =========== ==========================================================
+
+The format for the above output is compatible with cpulist_parse()
+[see <linux/cpumask.h>].  Some examples follow.
+
+In this example, there are 64 CPUs in the system but cpus 32-63 exceed
+the kernel max which is limited to 0..31 by the NR_CPUS config option
+being 32.  Note also that CPUs 2 and 4-31 are not online but could be
+brought online as they are both present and possible::
+
+     kernel_max: 31
+        offline: 2,4-31,32-63
+         online: 0-1,3
+       possible: 0-31
+        present: 0-31
+
+In this example, the NR_CPUS config option is 128, but the kernel was
+started with possible_cpus=144.  There are 4 CPUs in the system and cpu2
+was manually taken offline (and is the only CPU that can be brought
+online.)::
+
+     kernel_max: 127
+        offline: 2,4-127,128-143
+         online: 0-1,3
+       possible: 0-127
+        present: 0-3
+
+See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
+as well as more information on the various cpumasks.
diff --git a/Documentation/admin-guide/device-mapper/statistics.rst b/Documentation/admin-guide/device-mapper/statistics.rst
index 3d80a9f850cc..41ded0bc5933 100644
--- a/Documentation/admin-guide/device-mapper/statistics.rst
+++ b/Documentation/admin-guide/device-mapper/statistics.rst
@@ -13,7 +13,7 @@ the range specified.
 
 The I/O statistics counters for each step-sized area of a region are
 in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see:
-Documentation/iostats.txt).  But two extra counters (12 and 13) are
+Documentation/admin-guide/iostats.rst).  But two extra counters (12 and 13) are
 provided: total time spent reading and writing.  When the histogram
 argument is used, the 14th parameter is reported that represents the
 histogram of latencies.  All these counters may be accessed by sending
@@ -151,7 +151,7 @@ Messages
 	  The first 11 counters have the same meaning as
 	  `/sys/block/*/stat or /proc/diskstats`.
 
-	  Please refer to Documentation/iostats.txt for details.
+	  Please refer to Documentation/admin-guide/iostats.rst for details.
 
 	  1. the number of reads completed
 	  2. the number of reads merged
diff --git a/Documentation/admin-guide/efi-stub.rst b/Documentation/admin-guide/efi-stub.rst
new file mode 100644
index 000000000000..833edb0d0bc4
--- /dev/null
+++ b/Documentation/admin-guide/efi-stub.rst
@@ -0,0 +1,100 @@
+=================
+The EFI Boot Stub
+=================
+
+On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade
+as a PE/COFF image, thereby convincing EFI firmware loaders to load
+it as an EFI executable. The code that modifies the bzImage header,
+along with the EFI-specific entry point that the firmware loader
+jumps to are collectively known as the "EFI boot stub", and live in
+arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
+respectively. For ARM the EFI stub is implemented in
+arch/arm/boot/compressed/efi-header.S and
+arch/arm/boot/compressed/efi-stub.c. EFI stub code that is shared
+between architectures is in drivers/firmware/efi/libstub.
+
+For arm64, there is no compressed kernel support, so the Image itself
+masquerades as a PE/COFF image and the EFI stub is linked into the
+kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S
+and drivers/firmware/efi/libstub/arm64-stub.c.
+
+By using the EFI boot stub it's possible to boot a Linux kernel
+without the use of a conventional EFI boot loader, such as grub or
+elilo. Since the EFI boot stub performs the jobs of a boot loader, in
+a certain sense it *IS* the boot loader.
+
+The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
+
+
+How to install bzImage.efi
+--------------------------
+
+The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
+System Partition (ESP) and renamed with the extension ".efi". Without
+the extension the EFI firmware loader will refuse to execute it. It's
+not possible to execute bzImage.efi from the usual Linux file systems
+because EFI firmware doesn't have support for them. For ARM the
+arch/arm/boot/zImage should be copied to the system partition, and it
+may not need to be renamed. Similarly for arm64, arch/arm64/boot/Image
+should be copied but not necessarily renamed.
+
+
+Passing kernel parameters from the EFI shell
+--------------------------------------------
+
+Arguments to the kernel can be passed after bzImage.efi, e.g.::
+
+	fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
+
+
+The "initrd=" option
+--------------------
+
+Like most boot loaders, the EFI stub allows the user to specify
+multiple initrd files using the "initrd=" option. This is the only EFI
+stub-specific command line parameter, everything else is passed to the
+kernel when it boots.
+
+The path to the initrd file must be an absolute path from the
+beginning of the ESP, relative path names do not work. Also, the path
+is an EFI-style path and directory elements must be separated with
+backslashes (\). For example, given the following directory layout::
+
+  fs0:>
+	Kernels\
+			bzImage.efi
+			initrd-large.img
+
+	Ramdisks\
+			initrd-small.img
+			initrd-medium.img
+
+to boot with the initrd-large.img file if the current working
+directory is fs0:\Kernels, the following command must be used::
+
+	fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
+
+Notice how bzImage.efi can be specified with a relative path. That's
+because the image we're executing is interpreted by the EFI shell,
+which understands relative paths, whereas the rest of the command line
+is passed to bzImage.efi.
+
+
+The "dtb=" option
+-----------------
+
+For the ARM and arm64 architectures, a device tree must be provided to
+the kernel. Normally firmware shall supply the device tree via the
+EFI CONFIGURATION TABLE. However, the "dtb=" command line option can
+be used to override the firmware supplied device tree, or to supply
+one when firmware is unable to.
+
+Please note: Firmware adds runtime configuration information to the
+device tree before booting the kernel. If dtb= is used to override
+the device tree, then any runtime data provided by firmware will be
+lost. The dtb= option should only be used either as a debug tool, or
+as a last resort when a device tree is not provided in the EFI
+CONFIGURATION TABLE.
+
+"dtb=" is processed in the same manner as the "initrd=" option that is
+described above.
diff --git a/Documentation/admin-guide/highuid.rst b/Documentation/admin-guide/highuid.rst
new file mode 100644
index 000000000000..6ee70465c0ea
--- /dev/null
+++ b/Documentation/admin-guide/highuid.rst
@@ -0,0 +1,80 @@
+===================================================
+Notes on the change from 16-bit UIDs to 32-bit UIDs
+===================================================
+
+:Author: Chris Wing <wingc@umich.edu>
+:Last updated: January 11, 2000
+
+- kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t
+  when communicating between user and kernel space in an ioctl or data
+  structure.
+
+- kernel code should use uid_t and gid_t in kernel-private structures and
+  code.
+
+What's left to be done for 32-bit UIDs on all Linux architectures:
+
+- Disk quotas have an interesting limitation that is not related to the
+  maximum UID/GID. They are limited by the maximum file size on the
+  underlying filesystem, because quota records are written at offsets
+  corresponding to the UID in question.
+  Further investigation is needed to see if the quota system can cope
+  properly with huge UIDs. If it can deal with 64-bit file offsets on all 
+  architectures, this should not be a problem.
+
+- Decide whether or not to keep backwards compatibility with the system
+  accounting file, or if we should break it as the comments suggest
+  (currently, the old 16-bit UID and GID are still written to disk, and
+  part of the former pad space is used to store separate 32-bit UID and
+  GID)
+
+- Need to validate that OS emulation calls the 16-bit UID
+  compatibility syscalls, if the OS being emulated used 16-bit UIDs, or
+  uses the 32-bit UID system calls properly otherwise.
+
+  This affects at least:
+
+	- iBCS on Intel
+
+	- sparc32 emulation on sparc64
+	  (need to support whatever new 32-bit UID system calls are added to
+	  sparc32)
+
+- Validate that all filesystems behave properly.
+
+  At present, 32-bit UIDs _should_ work for:
+
+	- ext2
+	- ufs
+	- isofs
+	- nfs
+	- coda
+	- udf
+
+  Ioctl() fixups have been made for:
+
+	- ncpfs
+	- smbfs
+
+  Filesystems with simple fixups to prevent 16-bit UID wraparound:
+
+	- minix
+	- sysv
+	- qnx4
+
+  Other filesystems have not been checked yet.
+
+- The ncpfs and smpfs filesystems cannot presently use 32-bit UIDs in
+  all ioctl()s. Some new ioctl()s have been added with 32-bit UIDs, but
+  more are needed. (as well as new user<->kernel data structures)
+
+- The ELF core dump format only supports 16-bit UIDs on arm, i386, m68k,
+  sh, and sparc32. Fixing this is probably not that important, but would
+  require adding a new ELF section.
+
+- The ioctl()s used to control the in-kernel NFS server only support
+  16-bit UIDs on arm, i386, m68k, sh, and sparc32.
+
+- make sure that the UID mapping feature of AX25 networking works properly
+  (it should be safe because it's always used a 32-bit integer to
+  communicate between user and kernel)
diff --git a/Documentation/admin-guide/hw-vuln/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst
index 656aee262e23..f83212fae4d5 100644
--- a/Documentation/admin-guide/hw-vuln/l1tf.rst
+++ b/Documentation/admin-guide/hw-vuln/l1tf.rst
@@ -241,7 +241,7 @@ Guest mitigation mechanisms
    For further information about confining guests to a single or to a group
    of cores consult the cpusets documentation:
 
-   https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.rst
+   https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v1/cpusets.rst
 
 .. _interrupt_isolation:
 
diff --git a/Documentation/admin-guide/hw_random.rst b/Documentation/admin-guide/hw_random.rst
new file mode 100644
index 000000000000..121de96e395e
--- /dev/null
+++ b/Documentation/admin-guide/hw_random.rst
@@ -0,0 +1,105 @@
+==========================================================
+Linux support for random number generator in i8xx chipsets
+==========================================================
+
+Introduction
+============
+
+The hw_random framework is software that makes use of a
+special hardware feature on your CPU or motherboard,
+a Random Number Generator (RNG).  The software has two parts:
+a core providing the /dev/hwrng character device and its
+sysfs support, plus a hardware-specific driver that plugs
+into that core.
+
+To make the most effective use of these mechanisms, you
+should download the support software as well.  Download the
+latest version of the "rng-tools" package from the
+hw_random driver's official Web site:
+
+	http://sourceforge.net/projects/gkernel/
+
+Those tools use /dev/hwrng to fill the kernel entropy pool,
+which is used internally and exported by the /dev/urandom and
+/dev/random special files.
+
+Theory of operation
+===================
+
+CHARACTER DEVICE.  Using the standard open()
+and read() system calls, you can read random data from
+the hardware RNG device.  This data is NOT CHECKED by any
+fitness tests, and could potentially be bogus (if the
+hardware is faulty or has been tampered with).  Data is only
+output if the hardware "has-data" flag is set, but nevertheless
+a security-conscious person would run fitness tests on the
+data before assuming it is truly random.
+
+The rng-tools package uses such tests in "rngd", and lets you
+run them by hand with a "rngtest" utility.
+
+/dev/hwrng is char device major 10, minor 183.
+
+CLASS DEVICE.  There is a /sys/class/misc/hw_random node with
+two unique attributes, "rng_available" and "rng_current".  The
+"rng_available" attribute lists the hardware-specific drivers
+available, while "rng_current" lists the one which is currently
+connected to /dev/hwrng.  If your system has more than one
+RNG available, you may change the one used by writing a name from
+the list in "rng_available" into "rng_current".
+
+==========================================================================
+
+
+Hardware driver for Intel/AMD/VIA Random Number Generators (RNG)
+	- Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
+	- Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
+
+
+About the Intel RNG hardware, from the firmware hub datasheet
+=============================================================
+
+The Firmware Hub integrates a Random Number Generator (RNG)
+using thermal noise generated from inherently random quantum
+mechanical properties of silicon. When not generating new random
+bits the RNG circuitry will enter a low power state. Intel will
+provide a binary software driver to give third party software
+access to our RNG for use as a security feature. At this time,
+the RNG is only to be used with a system in an OS-present state.
+
+Intel RNG Driver notes
+======================
+
+FIXME: support poll(2)
+
+.. note::
+
+	request_mem_region was removed, for three reasons:
+
+	1) Only one RNG is supported by this driver;
+	2) The location used by the RNG is a fixed location in
+	   MMIO-addressable memory;
+	3) users with properly working BIOS e820 handling will always
+	   have the region in which the RNG is located reserved, so
+	   request_mem_region calls always fail for proper setups.
+	   However, for people who use mem=XX, BIOS e820 information is
+	   **not** in /proc/iomem, and request_mem_region(RNG_ADDR) can
+	   succeed.
+
+Driver details
+==============
+
+Based on:
+	Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet
+	May 1999 Order Number: 290658-002 R
+
+Intel 82802 Firmware Hub:
+	Random Number Generator
+	Programmer's Reference Manual
+	December 1999 Order Number: 298029-001 R
+
+Intel 82802 Firmware HUB Random Number Generator Driver
+	Copyright (c) 2000 Matt Sottek <msottek@quiknet.com>
+
+Special thanks to Matt Sottek.  I did the "guts", he
+did the "brains" and all the testing.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index a5fdb1a846ce..4e98f5596da0 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -85,8 +85,25 @@ configure specific aspects of kernel behavior to your liking.
    perf-security
    acpi/index
    aoe/index
+   btmrvl
+   clearing-warn-once
+   cpu-load
+   cputopology
    device-mapper/index
+   efi-stub
+   highuid
+   hw_random
+   iostats
+   kernel-per-CPU-kthreads
    laptops/index
+   lcd-panel-cgram
+   ldm
+   lockup-watchdogs
+   numastat
+   pnp
+   rtc
+   svga
+   video-output
 
 .. only::  subproject and html
 
diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
new file mode 100644
index 000000000000..5d63b18bd6d1
--- /dev/null
+++ b/Documentation/admin-guide/iostats.rst
@@ -0,0 +1,197 @@
+=====================
+I/O statistics fields
+=====================
+
+Since 2.4.20 (and some versions before, with patches), and 2.5.45,
+more extensive disk statistics have been introduced to help measure disk
+activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
+the work for you, but in case you are interested in creating your own
+tools, the fields are explained here.
+
+In 2.4 now, the information is found as additional fields in
+``/proc/partitions``.  In 2.6 and upper, the same information is found in two
+places: one is in the file ``/proc/diskstats``, and the other is within
+the sysfs file system, which must be mounted in order to obtain
+the information. Throughout this document we'll assume that sysfs
+is mounted on ``/sys``, although of course it may be mounted anywhere.
+Both ``/proc/diskstats`` and sysfs use the same source for the information
+and so should not differ.
+
+Here are examples of these different formats::
+
+   2.4:
+      3     0   39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
+      3     1    9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
+
+   2.6+ sysfs:
+      446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
+      35486    38030    38030    38030
+
+   2.6+ diskstats:
+      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
+      3    1   hda1 35486 38030 38030 38030
+
+   4.18+ diskstats:
+      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
+
+On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
+a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
+
+The advantage of one over the other is that the sysfs choice works well
+if you are watching a known, small set of disks.  ``/proc/diskstats`` may
+be a better choice if you are watching a large number of disks because
+you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
+each snapshot of your disk statistics.
+
+In 2.4, the statistics fields are those after the device name. In
+the above example, the first field of statistics would be 446216.
+By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
+find just the eleven fields, beginning with 446216.  If you look at
+``/proc/diskstats``, the eleven fields will be preceded by the major and
+minor device numbers, and device name.  Each of these formats provides
+eleven fields of statistics, each meaning exactly the same things.
+All fields except field 9 are cumulative since boot.  Field 9 should
+go to zero as I/Os complete; all others only increase (unless they
+overflow and wrap).  Yes, these are (32-bit or 64-bit) unsigned long
+(native word size) numbers, and on a very busy or long-lived system they
+may wrap. Applications should be prepared to deal with that; unless
+your observations are measured in large numbers of minutes or hours,
+they should not wrap twice before you notice them.
+
+Each set of stats only applies to the indicated device; if you want
+system-wide stats you'll have to find all the devices and sum them all up.
+
+Field  1 -- # of reads completed
+    This is the total number of reads completed successfully.
+
+Field  2 -- # of reads merged, field 6 -- # of writes merged
+    Reads and writes which are adjacent to each other may be merged for
+    efficiency.  Thus two 4K reads may become one 8K read before it is
+    ultimately handed to the disk, and so it will be counted (and queued)
+    as only one I/O.  This field lets you know how often this was done.
+
+Field  3 -- # of sectors read
+    This is the total number of sectors read successfully.
+
+Field  4 -- # of milliseconds spent reading
+    This is the total number of milliseconds spent by all reads (as
+    measured from __make_request() to end_that_request_last()).
+
+Field  5 -- # of writes completed
+    This is the total number of writes completed successfully.
+
+Field  6 -- # of writes merged
+    See the description of field 2.
+
+Field  7 -- # of sectors written
+    This is the total number of sectors written successfully.
+
+Field  8 -- # of milliseconds spent writing
+    This is the total number of milliseconds spent by all writes (as
+    measured from __make_request() to end_that_request_last()).
+
+Field  9 -- # of I/Os currently in progress
+    The only field that should go to zero. Incremented as requests are
+    given to appropriate struct request_queue and decremented as they finish.
+
+Field 10 -- # of milliseconds spent doing I/Os
+    This field increases so long as field 9 is nonzero.
+
+    Since 5.0 this field counts jiffies when at least one request was
+    started or completed. If request runs more than 2 jiffies then some
+    I/O time will not be accounted unless there are other requests.
+
+Field 11 -- weighted # of milliseconds spent doing I/Os
+    This field is incremented at each I/O start, I/O completion, I/O
+    merge, or read of these stats by the number of I/Os in progress
+    (field 9) times the number of milliseconds spent doing I/O since the
+    last update of this field.  This can provide an easy measure of both
+    I/O completion time and the backlog that may be accumulating.
+
+Field 12 -- # of discards completed
+    This is the total number of discards completed successfully.
+
+Field 13 -- # of discards merged
+    See the description of field 2
+
+Field 14 -- # of sectors discarded
+    This is the total number of sectors discarded successfully.
+
+Field 15 -- # of milliseconds spent discarding
+    This is the total number of milliseconds spent by all discards (as
+    measured from __make_request() to end_that_request_last()).
+
+To avoid introducing performance bottlenecks, no locks are held while
+modifying these counters.  This implies that minor inaccuracies may be
+introduced when changes collide, so (for instance) adding up all the
+read I/Os issued per partition should equal those made to the disks ...
+but due to the lack of locking it may only be very close.
+
+In 2.6+, there are counters for each CPU, which make the lack of locking
+almost a non-issue.  When the statistics are read, the per-CPU counters
+are summed (possibly overflowing the unsigned long variable they are
+summed to) and the result given to the user.  There is no convenient
+user interface for accessing the per-CPU counters themselves.
+
+Disks vs Partitions
+-------------------
+
+There were significant changes between 2.4 and 2.6+ in the I/O subsystem.
+As a result, some statistic information disappeared. The translation from
+a disk address relative to a partition to the disk address relative to
+the host disk happens much earlier.  All merges and timings now happen
+at the disk level rather than at both the disk and partition level as
+in 2.4.  Consequently, you'll see a different statistics output on 2.6+ for
+partitions from that for disks.  There are only *four* fields available
+for partitions on 2.6+ machines.  This is reflected in the examples above.
+
+Field  1 -- # of reads issued
+    This is the total number of reads issued to this partition.
+
+Field  2 -- # of sectors read
+    This is the total number of sectors requested to be read from this
+    partition.
+
+Field  3 -- # of writes issued
+    This is the total number of writes issued to this partition.
+
+Field  4 -- # of sectors written
+    This is the total number of sectors requested to be written to
+    this partition.
+
+Note that since the address is translated to a disk-relative one, and no
+record of the partition-relative address is kept, the subsequent success
+or failure of the read cannot be attributed to the partition.  In other
+words, the number of reads for partitions is counted slightly before time
+of queuing for partitions, and at completion for whole disks.  This is
+a subtle distinction that is probably uninteresting for most cases.
+
+More significant is the error induced by counting the numbers of
+reads/writes before merges for partitions and after for disks. Since a
+typical workload usually contains a lot of successive and adjacent requests,
+the number of reads/writes issued can be several times higher than the
+number of reads/writes completed.
+
+In 2.6.25, the full statistic set is again available for partitions and
+disk and partition statistics are consistent again. Since we still don't
+keep record of the partition-relative address, an operation is attributed to
+the partition which contains the first sector of the request after the
+eventual merges. As requests can be merged across partition, this could lead
+to some (probably insignificant) inaccuracy.
+
+Additional notes
+----------------
+
+In 2.6+, sysfs is not mounted by default.  If your distribution of
+Linux hasn't added it already, here's the line you'll want to add to
+your ``/etc/fstab``::
+
+	none /sys sysfs defaults 0 0
+
+
+In 2.6+, all disk statistics were removed from ``/proc/stat``.  In 2.4, they
+appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in
+``/proc/stat`` take a very different format from those in ``/proc/partitions``
+(see proc(5), if your system has it.)
+
+-- ricklind@us.ibm.com
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a571a67e0c85..19b1e3bef56c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5066,7 +5066,7 @@
 
 	vga=		[BOOT,X86-32] Select a particular video mode
 			See Documentation/x86/boot.rst and
-			Documentation/svga.txt.
+			Documentation/admin-guide/svga.rst.
 			Use vga=ask for menu.
 			This is actually a boot loader parameter; the value is
 			passed to the kernel using a special protocol.
diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
new file mode 100644
index 000000000000..4f18456dd3b1
--- /dev/null
+++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
@@ -0,0 +1,356 @@
+==========================================
+Reducing OS jitter due to per-cpu kthreads
+==========================================
+
+This document lists per-CPU kthreads in the Linux kernel and presents
+options to control their OS jitter.  Note that non-per-CPU kthreads are
+not listed here.  To reduce OS jitter from non-per-CPU kthreads, bind
+them to a "housekeeping" CPU dedicated to such work.
+
+References
+==========
+
+-	Documentation/IRQ-affinity.txt:  Binding interrupts to sets of CPUs.
+
+-	Documentation/admin-guide/cgroup-v1:  Using cgroups to bind tasks to sets of CPUs.
+
+-	man taskset:  Using the taskset command to bind tasks to sets
+	of CPUs.
+
+-	man sched_setaffinity:  Using the sched_setaffinity() system
+	call to bind tasks to sets of CPUs.
+
+-	/sys/devices/system/cpu/cpuN/online:  Control CPU N's hotplug state,
+	writing "0" to offline and "1" to online.
+
+-	In order to locate kernel-generated OS jitter on CPU N:
+
+		cd /sys/kernel/debug/tracing
+		echo 1 > max_graph_depth # Increase the "1" for more detail
+		echo function_graph > current_tracer
+		# run workload
+		cat per_cpu/cpuN/trace
+
+kthreads
+========
+
+Name:
+  ehca_comp/%u
+
+Purpose:
+  Periodically process Infiniband-related work.
+
+To reduce its OS jitter, do any of the following:
+
+1.	Don't use eHCA Infiniband hardware, instead choosing hardware
+	that does not require per-CPU kthreads.  This will prevent these
+	kthreads from being created in the first place.  (This will
+	work for most people, as this hardware, though important, is
+	relatively old and is produced in relatively low unit volumes.)
+2.	Do all eHCA-Infiniband-related work on other CPUs, including
+	interrupts.
+3.	Rework the eHCA driver so that its per-CPU kthreads are
+	provisioned only on selected CPUs.
+
+
+Name:
+  irq/%d-%s
+
+Purpose:
+  Handle threaded interrupts.
+
+To reduce its OS jitter, do the following:
+
+1.	Use irq affinity to force the irq threads to execute on
+	some other CPU.
+
+Name:
+  kcmtpd_ctr_%d
+
+Purpose:
+  Handle Bluetooth work.
+
+To reduce its OS jitter, do one of the following:
+
+1.	Don't use Bluetooth, in which case these kthreads won't be
+	created in the first place.
+2.	Use irq affinity to force Bluetooth-related interrupts to
+	occur on some other CPU and furthermore initiate all
+	Bluetooth activity on some other CPU.
+
+Name:
+  ksoftirqd/%u
+
+Purpose:
+  Execute softirq handlers when threaded or when under heavy load.
+
+To reduce its OS jitter, each softirq vector must be handled
+separately as follows:
+
+TIMER_SOFTIRQ
+-------------
+
+Do all of the following:
+
+1.	To the extent possible, keep the CPU out of the kernel when it
+	is non-idle, for example, by avoiding system calls and by forcing
+	both kernel threads and interrupts to execute elsewhere.
+2.	Build with CONFIG_HOTPLUG_CPU=y.  After boot completes, force
+	the CPU offline, then bring it back online.  This forces
+	recurring timers to migrate elsewhere.	If you are concerned
+	with multiple CPUs, force them all offline before bringing the
+	first one back online.  Once you have onlined the CPUs in question,
+	do not offline any other CPUs, because doing so could force the
+	timer back onto one of the CPUs in question.
+
+NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
+---------------------------------
+
+Do all of the following:
+
+1.	Force networking interrupts onto other CPUs.
+2.	Initiate any network I/O on other CPUs.
+3.	Once your application has started, prevent CPU-hotplug operations
+	from being initiated from tasks that might run on the CPU to
+	be de-jittered.  (It is OK to force this CPU offline and then
+	bring it back online before you start your application.)
+
+BLOCK_SOFTIRQ
+-------------
+
+Do all of the following:
+
+1.	Force block-device interrupts onto some other CPU.
+2.	Initiate any block I/O on other CPUs.
+3.	Once your application has started, prevent CPU-hotplug operations
+	from being initiated from tasks that might run on the CPU to
+	be de-jittered.  (It is OK to force this CPU offline and then
+	bring it back online before you start your application.)
+
+IRQ_POLL_SOFTIRQ
+----------------
+
+Do all of the following:
+
+1.	Force block-device interrupts onto some other CPU.
+2.	Initiate any block I/O and block-I/O polling on other CPUs.
+3.	Once your application has started, prevent CPU-hotplug operations
+	from being initiated from tasks that might run on the CPU to
+	be de-jittered.  (It is OK to force this CPU offline and then
+	bring it back online before you start your application.)
+
+TASKLET_SOFTIRQ
+---------------
+
+Do one or more of the following:
+
+1.	Avoid use of drivers that use tasklets.  (Such drivers will contain
+	calls to things like tasklet_schedule().)
+2.	Convert all drivers that you must use from tasklets to workqueues.
+3.	Force interrupts for drivers using tasklets onto other CPUs,
+	and also do I/O involving these drivers on other CPUs.
+
+SCHED_SOFTIRQ
+-------------
+
+Do all of the following:
+
+1.	Avoid sending scheduler IPIs to the CPU to be de-jittered,
+	for example, ensure that at most one runnable kthread is present
+	on that CPU.  If a thread that expects to run on the de-jittered
+	CPU awakens, the scheduler will send an IPI that can result in
+	a subsequent SCHED_SOFTIRQ.
+2.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
+	is marked as an adaptive-ticks CPU using the "nohz_full="
+	boot parameter.  This reduces the number of scheduler-clock
+	interrupts that the de-jittered CPU receives, minimizing its
+	chances of being selected to do the load balancing work that
+	runs in SCHED_SOFTIRQ context.
+3.	To the extent possible, keep the CPU out of the kernel when it
+	is non-idle, for example, by avoiding system calls and by
+	forcing both kernel threads and interrupts to execute elsewhere.
+	This further reduces the number of scheduler-clock interrupts
+	received by the de-jittered CPU.
+
+HRTIMER_SOFTIRQ
+---------------
+
+Do all of the following:
+
+1.	To the extent possible, keep the CPU out of the kernel when it
+	is non-idle.  For example, avoid system calls and force both
+	kernel threads and interrupts to execute elsewhere.
+2.	Build with CONFIG_HOTPLUG_CPU=y.  Once boot completes, force the
+	CPU offline, then bring it back online.  This forces recurring
+	timers to migrate elsewhere.  If you are concerned with multiple
+	CPUs, force them all offline before bringing the first one
+	back online.  Once you have onlined the CPUs in question, do not
+	offline any other CPUs, because doing so could force the timer
+	back onto one of the CPUs in question.
+
+RCU_SOFTIRQ
+-----------
+
+Do at least one of the following:
+
+1.	Offload callbacks and keep the CPU in either dyntick-idle or
+	adaptive-ticks state by doing all of the following:
+
+	a.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
+		de-jittered is marked as an adaptive-ticks CPU using the
+		"nohz_full=" boot parameter.  Bind the rcuo kthreads to
+		housekeeping CPUs, which can tolerate OS jitter.
+	b.	To the extent possible, keep the CPU out of the kernel
+		when it is non-idle, for example, by avoiding system
+		calls and by forcing both kernel threads and interrupts
+		to execute elsewhere.
+
+2.	Enable RCU to do its processing remotely via dyntick-idle by
+	doing all of the following:
+
+	a.	Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
+	b.	Ensure that the CPU goes idle frequently, allowing other
+		CPUs to detect that it has passed through an RCU quiescent
+		state.	If the kernel is built with CONFIG_NO_HZ_FULL=y,
+		userspace execution also allows other CPUs to detect that
+		the CPU in question has passed through a quiescent state.
+	c.	To the extent possible, keep the CPU out of the kernel
+		when it is non-idle, for example, by avoiding system
+		calls and by forcing both kernel threads and interrupts
+		to execute elsewhere.
+
+Name:
+  kworker/%u:%d%s (cpu, id, priority)
+
+Purpose:
+  Execute workqueue requests
+
+To reduce its OS jitter, do any of the following:
+
+1.	Run your workload at a real-time priority, which will allow
+	preempting the kworker daemons.
+2.	A given workqueue can be made visible in the sysfs filesystem
+	by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
+	Such a workqueue can be confined to a given subset of the
+	CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
+	files.	The set of WQ_SYSFS workqueues can be displayed using
+	"ls sys/devices/virtual/workqueue".  That said, the workqueues
+	maintainer would like to caution people against indiscriminately
+	sprinkling WQ_SYSFS across all the workqueues.	The reason for
+	caution is that it is easy to add WQ_SYSFS, but because sysfs is
+	part of the formal user/kernel API, it can be nearly impossible
+	to remove it, even if its addition was a mistake.
+3.	Do any of the following needed to avoid jitter that your
+	application cannot tolerate:
+
+	a.	Build your kernel with CONFIG_SLUB=y rather than
+		CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
+		use of each CPU's workqueues to run its cache_reap()
+		function.
+	b.	Avoid using oprofile, thus avoiding OS jitter from
+		wq_sync_buffer().
+	c.	Limit your CPU frequency so that a CPU-frequency
+		governor is not required, possibly enlisting the aid of
+		special heatsinks or other cooling technologies.  If done
+		correctly, and if you CPU architecture permits, you should
+		be able to build your kernel with CONFIG_CPU_FREQ=n to
+		avoid the CPU-frequency governor periodically running
+		on each CPU, including cs_dbs_timer() and od_dbs_timer().
+
+		WARNING:  Please check your CPU specifications to
+		make sure that this is safe on your particular system.
+	d.	As of v3.18, Christoph Lameter's on-demand vmstat workers
+		commit prevents OS jitter due to vmstat_update() on
+		CONFIG_SMP=y systems.  Before v3.18, is not possible
+		to entirely get rid of the OS jitter, but you can
+		decrease its frequency by writing a large value to
+		/proc/sys/vm/stat_interval.  The default value is HZ,
+		for an interval of one second.	Of course, larger values
+		will make your virtual-memory statistics update more
+		slowly.  Of course, you can also run your workload at
+		a real-time priority, thus preempting vmstat_update(),
+		but if your workload is CPU-bound, this is a bad idea.
+		However, there is an RFC patch from Christoph Lameter
+		(based on an earlier one from Gilad Ben-Yossef) that
+		reduces or even eliminates vmstat overhead for some
+		workloads at https://lkml.org/lkml/2013/9/4/379.
+	e.	Boot with "elevator=noop" to avoid workqueue use by
+		the block layer.
+	f.	If running on high-end powerpc servers, build with
+		CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
+		daemon from running on each CPU every second or so.
+		(This will require editing Kconfig files and will defeat
+		this platform's RAS functionality.)  This avoids jitter
+		due to the rtas_event_scan() function.
+		WARNING:  Please check your CPU specifications to
+		make sure that this is safe on your particular system.
+	g.	If running on Cell Processor, build your kernel with
+		CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
+		spu_gov_work().
+		WARNING:  Please check your CPU specifications to
+		make sure that this is safe on your particular system.
+	h.	If running on PowerMAC, build your kernel with
+		CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
+		avoiding OS jitter from rackmeter_do_timer().
+
+Name:
+  rcuc/%u
+
+Purpose:
+  Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
+
+To reduce its OS jitter, do at least one of the following:
+
+1.	Build the kernel with CONFIG_PREEMPT=n.  This prevents these
+	kthreads from being created in the first place, and also obviates
+	the need for RCU priority boosting.  This approach is feasible
+	for workloads that do not require high degrees of responsiveness.
+2.	Build the kernel with CONFIG_RCU_BOOST=n.  This prevents these
+	kthreads from being created in the first place.  This approach
+	is feasible only if your workload never requires RCU priority
+	boosting, for example, if you ensure frequent idle time on all
+	CPUs that might execute within the kernel.
+3.	Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
+	boot parameter offloading RCU callbacks from all CPUs susceptible
+	to OS jitter.  This approach prevents the rcuc/%u kthreads from
+	having any work to do, so that they are never awakened.
+4.	Ensure that the CPU never enters the kernel, and, in particular,
+	avoid initiating any CPU hotplug operations on this CPU.  This is
+	another way of preventing any callbacks from being queued on the
+	CPU, again preventing the rcuc/%u kthreads from having any work
+	to do.
+
+Name:
+  rcuop/%d and rcuos/%d
+
+Purpose:
+  Offload RCU callbacks from the corresponding CPU.
+
+To reduce its OS jitter, do at least one of the following:
+
+1.	Use affinity, cgroups, or other mechanism to force these kthreads
+	to execute on some other CPU.
+2.	Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
+	kthreads from being created in the first place.  However, please
+	note that this will not eliminate OS jitter, but will instead
+	shift it to RCU_SOFTIRQ.
+
+Name:
+  watchdog/%u
+
+Purpose:
+  Detect software lockups on each CPU.
+
+To reduce its OS jitter, do at least one of the following:
+
+1.	Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
+	kthreads from being created in the first place.
+2.	Boot with "nosoftlockup=0", which will also prevent these kthreads
+	from being created.  Other related watchdog and softlockup boot
+	parameters may be found in Documentation/admin-guide/kernel-parameters.rst
+	and Documentation/watchdog/watchdog-parameters.rst.
+3.	Echo a zero to /proc/sys/kernel/watchdog to disable the
+	watchdog timer.
+4.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
+	order to reduce the frequency of OS jitter due to the watchdog
+	timer down to a level that is acceptable for your workload.
diff --git a/Documentation/admin-guide/lcd-panel-cgram.rst b/Documentation/admin-guide/lcd-panel-cgram.rst
new file mode 100644
index 000000000000..a3eb00c62f53
--- /dev/null
+++ b/Documentation/admin-guide/lcd-panel-cgram.rst
@@ -0,0 +1,27 @@
+======================================
+Parallel port LCD/Keypad Panel support
+======================================
+
+Some LCDs allow you to define up to 8 characters, mapped to ASCII
+characters 0 to 7. The escape code to define a new character is
+'\e[LG' followed by one digit from 0 to 7, representing the character
+number, and up to 8 couples of hex digits terminated by a semi-colon
+(';'). Each couple of digits represents a line, with 1-bits for each
+illuminated pixel with LSB on the right. Lines are numbered from the
+top of the character to the bottom. On a 5x7 matrix, only the 5 lower
+bits of the 7 first bytes are used for each character. If the string
+is incomplete, only complete lines will be redefined. Here are some
+examples::
+
+  printf "\e[LG0010101050D1F0C04;"  => 0 = [enter]
+  printf "\e[LG1040E1F0000000000;"  => 1 = [up]
+  printf "\e[LG2000000001F0E0400;"  => 2 = [down]
+  printf "\e[LG3040E1F001F0E0400;"  => 3 = [up-down]
+  printf "\e[LG40002060E1E0E0602;"  => 4 = [left]
+  printf "\e[LG500080C0E0F0E0C08;"  => 5 = [right]
+  printf "\e[LG60016051516141400;"  => 6 = "IP"
+
+  printf "\e[LG00103071F1F070301;"  => big speaker
+  printf "\e[LG00002061E1E060200;"  => small speaker
+
+Willy
diff --git a/Documentation/admin-guide/ldm.rst b/Documentation/admin-guide/ldm.rst
new file mode 100644
index 000000000000..12c571368e73
--- /dev/null
+++ b/Documentation/admin-guide/ldm.rst
@@ -0,0 +1,121 @@
+==========================================
+LDM - Logical Disk Manager (Dynamic Disks)
+==========================================
+
+:Author: Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>.
+:Last Updated: Anton Altaparmakov on 30 March 2007 for Windows Vista.
+
+Overview
+--------
+
+Windows 2000, XP, and Vista use a new partitioning scheme.  It is a complete
+replacement for the MSDOS style partitions.  It stores its information in a
+1MiB journalled database at the end of the physical disk.  The size of
+partitions is limited only by disk space.  The maximum number of partitions is
+nearly 2000.
+
+Any partitions created under the LDM are called "Dynamic Disks".  There are no
+longer any primary or extended partitions.  Normal MSDOS style partitions are
+now known as Basic Disks.
+
+If you wish to use Spanned, Striped, Mirrored or RAID 5 Volumes, you must use
+Dynamic Disks.  The journalling allows Windows to make changes to these
+partitions and filesystems without the need to reboot.
+
+Once the LDM driver has divided up the disk, you can use the MD driver to
+assemble any multi-partition volumes, e.g.  Stripes, RAID5.
+
+To prevent legacy applications from repartitioning the disk, the LDM creates a
+dummy MSDOS partition containing one disk-sized partition.  This is what is
+supported with the Linux LDM driver.
+
+A newer approach that has been implemented with Vista is to put LDM on top of a
+GPT label disk.  This is not supported by the Linux LDM driver yet.
+
+
+Example
+-------
+
+Below we have a 50MiB disk, divided into seven partitions.
+
+.. note::
+
+   The missing 1MiB at the end of the disk is where the LDM database is
+   stored.
+
++-------++--------------+---------+-----++--------------+---------+----+
+|Device || Offset Bytes | Sectors | MiB || Size   Bytes | Sectors | MiB|
++=======++==============+=========+=====++==============+=========+====+
+|hda    ||            0 |       0 |   0 ||     52428800 |  102400 |  50|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda1   ||     51380224 |  100352 |  49 ||      1048576 |    2048 |   1|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda2   ||        16384 |      32 |   0 ||      6979584 |   13632 |   6|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda3   ||      6995968 |   13664 |   6 ||     10485760 |   20480 |  10|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda4   ||     17481728 |   34144 |  16 ||      4194304 |    8192 |   4|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda5   ||     21676032 |   42336 |  20 ||      5242880 |   10240 |   5|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda6   ||     26918912 |   52576 |  25 ||     10485760 |   20480 |  10|
++-------++--------------+---------+-----++--------------+---------+----+
+|hda7   ||     37404672 |   73056 |  35 ||     13959168 |   27264 |  13|
++-------++--------------+---------+-----++--------------+---------+----+
+
+The LDM Database may not store the partitions in the order that they appear on
+disk, but the driver will sort them.
+
+When Linux boots, you will see something like::
+
+  hda: 102400 sectors w/32KiB Cache, CHS=50/64/32
+  hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7
+
+
+Compiling LDM Support
+---------------------
+
+To enable LDM, choose the following two options: 
+
+  - "Advanced partition selection" CONFIG_PARTITION_ADVANCED
+  - "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION
+
+If you believe the driver isn't working as it should, you can enable the extra
+debugging code.  This will produce a LOT of output.  The option is:
+
+  - "Windows LDM extra logging" CONFIG_LDM_DEBUG
+
+N.B. The partition code cannot be compiled as a module.
+
+As with all the partition code, if the driver doesn't see signs of its type of
+partition, it will pass control to another driver, so there is no harm in
+enabling it.
+
+If you have Dynamic Disks but don't enable the driver, then all you will see
+is a dummy MSDOS partition filling the whole disk.  You won't be able to mount
+any of the volumes on the disk.
+
+
+Booting
+-------
+
+If you enable LDM support, then lilo is capable of booting from any of the
+discovered partitions.  However, grub does not understand the LDM partitioning
+and cannot boot from a Dynamic Disk.
+
+
+More Documentation
+------------------
+
+There is an Overview of the LDM together with complete Technical Documentation.
+It is available for download.
+
+  http://www.linux-ntfs.org/
+
+If you have any LDM questions that aren't answered in the documentation, email
+me.
+
+Cheers,
+    FlatCap - Richard Russon
+    ldm@flatcap.org
+
diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
new file mode 100644
index 000000000000..290840c160af
--- /dev/null
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -0,0 +1,83 @@
+===============================================================
+Softlockup detector and hardlockup detector (aka nmi_watchdog)
+===============================================================
+
+The Linux kernel can act as a watchdog to detect both soft and hard
+lockups.
+
+A 'softlockup' is defined as a bug that causes the kernel to loop in
+kernel mode for more than 20 seconds (see "Implementation" below for
+details), without giving other tasks a chance to run. The current
+stack trace is displayed upon detection and, by default, the system
+will stay locked up. Alternatively, the kernel can be configured to
+panic; a sysctl, "kernel.softlockup_panic", a kernel parameter,
+"softlockup_panic" (see "Documentation/admin-guide/kernel-parameters.rst" for
+details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
+provided for this.
+
+A 'hardlockup' is defined as a bug that causes the CPU to loop in
+kernel mode for more than 10 seconds (see "Implementation" below for
+details), without letting other interrupts have a chance to run.
+Similarly to the softlockup case, the current stack trace is displayed
+upon detection and the system will stay locked up unless the default
+behavior is changed, which can be done through a sysctl,
+'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
+and a kernel parameter, "nmi_watchdog"
+(see "Documentation/admin-guide/kernel-parameters.rst" for details).
+
+The panic option can be used in combination with panic_timeout (this
+timeout is set through the confusingly named "kernel.panic" sysctl),
+to cause the system to reboot automatically after a specified amount
+of time.
+
+Implementation
+==============
+
+The soft and hard lockup detectors are built on top of the hrtimer and
+perf subsystems, respectively. A direct consequence of this is that,
+in principle, they should work in any architecture where these
+subsystems are present.
+
+A periodic hrtimer runs to generate interrupts and kick the watchdog
+task. An NMI perf event is generated every "watchdog_thresh"
+(compile-time initialized to 10 and configurable through sysctl of the
+same name) seconds to check for hardlockups. If any CPU in the system
+does not receive any hrtimer interrupt during that time the
+'hardlockup detector' (the handler for the NMI perf event) will
+generate a kernel warning or call panic, depending on the
+configuration.
+
+The watchdog task is a high priority kernel thread that updates a
+timestamp every time it is scheduled. If that timestamp is not updated
+for 2*watchdog_thresh seconds (the softlockup threshold) the
+'softlockup detector' (coded inside the hrtimer callback function)
+will dump useful debug information to the system log, after which it
+will call panic if it was instructed to do so or resume execution of
+other kernel code.
+
+The period of the hrtimer is 2*watchdog_thresh/5, which means it has
+two or three chances to generate an interrupt before the hardlockup
+detector kicks in.
+
+As explained above, a kernel knob is provided that allows
+administrators to configure the period of the hrtimer and the perf
+event. The right value for a particular environment is a trade-off
+between fast response to lockups and detection overhead.
+
+By default, the watchdog runs on all online cores.  However, on a
+kernel configured with NO_HZ_FULL, by default the watchdog runs only
+on the housekeeping cores, not the cores specified in the "nohz_full"
+boot argument.  If we allowed the watchdog to run by default on
+the "nohz_full" cores, we would have to run timer ticks to activate
+the scheduler, which would prevent the "nohz_full" functionality
+from protecting the user code on those cores from the kernel.
+Of course, disabling it by default on the nohz_full cores means that
+when those cores do enter the kernel, by default we will not be
+able to detect if they lock up.  However, allowing the watchdog
+to continue to run on the housekeeping (non-tickless) cores means
+that we will continue to detect lockups properly on those cores.
+
+In either case, the set of cores excluded from running the watchdog
+may be adjusted via the kernel.watchdog_cpumask sysctl.  For
+nohz_full cores, this may be useful for debugging a case where the
+kernel seems to be hanging on the nohz_full cores.
diff --git a/Documentation/admin-guide/mm/cma_debugfs.rst b/Documentation/admin-guide/mm/cma_debugfs.rst
new file mode 100644
index 000000000000..4e06ffabd78a
--- /dev/null
+++ b/Documentation/admin-guide/mm/cma_debugfs.rst
@@ -0,0 +1,25 @@
+=====================
+CMA Debugfs Interface
+=====================
+
+The CMA debugfs interface is useful to retrieve basic information out of the
+different CMA areas and to test allocation/release in each of the areas.
+
+Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
+kernel's CMA index. So the first CMA zone would be:
+
+	<debugfs>/cma/cma-0
+
+The structure of the files created under that directory is as follows:
+
+ - [RO] base_pfn: The base PFN (Page Frame Number) of the zone.
+ - [RO] count: Amount of memory in the CMA area.
+ - [RO] order_per_bit: Order of pages represented by one bit.
+ - [RO] bitmap: The bitmap of page states in the zone.
+ - [WO] alloc: Allocate N pages from that CMA area. For example::
+
+	echo 5 > <debugfs>/cma/cma-2/alloc
+
+would try to allocate 5 pages from the cma-2 area.
+
+ - [WO] free: Free N pages from that CMA area, similar to the above.
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 5f61a6c429e0..11db46448354 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -26,6 +26,7 @@ the Linux memory management.
    :maxdepth: 1
 
    concepts
+   cma_debugfs
    hugetlbpage
    idle_page_tracking
    ksm
diff --git a/Documentation/admin-guide/numastat.rst b/Documentation/admin-guide/numastat.rst
new file mode 100644
index 000000000000..aaf1667489f8
--- /dev/null
+++ b/Documentation/admin-guide/numastat.rst
@@ -0,0 +1,30 @@
+===============================
+Numa policy hit/miss statistics
+===============================
+
+/sys/devices/system/node/node*/numastat
+
+All units are pages. Hugepages have separate counters.
+
+=============== ============================================================
+numa_hit	A process wanted to allocate memory from this node,
+		and succeeded.
+
+numa_miss	A process wanted to allocate memory from another node,
+		but ended up with memory from this node.
+
+numa_foreign	A process wanted to allocate on this node,
+		but ended up with memory from another one.
+
+local_node	A process ran on this node and got memory from it.
+
+other_node	A process ran on this node and got memory from another node.
+
+interleave_hit 	Interleaving wanted to allocate from this node
+		and succeeded.
+=============== ============================================================
+
+For easier reading you can use the numastat utility from the numactl package
+(http://oss.sgi.com/projects/libnuma/). Note that it only works
+well right now on machines with a small number of CPUs.
+
diff --git a/Documentation/admin-guide/pnp.rst b/Documentation/admin-guide/pnp.rst
new file mode 100644
index 000000000000..bab2d10631f0
--- /dev/null
+++ b/Documentation/admin-guide/pnp.rst
@@ -0,0 +1,292 @@
+=================================
+Linux Plug and Play Documentation
+=================================
+
+:Author: Adam Belay <ambx1@neo.rr.com>
+:Last updated: Oct. 16, 2002
+
+
+Overview
+--------
+
+Plug and Play provides a means of detecting and setting resources for legacy or
+otherwise unconfigurable devices.  The Linux Plug and Play Layer provides these 
+services to compatible drivers.
+
+
+The User Interface
+------------------
+
+The Linux Plug and Play user interface provides a means to activate PnP devices
+for legacy and user level drivers that do not support Linux Plug and Play.  The 
+user interface is integrated into sysfs.
+
+In addition to the standard sysfs file the following are created in each
+device's directory:
+- id - displays a list of support EISA IDs
+- options - displays possible resource configurations
+- resources - displays currently allocated resources and allows resource changes
+
+activating a device
+^^^^^^^^^^^^^^^^^^^
+
+::
+
+	# echo "auto" > resources
+
+this will invoke the automatic resource config system to activate the device
+
+manually activating a device
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+	# echo "manual <depnum> <mode>" > resources
+
+	<depnum> - the configuration number
+	<mode> - static or dynamic
+		 static = for next boot
+		 dynamic = now
+
+disabling a device
+^^^^^^^^^^^^^^^^^^
+
+::
+
+	# echo "disable" > resources
+
+
+EXAMPLE:
+
+Suppose you need to activate the floppy disk controller.
+
+1. change to the proper directory, in my case it is
+   /driver/bus/pnp/devices/00:0f::
+
+	# cd /driver/bus/pnp/devices/00:0f
+	# cat name
+	PC standard floppy disk controller
+
+2. check if the device is already active::
+
+	# cat resources
+	DISABLED
+
+  - Notice the string "DISABLED".  This means the device is not active.
+
+3. check the device's possible configurations (optional)::
+
+	# cat options
+	Dependent: 01 - Priority acceptable
+	    port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding
+	    port 0x3f7-0x3f7, align 0x0, size 0x1, 16-bit address decoding
+	    irq 6
+	    dma 2 8-bit compatible
+	Dependent: 02 - Priority acceptable
+	    port 0x370-0x370, align 0x7, size 0x6, 16-bit address decoding
+	    port 0x377-0x377, align 0x0, size 0x1, 16-bit address decoding
+	    irq 6
+	    dma 2 8-bit compatible
+
+4. now activate the device::
+
+	# echo "auto" > resources
+
+5. finally check if the device is active::
+
+	# cat resources
+	io 0x3f0-0x3f5
+	io 0x3f7-0x3f7
+	irq 6
+	dma 2
+
+also there are a series of kernel parameters::
+
+	pnp_reserve_irq=irq1[,irq2] ....
+	pnp_reserve_dma=dma1[,dma2] ....
+	pnp_reserve_io=io1,size1[,io2,size2] ....
+	pnp_reserve_mem=mem1,size1[,mem2,size2] ....
+
+
+
+The Unified Plug and Play Layer
+-------------------------------
+
+All Plug and Play drivers, protocols, and services meet at a central location
+called the Plug and Play Layer.  This layer is responsible for the exchange of 
+information between PnP drivers and PnP protocols.  Thus it automatically 
+forwards commands to the proper protocol.  This makes writing PnP drivers 
+significantly easier.
+
+The following functions are available from the Plug and Play Layer:
+
+pnp_get_protocol
+  increments the number of uses by one
+
+pnp_put_protocol
+  deincrements the number of uses by one
+
+pnp_register_protocol
+  use this to register a new PnP protocol
+
+pnp_unregister_protocol
+  use this function to remove a PnP protocol from the Plug and Play Layer
+
+pnp_register_driver
+  adds a PnP driver to the Plug and Play Layer
+
+  this includes driver model integration
+  returns zero for success or a negative error number for failure; count
+  calls to the .add() method if you need to know how many devices bind to
+  the driver
+
+pnp_unregister_driver
+  removes a PnP driver from the Plug and Play Layer
+
+
+
+Plug and Play Protocols
+-----------------------
+
+This section contains information for PnP protocol developers.
+
+The following Protocols are currently available in the computing world:
+
+- PNPBIOS:
+    used for system devices such as serial and parallel ports.
+- ISAPNP:
+    provides PnP support for the ISA bus
+- ACPI:
+    among its many uses, ACPI provides information about system level
+    devices.
+
+It is meant to replace the PNPBIOS.  It is not currently supported by Linux
+Plug and Play but it is planned to be in the near future.
+
+
+Requirements for a Linux PnP protocol:
+1. the protocol must use EISA IDs
+2. the protocol must inform the PnP Layer of a device's current configuration
+
+- the ability to set resources is optional but preferred.
+
+The following are PnP protocol related functions:
+
+pnp_add_device
+  use this function to add a PnP device to the PnP layer
+
+  only call this function when all wanted values are set in the pnp_dev
+  structure
+
+pnp_init_device
+  call this to initialize the PnP structure
+
+pnp_remove_device
+  call this to remove a device from the Plug and Play Layer.
+  it will fail if the device is still in use.
+  automatically will free mem used by the device and related structures
+
+pnp_add_id
+  adds an EISA ID to the list of supported IDs for the specified device
+
+For more information consult the source of a protocol such as
+/drivers/pnp/pnpbios/core.c.
+
+
+
+Linux Plug and Play Drivers
+---------------------------
+
+This section contains information for Linux PnP driver developers.
+
+The New Way
+^^^^^^^^^^^
+
+1. first make a list of supported EISA IDS
+
+   ex::
+
+	static const struct pnp_id pnp_dev_table[] = {
+		/* Standard LPT Printer Port */
+		{.id = "PNP0400", .driver_data = 0},
+		/* ECP Printer Port */
+		{.id = "PNP0401", .driver_data = 0},
+		{.id = ""}
+	};
+
+   Please note that the character 'X' can be used as a wild card in the function
+   portion (last four characters).
+
+   ex::
+
+	/* Unknown PnP modems */
+	{	"PNPCXXX",		UNKNOWN_DEV	},
+
+   Supported PnP card IDs can optionally be defined.
+   ex::
+
+	static const struct pnp_id pnp_card_table[] = {
+		{	"ANYDEVS",		0	},
+		{	"",			0	}
+	};
+
+2. Optionally define probe and remove functions.  It may make sense not to
+   define these functions if the driver already has a reliable method of detecting
+   the resources, such as the parport_pc driver.
+
+   ex::
+
+	static int
+	serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const
+			struct pnp_id *dev_id)
+	{
+	. . .
+
+   ex::
+
+	static void serial_pnp_remove(struct pnp_dev * dev)
+	{
+	. . .
+
+   consult /drivers/serial/8250_pnp.c for more information.
+
+3. create a driver structure
+
+   ex::
+
+	static struct pnp_driver serial_pnp_driver = {
+		.name		= "serial",
+		.card_id_table	= pnp_card_table,
+		.id_table	= pnp_dev_table,
+		.probe		= serial_pnp_probe,
+		.remove		= serial_pnp_remove,
+	};
+
+   * name and id_table cannot be NULL.
+
+4. register the driver
+
+   ex::
+
+	static int __init serial8250_pnp_init(void)
+	{
+		return pnp_register_driver(&serial_pnp_driver);
+	}
+
+The Old Way
+^^^^^^^^^^^
+
+A series of compatibility functions have been created to make it easy to convert
+ISAPNP drivers.  They should serve as a temporary solution only.
+
+They are as follows::
+
+	struct pnp_card *pnp_find_card(unsigned short vendor,
+				       unsigned short device,
+				       struct pnp_card *from)
+
+	struct pnp_dev *pnp_find_dev(struct pnp_card *card,
+				     unsigned short vendor,
+				     unsigned short function,
+				     struct pnp_dev *from)
+
diff --git a/Documentation/admin-guide/rtc.rst b/Documentation/admin-guide/rtc.rst
new file mode 100644
index 000000000000..688c95b11919
--- /dev/null
+++ b/Documentation/admin-guide/rtc.rst
@@ -0,0 +1,140 @@
+=======================================
+Real Time Clock (RTC) Drivers for Linux
+=======================================
+
+When Linux developers talk about a "Real Time Clock", they usually mean
+something that tracks wall clock time and is battery backed so that it
+works even with system power off.  Such clocks will normally not track
+the local time zone or daylight savings time -- unless they dual boot
+with MS-Windows -- but will instead be set to Coordinated Universal Time
+(UTC, formerly "Greenwich Mean Time").
+
+The newest non-PC hardware tends to just count seconds, like the time(2)
+system call reports, but RTCs also very commonly represent time using
+the Gregorian calendar and 24 hour time, as reported by gmtime(3).
+
+Linux has two largely-compatible userspace RTC API families you may
+need to know about:
+
+    *	/dev/rtc ... is the RTC provided by PC compatible systems,
+	so it's not very portable to non-x86 systems.
+
+    *	/dev/rtc0, /dev/rtc1 ... are part of a framework that's
+	supported by a wide variety of RTC chips on all systems.
+
+Programmers need to understand that the PC/AT functionality is not
+always available, and some systems can do much more.  That is, the
+RTCs use the same API to make requests in both RTC frameworks (using
+different filenames of course), but the hardware may not offer the
+same functionality.  For example, not every RTC is hooked up to an
+IRQ, so they can't all issue alarms; and where standard PC RTCs can
+only issue an alarm up to 24 hours in the future, other hardware may
+be able to schedule one any time in the upcoming century.
+
+
+Old PC/AT-Compatible driver:  /dev/rtc
+--------------------------------------
+
+All PCs (even Alpha machines) have a Real Time Clock built into them.
+Usually they are built into the chipset of the computer, but some may
+actually have a Motorola MC146818 (or clone) on the board. This is the
+clock that keeps the date and time while your computer is turned off.
+
+ACPI has standardized that MC146818 functionality, and extended it in
+a few ways (enabling longer alarm periods, and wake-from-hibernate).
+That functionality is NOT exposed in the old driver.
+
+However it can also be used to generate signals from a slow 2Hz to a
+relatively fast 8192Hz, in increments of powers of two. These signals
+are reported by interrupt number 8. (Oh! So *that* is what IRQ 8 is
+for...) It can also function as a 24hr alarm, raising IRQ 8 when the
+alarm goes off. The alarm can also be programmed to only check any
+subset of the three programmable values, meaning that it could be set to
+ring on the 30th second of the 30th minute of every hour, for example.
+The clock can also be set to generate an interrupt upon every clock
+update, thus generating a 1Hz signal.
+
+The interrupts are reported via /dev/rtc (major 10, minor 135, read only
+character device) in the form of an unsigned long. The low byte contains
+the type of interrupt (update-done, alarm-rang, or periodic) that was
+raised, and the remaining bytes contain the number of interrupts since
+the last read.  Status information is reported through the pseudo-file
+/proc/driver/rtc if the /proc filesystem was enabled.  The driver has
+built in locking so that only one process is allowed to have the /dev/rtc
+interface open at a time.
+
+A user process can monitor these interrupts by doing a read(2) or a
+select(2) on /dev/rtc -- either will block/stop the user process until
+the next interrupt is received. This is useful for things like
+reasonably high frequency data acquisition where one doesn't want to
+burn up 100% CPU by polling gettimeofday etc. etc.
+
+At high frequencies, or under high loads, the user process should check
+the number of interrupts received since the last read to determine if
+there has been any interrupt "pileup" so to speak. Just for reference, a
+typical 486-33 running a tight read loop on /dev/rtc will start to suffer
+occasional interrupt pileup (i.e. > 1 IRQ event since last read) for
+frequencies above 1024Hz. So you really should check the high bytes
+of the value you read, especially at frequencies above that of the
+normal timer interrupt, which is 100Hz.
+
+Programming and/or enabling interrupt frequencies greater than 64Hz is
+only allowed by root. This is perhaps a bit conservative, but we don't want
+an evil user generating lots of IRQs on a slow 386sx-16, where it might have
+a negative impact on performance. This 64Hz limit can be changed by writing
+a different value to /proc/sys/dev/rtc/max-user-freq. Note that the
+interrupt handler is only a few lines of code to minimize any possibility
+of this effect.
+
+Also, if the kernel time is synchronized with an external source, the 
+kernel will write the time back to the CMOS clock every 11 minutes. In 
+the process of doing this, the kernel briefly turns off RTC periodic 
+interrupts, so be aware of this if you are doing serious work. If you
+don't synchronize the kernel time with an external source (via ntp or
+whatever) then the kernel will keep its hands off the RTC, allowing you
+exclusive access to the device for your applications.
+
+The alarm and/or interrupt frequency are programmed into the RTC via
+various ioctl(2) calls as listed in ./include/linux/rtc.h
+Rather than write 50 pages describing the ioctl() and so on, it is
+perhaps more useful to include a small test program that demonstrates
+how to use them, and demonstrates the features of the driver. This is
+probably a lot more useful to people interested in writing applications
+that will be using this driver.  See the code at the end of this document.
+
+(The original /dev/rtc driver was written by Paul Gortmaker.)
+
+
+New portable "RTC Class" drivers:  /dev/rtcN
+--------------------------------------------
+
+Because Linux supports many non-ACPI and non-PC platforms, some of which
+have more than one RTC style clock, it needed a more portable solution
+than expecting a single battery-backed MC146818 clone on every system.
+Accordingly, a new "RTC Class" framework has been defined.  It offers
+three different userspace interfaces:
+
+    *	/dev/rtcN ... much the same as the older /dev/rtc interface
+
+    *	/sys/class/rtc/rtcN ... sysfs attributes support readonly
+	access to some RTC attributes.
+
+    *	/proc/driver/rtc ... the system clock RTC may expose itself
+	using a procfs interface. If there is no RTC for the system clock,
+	rtc0 is used by default. More information is (currently) shown
+	here than through sysfs.
+
+The RTC Class framework supports a wide variety of RTCs, ranging from those
+integrated into embeddable system-on-chip (SOC) processors to discrete chips
+using I2C, SPI, or some other bus to communicate with the host CPU.  There's
+even support for PC-style RTCs ... including the features exposed on newer PCs
+through ACPI.
+
+The new framework also removes the "one RTC per system" restriction.  For
+example, maybe the low-power battery-backed RTC is a discrete I2C chip, but
+a high functionality RTC is integrated into the SOC.  That system might read
+the system clock from the discrete RTC, but use the integrated one for all
+other tasks, because of its greater functionality.
+
+Check out tools/testing/selftests/rtc/rtctest.c for an example usage of the
+ioctl interface.
diff --git a/Documentation/admin-guide/svga.rst b/Documentation/admin-guide/svga.rst
new file mode 100644
index 000000000000..b6c2f9acca92
--- /dev/null
+++ b/Documentation/admin-guide/svga.rst
@@ -0,0 +1,249 @@
+.. include:: <isonum.txt>
+
+=================================
+Video Mode Selection Support 2.13
+=================================
+
+:Copyright: |copy| 1995--1999 Martin Mares, <mj@ucw.cz>
+
+Intro
+~~~~~
+
+This small document describes the "Video Mode Selection" feature which
+allows the use of various special video modes supported by the video BIOS. Due
+to usage of the BIOS, the selection is limited to boot time (before the
+kernel decompression starts) and works only on 80X86 machines.
+
+.. note::
+
+   Short intro for the impatient: Just use vga=ask for the first time,
+   enter ``scan`` on the video mode prompt, pick the mode you want to use,
+   remember its mode ID (the four-digit hexadecimal number) and then
+   set the vga parameter to this number (converted to decimal first).
+
+The video mode to be used is selected by a kernel parameter which can be
+specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
+option of LILO (or some other boot loader you use) or by the "vidmode" utility
+(present in standard Linux utility packages). You can use the following values
+of this parameter::
+
+   NORMAL_VGA - Standard 80x25 mode available on all display adapters.
+
+   EXTENDED_VGA	- Standard 8-pixel font mode: 80x43 on EGA, 80x50 on VGA.
+
+   ASK_VGA - Display a video mode menu upon startup (see below).
+
+   0..35 - Menu item number (when you have used the menu to view the list of
+      modes available on your adapter, you can specify the menu item you want
+      to use). 0..9 correspond to "0".."9", 10..35 to "a".."z". Warning: the
+      mode list displayed may vary as the kernel version changes, because the
+      modes are listed in a "first detected -- first displayed" manner. It's
+      better to use absolute mode numbers instead.
+
+   0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
+      for exact meaning of the ID). Warning: rdev and LILO don't support
+      hexadecimal numbers -- you have to convert it to decimal manually.
+
+Menu
+~~~~
+
+The ASK_VGA mode causes the kernel to offer a video mode menu upon
+bootup. It displays a "Press <RETURN> to see video modes available, <SPACE>
+to continue or wait 30 secs" message. If you press <RETURN>, you enter the
+menu, if you press <SPACE> or wait 30 seconds, the kernel will boot up in
+the standard 80x25 mode.
+
+The menu looks like::
+
+	Video adapter: <name-of-detected-video-adapter>
+	Mode:    COLSxROWS:
+	0  0F00  80x25
+	1  0F01  80x50
+	2  0F02  80x43
+	3  0F03  80x26
+	....
+	Enter mode number or ``scan``: <flashing-cursor-here>
+
+<name-of-detected-video-adapter> tells what video adapter did Linux detect
+-- it's either a generic adapter name (MDA, CGA, HGC, EGA, VGA, VESA VGA [a VGA
+with VESA-compliant BIOS]) or a chipset name (e.g., Trident). Direct detection
+of chipsets is turned off by default as it's inherently unreliable due to
+absolutely insane PC design.
+
+"0  0F00  80x25" means that the first menu item (the menu items are numbered
+from "0" to "9" and from "a" to "z") is a 80x25 mode with ID=0x0f00 (see the
+next section for a description of mode IDs).
+
+<flashing-cursor-here> encourages you to enter the item number or mode ID
+you wish to set and press <RETURN>. If the computer complains something about
+"Unknown mode ID", it is trying to tell you that it isn't possible to set such
+a mode. It's also possible to press only <RETURN> which leaves the current mode.
+
+The mode list usually contains a few basic modes and some VESA modes.  In
+case your chipset has been detected, some chipset-specific modes are shown as
+well (some of these might be missing or unusable on your machine as different
+BIOSes are often shipped with the same card and the mode numbers depend purely
+on the VGA BIOS).
+
+The modes displayed on the menu are partially sorted: The list starts with
+the standard modes (80x25 and 80x50) followed by "special" modes (80x28 and
+80x43), local modes (if the local modes feature is enabled), VESA modes and
+finally SVGA modes for the auto-detected adapter.
+
+If you are not happy with the mode list offered (e.g., if you think your card
+is able to do more), you can enter "scan" instead of item number / mode ID.  The
+program will try to ask the BIOS for all possible video mode numbers and test
+what happens then. The screen will be probably flashing wildly for some time and
+strange noises will be heard from inside the monitor and so on and then, really
+all consistent video modes supported by your BIOS will appear (plus maybe some
+``ghost modes``). If you are afraid this could damage your monitor, don't use
+this function.
+
+After scanning, the mode ordering is a bit different: the auto-detected SVGA
+modes are not listed at all and the modes revealed by ``scan`` are shown before
+all VESA modes.
+
+Mode IDs
+~~~~~~~~
+
+Because of the complexity of all the video stuff, the video mode IDs
+used here are also a bit complex. A video mode ID is a 16-bit number usually
+expressed in a hexadecimal notation (starting with "0x"). You can set a mode
+by entering its mode directly if you know it even if it isn't shown on the menu.
+
+The ID numbers can be divided to those regions::
+
+   0x0000 to 0x00ff - menu item references. 0x0000 is the first item. Don't use
+	outside the menu as this can change from boot to boot (especially if you
+	have used the ``scan`` feature).
+
+   0x0100 to 0x017f - standard BIOS modes. The ID is a BIOS video mode number
+	(as presented to INT 10, function 00) increased by 0x0100.
+
+   0x0200 to 0x08ff - VESA BIOS modes. The ID is a VESA mode ID increased by
+	0x0100. All VESA modes should be autodetected and shown on the menu.
+
+   0x0900 to 0x09ff - Video7 special modes. Set by calling INT 0x10, AX=0x6f05.
+	(Usually 940=80x43, 941=132x25, 942=132x44, 943=80x60, 944=100x60,
+	945=132x28 for the standard Video7 BIOS)
+
+   0x0f00 to 0x0fff - special modes (they are set by various tricks -- usually
+	by modifying one of the standard modes). Currently available:
+	0x0f00	standard 80x25, don't reset mode if already set (=FFFF)
+	0x0f01	standard with 8-point font: 80x43 on EGA, 80x50 on VGA
+	0x0f02	VGA 80x43 (VGA switched to 350 scanlines with a 8-point font)
+	0x0f03	VGA 80x28 (standard VGA scans, but 14-point font)
+	0x0f04	leave current video mode
+	0x0f05	VGA 80x30 (480 scans, 16-point font)
+	0x0f06	VGA 80x34 (480 scans, 14-point font)
+	0x0f07	VGA 80x60 (480 scans, 8-point font)
+	0x0f08	Graphics hack (see the VIDEO_GFX_HACK paragraph below)
+
+   0x1000 to 0x7fff - modes specified by resolution. The code has a "0xRRCC"
+	form where RR is a number of rows and CC is a number of columns.
+	E.g., 0x1950 corresponds to a 80x25 mode, 0x2b84 to 132x43 etc.
+	This is the only fully portable way to refer to a non-standard mode,
+	but it relies on the mode being found and displayed on the menu
+	(remember that mode scanning is not done automatically).
+
+   0xff00 to 0xffff - aliases for backward compatibility:
+	0xffff	equivalent to 0x0f00 (standard 80x25)
+	0xfffe	equivalent to 0x0f01 (EGA 80x43 or VGA 80x50)
+
+If you add 0x8000 to the mode ID, the program will try to recalculate
+vertical display timing according to mode parameters, which can be used to
+eliminate some annoying bugs of certain VGA BIOSes (usually those used for
+cards with S3 chipsets and old Cirrus Logic BIOSes) -- mainly extra lines at the
+end of the display.
+
+Options
+~~~~~~~
+
+Build options for arch/x86/boot/* are selected by the kernel kconfig
+utility and the kernel .config file.
+
+VIDEO_GFX_HACK - includes special hack for setting of graphics modes
+to be used later by special drivers.
+Allows to set _any_ BIOS mode including graphic ones and forcing specific
+text screen resolution instead of peeking it from BIOS variables. Don't use
+unless you think you know what you're doing. To activate this setup, use
+mode number 0x0f08 (see the Mode IDs section above).
+
+Still doesn't work?
+~~~~~~~~~~~~~~~~~~~
+
+When the mode detection doesn't work (e.g., the mode list is incorrect or
+the machine hangs instead of displaying the menu), try to switch off some of
+the configuration options listed under "Options". If it fails, you can still use
+your kernel with the video mode set directly via the kernel parameter.
+
+In either case, please send me a bug report containing what _exactly_
+happens and how do the configuration switches affect the behaviour of the bug.
+
+If you start Linux from M$-DOS, you might also use some DOS tools for
+video mode setting. In this case, you must specify the 0x0f04 mode ("leave
+current settings") to Linux, because if you don't and you use any non-standard
+mode, Linux will switch to 80x25 automatically.
+
+If you set some extended mode and there's one or more extra lines on the
+bottom of the display containing already scrolled-out text, your VGA BIOS
+contains the most common video BIOS bug called "incorrect vertical display
+end setting". Adding 0x8000 to the mode ID might fix the problem. Unfortunately,
+this must be done manually -- no autodetection mechanisms are available.
+
+History
+~~~~~~~
+
+=============== ================================================================
+1.0 (??-Nov-95)	First version supporting all adapters supported by the old
+		setup.S + Cirrus Logic 54XX. Present in some 1.3.4? kernels
+		and then removed due to instability on some machines.
+2.0 (28-Jan-96)	Rewritten from scratch. Cirrus Logic 64XX support added, almost
+		everything is configurable, the VESA support should be much more
+		stable, explicit mode numbering allowed, "scan" implemented etc.
+2.1 (30-Jan-96) VESA modes moved to 0x200-0x3ff. Mode selection by resolution
+		supported. Few bugs fixed. VESA modes are listed prior to
+		modes supplied by SVGA autodetection as they are more reliable.
+		CLGD autodetect works better. Doesn't depend on 80x25 being
+		active when started. Scanning fixed. 80x43 (any VGA) added.
+		Code cleaned up.
+2.2 (01-Feb-96)	EGA 80x43 fixed. VESA extended to 0x200-0x4ff (non-standard 02XX
+		VESA modes work now). Display end bug workaround supported.
+		Special modes renumbered to allow adding of the "recalculate"
+		flag, 0xffff and 0xfffe became aliases instead of real IDs.
+		Screen contents retained during mode changes.
+2.3 (15-Mar-96)	Changed to work with 1.3.74 kernel.
+2.4 (18-Mar-96)	Added patches by Hans Lermen fixing a memory overwrite problem
+		with some boot loaders. Memory management rewritten to reflect
+		these changes. Unfortunately, screen contents retaining works
+		only with some loaders now.
+		Added a Tseng 132x60 mode.
+2.5 (19-Mar-96)	Fixed a VESA mode scanning bug introduced in 2.4.
+2.6 (25-Mar-96)	Some VESA BIOS errors not reported -- it fixes error reports on
+		several cards with broken VESA code (e.g., ATI VGA).
+2.7 (09-Apr-96)	- Accepted all VESA modes in range 0x100 to 0x7ff, because some
+		  cards use very strange mode numbers.
+		- Added Realtek VGA modes (thanks to Gonzalo Tornaria).
+		- Hardware testing order slightly changed, tests based on ROM
+		  contents done as first.
+		- Added support for special Video7 mode switching functions
+		  (thanks to Tom Vander Aa).
+		- Added 480-scanline modes (especially useful for notebooks,
+		  original version written by hhanemaa@cs.ruu.nl, patched by
+		  Jeff Chua, rewritten by me).
+		- Screen store/restore fixed.
+2.8 (14-Apr-96) - Previous release was not compilable without CONFIG_VIDEO_SVGA.
+		- Better recognition of text modes during mode scan.
+2.9 (12-May-96)	- Ignored VESA modes 0x80 - 0xff (more VESA BIOS bugs!)
+2.10(11-Nov-96) - The whole thing made optional.
+		- Added the CONFIG_VIDEO_400_HACK switch.
+		- Added the CONFIG_VIDEO_GFX_HACK switch.
+		- Code cleanup.
+2.11(03-May-97) - Yet another cleanup, now including also the documentation.
+		- Direct testing of SVGA adapters turned off by default, ``scan``
+		  offered explicitly on the prompt line.
+		- Removed the doc section describing adding of new probing
+		  functions as I try to get rid of _all_ hardware probing here.
+2.12(25-May-98) Added support for VESA frame buffer graphics.
+2.13(14-May-99) Minor documentation fixes.
+=============== ================================================================
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index a0c1d4ce403a..032c7cd3cede 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -327,7 +327,7 @@ when a hard lockup is detected.
    0 - don't panic on hard lockup
    1 - panic on hard lockup
 
-See Documentation/lockup-watchdogs.txt for more information.  This can
+See Documentation/admin-guide/lockup-watchdogs.rst for more information.  This can
 also be set using the nmi_watchdog kernel parameter.
 
 
diff --git a/Documentation/admin-guide/video-output.rst b/Documentation/admin-guide/video-output.rst
new file mode 100644
index 000000000000..56d6fa2e2368
--- /dev/null
+++ b/Documentation/admin-guide/video-output.rst
@@ -0,0 +1,34 @@
+Video Output Switcher Control
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+2006 luming.yu@intel.com
+
+The output sysfs class driver provides an abstract video output layer that
+can be used to hook platform specific methods to enable/disable video output
+device through common sysfs interface. For example, on my IBM ThinkPad T42
+laptop, The ACPI video driver registered its output devices and read/write
+method for 'state' with output sysfs class. The user interface under sysfs is::
+
+  linux:/sys/class/video_output # tree .
+  .
+  |-- CRT0
+  |   |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+  |   |-- state
+  |   |-- subsystem -> ../../../class/video_output
+  |   `-- uevent
+  |-- DVI0
+  |   |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+  |   |-- state
+  |   |-- subsystem -> ../../../class/video_output
+  |   `-- uevent
+  |-- LCD0
+  |   |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+  |   |-- state
+  |   |-- subsystem -> ../../../class/video_output
+  |   `-- uevent
+  `-- TV0
+     |-- device -> ../../../devices/pci0000:00/0000:00:01.0
+     |-- state
+     |-- subsystem -> ../../../class/video_output
+     `-- uevent
+
diff --git a/Documentation/auxdisplay/lcd-panel-cgram.rst b/Documentation/auxdisplay/lcd-panel-cgram.rst
deleted file mode 100644
index dfef50286018..000000000000
--- a/Documentation/auxdisplay/lcd-panel-cgram.rst
+++ /dev/null
@@ -1,29 +0,0 @@
-:orphan:
-
-======================================
-Parallel port LCD/Keypad Panel support
-======================================
-
-Some LCDs allow you to define up to 8 characters, mapped to ASCII
-characters 0 to 7. The escape code to define a new character is
-'\e[LG' followed by one digit from 0 to 7, representing the character
-number, and up to 8 couples of hex digits terminated by a semi-colon
-(';'). Each couple of digits represents a line, with 1-bits for each
-illuminated pixel with LSB on the right. Lines are numbered from the
-top of the character to the bottom. On a 5x7 matrix, only the 5 lower
-bits of the 7 first bytes are used for each character. If the string
-is incomplete, only complete lines will be redefined. Here are some
-examples::
-
-  printf "\e[LG0010101050D1F0C04;"  => 0 = [enter]
-  printf "\e[LG1040E1F0000000000;"  => 1 = [up]
-  printf "\e[LG2000000001F0E0400;"  => 2 = [down]
-  printf "\e[LG3040E1F001F0E0400;"  => 3 = [up-down]
-  printf "\e[LG40002060E1E0E0602;"  => 4 = [left]
-  printf "\e[LG500080C0E0F0E0C08;"  => 5 = [right]
-  printf "\e[LG60016051516141400;"  => 6 = "IP"
-
-  printf "\e[LG00103071F1F070301;"  => big speaker
-  printf "\e[LG00002061E1E060200;"  => small speaker
-
-Willy
diff --git a/Documentation/btmrvl.txt b/Documentation/btmrvl.txt
deleted file mode 100644
index ec57740ead0c..000000000000
--- a/Documentation/btmrvl.txt
+++ /dev/null
@@ -1,124 +0,0 @@
-=============
-btmrvl driver
-=============
-
-All commands are used via debugfs interface.
-
-Set/get driver configurations
-=============================
-
-Path:	/debug/btmrvl/config/
-
-gpiogap=[n], hscfgcmd
-	These commands are used to configure the host sleep parameters::
-	bit 8:0  -- Gap
-	bit 16:8 -- GPIO
-
-	where GPIO is the pin number of GPIO used to wake up the host.
-	It could be any valid GPIO pin# (e.g. 0-7) or 0xff (SDIO interface
-	wakeup will be used instead).
-
-	where Gap is the gap in milli seconds between wakeup signal and
-	wakeup event, or 0xff for special host sleep setting.
-
-	Usage::
-
-		# Use SDIO interface to wake up the host and set GAP to 0x80:
-		echo 0xff80 > /debug/btmrvl/config/gpiogap
-		echo 1 > /debug/btmrvl/config/hscfgcmd
-
-		# Use GPIO pin #3 to wake up the host and set GAP to 0xff:
-		echo 0x03ff >  /debug/btmrvl/config/gpiogap
-		echo 1 > /debug/btmrvl/config/hscfgcmd
-
-psmode=[n], pscmd
-	These commands are used to enable/disable auto sleep mode
-
-	where the option is::
-
-			1 	-- Enable auto sleep mode
-			0 	-- Disable auto sleep mode
-
-	Usage::
-
-		# Enable auto sleep mode
-		echo 1 > /debug/btmrvl/config/psmode
-		echo 1 > /debug/btmrvl/config/pscmd
-
-		# Disable auto sleep mode
-		echo 0 > /debug/btmrvl/config/psmode
-		echo 1 > /debug/btmrvl/config/pscmd
-
-
-hsmode=[n], hscmd
-	These commands are used to enable host sleep or wake up firmware
-
-	where the option is::
-
-			1	-- Enable host sleep
-			0	-- Wake up firmware
-
-	Usage::
-
-		# Enable host sleep
-		echo 1 > /debug/btmrvl/config/hsmode
-		echo 1 > /debug/btmrvl/config/hscmd
-
-		# Wake up firmware
-		echo 0 > /debug/btmrvl/config/hsmode
-		echo 1 > /debug/btmrvl/config/hscmd
-
-
-Get driver status
-=================
-
-Path:	/debug/btmrvl/status/
-
-Usage::
-
-	cat /debug/btmrvl/status/<args>
-
-where the args are:
-
-curpsmode
-	This command displays current auto sleep status.
-
-psstate
-	This command display the power save state.
-
-hsstate
-	This command display the host sleep state.
-
-txdnldrdy
-	This command displays the value of Tx download ready flag.
-
-Issuing a raw hci command
-=========================
-
-Use hcitool to issue raw hci command, refer to hcitool manual
-
-Usage::
-
-	Hcitool cmd <ogf> <ocf> [Parameters]
-
-Interface Control Command::
-
-	hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00    --Enable All interface
-	hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01    --Enable Wlan interface
-	hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02    --Enable BT interface
-	hcitool cmd 0x3f 0x5b 0xf5 0x00 0x00    --Disable All interface
-	hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01    --Disable Wlan interface
-	hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02    --Disable BT interface
-
-SD8688 firmware
-===============
-
-Images:
-
-- /lib/firmware/sd8688_helper.bin
-- /lib/firmware/sd8688.bin
-
-
-The images can be downloaded from:
-
-git.infradead.org/users/dwmw2/linux-firmware.git/libertas/
diff --git a/Documentation/clearing-warn-once.txt b/Documentation/clearing-warn-once.txt
deleted file mode 100644
index 211fd926cf00..000000000000
--- a/Documentation/clearing-warn-once.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-Clearing WARN_ONCE
-------------------
-
-WARN_ONCE / WARN_ON_ONCE / printk_once only emit a message once.
-
-echo 1 > /sys/kernel/debug/clear_warn_once
-
-clears the state and allows the warnings to print once again.
-This can be useful after test suite runs to reproduce problems.
diff --git a/Documentation/cma/debugfs.rst b/Documentation/cma/debugfs.rst
deleted file mode 100644
index 518fe401b5ee..000000000000
--- a/Documentation/cma/debugfs.rst
+++ /dev/null
@@ -1,27 +0,0 @@
-:orphan:
-
-=====================
-CMA Debugfs Interface
-=====================
-
-The CMA debugfs interface is useful to retrieve basic information out of the
-different CMA areas and to test allocation/release in each of the areas.
-
-Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
-kernel's CMA index. So the first CMA zone would be:
-
-	<debugfs>/cma/cma-0
-
-The structure of the files created under that directory is as follows:
-
- - [RO] base_pfn: The base PFN (Page Frame Number) of the zone.
- - [RO] count: Amount of memory in the CMA area.
- - [RO] order_per_bit: Order of pages represented by one bit.
- - [RO] bitmap: The bitmap of page states in the zone.
- - [WO] alloc: Allocate N pages from that CMA area. For example::
-
-	echo 5 > <debugfs>/cma/cma-2/alloc
-
-would try to allocate 5 pages from the cma-2 area.
-
- - [WO] free: Free N pages from that CMA area, similar to the above.
diff --git a/Documentation/cpu-load.txt b/Documentation/cpu-load.txt
deleted file mode 100644
index 2d01ce43d2a2..000000000000
--- a/Documentation/cpu-load.txt
+++ /dev/null
@@ -1,114 +0,0 @@
-========
-CPU load
-========
-
-Linux exports various bits of information via ``/proc/stat`` and
-``/proc/uptime`` that userland tools, such as top(1), use to calculate
-the average time system spent in a particular state, for example::
-
-    $ iostat
-    Linux 2.6.18.3-exp (linmac)     02/20/2007
-
-    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
-              10.01    0.00    2.92    5.44    0.00   81.63
-
-    ...
-
-Here the system thinks that over the default sampling period the
-system spent 10.01% of the time doing work in user space, 2.92% in the
-kernel, and was overall 81.63% of the time idle.
-
-In most cases the ``/proc/stat``	 information reflects the reality quite
-closely, however due to the nature of how/when the kernel collects
-this data sometimes it can not be trusted at all.
-
-So how is this information collected?  Whenever timer interrupt is
-signalled the kernel looks what kind of task was running at this
-moment and increments the counter that corresponds to this tasks
-kind/state.  The problem with this is that the system could have
-switched between various states multiple times between two timer
-interrupts yet the counter is incremented only for the last state.
-
-
-Example
--------
-
-If we imagine the system with one task that periodically burns cycles
-in the following manner::
-
-     time line between two timer interrupts
-    |--------------------------------------|
-     ^                                    ^
-     |_ something begins working          |
-                                          |_ something goes to sleep
-                                         (only to be awaken quite soon)
-
-In the above situation the system will be 0% loaded according to the
-``/proc/stat`` (since the timer interrupt will always happen when the
-system is executing the idle handler), but in reality the load is
-closer to 99%.
-
-One can imagine many more situations where this behavior of the kernel
-will lead to quite erratic information inside ``/proc/stat``::
-
-
-	/* gcc -o hog smallhog.c */
-	#include <time.h>
-	#include <limits.h>
-	#include <signal.h>
-	#include <sys/time.h>
-	#define HIST 10
-
-	static volatile sig_atomic_t stop;
-
-	static void sighandler (int signr)
-	{
-	(void) signr;
-	stop = 1;
-	}
-	static unsigned long hog (unsigned long niters)
-	{
-	stop = 0;
-	while (!stop && --niters);
-	return niters;
-	}
-	int main (void)
-	{
-	int i;
-	struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
-				.it_value = { .tv_sec = 0, .tv_usec = 1 } };
-	sigset_t set;
-	unsigned long v[HIST];
-	double tmp = 0.0;
-	unsigned long n;
-	signal (SIGALRM, &sighandler);
-	setitimer (ITIMER_REAL, &it, NULL);
-
-	hog (ULONG_MAX);
-	for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
-	for (i = 0; i < HIST; ++i) tmp += v[i];
-	tmp /= HIST;
-	n = tmp - (tmp / 3.0);
-
-	sigemptyset (&set);
-	sigaddset (&set, SIGALRM);
-
-	for (;;) {
-		hog (n);
-		sigwait (&set, &i);
-	}
-	return 0;
-	}
-
-
-References
-----------
-
-- http://lkml.org/lkml/2007/2/12/6
-- Documentation/filesystems/proc.txt (1.8)
-
-
-Thanks
-------
-
-Con Kolivas, Pavel Machek
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
deleted file mode 100644
index b90dafcc8237..000000000000
--- a/Documentation/cputopology.txt
+++ /dev/null
@@ -1,177 +0,0 @@
-===========================================
-How CPU topology info is exported via sysfs
-===========================================
-
-Export CPU topology info via sysfs. Items (attributes) are similar
-to /proc/cpuinfo output of some architectures.  They reside in
-/sys/devices/system/cpu/cpuX/topology/:
-
-physical_package_id:
-
-	physical package id of cpuX. Typically corresponds to a physical
-	socket number, but the actual value is architecture and platform
-	dependent.
-
-die_id:
-
-	the CPU die ID of cpuX. Typically it is the hardware platform's
-	identifier (rather than the kernel's).  The actual value is
-	architecture and platform dependent.
-
-core_id:
-
-	the CPU core ID of cpuX. Typically it is the hardware platform's
-	identifier (rather than the kernel's).  The actual value is
-	architecture and platform dependent.
-
-book_id:
-
-	the book ID of cpuX. Typically it is the hardware platform's
-	identifier (rather than the kernel's).	The actual value is
-	architecture and platform dependent.
-
-drawer_id:
-
-	the drawer ID of cpuX. Typically it is the hardware platform's
-	identifier (rather than the kernel's).	The actual value is
-	architecture and platform dependent.
-
-core_cpus:
-
-	internal kernel map of CPUs within the same core.
-	(deprecated name: "thread_siblings")
-
-core_cpus_list:
-
-	human-readable list of CPUs within the same core.
-	(deprecated name: "thread_siblings_list");
-
-package_cpus:
-
-	internal kernel map of the CPUs sharing the same physical_package_id.
-	(deprecated name: "core_siblings")
-
-package_cpus_list:
-
-	human-readable list of CPUs sharing the same physical_package_id.
-	(deprecated name: "core_siblings_list")
-
-die_cpus:
-
-	internal kernel map of CPUs within the same die.
-
-die_cpus_list:
-
-	human-readable list of CPUs within the same die.
-
-book_siblings:
-
-	internal kernel map of cpuX's hardware threads within the same
-	book_id.
-
-book_siblings_list:
-
-	human-readable list of cpuX's hardware threads within the same
-	book_id.
-
-drawer_siblings:
-
-	internal kernel map of cpuX's hardware threads within the same
-	drawer_id.
-
-drawer_siblings_list:
-
-	human-readable list of cpuX's hardware threads within the same
-	drawer_id.
-
-Architecture-neutral, drivers/base/topology.c, exports these attributes.
-However, the book and drawer related sysfs files will only be created if
-CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
-
-CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390,
-where they reflect the cpu and cache hierarchy.
-
-For an architecture to support this feature, it must define some of
-these macros in include/asm-XXX/topology.h::
-
-	#define topology_physical_package_id(cpu)
-	#define topology_die_id(cpu)
-	#define topology_core_id(cpu)
-	#define topology_book_id(cpu)
-	#define topology_drawer_id(cpu)
-	#define topology_sibling_cpumask(cpu)
-	#define topology_core_cpumask(cpu)
-	#define topology_die_cpumask(cpu)
-	#define topology_book_cpumask(cpu)
-	#define topology_drawer_cpumask(cpu)
-
-The type of ``**_id macros`` is int.
-The type of ``**_cpumask macros`` is ``(const) struct cpumask *``. The latter
-correspond with appropriate ``**_siblings`` sysfs attributes (except for
-topology_sibling_cpumask() which corresponds with thread_siblings).
-
-To be consistent on all architectures, include/linux/topology.h
-provides default definitions for any of the above macros that are
-not defined by include/asm-XXX/topology.h:
-
-1) topology_physical_package_id: -1
-2) topology_die_id: -1
-3) topology_core_id: 0
-4) topology_sibling_cpumask: just the given CPU
-5) topology_core_cpumask: just the given CPU
-6) topology_die_cpumask: just the given CPU
-
-For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
-default definitions for topology_book_id() and topology_book_cpumask().
-For architectures that don't support drawers (CONFIG_SCHED_DRAWER) there are
-no default definitions for topology_drawer_id() and topology_drawer_cpumask().
-
-Additionally, CPU topology information is provided under
-/sys/devices/system/cpu and includes these files.  The internal
-source for the output is in brackets ("[]").
-
-    =========== ==========================================================
-    kernel_max: the maximum CPU index allowed by the kernel configuration.
-		[NR_CPUS-1]
-
-    offline:	CPUs that are not online because they have been
-		HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
-		of CPUs allowed by the kernel configuration (kernel_max
-		above). [~cpu_online_mask + cpus >= NR_CPUS]
-
-    online:	CPUs that are online and being scheduled [cpu_online_mask]
-
-    possible:	CPUs that have been allocated resources and can be
-		brought online if they are present. [cpu_possible_mask]
-
-    present:	CPUs that have been identified as being present in the
-		system. [cpu_present_mask]
-    =========== ==========================================================
-
-The format for the above output is compatible with cpulist_parse()
-[see <linux/cpumask.h>].  Some examples follow.
-
-In this example, there are 64 CPUs in the system but cpus 32-63 exceed
-the kernel max which is limited to 0..31 by the NR_CPUS config option
-being 32.  Note also that CPUs 2 and 4-31 are not online but could be
-brought online as they are both present and possible::
-
-     kernel_max: 31
-        offline: 2,4-31,32-63
-         online: 0-1,3
-       possible: 0-31
-        present: 0-31
-
-In this example, the NR_CPUS config option is 128, but the kernel was
-started with possible_cpus=144.  There are 4 CPUs in the system and cpu2
-was manually taken offline (and is the only CPU that can be brought
-online.)::
-
-     kernel_max: 127
-        offline: 2,4-127,128-143
-         online: 0-1,3
-       possible: 0-127
-        present: 0-3
-
-See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
-as well as more information on the various cpumasks.
diff --git a/Documentation/efi-stub.txt b/Documentation/efi-stub.txt
deleted file mode 100644
index 833edb0d0bc4..000000000000
--- a/Documentation/efi-stub.txt
+++ /dev/null
@@ -1,100 +0,0 @@
-=================
-The EFI Boot Stub
-=================
-
-On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade
-as a PE/COFF image, thereby convincing EFI firmware loaders to load
-it as an EFI executable. The code that modifies the bzImage header,
-along with the EFI-specific entry point that the firmware loader
-jumps to are collectively known as the "EFI boot stub", and live in
-arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
-respectively. For ARM the EFI stub is implemented in
-arch/arm/boot/compressed/efi-header.S and
-arch/arm/boot/compressed/efi-stub.c. EFI stub code that is shared
-between architectures is in drivers/firmware/efi/libstub.
-
-For arm64, there is no compressed kernel support, so the Image itself
-masquerades as a PE/COFF image and the EFI stub is linked into the
-kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S
-and drivers/firmware/efi/libstub/arm64-stub.c.
-
-By using the EFI boot stub it's possible to boot a Linux kernel
-without the use of a conventional EFI boot loader, such as grub or
-elilo. Since the EFI boot stub performs the jobs of a boot loader, in
-a certain sense it *IS* the boot loader.
-
-The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
-
-
-How to install bzImage.efi
---------------------------
-
-The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
-System Partition (ESP) and renamed with the extension ".efi". Without
-the extension the EFI firmware loader will refuse to execute it. It's
-not possible to execute bzImage.efi from the usual Linux file systems
-because EFI firmware doesn't have support for them. For ARM the
-arch/arm/boot/zImage should be copied to the system partition, and it
-may not need to be renamed. Similarly for arm64, arch/arm64/boot/Image
-should be copied but not necessarily renamed.
-
-
-Passing kernel parameters from the EFI shell
---------------------------------------------
-
-Arguments to the kernel can be passed after bzImage.efi, e.g.::
-
-	fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
-
-
-The "initrd=" option
---------------------
-
-Like most boot loaders, the EFI stub allows the user to specify
-multiple initrd files using the "initrd=" option. This is the only EFI
-stub-specific command line parameter, everything else is passed to the
-kernel when it boots.
-
-The path to the initrd file must be an absolute path from the
-beginning of the ESP, relative path names do not work. Also, the path
-is an EFI-style path and directory elements must be separated with
-backslashes (\). For example, given the following directory layout::
-
-  fs0:>
-	Kernels\
-			bzImage.efi
-			initrd-large.img
-
-	Ramdisks\
-			initrd-small.img
-			initrd-medium.img
-
-to boot with the initrd-large.img file if the current working
-directory is fs0:\Kernels, the following command must be used::
-
-	fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
-
-Notice how bzImage.efi can be specified with a relative path. That's
-because the image we're executing is interpreted by the EFI shell,
-which understands relative paths, whereas the rest of the command line
-is passed to bzImage.efi.
-
-
-The "dtb=" option
------------------
-
-For the ARM and arm64 architectures, a device tree must be provided to
-the kernel. Normally firmware shall supply the device tree via the
-EFI CONFIGURATION TABLE. However, the "dtb=" command line option can
-be used to override the firmware supplied device tree, or to supply
-one when firmware is unable to.
-
-Please note: Firmware adds runtime configuration information to the
-device tree before booting the kernel. If dtb= is used to override
-the device tree, then any runtime data provided by firmware will be
-lost. The dtb= option should only be used either as a debug tool, or
-as a last resort when a device tree is not provided in the EFI
-CONFIGURATION TABLE.
-
-"dtb=" is processed in the same manner as the "initrd=" option that is
-described above.
diff --git a/Documentation/fb/vesafb.rst b/Documentation/fb/vesafb.rst
index 2ed0dfb661cf..6821c87b7893 100644
--- a/Documentation/fb/vesafb.rst
+++ b/Documentation/fb/vesafb.rst
@@ -30,7 +30,7 @@ How to use it?
 ==============
 
 Switching modes is done using the vga=... boot parameter.  Read
-Documentation/svga.txt for details.
+Documentation/admin-guide/svga.rst for details.
 
 You should compile in both vgacon (for text mode) and vesafb (for
 graphics mode). Which of them takes over the console depends on
diff --git a/Documentation/highuid.txt b/Documentation/highuid.txt
deleted file mode 100644
index 6ee70465c0ea..000000000000
--- a/Documentation/highuid.txt
+++ /dev/null
@@ -1,80 +0,0 @@
-===================================================
-Notes on the change from 16-bit UIDs to 32-bit UIDs
-===================================================
-
-:Author: Chris Wing <wingc@umich.edu>
-:Last updated: January 11, 2000
-
-- kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t
-  when communicating between user and kernel space in an ioctl or data
-  structure.
-
-- kernel code should use uid_t and gid_t in kernel-private structures and
-  code.
-
-What's left to be done for 32-bit UIDs on all Linux architectures:
-
-- Disk quotas have an interesting limitation that is not related to the
-  maximum UID/GID. They are limited by the maximum file size on the
-  underlying filesystem, because quota records are written at offsets
-  corresponding to the UID in question.
-  Further investigation is needed to see if the quota system can cope
-  properly with huge UIDs. If it can deal with 64-bit file offsets on all 
-  architectures, this should not be a problem.
-
-- Decide whether or not to keep backwards compatibility with the system
-  accounting file, or if we should break it as the comments suggest
-  (currently, the old 16-bit UID and GID are still written to disk, and
-  part of the former pad space is used to store separate 32-bit UID and
-  GID)
-
-- Need to validate that OS emulation calls the 16-bit UID
-  compatibility syscalls, if the OS being emulated used 16-bit UIDs, or
-  uses the 32-bit UID system calls properly otherwise.
-
-  This affects at least:
-
-	- iBCS on Intel
-
-	- sparc32 emulation on sparc64
-	  (need to support whatever new 32-bit UID system calls are added to
-	  sparc32)
-
-- Validate that all filesystems behave properly.
-
-  At present, 32-bit UIDs _should_ work for:
-
-	- ext2
-	- ufs
-	- isofs
-	- nfs
-	- coda
-	- udf
-
-  Ioctl() fixups have been made for:
-
-	- ncpfs
-	- smbfs
-
-  Filesystems with simple fixups to prevent 16-bit UID wraparound:
-
-	- minix
-	- sysv
-	- qnx4
-
-  Other filesystems have not been checked yet.
-
-- The ncpfs and smpfs filesystems cannot presently use 32-bit UIDs in
-  all ioctl()s. Some new ioctl()s have been added with 32-bit UIDs, but
-  more are needed. (as well as new user<->kernel data structures)
-
-- The ELF core dump format only supports 16-bit UIDs on arm, i386, m68k,
-  sh, and sparc32. Fixing this is probably not that important, but would
-  require adding a new ELF section.
-
-- The ioctl()s used to control the in-kernel NFS server only support
-  16-bit UIDs on arm, i386, m68k, sh, and sparc32.
-
-- make sure that the UID mapping feature of AX25 networking works properly
-  (it should be safe because it's always used a 32-bit integer to
-  communicate between user and kernel)
diff --git a/Documentation/hw_random.txt b/Documentation/hw_random.txt
deleted file mode 100644
index 121de96e395e..000000000000
--- a/Documentation/hw_random.txt
+++ /dev/null
@@ -1,105 +0,0 @@
-==========================================================
-Linux support for random number generator in i8xx chipsets
-==========================================================
-
-Introduction
-============
-
-The hw_random framework is software that makes use of a
-special hardware feature on your CPU or motherboard,
-a Random Number Generator (RNG).  The software has two parts:
-a core providing the /dev/hwrng character device and its
-sysfs support, plus a hardware-specific driver that plugs
-into that core.
-
-To make the most effective use of these mechanisms, you
-should download the support software as well.  Download the
-latest version of the "rng-tools" package from the
-hw_random driver's official Web site:
-
-	http://sourceforge.net/projects/gkernel/
-
-Those tools use /dev/hwrng to fill the kernel entropy pool,
-which is used internally and exported by the /dev/urandom and
-/dev/random special files.
-
-Theory of operation
-===================
-
-CHARACTER DEVICE.  Using the standard open()
-and read() system calls, you can read random data from
-the hardware RNG device.  This data is NOT CHECKED by any
-fitness tests, and could potentially be bogus (if the
-hardware is faulty or has been tampered with).  Data is only
-output if the hardware "has-data" flag is set, but nevertheless
-a security-conscious person would run fitness tests on the
-data before assuming it is truly random.
-
-The rng-tools package uses such tests in "rngd", and lets you
-run them by hand with a "rngtest" utility.
-
-/dev/hwrng is char device major 10, minor 183.
-
-CLASS DEVICE.  There is a /sys/class/misc/hw_random node with
-two unique attributes, "rng_available" and "rng_current".  The
-"rng_available" attribute lists the hardware-specific drivers
-available, while "rng_current" lists the one which is currently
-connected to /dev/hwrng.  If your system has more than one
-RNG available, you may change the one used by writing a name from
-the list in "rng_available" into "rng_current".
-
-==========================================================================
-
-
-Hardware driver for Intel/AMD/VIA Random Number Generators (RNG)
-	- Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
-	- Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
-
-
-About the Intel RNG hardware, from the firmware hub datasheet
-=============================================================
-
-The Firmware Hub integrates a Random Number Generator (RNG)
-using thermal noise generated from inherently random quantum
-mechanical properties of silicon. When not generating new random
-bits the RNG circuitry will enter a low power state. Intel will
-provide a binary software driver to give third party software
-access to our RNG for use as a security feature. At this time,
-the RNG is only to be used with a system in an OS-present state.
-
-Intel RNG Driver notes
-======================
-
-FIXME: support poll(2)
-
-.. note::
-
-	request_mem_region was removed, for three reasons:
-
-	1) Only one RNG is supported by this driver;
-	2) The location used by the RNG is a fixed location in
-	   MMIO-addressable memory;
-	3) users with properly working BIOS e820 handling will always
-	   have the region in which the RNG is located reserved, so
-	   request_mem_region calls always fail for proper setups.
-	   However, for people who use mem=XX, BIOS e820 information is
-	   **not** in /proc/iomem, and request_mem_region(RNG_ADDR) can
-	   succeed.
-
-Driver details
-==============
-
-Based on:
-	Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet
-	May 1999 Order Number: 290658-002 R
-
-Intel 82802 Firmware Hub:
-	Random Number Generator
-	Programmer's Reference Manual
-	December 1999 Order Number: 298029-001 R
-
-Intel 82802 Firmware HUB Random Number Generator Driver
-	Copyright (c) 2000 Matt Sottek <msottek@quiknet.com>
-
-Special thanks to Matt Sottek.  I did the "guts", he
-did the "brains" and all the testing.
diff --git a/Documentation/iostats.txt b/Documentation/iostats.txt
deleted file mode 100644
index 5d63b18bd6d1..000000000000
--- a/Documentation/iostats.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-=====================
-I/O statistics fields
-=====================
-
-Since 2.4.20 (and some versions before, with patches), and 2.5.45,
-more extensive disk statistics have been introduced to help measure disk
-activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
-the work for you, but in case you are interested in creating your own
-tools, the fields are explained here.
-
-In 2.4 now, the information is found as additional fields in
-``/proc/partitions``.  In 2.6 and upper, the same information is found in two
-places: one is in the file ``/proc/diskstats``, and the other is within
-the sysfs file system, which must be mounted in order to obtain
-the information. Throughout this document we'll assume that sysfs
-is mounted on ``/sys``, although of course it may be mounted anywhere.
-Both ``/proc/diskstats`` and sysfs use the same source for the information
-and so should not differ.
-
-Here are examples of these different formats::
-
-   2.4:
-      3     0   39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
-      3     1    9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
-
-   2.6+ sysfs:
-      446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
-      35486    38030    38030    38030
-
-   2.6+ diskstats:
-      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
-      3    1   hda1 35486 38030 38030 38030
-
-   4.18+ diskstats:
-      3    0   hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
-
-On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
-a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
-
-The advantage of one over the other is that the sysfs choice works well
-if you are watching a known, small set of disks.  ``/proc/diskstats`` may
-be a better choice if you are watching a large number of disks because
-you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
-each snapshot of your disk statistics.
-
-In 2.4, the statistics fields are those after the device name. In
-the above example, the first field of statistics would be 446216.
-By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
-find just the eleven fields, beginning with 446216.  If you look at
-``/proc/diskstats``, the eleven fields will be preceded by the major and
-minor device numbers, and device name.  Each of these formats provides
-eleven fields of statistics, each meaning exactly the same things.
-All fields except field 9 are cumulative since boot.  Field 9 should
-go to zero as I/Os complete; all others only increase (unless they
-overflow and wrap).  Yes, these are (32-bit or 64-bit) unsigned long
-(native word size) numbers, and on a very busy or long-lived system they
-may wrap. Applications should be prepared to deal with that; unless
-your observations are measured in large numbers of minutes or hours,
-they should not wrap twice before you notice them.
-
-Each set of stats only applies to the indicated device; if you want
-system-wide stats you'll have to find all the devices and sum them all up.
-
-Field  1 -- # of reads completed
-    This is the total number of reads completed successfully.
-
-Field  2 -- # of reads merged, field 6 -- # of writes merged
-    Reads and writes which are adjacent to each other may be merged for
-    efficiency.  Thus two 4K reads may become one 8K read before it is
-    ultimately handed to the disk, and so it will be counted (and queued)
-    as only one I/O.  This field lets you know how often this was done.
-
-Field  3 -- # of sectors read
-    This is the total number of sectors read successfully.
-
-Field  4 -- # of milliseconds spent reading
-    This is the total number of milliseconds spent by all reads (as
-    measured from __make_request() to end_that_request_last()).
-
-Field  5 -- # of writes completed
-    This is the total number of writes completed successfully.
-
-Field  6 -- # of writes merged
-    See the description of field 2.
-
-Field  7 -- # of sectors written
-    This is the total number of sectors written successfully.
-
-Field  8 -- # of milliseconds spent writing
-    This is the total number of milliseconds spent by all writes (as
-    measured from __make_request() to end_that_request_last()).
-
-Field  9 -- # of I/Os currently in progress
-    The only field that should go to zero. Incremented as requests are
-    given to appropriate struct request_queue and decremented as they finish.
-
-Field 10 -- # of milliseconds spent doing I/Os
-    This field increases so long as field 9 is nonzero.
-
-    Since 5.0 this field counts jiffies when at least one request was
-    started or completed. If request runs more than 2 jiffies then some
-    I/O time will not be accounted unless there are other requests.
-
-Field 11 -- weighted # of milliseconds spent doing I/Os
-    This field is incremented at each I/O start, I/O completion, I/O
-    merge, or read of these stats by the number of I/Os in progress
-    (field 9) times the number of milliseconds spent doing I/O since the
-    last update of this field.  This can provide an easy measure of both
-    I/O completion time and the backlog that may be accumulating.
-
-Field 12 -- # of discards completed
-    This is the total number of discards completed successfully.
-
-Field 13 -- # of discards merged
-    See the description of field 2
-
-Field 14 -- # of sectors discarded
-    This is the total number of sectors discarded successfully.
-
-Field 15 -- # of milliseconds spent discarding
-    This is the total number of milliseconds spent by all discards (as
-    measured from __make_request() to end_that_request_last()).
-
-To avoid introducing performance bottlenecks, no locks are held while
-modifying these counters.  This implies that minor inaccuracies may be
-introduced when changes collide, so (for instance) adding up all the
-read I/Os issued per partition should equal those made to the disks ...
-but due to the lack of locking it may only be very close.
-
-In 2.6+, there are counters for each CPU, which make the lack of locking
-almost a non-issue.  When the statistics are read, the per-CPU counters
-are summed (possibly overflowing the unsigned long variable they are
-summed to) and the result given to the user.  There is no convenient
-user interface for accessing the per-CPU counters themselves.
-
-Disks vs Partitions
--------------------
-
-There were significant changes between 2.4 and 2.6+ in the I/O subsystem.
-As a result, some statistic information disappeared. The translation from
-a disk address relative to a partition to the disk address relative to
-the host disk happens much earlier.  All merges and timings now happen
-at the disk level rather than at both the disk and partition level as
-in 2.4.  Consequently, you'll see a different statistics output on 2.6+ for
-partitions from that for disks.  There are only *four* fields available
-for partitions on 2.6+ machines.  This is reflected in the examples above.
-
-Field  1 -- # of reads issued
-    This is the total number of reads issued to this partition.
-
-Field  2 -- # of sectors read
-    This is the total number of sectors requested to be read from this
-    partition.
-
-Field  3 -- # of writes issued
-    This is the total number of writes issued to this partition.
-
-Field  4 -- # of sectors written
-    This is the total number of sectors requested to be written to
-    this partition.
-
-Note that since the address is translated to a disk-relative one, and no
-record of the partition-relative address is kept, the subsequent success
-or failure of the read cannot be attributed to the partition.  In other
-words, the number of reads for partitions is counted slightly before time
-of queuing for partitions, and at completion for whole disks.  This is
-a subtle distinction that is probably uninteresting for most cases.
-
-More significant is the error induced by counting the numbers of
-reads/writes before merges for partitions and after for disks. Since a
-typical workload usually contains a lot of successive and adjacent requests,
-the number of reads/writes issued can be several times higher than the
-number of reads/writes completed.
-
-In 2.6.25, the full statistic set is again available for partitions and
-disk and partition statistics are consistent again. Since we still don't
-keep record of the partition-relative address, an operation is attributed to
-the partition which contains the first sector of the request after the
-eventual merges. As requests can be merged across partition, this could lead
-to some (probably insignificant) inaccuracy.
-
-Additional notes
-----------------
-
-In 2.6+, sysfs is not mounted by default.  If your distribution of
-Linux hasn't added it already, here's the line you'll want to add to
-your ``/etc/fstab``::
-
-	none /sys sysfs defaults 0 0
-
-
-In 2.6+, all disk statistics were removed from ``/proc/stat``.  In 2.4, they
-appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in
-``/proc/stat`` take a very different format from those in ``/proc/partitions``
-(see proc(5), if your system has it.)
-
--- ricklind@us.ibm.com
diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt
deleted file mode 100644
index 4f18456dd3b1..000000000000
--- a/Documentation/kernel-per-CPU-kthreads.txt
+++ /dev/null
@@ -1,356 +0,0 @@
-==========================================
-Reducing OS jitter due to per-cpu kthreads
-==========================================
-
-This document lists per-CPU kthreads in the Linux kernel and presents
-options to control their OS jitter.  Note that non-per-CPU kthreads are
-not listed here.  To reduce OS jitter from non-per-CPU kthreads, bind
-them to a "housekeeping" CPU dedicated to such work.
-
-References
-==========
-
--	Documentation/IRQ-affinity.txt:  Binding interrupts to sets of CPUs.
-
--	Documentation/admin-guide/cgroup-v1:  Using cgroups to bind tasks to sets of CPUs.
-
--	man taskset:  Using the taskset command to bind tasks to sets
-	of CPUs.
-
--	man sched_setaffinity:  Using the sched_setaffinity() system
-	call to bind tasks to sets of CPUs.
-
--	/sys/devices/system/cpu/cpuN/online:  Control CPU N's hotplug state,
-	writing "0" to offline and "1" to online.
-
--	In order to locate kernel-generated OS jitter on CPU N:
-
-		cd /sys/kernel/debug/tracing
-		echo 1 > max_graph_depth # Increase the "1" for more detail
-		echo function_graph > current_tracer
-		# run workload
-		cat per_cpu/cpuN/trace
-
-kthreads
-========
-
-Name:
-  ehca_comp/%u
-
-Purpose:
-  Periodically process Infiniband-related work.
-
-To reduce its OS jitter, do any of the following:
-
-1.	Don't use eHCA Infiniband hardware, instead choosing hardware
-	that does not require per-CPU kthreads.  This will prevent these
-	kthreads from being created in the first place.  (This will
-	work for most people, as this hardware, though important, is
-	relatively old and is produced in relatively low unit volumes.)
-2.	Do all eHCA-Infiniband-related work on other CPUs, including
-	interrupts.
-3.	Rework the eHCA driver so that its per-CPU kthreads are
-	provisioned only on selected CPUs.
-
-
-Name:
-  irq/%d-%s
-
-Purpose:
-  Handle threaded interrupts.
-
-To reduce its OS jitter, do the following:
-
-1.	Use irq affinity to force the irq threads to execute on
-	some other CPU.
-
-Name:
-  kcmtpd_ctr_%d
-
-Purpose:
-  Handle Bluetooth work.
-
-To reduce its OS jitter, do one of the following:
-
-1.	Don't use Bluetooth, in which case these kthreads won't be
-	created in the first place.
-2.	Use irq affinity to force Bluetooth-related interrupts to
-	occur on some other CPU and furthermore initiate all
-	Bluetooth activity on some other CPU.
-
-Name:
-  ksoftirqd/%u
-
-Purpose:
-  Execute softirq handlers when threaded or when under heavy load.
-
-To reduce its OS jitter, each softirq vector must be handled
-separately as follows:
-
-TIMER_SOFTIRQ
--------------
-
-Do all of the following:
-
-1.	To the extent possible, keep the CPU out of the kernel when it
-	is non-idle, for example, by avoiding system calls and by forcing
-	both kernel threads and interrupts to execute elsewhere.
-2.	Build with CONFIG_HOTPLUG_CPU=y.  After boot completes, force
-	the CPU offline, then bring it back online.  This forces
-	recurring timers to migrate elsewhere.	If you are concerned
-	with multiple CPUs, force them all offline before bringing the
-	first one back online.  Once you have onlined the CPUs in question,
-	do not offline any other CPUs, because doing so could force the
-	timer back onto one of the CPUs in question.
-
-NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
----------------------------------
-
-Do all of the following:
-
-1.	Force networking interrupts onto other CPUs.
-2.	Initiate any network I/O on other CPUs.
-3.	Once your application has started, prevent CPU-hotplug operations
-	from being initiated from tasks that might run on the CPU to
-	be de-jittered.  (It is OK to force this CPU offline and then
-	bring it back online before you start your application.)
-
-BLOCK_SOFTIRQ
--------------
-
-Do all of the following:
-
-1.	Force block-device interrupts onto some other CPU.
-2.	Initiate any block I/O on other CPUs.
-3.	Once your application has started, prevent CPU-hotplug operations
-	from being initiated from tasks that might run on the CPU to
-	be de-jittered.  (It is OK to force this CPU offline and then
-	bring it back online before you start your application.)
-
-IRQ_POLL_SOFTIRQ
-----------------
-
-Do all of the following:
-
-1.	Force block-device interrupts onto some other CPU.
-2.	Initiate any block I/O and block-I/O polling on other CPUs.
-3.	Once your application has started, prevent CPU-hotplug operations
-	from being initiated from tasks that might run on the CPU to
-	be de-jittered.  (It is OK to force this CPU offline and then
-	bring it back online before you start your application.)
-
-TASKLET_SOFTIRQ
----------------
-
-Do one or more of the following:
-
-1.	Avoid use of drivers that use tasklets.  (Such drivers will contain
-	calls to things like tasklet_schedule().)
-2.	Convert all drivers that you must use from tasklets to workqueues.
-3.	Force interrupts for drivers using tasklets onto other CPUs,
-	and also do I/O involving these drivers on other CPUs.
-
-SCHED_SOFTIRQ
--------------
-
-Do all of the following:
-
-1.	Avoid sending scheduler IPIs to the CPU to be de-jittered,
-	for example, ensure that at most one runnable kthread is present
-	on that CPU.  If a thread that expects to run on the de-jittered
-	CPU awakens, the scheduler will send an IPI that can result in
-	a subsequent SCHED_SOFTIRQ.
-2.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
-	is marked as an adaptive-ticks CPU using the "nohz_full="
-	boot parameter.  This reduces the number of scheduler-clock
-	interrupts that the de-jittered CPU receives, minimizing its
-	chances of being selected to do the load balancing work that
-	runs in SCHED_SOFTIRQ context.
-3.	To the extent possible, keep the CPU out of the kernel when it
-	is non-idle, for example, by avoiding system calls and by
-	forcing both kernel threads and interrupts to execute elsewhere.
-	This further reduces the number of scheduler-clock interrupts
-	received by the de-jittered CPU.
-
-HRTIMER_SOFTIRQ
----------------
-
-Do all of the following:
-
-1.	To the extent possible, keep the CPU out of the kernel when it
-	is non-idle.  For example, avoid system calls and force both
-	kernel threads and interrupts to execute elsewhere.
-2.	Build with CONFIG_HOTPLUG_CPU=y.  Once boot completes, force the
-	CPU offline, then bring it back online.  This forces recurring
-	timers to migrate elsewhere.  If you are concerned with multiple
-	CPUs, force them all offline before bringing the first one
-	back online.  Once you have onlined the CPUs in question, do not
-	offline any other CPUs, because doing so could force the timer
-	back onto one of the CPUs in question.
-
-RCU_SOFTIRQ
------------
-
-Do at least one of the following:
-
-1.	Offload callbacks and keep the CPU in either dyntick-idle or
-	adaptive-ticks state by doing all of the following:
-
-	a.	CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
-		de-jittered is marked as an adaptive-ticks CPU using the
-		"nohz_full=" boot parameter.  Bind the rcuo kthreads to
-		housekeeping CPUs, which can tolerate OS jitter.
-	b.	To the extent possible, keep the CPU out of the kernel
-		when it is non-idle, for example, by avoiding system
-		calls and by forcing both kernel threads and interrupts
-		to execute elsewhere.
-
-2.	Enable RCU to do its processing remotely via dyntick-idle by
-	doing all of the following:
-
-	a.	Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
-	b.	Ensure that the CPU goes idle frequently, allowing other
-		CPUs to detect that it has passed through an RCU quiescent
-		state.	If the kernel is built with CONFIG_NO_HZ_FULL=y,
-		userspace execution also allows other CPUs to detect that
-		the CPU in question has passed through a quiescent state.
-	c.	To the extent possible, keep the CPU out of the kernel
-		when it is non-idle, for example, by avoiding system
-		calls and by forcing both kernel threads and interrupts
-		to execute elsewhere.
-
-Name:
-  kworker/%u:%d%s (cpu, id, priority)
-
-Purpose:
-  Execute workqueue requests
-
-To reduce its OS jitter, do any of the following:
-
-1.	Run your workload at a real-time priority, which will allow
-	preempting the kworker daemons.
-2.	A given workqueue can be made visible in the sysfs filesystem
-	by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
-	Such a workqueue can be confined to a given subset of the
-	CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
-	files.	The set of WQ_SYSFS workqueues can be displayed using
-	"ls sys/devices/virtual/workqueue".  That said, the workqueues
-	maintainer would like to caution people against indiscriminately
-	sprinkling WQ_SYSFS across all the workqueues.	The reason for
-	caution is that it is easy to add WQ_SYSFS, but because sysfs is
-	part of the formal user/kernel API, it can be nearly impossible
-	to remove it, even if its addition was a mistake.
-3.	Do any of the following needed to avoid jitter that your
-	application cannot tolerate:
-
-	a.	Build your kernel with CONFIG_SLUB=y rather than
-		CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
-		use of each CPU's workqueues to run its cache_reap()
-		function.
-	b.	Avoid using oprofile, thus avoiding OS jitter from
-		wq_sync_buffer().
-	c.	Limit your CPU frequency so that a CPU-frequency
-		governor is not required, possibly enlisting the aid of
-		special heatsinks or other cooling technologies.  If done
-		correctly, and if you CPU architecture permits, you should
-		be able to build your kernel with CONFIG_CPU_FREQ=n to
-		avoid the CPU-frequency governor periodically running
-		on each CPU, including cs_dbs_timer() and od_dbs_timer().
-
-		WARNING:  Please check your CPU specifications to
-		make sure that this is safe on your particular system.
-	d.	As of v3.18, Christoph Lameter's on-demand vmstat workers
-		commit prevents OS jitter due to vmstat_update() on
-		CONFIG_SMP=y systems.  Before v3.18, is not possible
-		to entirely get rid of the OS jitter, but you can
-		decrease its frequency by writing a large value to
-		/proc/sys/vm/stat_interval.  The default value is HZ,
-		for an interval of one second.	Of course, larger values
-		will make your virtual-memory statistics update more
-		slowly.  Of course, you can also run your workload at
-		a real-time priority, thus preempting vmstat_update(),
-		but if your workload is CPU-bound, this is a bad idea.
-		However, there is an RFC patch from Christoph Lameter
-		(based on an earlier one from Gilad Ben-Yossef) that
-		reduces or even eliminates vmstat overhead for some
-		workloads at https://lkml.org/lkml/2013/9/4/379.
-	e.	Boot with "elevator=noop" to avoid workqueue use by
-		the block layer.
-	f.	If running on high-end powerpc servers, build with
-		CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
-		daemon from running on each CPU every second or so.
-		(This will require editing Kconfig files and will defeat
-		this platform's RAS functionality.)  This avoids jitter
-		due to the rtas_event_scan() function.
-		WARNING:  Please check your CPU specifications to
-		make sure that this is safe on your particular system.
-	g.	If running on Cell Processor, build your kernel with
-		CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
-		spu_gov_work().
-		WARNING:  Please check your CPU specifications to
-		make sure that this is safe on your particular system.
-	h.	If running on PowerMAC, build your kernel with
-		CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
-		avoiding OS jitter from rackmeter_do_timer().
-
-Name:
-  rcuc/%u
-
-Purpose:
-  Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
-
-To reduce its OS jitter, do at least one of the following:
-
-1.	Build the kernel with CONFIG_PREEMPT=n.  This prevents these
-	kthreads from being created in the first place, and also obviates
-	the need for RCU priority boosting.  This approach is feasible
-	for workloads that do not require high degrees of responsiveness.
-2.	Build the kernel with CONFIG_RCU_BOOST=n.  This prevents these
-	kthreads from being created in the first place.  This approach
-	is feasible only if your workload never requires RCU priority
-	boosting, for example, if you ensure frequent idle time on all
-	CPUs that might execute within the kernel.
-3.	Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
-	boot parameter offloading RCU callbacks from all CPUs susceptible
-	to OS jitter.  This approach prevents the rcuc/%u kthreads from
-	having any work to do, so that they are never awakened.
-4.	Ensure that the CPU never enters the kernel, and, in particular,
-	avoid initiating any CPU hotplug operations on this CPU.  This is
-	another way of preventing any callbacks from being queued on the
-	CPU, again preventing the rcuc/%u kthreads from having any work
-	to do.
-
-Name:
-  rcuop/%d and rcuos/%d
-
-Purpose:
-  Offload RCU callbacks from the corresponding CPU.
-
-To reduce its OS jitter, do at least one of the following:
-
-1.	Use affinity, cgroups, or other mechanism to force these kthreads
-	to execute on some other CPU.
-2.	Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
-	kthreads from being created in the first place.  However, please
-	note that this will not eliminate OS jitter, but will instead
-	shift it to RCU_SOFTIRQ.
-
-Name:
-  watchdog/%u
-
-Purpose:
-  Detect software lockups on each CPU.
-
-To reduce its OS jitter, do at least one of the following:
-
-1.	Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
-	kthreads from being created in the first place.
-2.	Boot with "nosoftlockup=0", which will also prevent these kthreads
-	from being created.  Other related watchdog and softlockup boot
-	parameters may be found in Documentation/admin-guide/kernel-parameters.rst
-	and Documentation/watchdog/watchdog-parameters.rst.
-3.	Echo a zero to /proc/sys/kernel/watchdog to disable the
-	watchdog timer.
-4.	Echo a large number of /proc/sys/kernel/watchdog_thresh in
-	order to reduce the frequency of OS jitter due to the watchdog
-	timer down to a level that is acceptable for your workload.
diff --git a/Documentation/ldm.txt b/Documentation/ldm.txt
deleted file mode 100644
index 12c571368e73..000000000000
--- a/Documentation/ldm.txt
+++ /dev/null
@@ -1,121 +0,0 @@
-==========================================
-LDM - Logical Disk Manager (Dynamic Disks)
-==========================================
-
-:Author: Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>.
-:Last Updated: Anton Altaparmakov on 30 March 2007 for Windows Vista.
-
-Overview
---------
-
-Windows 2000, XP, and Vista use a new partitioning scheme.  It is a complete
-replacement for the MSDOS style partitions.  It stores its information in a
-1MiB journalled database at the end of the physical disk.  The size of
-partitions is limited only by disk space.  The maximum number of partitions is
-nearly 2000.
-
-Any partitions created under the LDM are called "Dynamic Disks".  There are no
-longer any primary or extended partitions.  Normal MSDOS style partitions are
-now known as Basic Disks.
-
-If you wish to use Spanned, Striped, Mirrored or RAID 5 Volumes, you must use
-Dynamic Disks.  The journalling allows Windows to make changes to these
-partitions and filesystems without the need to reboot.
-
-Once the LDM driver has divided up the disk, you can use the MD driver to
-assemble any multi-partition volumes, e.g.  Stripes, RAID5.
-
-To prevent legacy applications from repartitioning the disk, the LDM creates a
-dummy MSDOS partition containing one disk-sized partition.  This is what is
-supported with the Linux LDM driver.
-
-A newer approach that has been implemented with Vista is to put LDM on top of a
-GPT label disk.  This is not supported by the Linux LDM driver yet.
-
-
-Example
--------
-
-Below we have a 50MiB disk, divided into seven partitions.
-
-.. note::
-
-   The missing 1MiB at the end of the disk is where the LDM database is
-   stored.
-
-+-------++--------------+---------+-----++--------------+---------+----+
-|Device || Offset Bytes | Sectors | MiB || Size   Bytes | Sectors | MiB|
-+=======++==============+=========+=====++==============+=========+====+
-|hda    ||            0 |       0 |   0 ||     52428800 |  102400 |  50|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda1   ||     51380224 |  100352 |  49 ||      1048576 |    2048 |   1|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda2   ||        16384 |      32 |   0 ||      6979584 |   13632 |   6|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda3   ||      6995968 |   13664 |   6 ||     10485760 |   20480 |  10|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda4   ||     17481728 |   34144 |  16 ||      4194304 |    8192 |   4|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda5   ||     21676032 |   42336 |  20 ||      5242880 |   10240 |   5|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda6   ||     26918912 |   52576 |  25 ||     10485760 |   20480 |  10|
-+-------++--------------+---------+-----++--------------+---------+----+
-|hda7   ||     37404672 |   73056 |  35 ||     13959168 |   27264 |  13|
-+-------++--------------+---------+-----++--------------+---------+----+
-
-The LDM Database may not store the partitions in the order that they appear on
-disk, but the driver will sort them.
-
-When Linux boots, you will see something like::
-
-  hda: 102400 sectors w/32KiB Cache, CHS=50/64/32
-  hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7
-
-
-Compiling LDM Support
----------------------
-
-To enable LDM, choose the following two options: 
-
-  - "Advanced partition selection" CONFIG_PARTITION_ADVANCED
-  - "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION
-
-If you believe the driver isn't working as it should, you can enable the extra
-debugging code.  This will produce a LOT of output.  The option is:
-
-  - "Windows LDM extra logging" CONFIG_LDM_DEBUG
-
-N.B. The partition code cannot be compiled as a module.
-
-As with all the partition code, if the driver doesn't see signs of its type of
-partition, it will pass control to another driver, so there is no harm in
-enabling it.
-
-If you have Dynamic Disks but don't enable the driver, then all you will see
-is a dummy MSDOS partition filling the whole disk.  You won't be able to mount
-any of the volumes on the disk.
-
-
-Booting
--------
-
-If you enable LDM support, then lilo is capable of booting from any of the
-discovered partitions.  However, grub does not understand the LDM partitioning
-and cannot boot from a Dynamic Disk.
-
-
-More Documentation
-------------------
-
-There is an Overview of the LDM together with complete Technical Documentation.
-It is available for download.
-
-  http://www.linux-ntfs.org/
-
-If you have any LDM questions that aren't answered in the documentation, email
-me.
-
-Cheers,
-    FlatCap - Richard Russon
-    ldm@flatcap.org
-
diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt
deleted file mode 100644
index 290840c160af..000000000000
--- a/Documentation/lockup-watchdogs.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-===============================================================
-Softlockup detector and hardlockup detector (aka nmi_watchdog)
-===============================================================
-
-The Linux kernel can act as a watchdog to detect both soft and hard
-lockups.
-
-A 'softlockup' is defined as a bug that causes the kernel to loop in
-kernel mode for more than 20 seconds (see "Implementation" below for
-details), without giving other tasks a chance to run. The current
-stack trace is displayed upon detection and, by default, the system
-will stay locked up. Alternatively, the kernel can be configured to
-panic; a sysctl, "kernel.softlockup_panic", a kernel parameter,
-"softlockup_panic" (see "Documentation/admin-guide/kernel-parameters.rst" for
-details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
-provided for this.
-
-A 'hardlockup' is defined as a bug that causes the CPU to loop in
-kernel mode for more than 10 seconds (see "Implementation" below for
-details), without letting other interrupts have a chance to run.
-Similarly to the softlockup case, the current stack trace is displayed
-upon detection and the system will stay locked up unless the default
-behavior is changed, which can be done through a sysctl,
-'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
-and a kernel parameter, "nmi_watchdog"
-(see "Documentation/admin-guide/kernel-parameters.rst" for details).
-
-The panic option can be used in combination with panic_timeout (this
-timeout is set through the confusingly named "kernel.panic" sysctl),
-to cause the system to reboot automatically after a specified amount
-of time.
-
-Implementation
-==============
-
-The soft and hard lockup detectors are built on top of the hrtimer and
-perf subsystems, respectively. A direct consequence of this is that,
-in principle, they should work in any architecture where these
-subsystems are present.
-
-A periodic hrtimer runs to generate interrupts and kick the watchdog
-task. An NMI perf event is generated every "watchdog_thresh"
-(compile-time initialized to 10 and configurable through sysctl of the
-same name) seconds to check for hardlockups. If any CPU in the system
-does not receive any hrtimer interrupt during that time the
-'hardlockup detector' (the handler for the NMI perf event) will
-generate a kernel warning or call panic, depending on the
-configuration.
-
-The watchdog task is a high priority kernel thread that updates a
-timestamp every time it is scheduled. If that timestamp is not updated
-for 2*watchdog_thresh seconds (the softlockup threshold) the
-'softlockup detector' (coded inside the hrtimer callback function)
-will dump useful debug information to the system log, after which it
-will call panic if it was instructed to do so or resume execution of
-other kernel code.
-
-The period of the hrtimer is 2*watchdog_thresh/5, which means it has
-two or three chances to generate an interrupt before the hardlockup
-detector kicks in.
-
-As explained above, a kernel knob is provided that allows
-administrators to configure the period of the hrtimer and the perf
-event. The right value for a particular environment is a trade-off
-between fast response to lockups and detection overhead.
-
-By default, the watchdog runs on all online cores.  However, on a
-kernel configured with NO_HZ_FULL, by default the watchdog runs only
-on the housekeeping cores, not the cores specified in the "nohz_full"
-boot argument.  If we allowed the watchdog to run by default on
-the "nohz_full" cores, we would have to run timer ticks to activate
-the scheduler, which would prevent the "nohz_full" functionality
-from protecting the user code on those cores from the kernel.
-Of course, disabling it by default on the nohz_full cores means that
-when those cores do enter the kernel, by default we will not be
-able to detect if they lock up.  However, allowing the watchdog
-to continue to run on the housekeeping (non-tickless) cores means
-that we will continue to detect lockups properly on those cores.
-
-In either case, the set of cores excluded from running the watchdog
-may be adjusted via the kernel.watchdog_cpumask sysctl.  For
-nohz_full cores, this may be useful for debugging a case where the
-kernel seems to be hanging on the nohz_full cores.
diff --git a/Documentation/numastat.txt b/Documentation/numastat.txt
deleted file mode 100644
index aaf1667489f8..000000000000
--- a/Documentation/numastat.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-===============================
-Numa policy hit/miss statistics
-===============================
-
-/sys/devices/system/node/node*/numastat
-
-All units are pages. Hugepages have separate counters.
-
-=============== ============================================================
-numa_hit	A process wanted to allocate memory from this node,
-		and succeeded.
-
-numa_miss	A process wanted to allocate memory from another node,
-		but ended up with memory from this node.
-
-numa_foreign	A process wanted to allocate on this node,
-		but ended up with memory from another one.
-
-local_node	A process ran on this node and got memory from it.
-
-other_node	A process ran on this node and got memory from another node.
-
-interleave_hit 	Interleaving wanted to allocate from this node
-		and succeeded.
-=============== ============================================================
-
-For easier reading you can use the numastat utility from the numactl package
-(http://oss.sgi.com/projects/libnuma/). Note that it only works
-well right now on machines with a small number of CPUs.
-
diff --git a/Documentation/pnp.txt b/Documentation/pnp.txt
deleted file mode 100644
index bab2d10631f0..000000000000
--- a/Documentation/pnp.txt
+++ /dev/null
@@ -1,292 +0,0 @@
-=================================
-Linux Plug and Play Documentation
-=================================
-
-:Author: Adam Belay <ambx1@neo.rr.com>
-:Last updated: Oct. 16, 2002
-
-
-Overview
---------
-
-Plug and Play provides a means of detecting and setting resources for legacy or
-otherwise unconfigurable devices.  The Linux Plug and Play Layer provides these 
-services to compatible drivers.
-
-
-The User Interface
-------------------
-
-The Linux Plug and Play user interface provides a means to activate PnP devices
-for legacy and user level drivers that do not support Linux Plug and Play.  The 
-user interface is integrated into sysfs.
-
-In addition to the standard sysfs file the following are created in each
-device's directory:
-- id - displays a list of support EISA IDs
-- options - displays possible resource configurations
-- resources - displays currently allocated resources and allows resource changes
-
-activating a device
-^^^^^^^^^^^^^^^^^^^
-
-::
-
-	# echo "auto" > resources
-
-this will invoke the automatic resource config system to activate the device
-
-manually activating a device
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-::
-
-	# echo "manual <depnum> <mode>" > resources
-
-	<depnum> - the configuration number
-	<mode> - static or dynamic
-		 static = for next boot
-		 dynamic = now
-
-disabling a device
-^^^^^^^^^^^^^^^^^^
-
-::
-
-	# echo "disable" > resources
-
-
-EXAMPLE:
-
-Suppose you need to activate the floppy disk controller.
-
-1. change to the proper directory, in my case it is
-   /driver/bus/pnp/devices/00:0f::
-
-	# cd /driver/bus/pnp/devices/00:0f
-	# cat name
-	PC standard floppy disk controller
-
-2. check if the device is already active::
-
-	# cat resources
-	DISABLED
-
-  - Notice the string "DISABLED".  This means the device is not active.
-
-3. check the device's possible configurations (optional)::
-
-	# cat options
-	Dependent: 01 - Priority acceptable
-	    port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding
-	    port 0x3f7-0x3f7, align 0x0, size 0x1, 16-bit address decoding
-	    irq 6
-	    dma 2 8-bit compatible
-	Dependent: 02 - Priority acceptable
-	    port 0x370-0x370, align 0x7, size 0x6, 16-bit address decoding
-	    port 0x377-0x377, align 0x0, size 0x1, 16-bit address decoding
-	    irq 6
-	    dma 2 8-bit compatible
-
-4. now activate the device::
-
-	# echo "auto" > resources
-
-5. finally check if the device is active::
-
-	# cat resources
-	io 0x3f0-0x3f5
-	io 0x3f7-0x3f7
-	irq 6
-	dma 2
-
-also there are a series of kernel parameters::
-
-	pnp_reserve_irq=irq1[,irq2] ....
-	pnp_reserve_dma=dma1[,dma2] ....
-	pnp_reserve_io=io1,size1[,io2,size2] ....
-	pnp_reserve_mem=mem1,size1[,mem2,size2] ....
-
-
-
-The Unified Plug and Play Layer
--------------------------------
-
-All Plug and Play drivers, protocols, and services meet at a central location
-called the Plug and Play Layer.  This layer is responsible for the exchange of 
-information between PnP drivers and PnP protocols.  Thus it automatically 
-forwards commands to the proper protocol.  This makes writing PnP drivers 
-significantly easier.
-
-The following functions are available from the Plug and Play Layer:
-
-pnp_get_protocol
-  increments the number of uses by one
-
-pnp_put_protocol
-  deincrements the number of uses by one
-
-pnp_register_protocol
-  use this to register a new PnP protocol
-
-pnp_unregister_protocol
-  use this function to remove a PnP protocol from the Plug and Play Layer
-
-pnp_register_driver
-  adds a PnP driver to the Plug and Play Layer
-
-  this includes driver model integration
-  returns zero for success or a negative error number for failure; count
-  calls to the .add() method if you need to know how many devices bind to
-  the driver
-
-pnp_unregister_driver
-  removes a PnP driver from the Plug and Play Layer
-
-
-
-Plug and Play Protocols
------------------------
-
-This section contains information for PnP protocol developers.
-
-The following Protocols are currently available in the computing world:
-
-- PNPBIOS:
-    used for system devices such as serial and parallel ports.
-- ISAPNP:
-    provides PnP support for the ISA bus
-- ACPI:
-    among its many uses, ACPI provides information about system level
-    devices.
-
-It is meant to replace the PNPBIOS.  It is not currently supported by Linux
-Plug and Play but it is planned to be in the near future.
-
-
-Requirements for a Linux PnP protocol:
-1. the protocol must use EISA IDs
-2. the protocol must inform the PnP Layer of a device's current configuration
-
-- the ability to set resources is optional but preferred.
-
-The following are PnP protocol related functions:
-
-pnp_add_device
-  use this function to add a PnP device to the PnP layer
-
-  only call this function when all wanted values are set in the pnp_dev
-  structure
-
-pnp_init_device
-  call this to initialize the PnP structure
-
-pnp_remove_device
-  call this to remove a device from the Plug and Play Layer.
-  it will fail if the device is still in use.
-  automatically will free mem used by the device and related structures
-
-pnp_add_id
-  adds an EISA ID to the list of supported IDs for the specified device
-
-For more information consult the source of a protocol such as
-/drivers/pnp/pnpbios/core.c.
-
-
-
-Linux Plug and Play Drivers
----------------------------
-
-This section contains information for Linux PnP driver developers.
-
-The New Way
-^^^^^^^^^^^
-
-1. first make a list of supported EISA IDS
-
-   ex::
-
-	static const struct pnp_id pnp_dev_table[] = {
-		/* Standard LPT Printer Port */
-		{.id = "PNP0400", .driver_data = 0},
-		/* ECP Printer Port */
-		{.id = "PNP0401", .driver_data = 0},
-		{.id = ""}
-	};
-
-   Please note that the character 'X' can be used as a wild card in the function
-   portion (last four characters).
-
-   ex::
-
-	/* Unknown PnP modems */
-	{	"PNPCXXX",		UNKNOWN_DEV	},
-
-   Supported PnP card IDs can optionally be defined.
-   ex::
-
-	static const struct pnp_id pnp_card_table[] = {
-		{	"ANYDEVS",		0	},
-		{	"",			0	}
-	};
-
-2. Optionally define probe and remove functions.  It may make sense not to
-   define these functions if the driver already has a reliable method of detecting
-   the resources, such as the parport_pc driver.
-
-   ex::
-
-	static int
-	serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const
-			struct pnp_id *dev_id)
-	{
-	. . .
-
-   ex::
-
-	static void serial_pnp_remove(struct pnp_dev * dev)
-	{
-	. . .
-
-   consult /drivers/serial/8250_pnp.c for more information.
-
-3. create a driver structure
-
-   ex::
-
-	static struct pnp_driver serial_pnp_driver = {
-		.name		= "serial",
-		.card_id_table	= pnp_card_table,
-		.id_table	= pnp_dev_table,
-		.probe		= serial_pnp_probe,
-		.remove		= serial_pnp_remove,
-	};
-
-   * name and id_table cannot be NULL.
-
-4. register the driver
-
-   ex::
-
-	static int __init serial8250_pnp_init(void)
-	{
-		return pnp_register_driver(&serial_pnp_driver);
-	}
-
-The Old Way
-^^^^^^^^^^^
-
-A series of compatibility functions have been created to make it easy to convert
-ISAPNP drivers.  They should serve as a temporary solution only.
-
-They are as follows::
-
-	struct pnp_card *pnp_find_card(unsigned short vendor,
-				       unsigned short device,
-				       struct pnp_card *from)
-
-	struct pnp_dev *pnp_find_dev(struct pnp_card *card,
-				     unsigned short vendor,
-				     unsigned short function,
-				     struct pnp_dev *from)
-
diff --git a/Documentation/rtc.txt b/Documentation/rtc.txt
deleted file mode 100644
index 688c95b11919..000000000000
--- a/Documentation/rtc.txt
+++ /dev/null
@@ -1,140 +0,0 @@
-=======================================
-Real Time Clock (RTC) Drivers for Linux
-=======================================
-
-When Linux developers talk about a "Real Time Clock", they usually mean
-something that tracks wall clock time and is battery backed so that it
-works even with system power off.  Such clocks will normally not track
-the local time zone or daylight savings time -- unless they dual boot
-with MS-Windows -- but will instead be set to Coordinated Universal Time
-(UTC, formerly "Greenwich Mean Time").
-
-The newest non-PC hardware tends to just count seconds, like the time(2)
-system call reports, but RTCs also very commonly represent time using
-the Gregorian calendar and 24 hour time, as reported by gmtime(3).
-
-Linux has two largely-compatible userspace RTC API families you may
-need to know about:
-
-    *	/dev/rtc ... is the RTC provided by PC compatible systems,
-	so it's not very portable to non-x86 systems.
-
-    *	/dev/rtc0, /dev/rtc1 ... are part of a framework that's
-	supported by a wide variety of RTC chips on all systems.
-
-Programmers need to understand that the PC/AT functionality is not
-always available, and some systems can do much more.  That is, the
-RTCs use the same API to make requests in both RTC frameworks (using
-different filenames of course), but the hardware may not offer the
-same functionality.  For example, not every RTC is hooked up to an
-IRQ, so they can't all issue alarms; and where standard PC RTCs can
-only issue an alarm up to 24 hours in the future, other hardware may
-be able to schedule one any time in the upcoming century.
-
-
-Old PC/AT-Compatible driver:  /dev/rtc
---------------------------------------
-
-All PCs (even Alpha machines) have a Real Time Clock built into them.
-Usually they are built into the chipset of the computer, but some may
-actually have a Motorola MC146818 (or clone) on the board. This is the
-clock that keeps the date and time while your computer is turned off.
-
-ACPI has standardized that MC146818 functionality, and extended it in
-a few ways (enabling longer alarm periods, and wake-from-hibernate).
-That functionality is NOT exposed in the old driver.
-
-However it can also be used to generate signals from a slow 2Hz to a
-relatively fast 8192Hz, in increments of powers of two. These signals
-are reported by interrupt number 8. (Oh! So *that* is what IRQ 8 is
-for...) It can also function as a 24hr alarm, raising IRQ 8 when the
-alarm goes off. The alarm can also be programmed to only check any
-subset of the three programmable values, meaning that it could be set to
-ring on the 30th second of the 30th minute of every hour, for example.
-The clock can also be set to generate an interrupt upon every clock
-update, thus generating a 1Hz signal.
-
-The interrupts are reported via /dev/rtc (major 10, minor 135, read only
-character device) in the form of an unsigned long. The low byte contains
-the type of interrupt (update-done, alarm-rang, or periodic) that was
-raised, and the remaining bytes contain the number of interrupts since
-the last read.  Status information is reported through the pseudo-file
-/proc/driver/rtc if the /proc filesystem was enabled.  The driver has
-built in locking so that only one process is allowed to have the /dev/rtc
-interface open at a time.
-
-A user process can monitor these interrupts by doing a read(2) or a
-select(2) on /dev/rtc -- either will block/stop the user process until
-the next interrupt is received. This is useful for things like
-reasonably high frequency data acquisition where one doesn't want to
-burn up 100% CPU by polling gettimeofday etc. etc.
-
-At high frequencies, or under high loads, the user process should check
-the number of interrupts received since the last read to determine if
-there has been any interrupt "pileup" so to speak. Just for reference, a
-typical 486-33 running a tight read loop on /dev/rtc will start to suffer
-occasional interrupt pileup (i.e. > 1 IRQ event since last read) for
-frequencies above 1024Hz. So you really should check the high bytes
-of the value you read, especially at frequencies above that of the
-normal timer interrupt, which is 100Hz.
-
-Programming and/or enabling interrupt frequencies greater than 64Hz is
-only allowed by root. This is perhaps a bit conservative, but we don't want
-an evil user generating lots of IRQs on a slow 386sx-16, where it might have
-a negative impact on performance. This 64Hz limit can be changed by writing
-a different value to /proc/sys/dev/rtc/max-user-freq. Note that the
-interrupt handler is only a few lines of code to minimize any possibility
-of this effect.
-
-Also, if the kernel time is synchronized with an external source, the 
-kernel will write the time back to the CMOS clock every 11 minutes. In 
-the process of doing this, the kernel briefly turns off RTC periodic 
-interrupts, so be aware of this if you are doing serious work. If you
-don't synchronize the kernel time with an external source (via ntp or
-whatever) then the kernel will keep its hands off the RTC, allowing you
-exclusive access to the device for your applications.
-
-The alarm and/or interrupt frequency are programmed into the RTC via
-various ioctl(2) calls as listed in ./include/linux/rtc.h
-Rather than write 50 pages describing the ioctl() and so on, it is
-perhaps more useful to include a small test program that demonstrates
-how to use them, and demonstrates the features of the driver. This is
-probably a lot more useful to people interested in writing applications
-that will be using this driver.  See the code at the end of this document.
-
-(The original /dev/rtc driver was written by Paul Gortmaker.)
-
-
-New portable "RTC Class" drivers:  /dev/rtcN
---------------------------------------------
-
-Because Linux supports many non-ACPI and non-PC platforms, some of which
-have more than one RTC style clock, it needed a more portable solution
-than expecting a single battery-backed MC146818 clone on every system.
-Accordingly, a new "RTC Class" framework has been defined.  It offers
-three different userspace interfaces:
-
-    *	/dev/rtcN ... much the same as the older /dev/rtc interface
-
-    *	/sys/class/rtc/rtcN ... sysfs attributes support readonly
-	access to some RTC attributes.
-
-    *	/proc/driver/rtc ... the system clock RTC may expose itself
-	using a procfs interface. If there is no RTC for the system clock,
-	rtc0 is used by default. More information is (currently) shown
-	here than through sysfs.
-
-The RTC Class framework supports a wide variety of RTCs, ranging from those
-integrated into embeddable system-on-chip (SOC) processors to discrete chips
-using I2C, SPI, or some other bus to communicate with the host CPU.  There's
-even support for PC-style RTCs ... including the features exposed on newer PCs
-through ACPI.
-
-The new framework also removes the "one RTC per system" restriction.  For
-example, maybe the low-power battery-backed RTC is a discrete I2C chip, but
-a high functionality RTC is integrated into the SOC.  That system might read
-the system clock from the discrete RTC, but use the integrated one for all
-other tasks, because of its greater functionality.
-
-Check out tools/testing/selftests/rtc/rtctest.c for an example usage of the
-ioctl interface.
diff --git a/Documentation/svga.txt b/Documentation/svga.txt
deleted file mode 100644
index b6c2f9acca92..000000000000
--- a/Documentation/svga.txt
+++ /dev/null
@@ -1,249 +0,0 @@
-.. include:: <isonum.txt>
-
-=================================
-Video Mode Selection Support 2.13
-=================================
-
-:Copyright: |copy| 1995--1999 Martin Mares, <mj@ucw.cz>
-
-Intro
-~~~~~
-
-This small document describes the "Video Mode Selection" feature which
-allows the use of various special video modes supported by the video BIOS. Due
-to usage of the BIOS, the selection is limited to boot time (before the
-kernel decompression starts) and works only on 80X86 machines.
-
-.. note::
-
-   Short intro for the impatient: Just use vga=ask for the first time,
-   enter ``scan`` on the video mode prompt, pick the mode you want to use,
-   remember its mode ID (the four-digit hexadecimal number) and then
-   set the vga parameter to this number (converted to decimal first).
-
-The video mode to be used is selected by a kernel parameter which can be
-specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
-option of LILO (or some other boot loader you use) or by the "vidmode" utility
-(present in standard Linux utility packages). You can use the following values
-of this parameter::
-
-   NORMAL_VGA - Standard 80x25 mode available on all display adapters.
-
-   EXTENDED_VGA	- Standard 8-pixel font mode: 80x43 on EGA, 80x50 on VGA.
-
-   ASK_VGA - Display a video mode menu upon startup (see below).
-
-   0..35 - Menu item number (when you have used the menu to view the list of
-      modes available on your adapter, you can specify the menu item you want
-      to use). 0..9 correspond to "0".."9", 10..35 to "a".."z". Warning: the
-      mode list displayed may vary as the kernel version changes, because the
-      modes are listed in a "first detected -- first displayed" manner. It's
-      better to use absolute mode numbers instead.
-
-   0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
-      for exact meaning of the ID). Warning: rdev and LILO don't support
-      hexadecimal numbers -- you have to convert it to decimal manually.
-
-Menu
-~~~~
-
-The ASK_VGA mode causes the kernel to offer a video mode menu upon
-bootup. It displays a "Press <RETURN> to see video modes available, <SPACE>
-to continue or wait 30 secs" message. If you press <RETURN>, you enter the
-menu, if you press <SPACE> or wait 30 seconds, the kernel will boot up in
-the standard 80x25 mode.
-
-The menu looks like::
-
-	Video adapter: <name-of-detected-video-adapter>
-	Mode:    COLSxROWS:
-	0  0F00  80x25
-	1  0F01  80x50
-	2  0F02  80x43
-	3  0F03  80x26
-	....
-	Enter mode number or ``scan``: <flashing-cursor-here>
-
-<name-of-detected-video-adapter> tells what video adapter did Linux detect
--- it's either a generic adapter name (MDA, CGA, HGC, EGA, VGA, VESA VGA [a VGA
-with VESA-compliant BIOS]) or a chipset name (e.g., Trident). Direct detection
-of chipsets is turned off by default as it's inherently unreliable due to
-absolutely insane PC design.
-
-"0  0F00  80x25" means that the first menu item (the menu items are numbered
-from "0" to "9" and from "a" to "z") is a 80x25 mode with ID=0x0f00 (see the
-next section for a description of mode IDs).
-
-<flashing-cursor-here> encourages you to enter the item number or mode ID
-you wish to set and press <RETURN>. If the computer complains something about
-"Unknown mode ID", it is trying to tell you that it isn't possible to set such
-a mode. It's also possible to press only <RETURN> which leaves the current mode.
-
-The mode list usually contains a few basic modes and some VESA modes.  In
-case your chipset has been detected, some chipset-specific modes are shown as
-well (some of these might be missing or unusable on your machine as different
-BIOSes are often shipped with the same card and the mode numbers depend purely
-on the VGA BIOS).
-
-The modes displayed on the menu are partially sorted: The list starts with
-the standard modes (80x25 and 80x50) followed by "special" modes (80x28 and
-80x43), local modes (if the local modes feature is enabled), VESA modes and
-finally SVGA modes for the auto-detected adapter.
-
-If you are not happy with the mode list offered (e.g., if you think your card
-is able to do more), you can enter "scan" instead of item number / mode ID.  The
-program will try to ask the BIOS for all possible video mode numbers and test
-what happens then. The screen will be probably flashing wildly for some time and
-strange noises will be heard from inside the monitor and so on and then, really
-all consistent video modes supported by your BIOS will appear (plus maybe some
-``ghost modes``). If you are afraid this could damage your monitor, don't use
-this function.
-
-After scanning, the mode ordering is a bit different: the auto-detected SVGA
-modes are not listed at all and the modes revealed by ``scan`` are shown before
-all VESA modes.
-
-Mode IDs
-~~~~~~~~
-
-Because of the complexity of all the video stuff, the video mode IDs
-used here are also a bit complex. A video mode ID is a 16-bit number usually
-expressed in a hexadecimal notation (starting with "0x"). You can set a mode
-by entering its mode directly if you know it even if it isn't shown on the menu.
-
-The ID numbers can be divided to those regions::
-
-   0x0000 to 0x00ff - menu item references. 0x0000 is the first item. Don't use
-	outside the menu as this can change from boot to boot (especially if you
-	have used the ``scan`` feature).
-
-   0x0100 to 0x017f - standard BIOS modes. The ID is a BIOS video mode number
-	(as presented to INT 10, function 00) increased by 0x0100.
-
-   0x0200 to 0x08ff - VESA BIOS modes. The ID is a VESA mode ID increased by
-	0x0100. All VESA modes should be autodetected and shown on the menu.
-
-   0x0900 to 0x09ff - Video7 special modes. Set by calling INT 0x10, AX=0x6f05.
-	(Usually 940=80x43, 941=132x25, 942=132x44, 943=80x60, 944=100x60,
-	945=132x28 for the standard Video7 BIOS)
-
-   0x0f00 to 0x0fff - special modes (they are set by various tricks -- usually
-	by modifying one of the standard modes). Currently available:
-	0x0f00	standard 80x25, don't reset mode if already set (=FFFF)
-	0x0f01	standard with 8-point font: 80x43 on EGA, 80x50 on VGA
-	0x0f02	VGA 80x43 (VGA switched to 350 scanlines with a 8-point font)
-	0x0f03	VGA 80x28 (standard VGA scans, but 14-point font)
-	0x0f04	leave current video mode
-	0x0f05	VGA 80x30 (480 scans, 16-point font)
-	0x0f06	VGA 80x34 (480 scans, 14-point font)
-	0x0f07	VGA 80x60 (480 scans, 8-point font)
-	0x0f08	Graphics hack (see the VIDEO_GFX_HACK paragraph below)
-
-   0x1000 to 0x7fff - modes specified by resolution. The code has a "0xRRCC"
-	form where RR is a number of rows and CC is a number of columns.
-	E.g., 0x1950 corresponds to a 80x25 mode, 0x2b84 to 132x43 etc.
-	This is the only fully portable way to refer to a non-standard mode,
-	but it relies on the mode being found and displayed on the menu
-	(remember that mode scanning is not done automatically).
-
-   0xff00 to 0xffff - aliases for backward compatibility:
-	0xffff	equivalent to 0x0f00 (standard 80x25)
-	0xfffe	equivalent to 0x0f01 (EGA 80x43 or VGA 80x50)
-
-If you add 0x8000 to the mode ID, the program will try to recalculate
-vertical display timing according to mode parameters, which can be used to
-eliminate some annoying bugs of certain VGA BIOSes (usually those used for
-cards with S3 chipsets and old Cirrus Logic BIOSes) -- mainly extra lines at the
-end of the display.
-
-Options
-~~~~~~~
-
-Build options for arch/x86/boot/* are selected by the kernel kconfig
-utility and the kernel .config file.
-
-VIDEO_GFX_HACK - includes special hack for setting of graphics modes
-to be used later by special drivers.
-Allows to set _any_ BIOS mode including graphic ones and forcing specific
-text screen resolution instead of peeking it from BIOS variables. Don't use
-unless you think you know what you're doing. To activate this setup, use
-mode number 0x0f08 (see the Mode IDs section above).
-
-Still doesn't work?
-~~~~~~~~~~~~~~~~~~~
-
-When the mode detection doesn't work (e.g., the mode list is incorrect or
-the machine hangs instead of displaying the menu), try to switch off some of
-the configuration options listed under "Options". If it fails, you can still use
-your kernel with the video mode set directly via the kernel parameter.
-
-In either case, please send me a bug report containing what _exactly_
-happens and how do the configuration switches affect the behaviour of the bug.
-
-If you start Linux from M$-DOS, you might also use some DOS tools for
-video mode setting. In this case, you must specify the 0x0f04 mode ("leave
-current settings") to Linux, because if you don't and you use any non-standard
-mode, Linux will switch to 80x25 automatically.
-
-If you set some extended mode and there's one or more extra lines on the
-bottom of the display containing already scrolled-out text, your VGA BIOS
-contains the most common video BIOS bug called "incorrect vertical display
-end setting". Adding 0x8000 to the mode ID might fix the problem. Unfortunately,
-this must be done manually -- no autodetection mechanisms are available.
-
-History
-~~~~~~~
-
-=============== ================================================================
-1.0 (??-Nov-95)	First version supporting all adapters supported by the old
-		setup.S + Cirrus Logic 54XX. Present in some 1.3.4? kernels
-		and then removed due to instability on some machines.
-2.0 (28-Jan-96)	Rewritten from scratch. Cirrus Logic 64XX support added, almost
-		everything is configurable, the VESA support should be much more
-		stable, explicit mode numbering allowed, "scan" implemented etc.
-2.1 (30-Jan-96) VESA modes moved to 0x200-0x3ff. Mode selection by resolution
-		supported. Few bugs fixed. VESA modes are listed prior to
-		modes supplied by SVGA autodetection as they are more reliable.
-		CLGD autodetect works better. Doesn't depend on 80x25 being
-		active when started. Scanning fixed. 80x43 (any VGA) added.
-		Code cleaned up.
-2.2 (01-Feb-96)	EGA 80x43 fixed. VESA extended to 0x200-0x4ff (non-standard 02XX
-		VESA modes work now). Display end bug workaround supported.
-		Special modes renumbered to allow adding of the "recalculate"
-		flag, 0xffff and 0xfffe became aliases instead of real IDs.
-		Screen contents retained during mode changes.
-2.3 (15-Mar-96)	Changed to work with 1.3.74 kernel.
-2.4 (18-Mar-96)	Added patches by Hans Lermen fixing a memory overwrite problem
-		with some boot loaders. Memory management rewritten to reflect
-		these changes. Unfortunately, screen contents retaining works
-		only with some loaders now.
-		Added a Tseng 132x60 mode.
-2.5 (19-Mar-96)	Fixed a VESA mode scanning bug introduced in 2.4.
-2.6 (25-Mar-96)	Some VESA BIOS errors not reported -- it fixes error reports on
-		several cards with broken VESA code (e.g., ATI VGA).
-2.7 (09-Apr-96)	- Accepted all VESA modes in range 0x100 to 0x7ff, because some
-		  cards use very strange mode numbers.
-		- Added Realtek VGA modes (thanks to Gonzalo Tornaria).
-		- Hardware testing order slightly changed, tests based on ROM
-		  contents done as first.
-		- Added support for special Video7 mode switching functions
-		  (thanks to Tom Vander Aa).
-		- Added 480-scanline modes (especially useful for notebooks,
-		  original version written by hhanemaa@cs.ruu.nl, patched by
-		  Jeff Chua, rewritten by me).
-		- Screen store/restore fixed.
-2.8 (14-Apr-96) - Previous release was not compilable without CONFIG_VIDEO_SVGA.
-		- Better recognition of text modes during mode scan.
-2.9 (12-May-96)	- Ignored VESA modes 0x80 - 0xff (more VESA BIOS bugs!)
-2.10(11-Nov-96) - The whole thing made optional.
-		- Added the CONFIG_VIDEO_400_HACK switch.
-		- Added the CONFIG_VIDEO_GFX_HACK switch.
-		- Code cleanup.
-2.11(03-May-97) - Yet another cleanup, now including also the documentation.
-		- Direct testing of SVGA adapters turned off by default, ``scan``
-		  offered explicitly on the prompt line.
-		- Removed the doc section describing adding of new probing
-		  functions as I try to get rid of _all_ hardware probing here.
-2.12(25-May-98) Added support for VESA frame buffer graphics.
-2.13(14-May-99) Minor documentation fixes.
-=============== ================================================================
diff --git a/Documentation/video-output.txt b/Documentation/video-output.txt
deleted file mode 100644
index 56d6fa2e2368..000000000000
--- a/Documentation/video-output.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-Video Output Switcher Control
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-2006 luming.yu@intel.com
-
-The output sysfs class driver provides an abstract video output layer that
-can be used to hook platform specific methods to enable/disable video output
-device through common sysfs interface. For example, on my IBM ThinkPad T42
-laptop, The ACPI video driver registered its output devices and read/write
-method for 'state' with output sysfs class. The user interface under sysfs is::
-
-  linux:/sys/class/video_output # tree .
-  .
-  |-- CRT0
-  |   |-- device -> ../../../devices/pci0000:00/0000:00:01.0
-  |   |-- state
-  |   |-- subsystem -> ../../../class/video_output
-  |   `-- uevent
-  |-- DVI0
-  |   |-- device -> ../../../devices/pci0000:00/0000:00:01.0
-  |   |-- state
-  |   |-- subsystem -> ../../../class/video_output
-  |   `-- uevent
-  |-- LCD0
-  |   |-- device -> ../../../devices/pci0000:00/0000:00:01.0
-  |   |-- state
-  |   |-- subsystem -> ../../../class/video_output
-  |   `-- uevent
-  `-- TV0
-     |-- device -> ../../../devices/pci0000:00/0000:00:01.0
-     |-- state
-     |-- subsystem -> ../../../class/video_output
-     `-- uevent
-
diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst
index 8e9704f61017..e29739904e37 100644
--- a/Documentation/x86/topology.rst
+++ b/Documentation/x86/topology.rst
@@ -9,7 +9,7 @@ representation in the kernel. Update/change when doing changes to the
 respective code.
 
 The architecture-agnostic topology definitions are in
-Documentation/cputopology.txt. This file holds x86-specific
+Documentation/admin-guide/cputopology.rst. This file holds x86-specific
 differences/specialities which must not necessarily apply to the generic
 definitions. Thus, the way to read up on Linux topology on x86 is to start
 with the generic one and look at this one in parallel for the x86 specifics.
diff --git a/MAINTAINERS b/MAINTAINERS
index c1593a668f80..570572627fd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6080,7 +6080,7 @@ M:	Ard Biesheuvel <ard.biesheuvel@linaro.org>
 L:	linux-efi@vger.kernel.org
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
 S:	Maintained
-F:	Documentation/efi-stub.txt
+F:	Documentation/admin-guide/efi-stub.rst
 F:	arch/*/kernel/efi.c
 F:	arch/x86/boot/compressed/eboot.[ch]
 F:	arch/*/include/asm/efi.h
@@ -7088,7 +7088,7 @@ M:	Herbert Xu <herbert@gondor.apana.org.au>
 L:	linux-crypto@vger.kernel.org
 S:	Odd fixes
 F:	Documentation/devicetree/bindings/rng/
-F:	Documentation/hw_random.txt
+F:	Documentation/admin-guide/hw_random.rst
 F:	drivers/char/hw_random/
 F:	include/linux/hw_random.h
 
@@ -9398,7 +9398,7 @@ M:	"Richard Russon (FlatCap)" <ldm@flatcap.org>
 L:	linux-ntfs-dev@lists.sourceforge.net
 W:	http://www.linux-ntfs.org/content/view/19/37/
 S:	Maintained
-F:	Documentation/ldm.txt
+F:	Documentation/admin-guide/ldm.rst
 F:	block/partitions/ldm.*
 
 LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI)
@@ -12058,7 +12058,7 @@ PARALLEL LCD/KEYPAD PANEL DRIVER
 M:	Willy Tarreau <willy@haproxy.com>
 M:	Ksenija Stanojevic <ksenija.stanojevic@gmail.com>
 S:	Odd Fixes
-F:	Documentation/auxdisplay/lcd-panel-cgram.rst
+F:	Documentation/admin-guide/lcd-panel-cgram.rst
 F:	drivers/auxdisplay/panel.c
 
 PARALLEL PORT SUBSYSTEM
@@ -13476,7 +13476,7 @@ Q:	http://patchwork.ozlabs.org/project/rtc-linux/list/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux.git
 S:	Maintained
 F:	Documentation/devicetree/bindings/rtc/
-F:	Documentation/rtc.txt
+F:	Documentation/admin-guide/rtc.rst
 F:	drivers/rtc/
 F:	include/linux/rtc.h
 F:	include/uapi/linux/rtc.h
@@ -15306,7 +15306,7 @@ SVGA HANDLING
 M:	Martin Mares <mj@ucw.cz>
 L:	linux-video@atrey.karlin.mff.cuni.cz
 S:	Maintained
-F:	Documentation/svga.txt
+F:	Documentation/admin-guide/svga.rst
 F:	arch/x86/boot/video*
 
 SWIOTLB SUBSYSTEM
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 20afd6077465..600c5ba1af41 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1297,7 +1297,7 @@ config SMP
 	  will run faster if you say N here.
 
 	  See also <file:Documentation/x86/i386/IO-APIC.rst>,
-	  <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO available at
+	  <file:Documentation/admin-guide/lockup-watchdogs.rst> and the SMP-HOWTO available at
 	  <http://tldp.org/HOWTO/SMP-HOWTO.html>.
 
 	  If you don't know what to do here, say N.
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 42875ff15671..6d732e451071 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -277,7 +277,7 @@ config SMP
 	  machines, but will use only one CPU of a multiprocessor machine.
 	  On a uniprocessor machine, the kernel will run faster if you say N.
 
-	  See also <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO
+	  See also <file:Documentation/admin-guide/lockup-watchdogs.rst> and the SMP-HOWTO
 	  available at <http://www.tldp.org/docs.html#howto>.
 
 	  If you don't know what to do here, say N.
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index c2858ac6a46a..6b1b5941b618 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -679,7 +679,7 @@ config SMP
 	  People using multiprocessor machines who say Y here should also say
 	  Y to "Enhanced Real Time Clock Support", below.
 
-	  See also <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO
+	  See also <file:Documentation/admin-guide/lockup-watchdogs.rst> and the SMP-HOWTO
 	  available at <http://www.tldp.org/docs.html#howto>.
 
 	  If you don't know what to do here, say N.
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index e9f5d62e9817..7926a2e11bdc 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -180,7 +180,7 @@ config SMP
 	  Y to "Enhanced Real Time Clock Support", below. The "Advanced Power
 	  Management" code will be disabled if you say Y here.
 
-	  See also <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO
+	  See also <file:Documentation/admin-guide/lockup-watchdogs.rst> and the SMP-HOWTO
 	  available at <http://www.tldp.org/docs.html#howto>.
 
 	  If you don't know what to do here, say N.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9505066b7ba3..9e95af666b33 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -402,7 +402,7 @@ config SMP
 	  Management" code will be disabled if you say Y here.
 
 	  See also <file:Documentation/x86/i386/IO-APIC.rst>,
-	  <file:Documentation/lockup-watchdogs.txt> and the SMP-HOWTO available at
+	  <file:Documentation/admin-guide/lockup-watchdogs.rst> and the SMP-HOWTO available at
 	  <http://www.tldp.org/docs.html#howto>.
 
 	  If you don't know what to do here, say N.
@@ -1959,7 +1959,7 @@ config EFI_STUB
           This kernel feature allows a bzImage to be loaded directly
 	  by EFI firmware without the use of a bootloader.
 
-	  See Documentation/efi-stub.txt for more information.
+	  See Documentation/admin-guide/efi-stub.rst for more information.
 
 config EFI_MIXED
 	bool "EFI mixed-mode support"
diff --git a/block/partitions/Kconfig b/block/partitions/Kconfig
index 37b9710cc80a..702689a628f0 100644
--- a/block/partitions/Kconfig
+++ b/block/partitions/Kconfig
@@ -194,7 +194,7 @@ config LDM_PARTITION
 	  Normal partitions are now called Basic Disks under Windows 2000, XP,
 	  and Vista.
 
-	  For a fuller description read <file:Documentation/ldm.txt>.
+	  For a fuller description read <file:Documentation/admin-guide/ldm.rst>.
 
 	  If unsure, say N.
 
diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 442403abd73a..3e866885a405 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -291,7 +291,7 @@ config RTC
 	  and set the RTC in an SMP compatible fashion.
 
 	  If you think you have a use for such a device (such as periodic data
-	  sampling), then say Y here, and read <file:Documentation/rtc.txt>
+	  sampling), then say Y here, and read <file:Documentation/admin-guide/rtc.rst>
 	  for details.
 
 	  To compile this driver as a module, choose M here: the
@@ -313,7 +313,7 @@ config JS_RTC
 	  /dev/rtc.
 
 	  If you think you have a use for such a device (such as periodic data
-	  sampling), then say Y here, and read <file:Documentation/rtc.txt>
+	  sampling), then say Y here, and read <file:Documentation/admin-guide/rtc.rst>
 	  for details.
 
 	  To compile this driver as a module, choose M here: the
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 95be7228f327..9044d31ab1a1 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -4,7 +4,7 @@
  * Copyright 2006 Michael Buesch <m@bues.ch>
  * Copyright 2005 (c) MontaVista Software, Inc.
  *
- * Please read Documentation/hw_random.txt for details on use.
+ * Please read Documentation/admin-guide/hw_random.rst for details on use.
  *
  * This software may be used and distributed according to the terms
  * of the GNU General Public License, incorporated herein by reference.
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index c0b93e0ff0c0..8e6dd908da21 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -1,7 +1,7 @@
 /*
 	Hardware Random Number Generator
 
-	Please read Documentation/hw_random.txt for details on use.
+	Please read Documentation/admin-guide/hw_random.rst for details on use.
 
 	----------------------------------------------------------
 	This software may be used and distributed according to the terms
-- 
cgit v1.2.3-55-g7522


From baa293e9544bea71361950d071579f0e4d5713ed Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 27 Jun 2019 15:39:22 -0300
Subject: docs: driver-api: add a series of orphaned documents

There are lots of documents under Documentation/*.txt and a few other
orphan documents elsehwere that belong to the driver-API book.

Move them to their right place.

Reviewed-by: Cornelia Huck <cohuck@redhat.com> # vfio-related parts
Acked-by: Logan Gunthorpe <logang@deltatee.com> # switchtec
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/ABI/removed/sysfs-class-rfkill       |    2 +-
 Documentation/ABI/stable/sysfs-class-rfkill        |    2 +-
 Documentation/ABI/testing/sysfs-class-switchtec    |    2 +-
 Documentation/EDID/howto.rst                       |   58 -
 Documentation/SM501.txt                            |   74 -
 Documentation/admin-guide/kernel-parameters.txt    |    2 +-
 .../admin-guide/laptops/thinkpad-acpi.rst          |    6 +-
 Documentation/bt8xxgpio.txt                        |   62 -
 Documentation/connector/connector.rst              |  156 --
 Documentation/console/console.rst                  |  152 --
 Documentation/dcdbas.txt                           |   99 --
 Documentation/dell_rbu.txt                         |  128 --
 Documentation/driver-api/bt8xxgpio.rst             |   62 +
 Documentation/driver-api/connector.rst             |  156 ++
 Documentation/driver-api/console.rst               |  152 ++
 Documentation/driver-api/dcdbas.rst                |   99 ++
 Documentation/driver-api/dell_rbu.rst              |  128 ++
 Documentation/driver-api/edid.rst                  |   58 +
 Documentation/driver-api/eisa.rst                  |  230 +++
 Documentation/driver-api/index.rst                 |   26 +
 Documentation/driver-api/isa.rst                   |  122 ++
 Documentation/driver-api/isapnp.rst                |   15 +
 Documentation/driver-api/lightnvm-pblk.rst         |   21 +
 Documentation/driver-api/men-chameleon-bus.rst     |  175 ++
 Documentation/driver-api/ntb.rst                   |  236 +++
 Documentation/driver-api/nvmem.rst                 |  189 ++
 Documentation/driver-api/parport-lowlevel.rst      | 1832 ++++++++++++++++++++
 Documentation/driver-api/pti_intel_mid.rst         |  106 ++
 Documentation/driver-api/pwm.rst                   |  165 ++
 Documentation/driver-api/rfkill.rst                |  132 ++
 Documentation/driver-api/sgi-ioc4.rst              |   49 +
 Documentation/driver-api/sm501.rst                 |   74 +
 Documentation/driver-api/smsc_ece1099.rst          |   60 +
 Documentation/driver-api/switchtec.rst             |  102 ++
 Documentation/driver-api/sync_file.rst             |   86 +
 Documentation/driver-api/vfio-mediated-device.rst  |  414 +++++
 Documentation/driver-api/vfio.rst                  |  520 ++++++
 Documentation/driver-api/xillybus.rst              |  379 ++++
 Documentation/driver-api/zorro.rst                 |  104 ++
 Documentation/eisa.txt                             |  230 ---
 Documentation/fb/fbcon.rst                         |    4 +-
 Documentation/isa.txt                              |  122 --
 Documentation/isapnp.txt                           |   15 -
 Documentation/lightnvm/pblk.txt                    |   21 -
 Documentation/men-chameleon-bus.txt                |  175 --
 Documentation/ntb.txt                              |  236 ---
 Documentation/nvmem/nvmem.rst                      |  189 --
 Documentation/parport-lowlevel.txt                 | 1832 --------------------
 Documentation/pti/pti_intel_mid.rst                |  106 --
 Documentation/pwm.txt                              |  165 --
 Documentation/rfkill.txt                           |  132 --
 Documentation/s390/vfio-ccw.rst                    |    6 +-
 Documentation/sgi-ioc4.txt                         |   49 -
 Documentation/smsc_ece1099.txt                     |   60 -
 Documentation/switchtec.txt                        |  102 --
 Documentation/sync_file.txt                        |   86 -
 Documentation/vfio-mediated-device.txt             |  414 -----
 Documentation/vfio.txt                             |  520 ------
 Documentation/w1/w1.netlink                        |    2 +-
 Documentation/xillybus.txt                         |  379 ----
 Documentation/zorro.txt                            |  104 --
 MAINTAINERS                                        |   22 +-
 drivers/dma-buf/Kconfig                            |    2 +-
 drivers/gpio/Kconfig                               |    2 +-
 drivers/gpu/drm/Kconfig                            |    2 +-
 drivers/pci/switch/Kconfig                         |    2 +-
 drivers/platform/x86/Kconfig                       |    4 +-
 drivers/platform/x86/dcdbas.c                      |    2 +-
 drivers/platform/x86/dell_rbu.c                    |    2 +-
 drivers/pnp/isapnp/Kconfig                         |    2 +-
 drivers/tty/Kconfig                                |    2 +-
 drivers/vfio/Kconfig                               |    2 +-
 drivers/vfio/mdev/Kconfig                          |    2 +-
 drivers/w1/Kconfig                                 |    2 +-
 samples/Kconfig                                    |    2 +-
 75 files changed, 5730 insertions(+), 5704 deletions(-)
 delete mode 100644 Documentation/EDID/howto.rst
 delete mode 100644 Documentation/SM501.txt
 delete mode 100644 Documentation/bt8xxgpio.txt
 delete mode 100644 Documentation/connector/connector.rst
 delete mode 100644 Documentation/console/console.rst
 delete mode 100644 Documentation/dcdbas.txt
 delete mode 100644 Documentation/dell_rbu.txt
 create mode 100644 Documentation/driver-api/bt8xxgpio.rst
 create mode 100644 Documentation/driver-api/connector.rst
 create mode 100644 Documentation/driver-api/console.rst
 create mode 100644 Documentation/driver-api/dcdbas.rst
 create mode 100644 Documentation/driver-api/dell_rbu.rst
 create mode 100644 Documentation/driver-api/edid.rst
 create mode 100644 Documentation/driver-api/eisa.rst
 create mode 100644 Documentation/driver-api/isa.rst
 create mode 100644 Documentation/driver-api/isapnp.rst
 create mode 100644 Documentation/driver-api/lightnvm-pblk.rst
 create mode 100644 Documentation/driver-api/men-chameleon-bus.rst
 create mode 100644 Documentation/driver-api/ntb.rst
 create mode 100644 Documentation/driver-api/nvmem.rst
 create mode 100644 Documentation/driver-api/parport-lowlevel.rst
 create mode 100644 Documentation/driver-api/pti_intel_mid.rst
 create mode 100644 Documentation/driver-api/pwm.rst
 create mode 100644 Documentation/driver-api/rfkill.rst
 create mode 100644 Documentation/driver-api/sgi-ioc4.rst
 create mode 100644 Documentation/driver-api/sm501.rst
 create mode 100644 Documentation/driver-api/smsc_ece1099.rst
 create mode 100644 Documentation/driver-api/switchtec.rst
 create mode 100644 Documentation/driver-api/sync_file.rst
 create mode 100644 Documentation/driver-api/vfio-mediated-device.rst
 create mode 100644 Documentation/driver-api/vfio.rst
 create mode 100644 Documentation/driver-api/xillybus.rst
 create mode 100644 Documentation/driver-api/zorro.rst
 delete mode 100644 Documentation/eisa.txt
 delete mode 100644 Documentation/isa.txt
 delete mode 100644 Documentation/isapnp.txt
 delete mode 100644 Documentation/lightnvm/pblk.txt
 delete mode 100644 Documentation/men-chameleon-bus.txt
 delete mode 100644 Documentation/ntb.txt
 delete mode 100644 Documentation/nvmem/nvmem.rst
 delete mode 100644 Documentation/parport-lowlevel.txt
 delete mode 100644 Documentation/pti/pti_intel_mid.rst
 delete mode 100644 Documentation/pwm.txt
 delete mode 100644 Documentation/rfkill.txt
 delete mode 100644 Documentation/sgi-ioc4.txt
 delete mode 100644 Documentation/smsc_ece1099.txt
 delete mode 100644 Documentation/switchtec.txt
 delete mode 100644 Documentation/sync_file.txt
 delete mode 100644 Documentation/vfio-mediated-device.txt
 delete mode 100644 Documentation/vfio.txt
 delete mode 100644 Documentation/xillybus.txt
 delete mode 100644 Documentation/zorro.txt

diff --git a/Documentation/ABI/removed/sysfs-class-rfkill b/Documentation/ABI/removed/sysfs-class-rfkill
index 3ce6231f20b2..9c08c7f98ffb 100644
--- a/Documentation/ABI/removed/sysfs-class-rfkill
+++ b/Documentation/ABI/removed/sysfs-class-rfkill
@@ -1,6 +1,6 @@
 rfkill - radio frequency (RF) connector kill switch support
 
-For details to this subsystem look at Documentation/rfkill.txt.
+For details to this subsystem look at Documentation/driver-api/rfkill.rst.
 
 What:		/sys/class/rfkill/rfkill[0-9]+/claim
 Date:		09-Jul-2007
diff --git a/Documentation/ABI/stable/sysfs-class-rfkill b/Documentation/ABI/stable/sysfs-class-rfkill
index 80151a409d67..5b154f922643 100644
--- a/Documentation/ABI/stable/sysfs-class-rfkill
+++ b/Documentation/ABI/stable/sysfs-class-rfkill
@@ -1,6 +1,6 @@
 rfkill - radio frequency (RF) connector kill switch support
 
-For details to this subsystem look at Documentation/rfkill.txt.
+For details to this subsystem look at Documentation/driver-api/rfkill.rst.
 
 For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in
 Documentation/ABI/removed/sysfs-class-rfkill.
diff --git a/Documentation/ABI/testing/sysfs-class-switchtec b/Documentation/ABI/testing/sysfs-class-switchtec
index 48cb4c15e430..76c7a661a595 100644
--- a/Documentation/ABI/testing/sysfs-class-switchtec
+++ b/Documentation/ABI/testing/sysfs-class-switchtec
@@ -1,6 +1,6 @@
 switchtec - Microsemi Switchtec PCI Switch Management Endpoint
 
-For details on this subsystem look at Documentation/switchtec.txt.
+For details on this subsystem look at Documentation/driver-api/switchtec.rst.
 
 What: 		/sys/class/switchtec
 Date:		05-Jan-2017
diff --git a/Documentation/EDID/howto.rst b/Documentation/EDID/howto.rst
deleted file mode 100644
index 725fd49a88ca..000000000000
--- a/Documentation/EDID/howto.rst
+++ /dev/null
@@ -1,58 +0,0 @@
-:orphan:
-
-====
-EDID
-====
-
-In the good old days when graphics parameters were configured explicitly
-in a file called xorg.conf, even broken hardware could be managed.
-
-Today, with the advent of Kernel Mode Setting, a graphics board is
-either correctly working because all components follow the standards -
-or the computer is unusable, because the screen remains dark after
-booting or it displays the wrong area. Cases when this happens are:
-- The graphics board does not recognize the monitor.
-- The graphics board is unable to detect any EDID data.
-- The graphics board incorrectly forwards EDID data to the driver.
-- The monitor sends no or bogus EDID data.
-- A KVM sends its own EDID data instead of querying the connected monitor.
-Adding the kernel parameter "nomodeset" helps in most cases, but causes
-restrictions later on.
-
-As a remedy for such situations, the kernel configuration item
-CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
-individually prepared or corrected EDID data set in the /lib/firmware
-directory from where it is loaded via the firmware interface. The code
-(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
-commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
-1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
-not contain code to create these data. In order to elucidate the origin
-of the built-in binary EDID blobs and to facilitate the creation of
-individual data for a specific misbehaving monitor, commented sources
-and a Makefile environment are given here.
-
-To create binary EDID and C source code files from the existing data
-material, simply type "make".
-
-If you want to create your own EDID file, copy the file 1024x768.S,
-replace the settings with your own data and add a new target to the
-Makefile. Please note that the EDID data structure expects the timing
-values in a different way as compared to the standard X11 format.
-
-X11:
-  HTimings:
-    hdisp hsyncstart hsyncend htotal
-  VTimings:
-    vdisp vsyncstart vsyncend vtotal
-
-EDID::
-
-  #define XPIX hdisp
-  #define XBLANK htotal-hdisp
-  #define XOFFSET hsyncstart-hdisp
-  #define XPULSE hsyncend-hsyncstart
-
-  #define YPIX vdisp
-  #define YBLANK vtotal-vdisp
-  #define YOFFSET vsyncstart-vdisp
-  #define YPULSE vsyncend-vsyncstart
diff --git a/Documentation/SM501.txt b/Documentation/SM501.txt
deleted file mode 100644
index 882507453ba4..000000000000
--- a/Documentation/SM501.txt
+++ /dev/null
@@ -1,74 +0,0 @@
-.. include:: <isonum.txt>
-
-============
-SM501 Driver
-============
-
-:Copyright: |copy| 2006, 2007 Simtec Electronics
-
-The Silicon Motion SM501 multimedia companion chip is a multifunction device
-which may provide numerous interfaces including USB host controller USB gadget,
-asynchronous serial ports, audio functions, and a dual display video interface.
-The device may be connected by PCI or local bus with varying functions enabled.
-
-Core
-----
-
-The core driver in drivers/mfd provides common services for the
-drivers which manage the specific hardware blocks. These services
-include locking for common registers, clock control and resource
-management.
-
-The core registers drivers for both PCI and generic bus based
-chips via the platform device and driver system.
-
-On detection of a device, the core initialises the chip (which may
-be specified by the platform data) and then exports the selected
-peripheral set as platform devices for the specific drivers.
-
-The core re-uses the platform device system as the platform device
-system provides enough features to support the drivers without the
-need to create a new bus-type and the associated code to go with it.
-
-
-Resources
----------
-
-Each peripheral has a view of the device which is implicitly narrowed to
-the specific set of resources that peripheral requires in order to
-function correctly.
-
-The centralised memory allocation allows the driver to ensure that the
-maximum possible resource allocation can be made to the video subsystem
-as this is by-far the most resource-sensitive of the on-chip functions.
-
-The primary issue with memory allocation is that of moving the video
-buffers once a display mode is chosen. Indeed when a video mode change
-occurs the memory footprint of the video subsystem changes.
-
-Since video memory is difficult to move without changing the display
-(unless sufficient contiguous memory can be provided for the old and new
-modes simultaneously) the video driver fully utilises the memory area
-given to it by aligning fb0 to the start of the area and fb1 to the end
-of it. Any memory left over in the middle is used for the acceleration
-functions, which are transient and thus their location is less critical
-as it can be moved.
-
-
-Configuration
--------------
-
-The platform device driver uses a set of platform data to pass
-configurations through to the core and the subsidiary drivers
-so that there can be support for more than one system carrying
-an SM501 built into a single kernel image.
-
-The PCI driver assumes that the PCI card behaves as per the Silicon
-Motion reference design.
-
-There is an errata (AB-5) affecting the selection of the
-of the M1XCLK and M1CLK frequencies. These two clocks
-must be sourced from the same PLL, although they can then
-be divided down individually. If this is not set, then SM501 may
-lock and hang the whole system. The driver will refuse to
-attach if the PLL selection is different.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 19b1e3bef56c..04f7b537ee51 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -930,7 +930,7 @@
 			edid/1680x1050.bin, or edid/1920x1080.bin is given
 			and no file with the same name exists. Details and
 			instructions how to build your own EDID data are
-			available in Documentation/EDID/howto.rst. An EDID
+			available in Documentation/driver-api/edid.rst. An EDID
 			data set will only be used for a particular connector,
 			if its name and a colon are prepended to the EDID
 			name. Each connector may use a unique EDID data
diff --git a/Documentation/admin-guide/laptops/thinkpad-acpi.rst b/Documentation/admin-guide/laptops/thinkpad-acpi.rst
index 19d52fc3c5e9..adea0bf2acc5 100644
--- a/Documentation/admin-guide/laptops/thinkpad-acpi.rst
+++ b/Documentation/admin-guide/laptops/thinkpad-acpi.rst
@@ -643,7 +643,7 @@ Sysfs notes
 	2010.
 
 	rfkill controller switch "tpacpi_bluetooth_sw": refer to
-	Documentation/rfkill.txt for details.
+	Documentation/driver-api/rfkill.rst for details.
 
 
 Video output control -- /proc/acpi/ibm/video
@@ -1406,7 +1406,7 @@ Sysfs notes
 	2010.
 
 	rfkill controller switch "tpacpi_wwan_sw": refer to
-	Documentation/rfkill.txt for details.
+	Documentation/driver-api/rfkill.rst for details.
 
 
 EXPERIMENTAL: UWB
@@ -1426,7 +1426,7 @@ Sysfs notes
 ^^^^^^^^^^^
 
 	rfkill controller switch "tpacpi_uwb_sw": refer to
-	Documentation/rfkill.txt for details.
+	Documentation/driver-api/rfkill.rst for details.
 
 Adaptive keyboard
 -----------------
diff --git a/Documentation/bt8xxgpio.txt b/Documentation/bt8xxgpio.txt
deleted file mode 100644
index a845feb074de..000000000000
--- a/Documentation/bt8xxgpio.txt
+++ /dev/null
@@ -1,62 +0,0 @@
-===================================================================
-A driver for a selfmade cheap BT8xx based PCI GPIO-card (bt8xxgpio)
-===================================================================
-
-For advanced documentation, see http://www.bu3sch.de/btgpio.php
-
-A generic digital 24-port PCI GPIO card can be built out of an ordinary
-Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
-Brooktree chip is used in old analog Hauppauge WinTV PCI cards. You can easily
-find them used for low prices on the net.
-
-The bt8xx chip does have 24 digital GPIO ports.
-These ports are accessible via 24 pins on the SMD chip package.
-
-
-How to physically access the GPIO pins
-======================================
-
-The are several ways to access these pins. One might unsolder the whole chip
-and put it on a custom PCI board, or one might only unsolder each individual
-GPIO pin and solder that to some tiny wire. As the chip package really is tiny
-there are some advanced soldering skills needed in any case.
-
-The physical pinouts are drawn in the following ASCII art.
-The GPIO pins are marked with G00-G23::
-
-                                           G G G G G G G G G G G G     G G G G G G
-                                           0 0 0 0 0 0 0 0 0 0 1 1     1 1 1 1 1 1
-                                           0 1 2 3 4 5 6 7 8 9 0 1     2 3 4 5 6 7
-           | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-           ---------------------------------------------------------------------------
-         --|                               ^                                     ^   |--
-         --|                               pin 86                           pin 67   |--
-         --|                                                                         |--
-         --|                                                               pin 61 >  |-- G18
-         --|                                                                         |-- G19
-         --|                                                                         |-- G20
-         --|                                                                         |-- G21
-         --|                                                                         |-- G22
-         --|                                                               pin 56 >  |-- G23
-         --|                                                                         |--
-         --|                           Brooktree 878/879                             |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|                                                                         |--
-         --|   O                                                                     |--
-         --|                                                                         |--
-           ---------------------------------------------------------------------------
-           | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
-           ^
-           This is pin 1
-
diff --git a/Documentation/connector/connector.rst b/Documentation/connector/connector.rst
deleted file mode 100644
index 24e26dc22dbf..000000000000
--- a/Documentation/connector/connector.rst
+++ /dev/null
@@ -1,156 +0,0 @@
-:orphan:
-
-================
-Kernel Connector
-================
-
-Kernel connector - new netlink based userspace <-> kernel space easy
-to use communication module.
-
-The Connector driver makes it easy to connect various agents using a
-netlink based network.  One must register a callback and an identifier.
-When the driver receives a special netlink message with the appropriate
-identifier, the appropriate callback will be called.
-
-From the userspace point of view it's quite straightforward:
-
-	- socket();
-	- bind();
-	- send();
-	- recv();
-
-But if kernelspace wants to use the full power of such connections, the
-driver writer must create special sockets, must know about struct sk_buff
-handling, etc...  The Connector driver allows any kernelspace agents to use
-netlink based networking for inter-process communication in a significantly
-easier way::
-
-  int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
-  void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
-  void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);
-
-  struct cb_id
-  {
-	__u32			idx;
-	__u32			val;
-  };
-
-idx and val are unique identifiers which must be registered in the
-connector.h header for in-kernel usage.  `void (*callback) (void *)` is a
-callback function which will be called when a message with above idx.val
-is received by the connector core.  The argument for that function must
-be dereferenced to `struct cn_msg *`::
-
-  struct cn_msg
-  {
-	struct cb_id		id;
-
-	__u32			seq;
-	__u32			ack;
-
-	__u32			len;	/* Length of the following data */
-	__u8			data[0];
-  };
-
-Connector interfaces
-====================
-
- .. kernel-doc:: include/linux/connector.h
-
- Note:
-   When registering new callback user, connector core assigns
-   netlink group to the user which is equal to its id.idx.
-
-Protocol description
-====================
-
-The current framework offers a transport layer with fixed headers.  The
-recommended protocol which uses such a header is as following:
-
-msg->seq and msg->ack are used to determine message genealogy.  When
-someone sends a message, they use a locally unique sequence and random
-acknowledge number.  The sequence number may be copied into
-nlmsghdr->nlmsg_seq too.
-
-The sequence number is incremented with each message sent.
-
-If you expect a reply to the message, then the sequence number in the
-received message MUST be the same as in the original message, and the
-acknowledge number MUST be the same + 1.
-
-If we receive a message and its sequence number is not equal to one we
-are expecting, then it is a new message.  If we receive a message and
-its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the sequence number in the original
-message + 1, then it is a new message.
-
-Obviously, the protocol header contains the above id.
-
-The connector allows event notification in the following form: kernel
-driver or userspace process can ask connector to notify it when
-selected ids will be turned on or off (registered or unregistered its
-callback).  It is done by sending a special command to the connector
-driver (it also registers itself with id={-1, -1}).
-
-As example of this usage can be found in the cn_test.c module which
-uses the connector to request notification and to send messages.
-
-Reliability
-===========
-
-Netlink itself is not a reliable protocol.  That means that messages can
-be lost due to memory pressure or process' receiving queue overflowed,
-so caller is warned that it must be prepared.  That is why the struct
-cn_msg [main connector's message header] contains u32 seq and u32 ack
-fields.
-
-Userspace usage
-===============
-
-2.6.14 has a new netlink socket implementation, which by default does not
-allow people to send data to netlink groups other than 1.
-So, if you wish to use a netlink socket (for example using connector)
-with a different group number, the userspace application must subscribe to
-that group first.  It can be achieved by the following pseudocode::
-
-  s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
-
-  l_local.nl_family = AF_NETLINK;
-  l_local.nl_groups = 12345;
-  l_local.nl_pid = 0;
-
-  if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
-	perror("bind");
-	close(s);
-	return -1;
-  }
-
-  {
-	int on = l_local.nl_groups;
-	setsockopt(s, 270, 1, &on, sizeof(on));
-  }
-
-Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket
-option.  To drop a multicast subscription, one should call the above socket
-option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0.
-
-2.6.14 netlink code only allows to select a group which is less or equal to
-the maximum group number, which is used at netlink_kernel_create() time.
-In case of connector it is CN_NETLINK_USERS + 0xf, so if you want to use
-group number 12345, you must increment CN_NETLINK_USERS to that number.
-Additional 0xf numbers are allocated to be used by non-in-kernel users.
-
-Due to this limitation, group 0xffffffff does not work now, so one can
-not use add/remove connector's group notifications, but as far as I know,
-only cn_test.c test module used it.
-
-Some work in netlink area is still being done, so things can be changed in
-2.6.15 timeframe, if it will happen, documentation will be updated for that
-kernel.
-
-Code samples
-============
-
-Sample code for a connector test module and user space can be found
-in samples/connector/. To build this code, enable CONFIG_CONNECTOR
-and CONFIG_SAMPLES.
diff --git a/Documentation/console/console.rst b/Documentation/console/console.rst
deleted file mode 100644
index b374141b027e..000000000000
--- a/Documentation/console/console.rst
+++ /dev/null
@@ -1,152 +0,0 @@
-:orphan:
-
-===============
-Console Drivers
-===============
-
-The Linux kernel has 2 general types of console drivers.  The first type is
-assigned by the kernel to all the virtual consoles during the boot process.
-This type will be called 'system driver', and only one system driver is allowed
-to exist. The system driver is persistent and it can never be unloaded, though
-it may become inactive.
-
-The second type has to be explicitly loaded and unloaded. This will be called
-'modular driver' by this document. Multiple modular drivers can coexist at
-any time with each driver sharing the console with other drivers including
-the system driver. However, modular drivers cannot take over the console
-that is currently occupied by another modular driver. (Exception: Drivers that
-call do_take_over_console() will succeed in the takeover regardless of the type
-of driver occupying the consoles.) They can only take over the console that is
-occupied by the system driver. In the same token, if the modular driver is
-released by the console, the system driver will take over.
-
-Modular drivers, from the programmer's point of view, have to call::
-
-	 do_take_over_console() - load and bind driver to console layer
-	 give_up_console() - unload driver; it will only work if driver
-			     is fully unbound
-
-In newer kernels, the following are also available::
-
-	 do_register_con_driver()
-	 do_unregister_con_driver()
-
-If sysfs is enabled, the contents of /sys/class/vtconsole can be
-examined. This shows the console backends currently registered by the
-system which are named vtcon<n> where <n> is an integer from 0 to 15.
-Thus::
-
-       ls /sys/class/vtconsole
-       .  ..  vtcon0  vtcon1
-
-Each directory in /sys/class/vtconsole has 3 files::
-
-     ls /sys/class/vtconsole/vtcon0
-     .  ..  bind  name  uevent
-
-What do these files signify?
-
-     1. bind - this is a read/write file. It shows the status of the driver if
-        read, or acts to bind or unbind the driver to the virtual consoles
-        when written to. The possible values are:
-
-	0
-	  - means the driver is not bound and if echo'ed, commands the driver
-	    to unbind
-
-        1
-	  - means the driver is bound and if echo'ed, commands the driver to
-	    bind
-
-     2. name - read-only file. Shows the name of the driver in this format::
-
-	  cat /sys/class/vtconsole/vtcon0/name
-	  (S) VGA+
-
-	      '(S)' stands for a (S)ystem driver, i.e., it cannot be directly
-	      commanded to bind or unbind
-
-	      'VGA+' is the name of the driver
-
-	  cat /sys/class/vtconsole/vtcon1/name
-	  (M) frame buffer device
-
-	      In this case, '(M)' stands for a (M)odular driver, one that can be
-	      directly commanded to bind or unbind.
-
-     3. uevent - ignore this file
-
-When unbinding, the modular driver is detached first, and then the system
-driver takes over the consoles vacated by the driver. Binding, on the other
-hand, will bind the driver to the consoles that are currently occupied by a
-system driver.
-
-NOTE1:
-  Binding and unbinding must be selected in Kconfig. It's under::
-
-    Device Drivers ->
-	Character devices ->
-		Support for binding and unbinding console drivers
-
-NOTE2:
-  If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
-  unbinding will not succeed. An example of an application that sets the
-  console to KD_GRAPHICS is X.
-
-How useful is this feature? This is very useful for console driver
-developers. By unbinding the driver from the console layer, one can unload the
-driver, make changes, recompile, reload and rebind the driver without any need
-for rebooting the kernel. For regular users who may want to switch from
-framebuffer console to VGA console and vice versa, this feature also makes
-this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
-for more details.)
-
-Notes for developers
-====================
-
-do_take_over_console() is now broken up into::
-
-     do_register_con_driver()
-     do_bind_con_driver() - private function
-
-give_up_console() is a wrapper to do_unregister_con_driver(), and a driver must
-be fully unbound for this call to succeed. con_is_bound() will check if the
-driver is bound or not.
-
-Guidelines for console driver writers
-=====================================
-
-In order for binding to and unbinding from the console to properly work,
-console drivers must follow these guidelines:
-
-1. All drivers, except system drivers, must call either do_register_con_driver()
-   or do_take_over_console(). do_register_con_driver() will just add the driver
-   to the console's internal list. It won't take over the
-   console. do_take_over_console(), as it name implies, will also take over (or
-   bind to) the console.
-
-2. All resources allocated during con->con_init() must be released in
-   con->con_deinit().
-
-3. All resources allocated in con->con_startup() must be released when the
-   driver, which was previously bound, becomes unbound.  The console layer
-   does not have a complementary call to con->con_startup() so it's up to the
-   driver to check when it's legal to release these resources. Calling
-   con_is_bound() in con->con_deinit() will help.  If the call returned
-   false(), then it's safe to release the resources.  This balance has to be
-   ensured because con->con_startup() can be called again when a request to
-   rebind the driver to the console arrives.
-
-4. Upon exit of the driver, ensure that the driver is totally unbound. If the
-   condition is satisfied, then the driver must call do_unregister_con_driver()
-   or give_up_console().
-
-5. do_unregister_con_driver() can also be called on conditions which make it
-   impossible for the driver to service console requests.  This can happen
-   with the framebuffer console that suddenly lost all of its drivers.
-
-The current crop of console drivers should still work correctly, but binding
-and unbinding them may cause problems. With minimal fixes, these drivers can
-be made to work correctly.
-
-Antonino Daplas <adaplas@pol.net>
diff --git a/Documentation/dcdbas.txt b/Documentation/dcdbas.txt
deleted file mode 100644
index 309cc57a7c1c..000000000000
--- a/Documentation/dcdbas.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-===================================
-Dell Systems Management Base Driver
-===================================
-
-Overview
-========
-
-The Dell Systems Management Base Driver provides a sysfs interface for
-systems management software such as Dell OpenManage to perform system
-management interrupts and host control actions (system power cycle or
-power off after OS shutdown) on certain Dell systems.
-
-Dell OpenManage requires this driver on the following Dell PowerEdge systems:
-300, 1300, 1400, 400SC, 500SC, 1500SC, 1550, 600SC, 1600SC, 650, 1655MC,
-700, and 750.  Other Dell software such as the open source libsmbios project
-is expected to make use of this driver, and it may include the use of this
-driver on other Dell systems.
-
-The Dell libsmbios project aims towards providing access to as much BIOS
-information as possible.  See http://linux.dell.com/libsmbios/main/ for
-more information about the libsmbios project.
-
-
-System Management Interrupt
-===========================
-
-On some Dell systems, systems management software must access certain
-management information via a system management interrupt (SMI).  The SMI data
-buffer must reside in 32-bit address space, and the physical address of the
-buffer is required for the SMI.  The driver maintains the memory required for
-the SMI and provides a way for the application to generate the SMI.
-The driver creates the following sysfs entries for systems management
-software to perform these system management interrupts::
-
-	/sys/devices/platform/dcdbas/smi_data
-	/sys/devices/platform/dcdbas/smi_data_buf_phys_addr
-	/sys/devices/platform/dcdbas/smi_data_buf_size
-	/sys/devices/platform/dcdbas/smi_request
-
-Systems management software must perform the following steps to execute
-a SMI using this driver:
-
-1) Lock smi_data.
-2) Write system management command to smi_data.
-3) Write "1" to smi_request to generate a calling interface SMI or
-   "2" to generate a raw SMI.
-4) Read system management command response from smi_data.
-5) Unlock smi_data.
-
-
-Host Control Action
-===================
-
-Dell OpenManage supports a host control feature that allows the administrator
-to perform a power cycle or power off of the system after the OS has finished
-shutting down.  On some Dell systems, this host control feature requires that
-a driver perform a SMI after the OS has finished shutting down.
-
-The driver creates the following sysfs entries for systems management software
-to schedule the driver to perform a power cycle or power off host control
-action after the system has finished shutting down:
-
-/sys/devices/platform/dcdbas/host_control_action
-/sys/devices/platform/dcdbas/host_control_smi_type
-/sys/devices/platform/dcdbas/host_control_on_shutdown
-
-Dell OpenManage performs the following steps to execute a power cycle or
-power off host control action using this driver:
-
-1) Write host control action to be performed to host_control_action.
-2) Write type of SMI that driver needs to perform to host_control_smi_type.
-3) Write "1" to host_control_on_shutdown to enable host control action.
-4) Initiate OS shutdown.
-   (Driver will perform host control SMI when it is notified that the OS
-   has finished shutting down.)
-
-
-Host Control SMI Type
-=====================
-
-The following table shows the value to write to host_control_smi_type to
-perform a power cycle or power off host control action:
-
-=================== =====================
-PowerEdge System    Host Control SMI Type
-=================== =====================
-      300             HC_SMITYPE_TYPE1
-     1300             HC_SMITYPE_TYPE1
-     1400             HC_SMITYPE_TYPE2
-      500SC           HC_SMITYPE_TYPE2
-     1500SC           HC_SMITYPE_TYPE2
-     1550             HC_SMITYPE_TYPE2
-      600SC           HC_SMITYPE_TYPE2
-     1600SC           HC_SMITYPE_TYPE2
-      650             HC_SMITYPE_TYPE2
-     1655MC           HC_SMITYPE_TYPE2
-      700             HC_SMITYPE_TYPE3
-      750             HC_SMITYPE_TYPE3
-=================== =====================
diff --git a/Documentation/dell_rbu.txt b/Documentation/dell_rbu.txt
deleted file mode 100644
index 5d1ce7bcd04d..000000000000
--- a/Documentation/dell_rbu.txt
+++ /dev/null
@@ -1,128 +0,0 @@
-=============================================================
-Usage of the new open sourced rbu (Remote BIOS Update) driver
-=============================================================
-
-Purpose
-=======
-
-Document demonstrating the use of the Dell Remote BIOS Update driver.
-for updating BIOS images on Dell servers and desktops.
-
-Scope
-=====
-
-This document discusses the functionality of the rbu driver only.
-It does not cover the support needed from applications to enable the BIOS to
-update itself with the image downloaded in to the memory.
-
-Overview
-========
-
-This driver works with Dell OpenManage or Dell Update Packages for updating
-the BIOS on Dell servers (starting from servers sold since 1999), desktops
-and notebooks (starting from those sold in 2005).
-
-Please go to  http://support.dell.com register and you can find info on
-OpenManage and Dell Update packages (DUP).
-
-Libsmbios can also be used to update BIOS on Dell systems go to
-http://linux.dell.com/libsmbios/ for details.
-
-Dell_RBU driver supports BIOS update using the monolithic image and packetized
-image methods. In case of monolithic the driver allocates a contiguous chunk
-of physical pages having the BIOS image. In case of packetized the app
-using the driver breaks the image in to packets of fixed sizes and the driver
-would place each packet in contiguous physical memory. The driver also
-maintains a link list of packets for reading them back.
-
-If the dell_rbu driver is unloaded all the allocated memory is freed.
-
-The rbu driver needs to have an application (as mentioned above)which will
-inform the BIOS to enable the update in the next system reboot.
-
-The user should not unload the rbu driver after downloading the BIOS image
-or updating.
-
-The driver load creates the following directories under the /sys file system::
-
-	/sys/class/firmware/dell_rbu/loading
-	/sys/class/firmware/dell_rbu/data
-	/sys/devices/platform/dell_rbu/image_type
-	/sys/devices/platform/dell_rbu/data
-	/sys/devices/platform/dell_rbu/packet_size
-
-The driver supports two types of update mechanism; monolithic and packetized.
-These update mechanism depends upon the BIOS currently running on the system.
-Most of the Dell systems support a monolithic update where the BIOS image is
-copied to a single contiguous block of physical memory.
-
-In case of packet mechanism the single memory can be broken in smaller chunks
-of contiguous memory and the BIOS image is scattered in these packets.
-
-By default the driver uses monolithic memory for the update type. This can be
-changed to packets during the driver load time by specifying the load
-parameter image_type=packet.  This can also be changed later as below::
-
-	echo packet > /sys/devices/platform/dell_rbu/image_type
-
-In packet update mode the packet size has to be given before any packets can
-be downloaded. It is done as below::
-
-	echo XXXX > /sys/devices/platform/dell_rbu/packet_size
-
-In the packet update mechanism, the user needs to create a new file having
-packets of data arranged back to back. It can be done as follows
-The user creates packets header, gets the chunk of the BIOS image and
-places it next to the packetheader; now, the packetheader + BIOS image chunk
-added together should match the specified packet_size. This makes one
-packet, the user needs to create more such packets out of the entire BIOS
-image file and then arrange all these packets back to back in to one single
-file.
-
-This file is then copied to /sys/class/firmware/dell_rbu/data.
-Once this file gets to the driver, the driver extracts packet_size data from
-the file and spreads it across the physical memory in contiguous packet_sized
-space.
-
-This method makes sure that all the packets get to the driver in a single operation.
-
-In monolithic update the user simply get the BIOS image (.hdr file) and copies
-to the data file as is without any change to the BIOS image itself.
-
-Do the steps below to download the BIOS image.
-
-1) echo 1 > /sys/class/firmware/dell_rbu/loading
-2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data
-3) echo 0 > /sys/class/firmware/dell_rbu/loading
-
-The /sys/class/firmware/dell_rbu/ entries will remain till the following is
-done.
-
-::
-
-	echo -1 > /sys/class/firmware/dell_rbu/loading
-
-Until this step is completed the driver cannot be unloaded.
-
-Also echoing either mono, packet or init in to image_type will free up the
-memory allocated by the driver.
-
-If a user by accident executes steps 1 and 3 above without executing step 2;
-it will make the /sys/class/firmware/dell_rbu/ entries disappear.
-
-The entries can be recreated by doing the following::
-
-	echo init > /sys/devices/platform/dell_rbu/image_type
-
-.. note:: echoing init in image_type does not change it original value.
-
-Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to
-read back the image downloaded.
-
-.. note::
-
-   After updating the BIOS image a user mode application needs to execute
-   code which sends the BIOS update request to the BIOS. So on the next reboot
-   the BIOS knows about the new image downloaded and it updates itself.
-   Also don't unload the rbu driver if the image has to be updated.
-
diff --git a/Documentation/driver-api/bt8xxgpio.rst b/Documentation/driver-api/bt8xxgpio.rst
new file mode 100644
index 000000000000..a845feb074de
--- /dev/null
+++ b/Documentation/driver-api/bt8xxgpio.rst
@@ -0,0 +1,62 @@
+===================================================================
+A driver for a selfmade cheap BT8xx based PCI GPIO-card (bt8xxgpio)
+===================================================================
+
+For advanced documentation, see http://www.bu3sch.de/btgpio.php
+
+A generic digital 24-port PCI GPIO card can be built out of an ordinary
+Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
+Brooktree chip is used in old analog Hauppauge WinTV PCI cards. You can easily
+find them used for low prices on the net.
+
+The bt8xx chip does have 24 digital GPIO ports.
+These ports are accessible via 24 pins on the SMD chip package.
+
+
+How to physically access the GPIO pins
+======================================
+
+The are several ways to access these pins. One might unsolder the whole chip
+and put it on a custom PCI board, or one might only unsolder each individual
+GPIO pin and solder that to some tiny wire. As the chip package really is tiny
+there are some advanced soldering skills needed in any case.
+
+The physical pinouts are drawn in the following ASCII art.
+The GPIO pins are marked with G00-G23::
+
+                                           G G G G G G G G G G G G     G G G G G G
+                                           0 0 0 0 0 0 0 0 0 0 1 1     1 1 1 1 1 1
+                                           0 1 2 3 4 5 6 7 8 9 0 1     2 3 4 5 6 7
+           | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+           ---------------------------------------------------------------------------
+         --|                               ^                                     ^   |--
+         --|                               pin 86                           pin 67   |--
+         --|                                                                         |--
+         --|                                                               pin 61 >  |-- G18
+         --|                                                                         |-- G19
+         --|                                                                         |-- G20
+         --|                                                                         |-- G21
+         --|                                                                         |-- G22
+         --|                                                               pin 56 >  |-- G23
+         --|                                                                         |--
+         --|                           Brooktree 878/879                             |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|                                                                         |--
+         --|   O                                                                     |--
+         --|                                                                         |--
+           ---------------------------------------------------------------------------
+           | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+           ^
+           This is pin 1
+
diff --git a/Documentation/driver-api/connector.rst b/Documentation/driver-api/connector.rst
new file mode 100644
index 000000000000..c100c7482289
--- /dev/null
+++ b/Documentation/driver-api/connector.rst
@@ -0,0 +1,156 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Kernel Connector
+================
+
+Kernel connector - new netlink based userspace <-> kernel space easy
+to use communication module.
+
+The Connector driver makes it easy to connect various agents using a
+netlink based network.  One must register a callback and an identifier.
+When the driver receives a special netlink message with the appropriate
+identifier, the appropriate callback will be called.
+
+From the userspace point of view it's quite straightforward:
+
+	- socket();
+	- bind();
+	- send();
+	- recv();
+
+But if kernelspace wants to use the full power of such connections, the
+driver writer must create special sockets, must know about struct sk_buff
+handling, etc...  The Connector driver allows any kernelspace agents to use
+netlink based networking for inter-process communication in a significantly
+easier way::
+
+  int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *));
+  void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask);
+  void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask);
+
+  struct cb_id
+  {
+	__u32			idx;
+	__u32			val;
+  };
+
+idx and val are unique identifiers which must be registered in the
+connector.h header for in-kernel usage.  `void (*callback) (void *)` is a
+callback function which will be called when a message with above idx.val
+is received by the connector core.  The argument for that function must
+be dereferenced to `struct cn_msg *`::
+
+  struct cn_msg
+  {
+	struct cb_id		id;
+
+	__u32			seq;
+	__u32			ack;
+
+	__u32			len;	/* Length of the following data */
+	__u8			data[0];
+  };
+
+Connector interfaces
+====================
+
+ .. kernel-doc:: include/linux/connector.h
+
+ Note:
+   When registering new callback user, connector core assigns
+   netlink group to the user which is equal to its id.idx.
+
+Protocol description
+====================
+
+The current framework offers a transport layer with fixed headers.  The
+recommended protocol which uses such a header is as following:
+
+msg->seq and msg->ack are used to determine message genealogy.  When
+someone sends a message, they use a locally unique sequence and random
+acknowledge number.  The sequence number may be copied into
+nlmsghdr->nlmsg_seq too.
+
+The sequence number is incremented with each message sent.
+
+If you expect a reply to the message, then the sequence number in the
+received message MUST be the same as in the original message, and the
+acknowledge number MUST be the same + 1.
+
+If we receive a message and its sequence number is not equal to one we
+are expecting, then it is a new message.  If we receive a message and
+its sequence number is the same as one we are expecting, but its
+acknowledge is not equal to the sequence number in the original
+message + 1, then it is a new message.
+
+Obviously, the protocol header contains the above id.
+
+The connector allows event notification in the following form: kernel
+driver or userspace process can ask connector to notify it when
+selected ids will be turned on or off (registered or unregistered its
+callback).  It is done by sending a special command to the connector
+driver (it also registers itself with id={-1, -1}).
+
+As example of this usage can be found in the cn_test.c module which
+uses the connector to request notification and to send messages.
+
+Reliability
+===========
+
+Netlink itself is not a reliable protocol.  That means that messages can
+be lost due to memory pressure or process' receiving queue overflowed,
+so caller is warned that it must be prepared.  That is why the struct
+cn_msg [main connector's message header] contains u32 seq and u32 ack
+fields.
+
+Userspace usage
+===============
+
+2.6.14 has a new netlink socket implementation, which by default does not
+allow people to send data to netlink groups other than 1.
+So, if you wish to use a netlink socket (for example using connector)
+with a different group number, the userspace application must subscribe to
+that group first.  It can be achieved by the following pseudocode::
+
+  s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
+
+  l_local.nl_family = AF_NETLINK;
+  l_local.nl_groups = 12345;
+  l_local.nl_pid = 0;
+
+  if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
+	perror("bind");
+	close(s);
+	return -1;
+  }
+
+  {
+	int on = l_local.nl_groups;
+	setsockopt(s, 270, 1, &on, sizeof(on));
+  }
+
+Where 270 above is SOL_NETLINK, and 1 is a NETLINK_ADD_MEMBERSHIP socket
+option.  To drop a multicast subscription, one should call the above socket
+option with the NETLINK_DROP_MEMBERSHIP parameter which is defined as 0.
+
+2.6.14 netlink code only allows to select a group which is less or equal to
+the maximum group number, which is used at netlink_kernel_create() time.
+In case of connector it is CN_NETLINK_USERS + 0xf, so if you want to use
+group number 12345, you must increment CN_NETLINK_USERS to that number.
+Additional 0xf numbers are allocated to be used by non-in-kernel users.
+
+Due to this limitation, group 0xffffffff does not work now, so one can
+not use add/remove connector's group notifications, but as far as I know,
+only cn_test.c test module used it.
+
+Some work in netlink area is still being done, so things can be changed in
+2.6.15 timeframe, if it will happen, documentation will be updated for that
+kernel.
+
+Code samples
+============
+
+Sample code for a connector test module and user space can be found
+in samples/connector/. To build this code, enable CONFIG_CONNECTOR
+and CONFIG_SAMPLES.
diff --git a/Documentation/driver-api/console.rst b/Documentation/driver-api/console.rst
new file mode 100644
index 000000000000..8394ad7747ac
--- /dev/null
+++ b/Documentation/driver-api/console.rst
@@ -0,0 +1,152 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Console Drivers
+===============
+
+The Linux kernel has 2 general types of console drivers.  The first type is
+assigned by the kernel to all the virtual consoles during the boot process.
+This type will be called 'system driver', and only one system driver is allowed
+to exist. The system driver is persistent and it can never be unloaded, though
+it may become inactive.
+
+The second type has to be explicitly loaded and unloaded. This will be called
+'modular driver' by this document. Multiple modular drivers can coexist at
+any time with each driver sharing the console with other drivers including
+the system driver. However, modular drivers cannot take over the console
+that is currently occupied by another modular driver. (Exception: Drivers that
+call do_take_over_console() will succeed in the takeover regardless of the type
+of driver occupying the consoles.) They can only take over the console that is
+occupied by the system driver. In the same token, if the modular driver is
+released by the console, the system driver will take over.
+
+Modular drivers, from the programmer's point of view, have to call::
+
+	 do_take_over_console() - load and bind driver to console layer
+	 give_up_console() - unload driver; it will only work if driver
+			     is fully unbound
+
+In newer kernels, the following are also available::
+
+	 do_register_con_driver()
+	 do_unregister_con_driver()
+
+If sysfs is enabled, the contents of /sys/class/vtconsole can be
+examined. This shows the console backends currently registered by the
+system which are named vtcon<n> where <n> is an integer from 0 to 15.
+Thus::
+
+       ls /sys/class/vtconsole
+       .  ..  vtcon0  vtcon1
+
+Each directory in /sys/class/vtconsole has 3 files::
+
+     ls /sys/class/vtconsole/vtcon0
+     .  ..  bind  name  uevent
+
+What do these files signify?
+
+     1. bind - this is a read/write file. It shows the status of the driver if
+        read, or acts to bind or unbind the driver to the virtual consoles
+        when written to. The possible values are:
+
+	0
+	  - means the driver is not bound and if echo'ed, commands the driver
+	    to unbind
+
+        1
+	  - means the driver is bound and if echo'ed, commands the driver to
+	    bind
+
+     2. name - read-only file. Shows the name of the driver in this format::
+
+	  cat /sys/class/vtconsole/vtcon0/name
+	  (S) VGA+
+
+	      '(S)' stands for a (S)ystem driver, i.e., it cannot be directly
+	      commanded to bind or unbind
+
+	      'VGA+' is the name of the driver
+
+	  cat /sys/class/vtconsole/vtcon1/name
+	  (M) frame buffer device
+
+	      In this case, '(M)' stands for a (M)odular driver, one that can be
+	      directly commanded to bind or unbind.
+
+     3. uevent - ignore this file
+
+When unbinding, the modular driver is detached first, and then the system
+driver takes over the consoles vacated by the driver. Binding, on the other
+hand, will bind the driver to the consoles that are currently occupied by a
+system driver.
+
+NOTE1:
+  Binding and unbinding must be selected in Kconfig. It's under::
+
+    Device Drivers ->
+	Character devices ->
+		Support for binding and unbinding console drivers
+
+NOTE2:
+  If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
+  unbinding will not succeed. An example of an application that sets the
+  console to KD_GRAPHICS is X.
+
+How useful is this feature? This is very useful for console driver
+developers. By unbinding the driver from the console layer, one can unload the
+driver, make changes, recompile, reload and rebind the driver without any need
+for rebooting the kernel. For regular users who may want to switch from
+framebuffer console to VGA console and vice versa, this feature also makes
+this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
+for more details.)
+
+Notes for developers
+====================
+
+do_take_over_console() is now broken up into::
+
+     do_register_con_driver()
+     do_bind_con_driver() - private function
+
+give_up_console() is a wrapper to do_unregister_con_driver(), and a driver must
+be fully unbound for this call to succeed. con_is_bound() will check if the
+driver is bound or not.
+
+Guidelines for console driver writers
+=====================================
+
+In order for binding to and unbinding from the console to properly work,
+console drivers must follow these guidelines:
+
+1. All drivers, except system drivers, must call either do_register_con_driver()
+   or do_take_over_console(). do_register_con_driver() will just add the driver
+   to the console's internal list. It won't take over the
+   console. do_take_over_console(), as it name implies, will also take over (or
+   bind to) the console.
+
+2. All resources allocated during con->con_init() must be released in
+   con->con_deinit().
+
+3. All resources allocated in con->con_startup() must be released when the
+   driver, which was previously bound, becomes unbound.  The console layer
+   does not have a complementary call to con->con_startup() so it's up to the
+   driver to check when it's legal to release these resources. Calling
+   con_is_bound() in con->con_deinit() will help.  If the call returned
+   false(), then it's safe to release the resources.  This balance has to be
+   ensured because con->con_startup() can be called again when a request to
+   rebind the driver to the console arrives.
+
+4. Upon exit of the driver, ensure that the driver is totally unbound. If the
+   condition is satisfied, then the driver must call do_unregister_con_driver()
+   or give_up_console().
+
+5. do_unregister_con_driver() can also be called on conditions which make it
+   impossible for the driver to service console requests.  This can happen
+   with the framebuffer console that suddenly lost all of its drivers.
+
+The current crop of console drivers should still work correctly, but binding
+and unbinding them may cause problems. With minimal fixes, these drivers can
+be made to work correctly.
+
+Antonino Daplas <adaplas@pol.net>
diff --git a/Documentation/driver-api/dcdbas.rst b/Documentation/driver-api/dcdbas.rst
new file mode 100644
index 000000000000..309cc57a7c1c
--- /dev/null
+++ b/Documentation/driver-api/dcdbas.rst
@@ -0,0 +1,99 @@
+===================================
+Dell Systems Management Base Driver
+===================================
+
+Overview
+========
+
+The Dell Systems Management Base Driver provides a sysfs interface for
+systems management software such as Dell OpenManage to perform system
+management interrupts and host control actions (system power cycle or
+power off after OS shutdown) on certain Dell systems.
+
+Dell OpenManage requires this driver on the following Dell PowerEdge systems:
+300, 1300, 1400, 400SC, 500SC, 1500SC, 1550, 600SC, 1600SC, 650, 1655MC,
+700, and 750.  Other Dell software such as the open source libsmbios project
+is expected to make use of this driver, and it may include the use of this
+driver on other Dell systems.
+
+The Dell libsmbios project aims towards providing access to as much BIOS
+information as possible.  See http://linux.dell.com/libsmbios/main/ for
+more information about the libsmbios project.
+
+
+System Management Interrupt
+===========================
+
+On some Dell systems, systems management software must access certain
+management information via a system management interrupt (SMI).  The SMI data
+buffer must reside in 32-bit address space, and the physical address of the
+buffer is required for the SMI.  The driver maintains the memory required for
+the SMI and provides a way for the application to generate the SMI.
+The driver creates the following sysfs entries for systems management
+software to perform these system management interrupts::
+
+	/sys/devices/platform/dcdbas/smi_data
+	/sys/devices/platform/dcdbas/smi_data_buf_phys_addr
+	/sys/devices/platform/dcdbas/smi_data_buf_size
+	/sys/devices/platform/dcdbas/smi_request
+
+Systems management software must perform the following steps to execute
+a SMI using this driver:
+
+1) Lock smi_data.
+2) Write system management command to smi_data.
+3) Write "1" to smi_request to generate a calling interface SMI or
+   "2" to generate a raw SMI.
+4) Read system management command response from smi_data.
+5) Unlock smi_data.
+
+
+Host Control Action
+===================
+
+Dell OpenManage supports a host control feature that allows the administrator
+to perform a power cycle or power off of the system after the OS has finished
+shutting down.  On some Dell systems, this host control feature requires that
+a driver perform a SMI after the OS has finished shutting down.
+
+The driver creates the following sysfs entries for systems management software
+to schedule the driver to perform a power cycle or power off host control
+action after the system has finished shutting down:
+
+/sys/devices/platform/dcdbas/host_control_action
+/sys/devices/platform/dcdbas/host_control_smi_type
+/sys/devices/platform/dcdbas/host_control_on_shutdown
+
+Dell OpenManage performs the following steps to execute a power cycle or
+power off host control action using this driver:
+
+1) Write host control action to be performed to host_control_action.
+2) Write type of SMI that driver needs to perform to host_control_smi_type.
+3) Write "1" to host_control_on_shutdown to enable host control action.
+4) Initiate OS shutdown.
+   (Driver will perform host control SMI when it is notified that the OS
+   has finished shutting down.)
+
+
+Host Control SMI Type
+=====================
+
+The following table shows the value to write to host_control_smi_type to
+perform a power cycle or power off host control action:
+
+=================== =====================
+PowerEdge System    Host Control SMI Type
+=================== =====================
+      300             HC_SMITYPE_TYPE1
+     1300             HC_SMITYPE_TYPE1
+     1400             HC_SMITYPE_TYPE2
+      500SC           HC_SMITYPE_TYPE2
+     1500SC           HC_SMITYPE_TYPE2
+     1550             HC_SMITYPE_TYPE2
+      600SC           HC_SMITYPE_TYPE2
+     1600SC           HC_SMITYPE_TYPE2
+      650             HC_SMITYPE_TYPE2
+     1655MC           HC_SMITYPE_TYPE2
+      700             HC_SMITYPE_TYPE3
+      750             HC_SMITYPE_TYPE3
+=================== =====================
diff --git a/Documentation/driver-api/dell_rbu.rst b/Documentation/driver-api/dell_rbu.rst
new file mode 100644
index 000000000000..5d1ce7bcd04d
--- /dev/null
+++ b/Documentation/driver-api/dell_rbu.rst
@@ -0,0 +1,128 @@
+=============================================================
+Usage of the new open sourced rbu (Remote BIOS Update) driver
+=============================================================
+
+Purpose
+=======
+
+Document demonstrating the use of the Dell Remote BIOS Update driver.
+for updating BIOS images on Dell servers and desktops.
+
+Scope
+=====
+
+This document discusses the functionality of the rbu driver only.
+It does not cover the support needed from applications to enable the BIOS to
+update itself with the image downloaded in to the memory.
+
+Overview
+========
+
+This driver works with Dell OpenManage or Dell Update Packages for updating
+the BIOS on Dell servers (starting from servers sold since 1999), desktops
+and notebooks (starting from those sold in 2005).
+
+Please go to  http://support.dell.com register and you can find info on
+OpenManage and Dell Update packages (DUP).
+
+Libsmbios can also be used to update BIOS on Dell systems go to
+http://linux.dell.com/libsmbios/ for details.
+
+Dell_RBU driver supports BIOS update using the monolithic image and packetized
+image methods. In case of monolithic the driver allocates a contiguous chunk
+of physical pages having the BIOS image. In case of packetized the app
+using the driver breaks the image in to packets of fixed sizes and the driver
+would place each packet in contiguous physical memory. The driver also
+maintains a link list of packets for reading them back.
+
+If the dell_rbu driver is unloaded all the allocated memory is freed.
+
+The rbu driver needs to have an application (as mentioned above)which will
+inform the BIOS to enable the update in the next system reboot.
+
+The user should not unload the rbu driver after downloading the BIOS image
+or updating.
+
+The driver load creates the following directories under the /sys file system::
+
+	/sys/class/firmware/dell_rbu/loading
+	/sys/class/firmware/dell_rbu/data
+	/sys/devices/platform/dell_rbu/image_type
+	/sys/devices/platform/dell_rbu/data
+	/sys/devices/platform/dell_rbu/packet_size
+
+The driver supports two types of update mechanism; monolithic and packetized.
+These update mechanism depends upon the BIOS currently running on the system.
+Most of the Dell systems support a monolithic update where the BIOS image is
+copied to a single contiguous block of physical memory.
+
+In case of packet mechanism the single memory can be broken in smaller chunks
+of contiguous memory and the BIOS image is scattered in these packets.
+
+By default the driver uses monolithic memory for the update type. This can be
+changed to packets during the driver load time by specifying the load
+parameter image_type=packet.  This can also be changed later as below::
+
+	echo packet > /sys/devices/platform/dell_rbu/image_type
+
+In packet update mode the packet size has to be given before any packets can
+be downloaded. It is done as below::
+
+	echo XXXX > /sys/devices/platform/dell_rbu/packet_size
+
+In the packet update mechanism, the user needs to create a new file having
+packets of data arranged back to back. It can be done as follows
+The user creates packets header, gets the chunk of the BIOS image and
+places it next to the packetheader; now, the packetheader + BIOS image chunk
+added together should match the specified packet_size. This makes one
+packet, the user needs to create more such packets out of the entire BIOS
+image file and then arrange all these packets back to back in to one single
+file.
+
+This file is then copied to /sys/class/firmware/dell_rbu/data.
+Once this file gets to the driver, the driver extracts packet_size data from
+the file and spreads it across the physical memory in contiguous packet_sized
+space.
+
+This method makes sure that all the packets get to the driver in a single operation.
+
+In monolithic update the user simply get the BIOS image (.hdr file) and copies
+to the data file as is without any change to the BIOS image itself.
+
+Do the steps below to download the BIOS image.
+
+1) echo 1 > /sys/class/firmware/dell_rbu/loading
+2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data
+3) echo 0 > /sys/class/firmware/dell_rbu/loading
+
+The /sys/class/firmware/dell_rbu/ entries will remain till the following is
+done.
+
+::
+
+	echo -1 > /sys/class/firmware/dell_rbu/loading
+
+Until this step is completed the driver cannot be unloaded.
+
+Also echoing either mono, packet or init in to image_type will free up the
+memory allocated by the driver.
+
+If a user by accident executes steps 1 and 3 above without executing step 2;
+it will make the /sys/class/firmware/dell_rbu/ entries disappear.
+
+The entries can be recreated by doing the following::
+
+	echo init > /sys/devices/platform/dell_rbu/image_type
+
+.. note:: echoing init in image_type does not change it original value.
+
+Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to
+read back the image downloaded.
+
+.. note::
+
+   After updating the BIOS image a user mode application needs to execute
+   code which sends the BIOS update request to the BIOS. So on the next reboot
+   the BIOS knows about the new image downloaded and it updates itself.
+   Also don't unload the rbu driver if the image has to be updated.
+
diff --git a/Documentation/driver-api/edid.rst b/Documentation/driver-api/edid.rst
new file mode 100644
index 000000000000..b1b5acd501ed
--- /dev/null
+++ b/Documentation/driver-api/edid.rst
@@ -0,0 +1,58 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+EDID
+====
+
+In the good old days when graphics parameters were configured explicitly
+in a file called xorg.conf, even broken hardware could be managed.
+
+Today, with the advent of Kernel Mode Setting, a graphics board is
+either correctly working because all components follow the standards -
+or the computer is unusable, because the screen remains dark after
+booting or it displays the wrong area. Cases when this happens are:
+- The graphics board does not recognize the monitor.
+- The graphics board is unable to detect any EDID data.
+- The graphics board incorrectly forwards EDID data to the driver.
+- The monitor sends no or bogus EDID data.
+- A KVM sends its own EDID data instead of querying the connected monitor.
+Adding the kernel parameter "nomodeset" helps in most cases, but causes
+restrictions later on.
+
+As a remedy for such situations, the kernel configuration item
+CONFIG_DRM_LOAD_EDID_FIRMWARE was introduced. It allows to provide an
+individually prepared or corrected EDID data set in the /lib/firmware
+directory from where it is loaded via the firmware interface. The code
+(see drivers/gpu/drm/drm_edid_load.c) contains built-in data sets for
+commonly used screen resolutions (800x600, 1024x768, 1280x1024, 1600x1200,
+1680x1050, 1920x1080) as binary blobs, but the kernel source tree does
+not contain code to create these data. In order to elucidate the origin
+of the built-in binary EDID blobs and to facilitate the creation of
+individual data for a specific misbehaving monitor, commented sources
+and a Makefile environment are given here.
+
+To create binary EDID and C source code files from the existing data
+material, simply type "make".
+
+If you want to create your own EDID file, copy the file 1024x768.S,
+replace the settings with your own data and add a new target to the
+Makefile. Please note that the EDID data structure expects the timing
+values in a different way as compared to the standard X11 format.
+
+X11:
+  HTimings:
+    hdisp hsyncstart hsyncend htotal
+  VTimings:
+    vdisp vsyncstart vsyncend vtotal
+
+EDID::
+
+  #define XPIX hdisp
+  #define XBLANK htotal-hdisp
+  #define XOFFSET hsyncstart-hdisp
+  #define XPULSE hsyncend-hsyncstart
+
+  #define YPIX vdisp
+  #define YBLANK vtotal-vdisp
+  #define YOFFSET vsyncstart-vdisp
+  #define YPULSE vsyncend-vsyncstart
diff --git a/Documentation/driver-api/eisa.rst b/Documentation/driver-api/eisa.rst
new file mode 100644
index 000000000000..c07565ba57da
--- /dev/null
+++ b/Documentation/driver-api/eisa.rst
@@ -0,0 +1,230 @@
+================
+EISA bus support
+================
+
+:Author: Marc Zyngier <maz@wild-wind.fr.eu.org>
+
+This document groups random notes about porting EISA drivers to the
+new EISA/sysfs API.
+
+Starting from version 2.5.59, the EISA bus is almost given the same
+status as other much more mainstream busses such as PCI or USB. This
+has been possible through sysfs, which defines a nice enough set of
+abstractions to manage busses, devices and drivers.
+
+Although the new API is quite simple to use, converting existing
+drivers to the new infrastructure is not an easy task (mostly because
+detection code is generally also used to probe ISA cards). Moreover,
+most EISA drivers are among the oldest Linux drivers so, as you can
+imagine, some dust has settled here over the years.
+
+The EISA infrastructure is made up of three parts:
+
+    - The bus code implements most of the generic code. It is shared
+      among all the architectures that the EISA code runs on. It
+      implements bus probing (detecting EISA cards available on the bus),
+      allocates I/O resources, allows fancy naming through sysfs, and
+      offers interfaces for driver to register.
+
+    - The bus root driver implements the glue between the bus hardware
+      and the generic bus code. It is responsible for discovering the
+      device implementing the bus, and setting it up to be latter probed
+      by the bus code. This can go from something as simple as reserving
+      an I/O region on x86, to the rather more complex, like the hppa
+      EISA code. This is the part to implement in order to have EISA
+      running on an "new" platform.
+
+    - The driver offers the bus a list of devices that it manages, and
+      implements the necessary callbacks to probe and release devices
+      whenever told to.
+
+Every function/structure below lives in <linux/eisa.h>, which depends
+heavily on <linux/device.h>.
+
+Bus root driver
+===============
+
+::
+
+	int eisa_root_register (struct eisa_root_device *root);
+
+The eisa_root_register function is used to declare a device as the
+root of an EISA bus. The eisa_root_device structure holds a reference
+to this device, as well as some parameters for probing purposes::
+
+	struct eisa_root_device {
+		struct device   *dev;	 /* Pointer to bridge device */
+		struct resource *res;
+		unsigned long    bus_base_addr;
+		int		 slots;  /* Max slot number */
+		int		 force_probe; /* Probe even when no slot 0 */
+		u64		 dma_mask; /* from bridge device */
+		int              bus_nr; /* Set by eisa_root_register */
+		struct resource  eisa_root_res;	/* ditto */
+	};
+
+============= ======================================================
+node          used for eisa_root_register internal purpose
+dev           pointer to the root device
+res           root device I/O resource
+bus_base_addr slot 0 address on this bus
+slots	      max slot number to probe
+force_probe   Probe even when slot 0 is empty (no EISA mainboard)
+dma_mask      Default DMA mask. Usually the bridge device dma_mask.
+bus_nr	      unique bus id, set by eisa_root_register
+============= ======================================================
+
+Driver
+======
+
+::
+
+	int eisa_driver_register (struct eisa_driver *edrv);
+	void eisa_driver_unregister (struct eisa_driver *edrv);
+
+Clear enough ?
+
+::
+
+	struct eisa_device_id {
+		char sig[EISA_SIG_LEN];
+		unsigned long driver_data;
+	};
+
+	struct eisa_driver {
+		const struct eisa_device_id *id_table;
+		struct device_driver         driver;
+	};
+
+=============== ====================================================
+id_table	an array of NULL terminated EISA id strings,
+		followed by an empty string. Each string can
+		optionally be paired with a driver-dependent value
+		(driver_data).
+
+driver		a generic driver, such as described in
+		Documentation/driver-api/driver-model/driver.rst. Only .name,
+		.probe and .remove members are mandatory.
+=============== ====================================================
+
+An example is the 3c59x driver::
+
+	static struct eisa_device_id vortex_eisa_ids[] = {
+		{ "TCM5920", EISA_3C592_OFFSET },
+		{ "TCM5970", EISA_3C597_OFFSET },
+		{ "" }
+	};
+
+	static struct eisa_driver vortex_eisa_driver = {
+		.id_table = vortex_eisa_ids,
+		.driver   = {
+			.name    = "3c59x",
+			.probe   = vortex_eisa_probe,
+			.remove  = vortex_eisa_remove
+		}
+	};
+
+Device
+======
+
+The sysfs framework calls .probe and .remove functions upon device
+discovery and removal (note that the .remove function is only called
+when driver is built as a module).
+
+Both functions are passed a pointer to a 'struct device', which is
+encapsulated in a 'struct eisa_device' described as follows::
+
+	struct eisa_device {
+		struct eisa_device_id id;
+		int                   slot;
+		int                   state;
+		unsigned long         base_addr;
+		struct resource       res[EISA_MAX_RESOURCES];
+		u64                   dma_mask;
+		struct device         dev; /* generic device */
+	};
+
+======== ============================================================
+id	 EISA id, as read from device. id.driver_data is set from the
+	 matching driver EISA id.
+slot	 slot number which the device was detected on
+state    set of flags indicating the state of the device. Current
+	 flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED.
+res	 set of four 256 bytes I/O regions allocated to this device
+dma_mask DMA mask set from the parent device.
+dev	 generic device (see Documentation/driver-api/driver-model/device.rst)
+======== ============================================================
+
+You can get the 'struct eisa_device' from 'struct device' using the
+'to_eisa_device' macro.
+
+Misc stuff
+==========
+
+::
+
+	void eisa_set_drvdata (struct eisa_device *edev, void *data);
+
+Stores data into the device's driver_data area.
+
+::
+
+	void *eisa_get_drvdata (struct eisa_device *edev):
+
+Gets the pointer previously stored into the device's driver_data area.
+
+::
+
+	int eisa_get_region_index (void *addr);
+
+Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given
+address.
+
+Kernel parameters
+=================
+
+eisa_bus.enable_dev
+	A comma-separated list of slots to be enabled, even if the firmware
+	set the card as disabled. The driver must be able to properly
+	initialize the device in such conditions.
+
+eisa_bus.disable_dev
+	A comma-separated list of slots to be enabled, even if the firmware
+	set the card as enabled. The driver won't be called to handle this
+	device.
+
+virtual_root.force_probe
+	Force the probing code to probe EISA slots even when it cannot find an
+	EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
+	(don't force), and set to 1 (force probing) when either
+	CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
+
+Random notes
+============
+
+Converting an EISA driver to the new API mostly involves *deleting*
+code (since probing is now in the core EISA code). Unfortunately, most
+drivers share their probing routine between ISA, and EISA. Special
+care must be taken when ripping out the EISA code, so other busses
+won't suffer from these surgical strikes...
+
+You *must not* expect any EISA device to be detected when returning
+from eisa_driver_register, since the chances are that the bus has not
+yet been probed. In fact, that's what happens most of the time (the
+bus root driver usually kicks in rather late in the boot process).
+Unfortunately, most drivers are doing the probing by themselves, and
+expect to have explored the whole machine when they exit their probe
+routine.
+
+For example, switching your favorite EISA SCSI card to the "hotplug"
+model is "the right thing"(tm).
+
+Thanks
+======
+
+I'd like to thank the following people for their help:
+
+- Xavier Benigni for lending me a wonderful Alpha Jensen,
+- James Bottomley, Jeff Garzik for getting this stuff into the kernel,
+- Andries Brouwer for contributing numerous EISA ids,
+- Catrin Jones for coping with far too many machines at home.
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 9fb03b7bdeb1..d1c6513dd20d 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -68,7 +68,33 @@ available subsections can be seen below.
    fpga/index
    acpi/index
    backlight/lp855x-driver.rst
+   bt8xxgpio
+   connector
+   console
+   dcdbas
+   dell_rbu
+   edid
+   eisa
+   isa
+   isapnp
    generic-counter
+   lightnvm-pblk
+   men-chameleon-bus
+   ntb
+   nvmem
+   parport-lowlevel
+   pti_intel_mid
+   pwm
+   rfkill
+   sgi-ioc4
+   sm501
+   smsc_ece1099
+   switchtec
+   sync_file
+   vfio-mediated-device
+   vfio
+   xillybus
+   zorro
 
 .. only::  subproject and html
 
diff --git a/Documentation/driver-api/isa.rst b/Documentation/driver-api/isa.rst
new file mode 100644
index 000000000000..def4a7b690b5
--- /dev/null
+++ b/Documentation/driver-api/isa.rst
@@ -0,0 +1,122 @@
+===========
+ISA Drivers
+===========
+
+The following text is adapted from the commit message of the initial
+commit of the ISA bus driver authored by Rene Herman.
+
+During the recent "isa drivers using platform devices" discussion it was
+pointed out that (ALSA) ISA drivers ran into the problem of not having
+the option to fail driver load (device registration rather) upon not
+finding their hardware due to a probe() error not being passed up
+through the driver model. In the course of that, I suggested a separate
+ISA bus might be best; Russell King agreed and suggested this bus could
+use the .match() method for the actual device discovery.
+
+The attached does this. For this old non (generically) discoverable ISA
+hardware only the driver itself can do discovery so as a difference with
+the platform_bus, this isa_bus also distributes match() up to the
+driver.
+
+As another difference: these devices only exist in the driver model due
+to the driver creating them because it might want to drive them, meaning
+that all device creation has been made internal as well.
+
+The usage model this provides is nice, and has been acked from the ALSA
+side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's
+now (for oldisa-only drivers) become::
+
+	static int __init alsa_card_foo_init(void)
+	{
+		return isa_register_driver(&snd_foo_isa_driver, SNDRV_CARDS);
+	}
+
+	static void __exit alsa_card_foo_exit(void)
+	{
+		isa_unregister_driver(&snd_foo_isa_driver);
+	}
+
+Quite like the other bus models therefore. This removes a lot of
+duplicated init code from the ALSA ISA drivers.
+
+The passed in isa_driver struct is the regular driver struct embedding a
+struct device_driver, the normal probe/remove/shutdown/suspend/resume
+callbacks, and as indicated that .match callback.
+
+The "SNDRV_CARDS" you see being passed in is a "unsigned int ndev"
+parameter, indicating how many devices to create and call our methods
+with.
+
+The platform_driver callbacks are called with a platform_device param;
+the isa_driver callbacks are being called with a ``struct device *dev,
+unsigned int id`` pair directly -- with the device creation completely
+internal to the bus it's much cleaner to not leak isa_dev's by passing
+them in at all. The id is the only thing we ever want other then the
+struct device anyways, and it makes for nicer code in the callbacks as
+well.
+
+With this additional .match() callback ISA drivers have all options. If
+ALSA would want to keep the old non-load behaviour, it could stick all
+of the old .probe in .match, which would only keep them registered after
+everything was found to be present and accounted for. If it wanted the
+behaviour of always loading as it inadvertently did for a bit after the
+changeover to platform devices, it could just not provide a .match() and
+do everything in .probe() as before.
+
+If it, as Takashi Iwai already suggested earlier as a way of following
+the model from saner buses more closely, wants to load when a later bind
+could conceivably succeed, it could use .match() for the prerequisites
+(such as checking the user wants the card enabled and that port/irq/dma
+values have been passed in) and .probe() for everything else. This is
+the nicest model.
+
+To the code...
+
+This exports only two functions; isa_{,un}register_driver().
+
+isa_register_driver() register's the struct device_driver, and then
+loops over the passed in ndev creating devices and registering them.
+This causes the bus match method to be called for them, which is::
+
+	int isa_bus_match(struct device *dev, struct device_driver *driver)
+	{
+		struct isa_driver *isa_driver = to_isa_driver(driver);
+
+		if (dev->platform_data == isa_driver) {
+			if (!isa_driver->match ||
+				isa_driver->match(dev, to_isa_dev(dev)->id))
+				return 1;
+			dev->platform_data = NULL;
+		}
+		return 0;
+	}
+
+The first thing this does is check if this device is in fact one of this
+driver's devices by seeing if the device's platform_data pointer is set
+to this driver. Platform devices compare strings, but we don't need to
+do that with everything being internal, so isa_register_driver() abuses
+dev->platform_data as a isa_driver pointer which we can then check here.
+I believe platform_data is available for this, but if rather not, moving
+the isa_driver pointer to the private struct isa_dev is ofcourse fine as
+well.
+
+Then, if the the driver did not provide a .match, it matches. If it did,
+the driver match() method is called to determine a match.
+
+If it did **not** match, dev->platform_data is reset to indicate this to
+isa_register_driver which can then unregister the device again.
+
+If during all this, there's any error, or no devices matched at all
+everything is backed out again and the error, or -ENODEV, is returned.
+
+isa_unregister_driver() just unregisters the matched devices and the
+driver itself.
+
+module_isa_driver is a helper macro for ISA drivers which do not do
+anything special in module init/exit. This eliminates a lot of
+boilerplate code. Each module may only use this macro once, and calling
+it replaces module_init and module_exit.
+
+max_num_isa_dev is a macro to determine the maximum possible number of
+ISA devices which may be registered in the I/O port address space given
+the address extent of the ISA devices.
diff --git a/Documentation/driver-api/isapnp.rst b/Documentation/driver-api/isapnp.rst
new file mode 100644
index 000000000000..8d0840ac847b
--- /dev/null
+++ b/Documentation/driver-api/isapnp.rst
@@ -0,0 +1,15 @@
+==========================================================
+ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
+==========================================================
+
+Interface /proc/isapnp
+======================
+
+The interface has been removed. See pnp.txt for more details.
+
+Interface /proc/bus/isapnp
+==========================
+
+This directory allows access to ISA PnP cards and logical devices.
+The regular files contain the contents of ISA PnP registers for
+a logical device.
diff --git a/Documentation/driver-api/lightnvm-pblk.rst b/Documentation/driver-api/lightnvm-pblk.rst
new file mode 100644
index 000000000000..1040ed1cec81
--- /dev/null
+++ b/Documentation/driver-api/lightnvm-pblk.rst
@@ -0,0 +1,21 @@
+pblk: Physical Block Device Target
+==================================
+
+pblk implements a fully associative, host-based FTL that exposes a traditional
+block I/O interface. Its primary responsibilities are:
+
+  - Map logical addresses onto physical addresses (4KB granularity) in a
+    logical-to-physical (L2P) table.
+  - Maintain the integrity and consistency of the L2P table as well as its
+    recovery from normal tear down and power outage.
+  - Deal with controller- and media-specific constrains.
+  - Handle I/O errors.
+  - Implement garbage collection.
+  - Maintain consistency across the I/O stack during synchronization points.
+
+For more information please refer to:
+
+  http://lightnvm.io
+
+which maintains updated FAQs, manual pages, technical documentation, tools,
+contacts, etc.
diff --git a/Documentation/driver-api/men-chameleon-bus.rst b/Documentation/driver-api/men-chameleon-bus.rst
new file mode 100644
index 000000000000..1b1f048aa748
--- /dev/null
+++ b/Documentation/driver-api/men-chameleon-bus.rst
@@ -0,0 +1,175 @@
+=================
+MEN Chameleon Bus
+=================
+
+.. Table of Contents
+   =================
+   1 Introduction
+       1.1 Scope of this Document
+       1.2 Limitations of the current implementation
+   2 Architecture
+       2.1 MEN Chameleon Bus
+       2.2 Carrier Devices
+       2.3 Parser
+   3 Resource handling
+       3.1 Memory Resources
+       3.2 IRQs
+   4 Writing an MCB driver
+       4.1 The driver structure
+       4.2 Probing and attaching
+       4.3 Initializing the driver
+
+
+Introduction
+============
+
+This document describes the architecture and implementation of the MEN
+Chameleon Bus (called MCB throughout this document).
+
+Scope of this Document
+----------------------
+
+This document is intended to be a short overview of the current
+implementation and does by no means describe the complete possibilities of MCB
+based devices.
+
+Limitations of the current implementation
+-----------------------------------------
+
+The current implementation is limited to PCI and PCIe based carrier devices
+that only use a single memory resource and share the PCI legacy IRQ.  Not
+implemented are:
+
+- Multi-resource MCB devices like the VME Controller or M-Module carrier.
+- MCB devices that need another MCB device, like SRAM for a DMA Controller's
+  buffer descriptors or a video controller's video memory.
+- A per-carrier IRQ domain for carrier devices that have one (or more) IRQs
+  per MCB device like PCIe based carriers with MSI or MSI-X support.
+
+Architecture
+============
+
+MCB is divided into 3 functional blocks:
+
+- The MEN Chameleon Bus itself,
+- drivers for MCB Carrier Devices and
+- the parser for the Chameleon table.
+
+MEN Chameleon Bus
+-----------------
+
+The MEN Chameleon Bus is an artificial bus system that attaches to a so
+called Chameleon FPGA device found on some hardware produced my MEN Mikro
+Elektronik GmbH. These devices are multi-function devices implemented in a
+single FPGA and usually attached via some sort of PCI or PCIe link. Each
+FPGA contains a header section describing the content of the FPGA. The
+header lists the device id, PCI BAR, offset from the beginning of the PCI
+BAR, size in the FPGA, interrupt number and some other properties currently
+not handled by the MCB implementation.
+
+Carrier Devices
+---------------
+
+A carrier device is just an abstraction for the real world physical bus the
+Chameleon FPGA is attached to. Some IP Core drivers may need to interact with
+properties of the carrier device (like querying the IRQ number of a PCI
+device). To provide abstraction from the real hardware bus, an MCB carrier
+device provides callback methods to translate the driver's MCB function calls
+to hardware related function calls. For example a carrier device may
+implement the get_irq() method which can be translated into a hardware bus
+query for the IRQ number the device should use.
+
+Parser
+------
+
+The parser reads the first 512 bytes of a Chameleon device and parses the
+Chameleon table. Currently the parser only supports the Chameleon v2 variant
+of the Chameleon table but can easily be adopted to support an older or
+possible future variant. While parsing the table's entries new MCB devices
+are allocated and their resources are assigned according to the resource
+assignment in the Chameleon table. After resource assignment is finished, the
+MCB devices are registered at the MCB and thus at the driver core of the
+Linux kernel.
+
+Resource handling
+=================
+
+The current implementation assigns exactly one memory and one IRQ resource
+per MCB device. But this is likely going to change in the future.
+
+Memory Resources
+----------------
+
+Each MCB device has exactly one memory resource, which can be requested from
+the MCB bus. This memory resource is the physical address of the MCB device
+inside the carrier and is intended to be passed to ioremap() and friends. It
+is already requested from the kernel by calling request_mem_region().
+
+IRQs
+----
+
+Each MCB device has exactly one IRQ resource, which can be requested from the
+MCB bus. If a carrier device driver implements the ->get_irq() callback
+method, the IRQ number assigned by the carrier device will be returned,
+otherwise the IRQ number inside the Chameleon table will be returned. This
+number is suitable to be passed to request_irq().
+
+Writing an MCB driver
+=====================
+
+The driver structure
+--------------------
+
+Each MCB driver has a structure to identify the device driver as well as
+device ids which identify the IP Core inside the FPGA. The driver structure
+also contains callback methods which get executed on driver probe and
+removal from the system::
+
+	static const struct mcb_device_id foo_ids[] = {
+		{ .device = 0x123 },
+		{ }
+	};
+	MODULE_DEVICE_TABLE(mcb, foo_ids);
+
+	static struct mcb_driver foo_driver = {
+	driver = {
+		.name = "foo-bar",
+		.owner = THIS_MODULE,
+	},
+		.probe = foo_probe,
+		.remove = foo_remove,
+		.id_table = foo_ids,
+	};
+
+Probing and attaching
+---------------------
+
+When a driver is loaded and the MCB devices it services are found, the MCB
+core will call the driver's probe callback method. When the driver is removed
+from the system, the MCB core will call the driver's remove callback method::
+
+	static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id);
+	static void foo_remove(struct mcb_device *mdev);
+
+Initializing the driver
+-----------------------
+
+When the kernel is booted or your foo driver module is inserted, you have to
+perform driver initialization. Usually it is enough to register your driver
+module at the MCB core::
+
+	static int __init foo_init(void)
+	{
+		return mcb_register_driver(&foo_driver);
+	}
+	module_init(foo_init);
+
+	static void __exit foo_exit(void)
+	{
+		mcb_unregister_driver(&foo_driver);
+	}
+	module_exit(foo_exit);
+
+The module_mcb_driver() macro can be used to reduce the above code::
+
+	module_mcb_driver(foo_driver);
diff --git a/Documentation/driver-api/ntb.rst b/Documentation/driver-api/ntb.rst
new file mode 100644
index 000000000000..074a423c853c
--- /dev/null
+++ b/Documentation/driver-api/ntb.rst
@@ -0,0 +1,236 @@
+===========
+NTB Drivers
+===========
+
+NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
+the separate memory systems of two or more computers to the same PCI-Express
+fabric. Existing NTB hardware supports a common feature set: doorbell
+registers and memory translation windows, as well as non common features like
+scratchpad and message registers. Scratchpad registers are read-and-writable
+registers that are accessible from either side of the device, so that peers can
+exchange a small amount of information at a fixed address. Message registers can
+be utilized for the same purpose. Additionally they are provided with with
+special status bits to make sure the information isn't rewritten by another
+peer. Doorbell registers provide a way for peers to send interrupt events.
+Memory windows allow translated read and write access to the peer memory.
+
+NTB Core Driver (ntb)
+=====================
+
+The NTB core driver defines an api wrapping the common feature set, and allows
+clients interested in NTB features to discover NTB the devices supported by
+hardware drivers.  The term "client" is used here to mean an upper layer
+component making use of the NTB api.  The term "driver," or "hardware driver,"
+is used here to mean a driver for a specific vendor and model of NTB hardware.
+
+NTB Client Drivers
+==================
+
+NTB client drivers should register with the NTB core driver.  After
+registering, the client probe and remove functions will be called appropriately
+as ntb hardware, or hardware drivers, are inserted and removed.  The
+registration uses the Linux Device framework, so it should feel familiar to
+anyone who has written a pci driver.
+
+NTB Typical client driver implementation
+----------------------------------------
+
+Primary purpose of NTB is to share some peace of memory between at least two
+systems. So the NTB device features like Scratchpad/Message registers are
+mainly used to perform the proper memory window initialization. Typically
+there are two types of memory window interfaces supported by the NTB API:
+inbound translation configured on the local ntb port and outbound translation
+configured by the peer, on the peer ntb port. The first type is
+depicted on the next figure::
+
+ Inbound translation:
+
+ Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
+  ____________
+ | dma-mapped |-ntb_mw_set_trans(addr)  |
+ | memory     |        _v____________   |   ______________
+ | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
+ |------------|       |--------------|  |  |--------------|
+
+So typical scenario of the first type memory window initialization looks:
+1) allocate a memory region, 2) put translated address to NTB config,
+3) somehow notify a peer device of performed initialization, 4) peer device
+maps corresponding outbound memory window so to have access to the shared
+memory region.
+
+The second type of interface, that implies the shared windows being
+initialized by a peer device, is depicted on the figure::
+
+ Outbound translation:
+
+ Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
+  ____________                      ______________
+ | dma-mapped |                |   | MW base addr |<== memory-mapped IO
+ | memory     |                |   |--------------|
+ | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
+ |------------|                |   |--------------|
+
+Typical scenario of the second type interface initialization would be:
+1) allocate a memory region, 2) somehow deliver a translated address to a peer
+device, 3) peer puts the translated address to NTB config, 4) peer device maps
+outbound memory window so to have access to the shared memory region.
+
+As one can see the described scenarios can be combined in one portable
+algorithm.
+
+ Local device:
+  1) Allocate memory for a shared window
+  2) Initialize memory window by translated address of the allocated region
+     (it may fail if local memory window initialization is unsupported)
+  3) Send the translated address and memory window index to a peer device
+
+ Peer device:
+  1) Initialize memory window with retrieved address of the allocated
+     by another device memory region (it may fail if peer memory window
+     initialization is unsupported)
+  2) Map outbound memory window
+
+In accordance with this scenario, the NTB Memory Window API can be used as
+follows:
+
+ Local device:
+  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
+     be allocated for memory windows between local device and peer device
+     of port with specified index.
+  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
+     shared memory region alignment and size. Then memory can be properly
+     allocated.
+  3) Allocate physically contiguous memory region in compliance with
+     restrictions retrieved in 2).
+  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
+     the memory window with specified index for the defined peer device
+     (it may fail if local translated address setting is not supported)
+  5) Send translated base address (usually together with memory window
+     number) to the peer device using, for instance, scratchpad or message
+     registers.
+
+ Peer device:
+  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
+     device (related to pidx) translated address for specified memory
+     window. It may fail if retrieved address, for instance, exceeds
+     maximum possible address or isn't properly aligned.
+  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
+     window so to have an access to the shared memory.
+
+Also it is worth to note, that method ntb_mw_count(pidx) should return the
+same value as ntb_peer_mw_count() on the peer with port index - pidx.
+
+NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
+------------------------------------------------------------------
+
+The primary client for NTB is the Transport client, used in tandem with NTB
+Netdev.  These drivers function together to create a logical link to the peer,
+across the ntb, to exchange packets of network data.  The Transport client
+establishes a logical link to the peer, and creates queue pairs to exchange
+messages and data.  The NTB Netdev then creates an ethernet device using a
+Transport queue pair.  Network data is copied between socket buffers and the
+Transport queue pair buffer.  The Transport client may be used for other things
+besides Netdev, however no other applications have yet been written.
+
+NTB Ping Pong Test Client (ntb\_pingpong)
+-----------------------------------------
+
+The Ping Pong test client serves as a demonstration to exercise the doorbell
+and scratchpad registers of NTB hardware, and as an example simple NTB client.
+Ping Pong enables the link when started, waits for the NTB link to come up, and
+then proceeds to read and write the doorbell scratchpad registers of the NTB.
+The peers interrupt each other using a bit mask of doorbell bits, which is
+shifted by one in each round, to test the behavior of multiple doorbell bits
+and interrupt vectors.  The Ping Pong driver also reads the first local
+scratchpad, and writes the value plus one to the first peer scratchpad, each
+round before writing the peer doorbell register.
+
+Module Parameters:
+
+* unsafe - Some hardware has known issues with scratchpad and doorbell
+	registers.  By default, Ping Pong will not attempt to exercise such
+	hardware.  You may override this behavior at your own risk by setting
+	unsafe=1.
+* delay\_ms - Specify the delay between receiving a doorbell
+	interrupt event and setting the peer doorbell register for the next
+	round.
+* init\_db - Specify the doorbell bits to start new series of rounds.  A new
+	series begins once all the doorbell bits have been shifted out of
+	range.
+* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
+	then to observe debugging output on the console.
+
+NTB Tool Test Client (ntb\_tool)
+--------------------------------
+
+The Tool test client serves for debugging, primarily, ntb hardware and drivers.
+The Tool provides access through debugfs for reading, setting, and clearing the
+NTB doorbell, and reading and writing scratchpads.
+
+The Tool does not currently have any module parameters.
+
+Debugfs Files:
+
+* *debugfs*/ntb\_tool/*hw*/
+	A directory in debugfs will be created for each
+	NTB device probed by the tool.  This directory is shortened to *hw*
+	below.
+* *hw*/db
+	This file is used to read, set, and clear the local doorbell.  Not
+	all operations may be supported by all hardware.  To read the doorbell,
+	read the file.  To set the doorbell, write `s` followed by the bits to
+	set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
+	followed by the bits to clear.
+* *hw*/mask
+	This file is used to read, set, and clear the local doorbell mask.
+	See *db* for details.
+* *hw*/peer\_db
+	This file is used to read, set, and clear the peer doorbell.
+	See *db* for details.
+* *hw*/peer\_mask
+	This file is used to read, set, and clear the peer doorbell
+	mask.  See *db* for details.
+* *hw*/spad
+	This file is used to read and write local scratchpads.  To read
+	the values of all scratchpads, read the file.  To write values, write a
+	series of pairs of scratchpad number and value
+	(eg: `echo '4 0x123 7 0xabc' > spad`
+	# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
+* *hw*/peer\_spad
+	This file is used to read and write peer scratchpads.  See
+	*spad* for details.
+
+NTB Hardware Drivers
+====================
+
+NTB hardware drivers should register devices with the NTB core driver.  After
+registering, clients probe and remove functions will be called.
+
+NTB Intel Hardware Driver (ntb\_hw\_intel)
+------------------------------------------
+
+The Intel hardware driver supports NTB on Xeon and Atom CPUs.
+
+Module Parameters:
+
+* b2b\_mw\_idx
+	If the peer ntb is to be accessed via a memory window, then use
+	this memory window to access the peer ntb.  A value of zero or positive
+	starts from the first mw idx, and a negative value starts from the last
+	mw idx.  Both sides MUST set the same value here!  The default value is
+	`-1`.
+* b2b\_mw\_share
+	If the peer ntb is to be accessed via a memory window, and if
+	the memory window is large enough, still allow the client to use the
+	second half of the memory window for address translation to the peer.
+* xeon\_b2b\_usd\_bar2\_addr64
+	If using B2B topology on Xeon hardware, use
+	this 64 bit address on the bus between the NTB devices for the window
+	at BAR2, on the upstream side of the link.
+* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
+* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
+* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
+* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
+* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
+* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
+* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
diff --git a/Documentation/driver-api/nvmem.rst b/Documentation/driver-api/nvmem.rst
new file mode 100644
index 000000000000..d9d958d5c824
--- /dev/null
+++ b/Documentation/driver-api/nvmem.rst
@@ -0,0 +1,189 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+NVMEM Subsystem
+===============
+
+ Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
+
+This document explains the NVMEM Framework along with the APIs provided,
+and how to use it.
+
+1. Introduction
+===============
+*NVMEM* is the abbreviation for Non Volatile Memory layer. It is used to
+retrieve configuration of SOC or Device specific data from non volatile
+memories like eeprom, efuses and so on.
+
+Before this framework existed, NVMEM drivers like eeprom were stored in
+drivers/misc, where they all had to duplicate pretty much the same code to
+register a sysfs file, allow in-kernel users to access the content of the
+devices they were driving, etc.
+
+This was also a problem as far as other in-kernel users were involved, since
+the solutions used were pretty much different from one driver to another, there
+was a rather big abstraction leak.
+
+This framework aims at solve these problems. It also introduces DT
+representation for consumer devices to go get the data they require (MAC
+Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
+framework is based on regmap, so that most of the abstraction available in
+regmap can be reused, across multiple types of buses.
+
+NVMEM Providers
++++++++++++++++
+
+NVMEM provider refers to an entity that implements methods to initialize, read
+and write the non-volatile memory.
+
+2. Registering/Unregistering the NVMEM provider
+===============================================
+
+A NVMEM provider can register with NVMEM core by supplying relevant
+nvmem configuration to nvmem_register(), on success core would return a valid
+nvmem_device pointer.
+
+nvmem_unregister(nvmem) is used to unregister a previously registered provider.
+
+For example, a simple qfprom case::
+
+  static struct nvmem_config econfig = {
+	.name = "qfprom",
+	.owner = THIS_MODULE,
+  };
+
+  static int qfprom_probe(struct platform_device *pdev)
+  {
+	...
+	econfig.dev = &pdev->dev;
+	nvmem = nvmem_register(&econfig);
+	...
+  }
+
+It is mandatory that the NVMEM provider has a regmap associated with its
+struct device. Failure to do would return error code from nvmem_register().
+
+Users of board files can define and register nvmem cells using the
+nvmem_cell_table struct::
+
+  static struct nvmem_cell_info foo_nvmem_cells[] = {
+	{
+		.name		= "macaddr",
+		.offset		= 0x7f00,
+		.bytes		= ETH_ALEN,
+	}
+  };
+
+  static struct nvmem_cell_table foo_nvmem_cell_table = {
+	.nvmem_name		= "i2c-eeprom",
+	.cells			= foo_nvmem_cells,
+	.ncells			= ARRAY_SIZE(foo_nvmem_cells),
+  };
+
+  nvmem_add_cell_table(&foo_nvmem_cell_table);
+
+Additionally it is possible to create nvmem cell lookup entries and register
+them with the nvmem framework from machine code as shown in the example below::
+
+  static struct nvmem_cell_lookup foo_nvmem_lookup = {
+	.nvmem_name		= "i2c-eeprom",
+	.cell_name		= "macaddr",
+	.dev_id			= "foo_mac.0",
+	.con_id			= "mac-address",
+  };
+
+  nvmem_add_cell_lookups(&foo_nvmem_lookup, 1);
+
+NVMEM Consumers
++++++++++++++++
+
+NVMEM consumers are the entities which make use of the NVMEM provider to
+read from and to NVMEM.
+
+3. NVMEM cell based consumer APIs
+=================================
+
+NVMEM cells are the data entries/fields in the NVMEM.
+The NVMEM framework provides 3 APIs to read/write NVMEM cells::
+
+  struct nvmem_cell *nvmem_cell_get(struct device *dev, const char *name);
+  struct nvmem_cell *devm_nvmem_cell_get(struct device *dev, const char *name);
+
+  void nvmem_cell_put(struct nvmem_cell *cell);
+  void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
+
+  void *nvmem_cell_read(struct nvmem_cell *cell, ssize_t *len);
+  int nvmem_cell_write(struct nvmem_cell *cell, void *buf, ssize_t len);
+
+`*nvmem_cell_get()` apis will get a reference to nvmem cell for a given id,
+and nvmem_cell_read/write() can then read or write to the cell.
+Once the usage of the cell is finished the consumer should call
+`*nvmem_cell_put()` to free all the allocation memory for the cell.
+
+4. Direct NVMEM device based consumer APIs
+==========================================
+
+In some instances it is necessary to directly read/write the NVMEM.
+To facilitate such consumers NVMEM framework provides below apis::
+
+  struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
+  struct nvmem_device *devm_nvmem_device_get(struct device *dev,
+					   const char *name);
+  void nvmem_device_put(struct nvmem_device *nvmem);
+  int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
+		      size_t bytes, void *buf);
+  int nvmem_device_write(struct nvmem_device *nvmem, unsigned int offset,
+		       size_t bytes, void *buf);
+  int nvmem_device_cell_read(struct nvmem_device *nvmem,
+			   struct nvmem_cell_info *info, void *buf);
+  int nvmem_device_cell_write(struct nvmem_device *nvmem,
+			    struct nvmem_cell_info *info, void *buf);
+
+Before the consumers can read/write NVMEM directly, it should get hold
+of nvmem_controller from one of the `*nvmem_device_get()` api.
+
+The difference between these apis and cell based apis is that these apis always
+take nvmem_device as parameter.
+
+5. Releasing a reference to the NVMEM
+=====================================
+
+When a consumer no longer needs the NVMEM, it has to release the reference
+to the NVMEM it has obtained using the APIs mentioned in the above section.
+The NVMEM framework provides 2 APIs to release a reference to the NVMEM::
+
+  void nvmem_cell_put(struct nvmem_cell *cell);
+  void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
+  void nvmem_device_put(struct nvmem_device *nvmem);
+  void devm_nvmem_device_put(struct device *dev, struct nvmem_device *nvmem);
+
+Both these APIs are used to release a reference to the NVMEM and
+devm_nvmem_cell_put and devm_nvmem_device_put destroys the devres associated
+with this NVMEM.
+
+Userspace
++++++++++
+
+6. Userspace binary interface
+==============================
+
+Userspace can read/write the raw NVMEM file located at::
+
+	/sys/bus/nvmem/devices/*/nvmem
+
+ex::
+
+  hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
+
+  0000000 0000 0000 0000 0000 0000 0000 0000 0000
+  *
+  00000a0 db10 2240 0000 e000 0c00 0c00 0000 0c00
+  0000000 0000 0000 0000 0000 0000 0000 0000 0000
+  ...
+  *
+  0001000
+
+7. DeviceTree Binding
+=====================
+
+See Documentation/devicetree/bindings/nvmem/nvmem.txt
diff --git a/Documentation/driver-api/parport-lowlevel.rst b/Documentation/driver-api/parport-lowlevel.rst
new file mode 100644
index 000000000000..0633d70ffda7
--- /dev/null
+++ b/Documentation/driver-api/parport-lowlevel.rst
@@ -0,0 +1,1832 @@
+===============================
+PARPORT interface documentation
+===============================
+
+:Time-stamp: <2000-02-24 13:30:20 twaugh>
+
+Described here are the following functions:
+
+Global functions::
+  parport_register_driver
+  parport_unregister_driver
+  parport_enumerate
+  parport_register_device
+  parport_unregister_device
+  parport_claim
+  parport_claim_or_block
+  parport_release
+  parport_yield
+  parport_yield_blocking
+  parport_wait_peripheral
+  parport_poll_peripheral
+  parport_wait_event
+  parport_negotiate
+  parport_read
+  parport_write
+  parport_open
+  parport_close
+  parport_device_id
+  parport_device_coords
+  parport_find_class
+  parport_find_device
+  parport_set_timeout
+
+Port functions (can be overridden by low-level drivers):
+
+  SPP::
+    port->ops->read_data
+    port->ops->write_data
+    port->ops->read_status
+    port->ops->read_control
+    port->ops->write_control
+    port->ops->frob_control
+    port->ops->enable_irq
+    port->ops->disable_irq
+    port->ops->data_forward
+    port->ops->data_reverse
+
+  EPP::
+    port->ops->epp_write_data
+    port->ops->epp_read_data
+    port->ops->epp_write_addr
+    port->ops->epp_read_addr
+
+  ECP::
+    port->ops->ecp_write_data
+    port->ops->ecp_read_data
+    port->ops->ecp_write_addr
+
+  Other::
+    port->ops->nibble_read_data
+    port->ops->byte_read_data
+    port->ops->compat_write_data
+
+The parport subsystem comprises ``parport`` (the core port-sharing
+code), and a variety of low-level drivers that actually do the port
+accesses.  Each low-level driver handles a particular style of port
+(PC, Amiga, and so on).
+
+The parport interface to the device driver author can be broken down
+into global functions and port functions.
+
+The global functions are mostly for communicating between the device
+driver and the parport subsystem: acquiring a list of available ports,
+claiming a port for exclusive use, and so on.  They also include
+``generic`` functions for doing standard things that will work on any
+IEEE 1284-capable architecture.
+
+The port functions are provided by the low-level drivers, although the
+core parport module provides generic ``defaults`` for some routines.
+The port functions can be split into three groups: SPP, EPP, and ECP.
+
+SPP (Standard Parallel Port) functions modify so-called ``SPP``
+registers: data, status, and control.  The hardware may not actually
+have registers exactly like that, but the PC does and this interface is
+modelled after common PC implementations.  Other low-level drivers may
+be able to emulate most of the functionality.
+
+EPP (Enhanced Parallel Port) functions are provided for reading and
+writing in IEEE 1284 EPP mode, and ECP (Extended Capabilities Port)
+functions are used for IEEE 1284 ECP mode. (What about BECP? Does
+anyone care?)
+
+Hardware assistance for EPP and/or ECP transfers may or may not be
+available, and if it is available it may or may not be used.  If
+hardware is not used, the transfer will be software-driven.  In order
+to cope with peripherals that only tenuously support IEEE 1284, a
+low-level driver specific function is provided, for altering 'fudge
+factors'.
+
+Global functions
+================
+
+parport_register_driver - register a device driver with parport
+---------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_driver {
+		const char *name;
+		void (*attach) (struct parport *);
+		void (*detach) (struct parport *);
+		struct parport_driver *next;
+	};
+	int parport_register_driver (struct parport_driver *driver);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+In order to be notified about parallel ports when they are detected,
+parport_register_driver should be called.  Your driver will
+immediately be notified of all ports that have already been detected,
+and of each new port as low-level drivers are loaded.
+
+A ``struct parport_driver`` contains the textual name of your driver,
+a pointer to a function to handle new ports, and a pointer to a
+function to handle ports going away due to a low-level driver
+unloading.  Ports will only be detached if they are not being used
+(i.e. there are no devices registered on them).
+
+The visible parts of the ``struct parport *`` argument given to
+attach/detach are::
+
+	struct parport
+	{
+		struct parport *next; /* next parport in list */
+		const char *name;     /* port's name */
+		unsigned int modes;   /* bitfield of hardware modes */
+		struct parport_device_info probe_info;
+				/* IEEE1284 info */
+		int number;           /* parport index */
+		struct parport_operations *ops;
+		...
+	};
+
+There are other members of the structure, but they should not be
+touched.
+
+The ``modes`` member summarises the capabilities of the underlying
+hardware.  It consists of flags which may be bitwise-ored together:
+
+  ============================= ===============================================
+  PARPORT_MODE_PCSPP		IBM PC registers are available,
+				i.e. functions that act on data,
+				control and status registers are
+				probably writing directly to the
+				hardware.
+  PARPORT_MODE_TRISTATE		The data drivers may be turned off.
+				This allows the data lines to be used
+				for reverse (peripheral to host)
+				transfers.
+  PARPORT_MODE_COMPAT		The hardware can assist with
+				compatibility-mode (printer)
+				transfers, i.e. compat_write_block.
+  PARPORT_MODE_EPP		The hardware can assist with EPP
+				transfers.
+  PARPORT_MODE_ECP		The hardware can assist with ECP
+				transfers.
+  PARPORT_MODE_DMA		The hardware can use DMA, so you might
+				want to pass ISA DMA-able memory
+				(i.e. memory allocated using the
+				GFP_DMA flag with kmalloc) to the
+				low-level driver in order to take
+				advantage of it.
+  ============================= ===============================================
+
+There may be other flags in ``modes`` as well.
+
+The contents of ``modes`` is advisory only.  For example, if the
+hardware is capable of DMA, and PARPORT_MODE_DMA is in ``modes``, it
+doesn't necessarily mean that DMA will always be used when possible.
+Similarly, hardware that is capable of assisting ECP transfers won't
+necessarily be used.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+Zero on success, otherwise an error code.
+
+ERRORS
+^^^^^^
+
+None. (Can it fail? Why return int?)
+
+EXAMPLE
+^^^^^^^
+
+::
+
+	static void lp_attach (struct parport *port)
+	{
+		...
+		private = kmalloc (...);
+		dev[count++] = parport_register_device (...);
+		...
+	}
+
+	static void lp_detach (struct parport *port)
+	{
+		...
+	}
+
+	static struct parport_driver lp_driver = {
+		"lp",
+		lp_attach,
+		lp_detach,
+		NULL /* always put NULL here */
+	};
+
+	int lp_init (void)
+	{
+		...
+		if (parport_register_driver (&lp_driver)) {
+			/* Failed; nothing we can do. */
+			return -EIO;
+		}
+		...
+	}
+
+
+SEE ALSO
+^^^^^^^^
+
+parport_unregister_driver, parport_register_device, parport_enumerate
+
+
+
+parport_unregister_driver - tell parport to forget about this driver
+--------------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_driver {
+		const char *name;
+		void (*attach) (struct parport *);
+		void (*detach) (struct parport *);
+		struct parport_driver *next;
+	};
+	void parport_unregister_driver (struct parport_driver *driver);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+This tells parport not to notify the device driver of new ports or of
+ports going away.  Registered devices belonging to that driver are NOT
+unregistered: parport_unregister_device must be used for each one.
+
+EXAMPLE
+^^^^^^^
+
+::
+
+	void cleanup_module (void)
+	{
+		...
+		/* Stop notifications. */
+		parport_unregister_driver (&lp_driver);
+
+		/* Unregister devices. */
+		for (i = 0; i < NUM_DEVS; i++)
+			parport_unregister_device (dev[i]);
+		...
+	}
+
+SEE ALSO
+^^^^^^^^
+
+parport_register_driver, parport_enumerate
+
+
+
+parport_enumerate - retrieve a list of parallel ports (DEPRECATED)
+------------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport *parport_enumerate (void);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Retrieve the first of a list of valid parallel ports for this machine.
+Successive parallel ports can be found using the ``struct parport
+*next`` element of the ``struct parport *`` that is returned.  If ``next``
+is NULL, there are no more parallel ports in the list.  The number of
+ports in the list will not exceed PARPORT_MAX.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+A ``struct parport *`` describing a valid parallel port for the machine,
+or NULL if there are none.
+
+ERRORS
+^^^^^^
+
+This function can return NULL to indicate that there are no parallel
+ports to use.
+
+EXAMPLE
+^^^^^^^
+
+::
+
+	int detect_device (void)
+	{
+		struct parport *port;
+
+		for (port = parport_enumerate ();
+		port != NULL;
+		port = port->next) {
+			/* Try to detect a device on the port... */
+			...
+		}
+		}
+
+		...
+	}
+
+NOTES
+^^^^^
+
+parport_enumerate is deprecated; parport_register_driver should be
+used instead.
+
+SEE ALSO
+^^^^^^^^
+
+parport_register_driver, parport_unregister_driver
+
+
+
+parport_register_device - register to use a port
+------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	typedef int (*preempt_func) (void *handle);
+	typedef void (*wakeup_func) (void *handle);
+	typedef int (*irq_func) (int irq, void *handle, struct pt_regs *);
+
+	struct pardevice *parport_register_device(struct parport *port,
+						  const char *name,
+						  preempt_func preempt,
+						  wakeup_func wakeup,
+						  irq_func irq,
+						  int flags,
+						  void *handle);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Use this function to register your device driver on a parallel port
+(``port``).  Once you have done that, you will be able to use
+parport_claim and parport_release in order to use the port.
+
+The (``name``) argument is the name of the device that appears in /proc
+filesystem. The string must be valid for the whole lifetime of the
+device (until parport_unregister_device is called).
+
+This function will register three callbacks into your driver:
+``preempt``, ``wakeup`` and ``irq``.  Each of these may be NULL in order to
+indicate that you do not want a callback.
+
+When the ``preempt`` function is called, it is because another driver
+wishes to use the parallel port.  The ``preempt`` function should return
+non-zero if the parallel port cannot be released yet -- if zero is
+returned, the port is lost to another driver and the port must be
+re-claimed before use.
+
+The ``wakeup`` function is called once another driver has released the
+port and no other driver has yet claimed it.  You can claim the
+parallel port from within the ``wakeup`` function (in which case the
+claim is guaranteed to succeed), or choose not to if you don't need it
+now.
+
+If an interrupt occurs on the parallel port your driver has claimed,
+the ``irq`` function will be called. (Write something about shared
+interrupts here.)
+
+The ``handle`` is a pointer to driver-specific data, and is passed to
+the callback functions.
+
+``flags`` may be a bitwise combination of the following flags:
+
+  ===================== =================================================
+        Flag            Meaning
+  ===================== =================================================
+  PARPORT_DEV_EXCL	The device cannot share the parallel port at all.
+			Use this only when absolutely necessary.
+  ===================== =================================================
+
+The typedefs are not actually defined -- they are only shown in order
+to make the function prototype more readable.
+
+The visible parts of the returned ``struct pardevice`` are::
+
+	struct pardevice {
+		struct parport *port;	/* Associated port */
+		void *private;		/* Device driver's 'handle' */
+		...
+	};
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+A ``struct pardevice *``: a handle to the registered parallel port
+device that can be used for parport_claim, parport_release, etc.
+
+ERRORS
+^^^^^^
+
+A return value of NULL indicates that there was a problem registering
+a device on that port.
+
+EXAMPLE
+^^^^^^^
+
+::
+
+	static int preempt (void *handle)
+	{
+		if (busy_right_now)
+			return 1;
+
+		must_reclaim_port = 1;
+		return 0;
+	}
+
+	static void wakeup (void *handle)
+	{
+		struct toaster *private = handle;
+		struct pardevice *dev = private->dev;
+		if (!dev) return; /* avoid races */
+
+		if (want_port)
+			parport_claim (dev);
+	}
+
+	static int toaster_detect (struct toaster *private, struct parport *port)
+	{
+		private->dev = parport_register_device (port, "toaster", preempt,
+							wakeup, NULL, 0,
+							private);
+		if (!private->dev)
+			/* Couldn't register with parport. */
+			return -EIO;
+
+		must_reclaim_port = 0;
+		busy_right_now = 1;
+		parport_claim_or_block (private->dev);
+		...
+		/* Don't need the port while the toaster warms up. */
+		busy_right_now = 0;
+		...
+		busy_right_now = 1;
+		if (must_reclaim_port) {
+			parport_claim_or_block (private->dev);
+			must_reclaim_port = 0;
+		}
+		...
+	}
+
+SEE ALSO
+^^^^^^^^
+
+parport_unregister_device, parport_claim
+
+
+
+parport_unregister_device - finish using a port
+-----------------------------------------------
+
+SYNPOPSIS
+
+::
+
+	#include <linux/parport.h>
+
+	void parport_unregister_device (struct pardevice *dev);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+This function is the opposite of parport_register_device.  After using
+parport_unregister_device, ``dev`` is no longer a valid device handle.
+
+You should not unregister a device that is currently claimed, although
+if you do it will be released automatically.
+
+EXAMPLE
+^^^^^^^
+
+::
+
+	...
+	kfree (dev->private); /* before we lose the pointer */
+	parport_unregister_device (dev);
+	...
+
+SEE ALSO
+^^^^^^^^
+
+
+parport_unregister_driver
+
+parport_claim, parport_claim_or_block - claim the parallel port for a device
+----------------------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_claim (struct pardevice *dev);
+	int parport_claim_or_block (struct pardevice *dev);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+These functions attempt to gain control of the parallel port on which
+``dev`` is registered.  ``parport_claim`` does not block, but
+``parport_claim_or_block`` may do. (Put something here about blocking
+interruptibly or non-interruptibly.)
+
+You should not try to claim a port that you have already claimed.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+A return value of zero indicates that the port was successfully
+claimed, and the caller now has possession of the parallel port.
+
+If ``parport_claim_or_block`` blocks before returning successfully, the
+return value is positive.
+
+ERRORS
+^^^^^^
+
+========== ==========================================================
+  -EAGAIN  The port is unavailable at the moment, but another attempt
+           to claim it may succeed.
+========== ==========================================================
+
+SEE ALSO
+^^^^^^^^
+
+
+parport_release
+
+parport_release - release the parallel port
+-------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	void parport_release (struct pardevice *dev);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Once a parallel port device has been claimed, it can be released using
+``parport_release``.  It cannot fail, but you should not release a
+device that you do not have possession of.
+
+EXAMPLE
+^^^^^^^
+
+::
+
+	static size_t write (struct pardevice *dev, const void *buf,
+			size_t len)
+	{
+		...
+		written = dev->port->ops->write_ecp_data (dev->port, buf,
+							len);
+		parport_release (dev);
+		...
+	}
+
+
+SEE ALSO
+^^^^^^^^
+
+change_mode, parport_claim, parport_claim_or_block, parport_yield
+
+
+
+parport_yield, parport_yield_blocking - temporarily release a parallel port
+---------------------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_yield (struct pardevice *dev)
+	int parport_yield_blocking (struct pardevice *dev);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+When a driver has control of a parallel port, it may allow another
+driver to temporarily ``borrow`` it.  ``parport_yield`` does not block;
+``parport_yield_blocking`` may do.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+A return value of zero indicates that the caller still owns the port
+and the call did not block.
+
+A positive return value from ``parport_yield_blocking`` indicates that
+the caller still owns the port and the call blocked.
+
+A return value of -EAGAIN indicates that the caller no longer owns the
+port, and it must be re-claimed before use.
+
+ERRORS
+^^^^^^
+
+========= ==========================================================
+  -EAGAIN  Ownership of the parallel port was given away.
+========= ==========================================================
+
+SEE ALSO
+^^^^^^^^
+
+parport_release
+
+
+
+parport_wait_peripheral - wait for status lines, up to 35ms
+-----------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_wait_peripheral (struct parport *port,
+				     unsigned char mask,
+				     unsigned char val);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Wait for the status lines in mask to match the values in val.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+======== ==========================================================
+ -EINTR  a signal is pending
+      0  the status lines in mask have values in val
+      1  timed out while waiting (35ms elapsed)
+======== ==========================================================
+
+SEE ALSO
+^^^^^^^^
+
+parport_poll_peripheral
+
+
+
+parport_poll_peripheral - wait for status lines, in usec
+--------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_poll_peripheral (struct parport *port,
+				     unsigned char mask,
+				     unsigned char val,
+				     int usec);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Wait for the status lines in mask to match the values in val.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+======== ==========================================================
+ -EINTR  a signal is pending
+      0  the status lines in mask have values in val
+      1  timed out while waiting (usec microseconds have elapsed)
+======== ==========================================================
+
+SEE ALSO
+^^^^^^^^
+
+parport_wait_peripheral
+
+
+
+parport_wait_event - wait for an event on a port
+------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_wait_event (struct parport *port, signed long timeout)
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Wait for an event (e.g. interrupt) on a port.  The timeout is in
+jiffies.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+======= ==========================================================
+      0  success
+     <0  error (exit as soon as possible)
+     >0  timed out
+======= ==========================================================
+
+parport_negotiate - perform IEEE 1284 negotiation
+-------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_negotiate (struct parport *, int mode);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Perform IEEE 1284 negotiation.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+======= ==========================================================
+     0  handshake OK; IEEE 1284 peripheral and mode available
+    -1  handshake failed; peripheral not compliant (or none present)
+     1  handshake OK; IEEE 1284 peripheral present but mode not
+        available
+======= ==========================================================
+
+SEE ALSO
+^^^^^^^^
+
+parport_read, parport_write
+
+
+
+parport_read - read data from device
+------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	ssize_t parport_read (struct parport *, void *buf, size_t len);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Read data from device in current IEEE 1284 transfer mode.  This only
+works for modes that support reverse data transfer.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+If negative, an error code; otherwise the number of bytes transferred.
+
+SEE ALSO
+^^^^^^^^
+
+parport_write, parport_negotiate
+
+
+
+parport_write - write data to device
+------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	ssize_t parport_write (struct parport *, const void *buf, size_t len);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Write data to device in current IEEE 1284 transfer mode.  This only
+works for modes that support forward data transfer.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+If negative, an error code; otherwise the number of bytes transferred.
+
+SEE ALSO
+^^^^^^^^
+
+parport_read, parport_negotiate
+
+
+
+parport_open - register device for particular device number
+-----------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct pardevice *parport_open (int devnum, const char *name,
+				        int (*pf) (void *),
+					void (*kf) (void *),
+					void (*irqf) (int, void *,
+						      struct pt_regs *),
+					int flags, void *handle);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+This is like parport_register_device but takes a device number instead
+of a pointer to a struct parport.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+See parport_register_device.  If no device is associated with devnum,
+NULL is returned.
+
+SEE ALSO
+^^^^^^^^
+
+parport_register_device
+
+
+
+parport_close - unregister device for particular device number
+--------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	void parport_close (struct pardevice *dev);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+This is the equivalent of parport_unregister_device for parport_open.
+
+SEE ALSO
+^^^^^^^^
+
+parport_unregister_device, parport_open
+
+
+
+parport_device_id - obtain IEEE 1284 Device ID
+----------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	ssize_t parport_device_id (int devnum, char *buffer, size_t len);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Obtains the IEEE 1284 Device ID associated with a given device.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+If negative, an error code; otherwise, the number of bytes of buffer
+that contain the device ID.  The format of the device ID is as
+follows::
+
+	[length][ID]
+
+The first two bytes indicate the inclusive length of the entire Device
+ID, and are in big-endian order.  The ID is a sequence of pairs of the
+form::
+
+	key:value;
+
+NOTES
+^^^^^
+
+Many devices have ill-formed IEEE 1284 Device IDs.
+
+SEE ALSO
+^^^^^^^^
+
+parport_find_class, parport_find_device
+
+
+
+parport_device_coords - convert device number to device coordinates
+-------------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_device_coords (int devnum, int *parport, int *mux,
+				   int *daisy);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Convert between device number (zero-based) and device coordinates
+(port, multiplexor, daisy chain address).
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+Zero on success, in which case the coordinates are (``*parport``, ``*mux``,
+``*daisy``).
+
+SEE ALSO
+^^^^^^^^
+
+parport_open, parport_device_id
+
+
+
+parport_find_class - find a device by its class
+-----------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	typedef enum {
+		PARPORT_CLASS_LEGACY = 0,       /* Non-IEEE1284 device */
+		PARPORT_CLASS_PRINTER,
+		PARPORT_CLASS_MODEM,
+		PARPORT_CLASS_NET,
+		PARPORT_CLASS_HDC,              /* Hard disk controller */
+		PARPORT_CLASS_PCMCIA,
+		PARPORT_CLASS_MEDIA,            /* Multimedia device */
+		PARPORT_CLASS_FDC,              /* Floppy disk controller */
+		PARPORT_CLASS_PORTS,
+		PARPORT_CLASS_SCANNER,
+		PARPORT_CLASS_DIGCAM,
+		PARPORT_CLASS_OTHER,            /* Anything else */
+		PARPORT_CLASS_UNSPEC,           /* No CLS field in ID */
+		PARPORT_CLASS_SCSIADAPTER
+	} parport_device_class;
+
+	int parport_find_class (parport_device_class cls, int from);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Find a device by class.  The search starts from device number from+1.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The device number of the next device in that class, or -1 if no such
+device exists.
+
+NOTES
+^^^^^
+
+Example usage::
+
+	int devnum = -1;
+	while ((devnum = parport_find_class (PARPORT_CLASS_DIGCAM, devnum)) != -1) {
+		struct pardevice *dev = parport_open (devnum, ...);
+		...
+	}
+
+SEE ALSO
+^^^^^^^^
+
+parport_find_device, parport_open, parport_device_id
+
+
+
+parport_find_device - find a device by its class
+------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	int parport_find_device (const char *mfg, const char *mdl, int from);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Find a device by vendor and model.  The search starts from device
+number from+1.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The device number of the next device matching the specifications, or
+-1 if no such device exists.
+
+NOTES
+^^^^^
+
+Example usage::
+
+	int devnum = -1;
+	while ((devnum = parport_find_device ("IOMEGA", "ZIP+", devnum)) != -1) {
+		struct pardevice *dev = parport_open (devnum, ...);
+		...
+	}
+
+SEE ALSO
+^^^^^^^^
+
+parport_find_class, parport_open, parport_device_id
+
+
+
+parport_set_timeout - set the inactivity timeout
+------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	long parport_set_timeout (struct pardevice *dev, long inactivity);
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Set the inactivity timeout, in jiffies, for a registered device.  The
+previous timeout is returned.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The previous timeout, in jiffies.
+
+NOTES
+^^^^^
+
+Some of the port->ops functions for a parport may take time, owing to
+delays at the peripheral.  After the peripheral has not responded for
+``inactivity`` jiffies, a timeout will occur and the blocking function
+will return.
+
+A timeout of 0 jiffies is a special case: the function must do as much
+as it can without blocking or leaving the hardware in an unknown
+state.  If port operations are performed from within an interrupt
+handler, for instance, a timeout of 0 jiffies should be used.
+
+Once set for a registered device, the timeout will remain at the set
+value until set again.
+
+SEE ALSO
+^^^^^^^^
+
+port->ops->xxx_read/write_yyy
+
+
+
+
+PORT FUNCTIONS
+==============
+
+The functions in the port->ops structure (struct parport_operations)
+are provided by the low-level driver responsible for that port.
+
+port->ops->read_data - read the data register
+---------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		unsigned char (*read_data) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+If port->modes contains the PARPORT_MODE_TRISTATE flag and the
+PARPORT_CONTROL_DIRECTION bit in the control register is set, this
+returns the value on the data pins.  If port->modes contains the
+PARPORT_MODE_TRISTATE flag and the PARPORT_CONTROL_DIRECTION bit is
+not set, the return value _may_ be the last value written to the data
+register.  Otherwise the return value is undefined.
+
+SEE ALSO
+^^^^^^^^
+
+write_data, read_status, write_control
+
+
+
+port->ops->write_data - write the data register
+-----------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		void (*write_data) (struct parport *port, unsigned char d);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes to the data register.  May have side-effects (a STROBE pulse,
+for instance).
+
+SEE ALSO
+^^^^^^^^
+
+read_data, read_status, write_control
+
+
+
+port->ops->read_status - read the status register
+-------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		unsigned char (*read_status) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Reads from the status register.  This is a bitmask:
+
+- PARPORT_STATUS_ERROR (printer fault, "nFault")
+- PARPORT_STATUS_SELECT (on-line, "Select")
+- PARPORT_STATUS_PAPEROUT (no paper, "PError")
+- PARPORT_STATUS_ACK (handshake, "nAck")
+- PARPORT_STATUS_BUSY (busy, "Busy")
+
+There may be other bits set.
+
+SEE ALSO
+^^^^^^^^
+
+read_data, write_data, write_control
+
+
+
+port->ops->read_control - read the control register
+---------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		unsigned char (*read_control) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Returns the last value written to the control register (either from
+write_control or frob_control).  No port access is performed.
+
+SEE ALSO
+^^^^^^^^
+
+read_data, write_data, read_status, write_control
+
+
+
+port->ops->write_control - write the control register
+-----------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		void (*write_control) (struct parport *port, unsigned char s);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes to the control register. This is a bitmask::
+
+				  _______
+	- PARPORT_CONTROL_STROBE (nStrobe)
+				  _______
+	- PARPORT_CONTROL_AUTOFD (nAutoFd)
+				_____
+	- PARPORT_CONTROL_INIT (nInit)
+				  _________
+	- PARPORT_CONTROL_SELECT (nSelectIn)
+
+SEE ALSO
+^^^^^^^^
+
+read_data, write_data, read_status, frob_control
+
+
+
+port->ops->frob_control - write control register bits
+-----------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		unsigned char (*frob_control) (struct parport *port,
+					unsigned char mask,
+					unsigned char val);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+This is equivalent to reading from the control register, masking out
+the bits in mask, exclusive-or'ing with the bits in val, and writing
+the result to the control register.
+
+As some ports don't allow reads from the control port, a software copy
+of its contents is maintained, so frob_control is in fact only one
+port access.
+
+SEE ALSO
+^^^^^^^^
+
+read_data, write_data, read_status, write_control
+
+
+
+port->ops->enable_irq - enable interrupt generation
+---------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		void (*enable_irq) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+The parallel port hardware is instructed to generate interrupts at
+appropriate moments, although those moments are
+architecture-specific.  For the PC architecture, interrupts are
+commonly generated on the rising edge of nAck.
+
+SEE ALSO
+^^^^^^^^
+
+disable_irq
+
+
+
+port->ops->disable_irq - disable interrupt generation
+-----------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		void (*disable_irq) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+The parallel port hardware is instructed not to generate interrupts.
+The interrupt itself is not masked.
+
+SEE ALSO
+^^^^^^^^
+
+enable_irq
+
+
+
+port->ops->data_forward - enable data drivers
+---------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		void (*data_forward) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Enables the data line drivers, for 8-bit host-to-peripheral
+communications.
+
+SEE ALSO
+^^^^^^^^
+
+data_reverse
+
+
+
+port->ops->data_reverse - tristate the buffer
+---------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		void (*data_reverse) (struct parport *port);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Places the data bus in a high impedance state, if port->modes has the
+PARPORT_MODE_TRISTATE bit set.
+
+SEE ALSO
+^^^^^^^^
+
+data_forward
+
+
+
+port->ops->epp_write_data - write EPP data
+------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*epp_write_data) (struct parport *port, const void *buf,
+					size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes data in EPP mode, and returns the number of bytes written.
+
+The ``flags`` parameter may be one or more of the following,
+bitwise-or'ed together:
+
+======================= =================================================
+PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
+			32-bit registers.  However, if a transfer
+			times out, the return value may be unreliable.
+======================= =================================================
+
+SEE ALSO
+^^^^^^^^
+
+epp_read_data, epp_write_addr, epp_read_addr
+
+
+
+port->ops->epp_read_data - read EPP data
+----------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*epp_read_data) (struct parport *port, void *buf,
+					size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Reads data in EPP mode, and returns the number of bytes read.
+
+The ``flags`` parameter may be one or more of the following,
+bitwise-or'ed together:
+
+======================= =================================================
+PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
+			32-bit registers.  However, if a transfer
+			times out, the return value may be unreliable.
+======================= =================================================
+
+SEE ALSO
+^^^^^^^^
+
+epp_write_data, epp_write_addr, epp_read_addr
+
+
+
+port->ops->epp_write_addr - write EPP address
+---------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*epp_write_addr) (struct parport *port,
+					const void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes EPP addresses (8 bits each), and returns the number written.
+
+The ``flags`` parameter may be one or more of the following,
+bitwise-or'ed together:
+
+======================= =================================================
+PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
+			32-bit registers.  However, if a transfer
+			times out, the return value may be unreliable.
+======================= =================================================
+
+(Does PARPORT_EPP_FAST make sense for this function?)
+
+SEE ALSO
+^^^^^^^^
+
+epp_write_data, epp_read_data, epp_read_addr
+
+
+
+port->ops->epp_read_addr - read EPP address
+-------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*epp_read_addr) (struct parport *port, void *buf,
+					size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Reads EPP addresses (8 bits each), and returns the number read.
+
+The ``flags`` parameter may be one or more of the following,
+bitwise-or'ed together:
+
+======================= =================================================
+PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
+			32-bit registers.  However, if a transfer
+			times out, the return value may be unreliable.
+======================= =================================================
+
+(Does PARPORT_EPP_FAST make sense for this function?)
+
+SEE ALSO
+^^^^^^^^
+
+epp_write_data, epp_read_data, epp_write_addr
+
+
+
+port->ops->ecp_write_data - write a block of ECP data
+-----------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*ecp_write_data) (struct parport *port,
+					const void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes a block of ECP data.  The ``flags`` parameter is ignored.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The number of bytes written.
+
+SEE ALSO
+^^^^^^^^
+
+ecp_read_data, ecp_write_addr
+
+
+
+port->ops->ecp_read_data - read a block of ECP data
+---------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*ecp_read_data) (struct parport *port,
+					void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Reads a block of ECP data.  The ``flags`` parameter is ignored.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The number of bytes read.  NB. There may be more unread data in a
+FIFO.  Is there a way of stunning the FIFO to prevent this?
+
+SEE ALSO
+^^^^^^^^
+
+ecp_write_block, ecp_write_addr
+
+
+
+port->ops->ecp_write_addr - write a block of ECP addresses
+----------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*ecp_write_addr) (struct parport *port,
+					const void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes a block of ECP addresses.  The ``flags`` parameter is ignored.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The number of bytes written.
+
+NOTES
+^^^^^
+
+This may use a FIFO, and if so shall not return until the FIFO is empty.
+
+SEE ALSO
+^^^^^^^^
+
+ecp_read_data, ecp_write_data
+
+
+
+port->ops->nibble_read_data - read a block of data in nibble mode
+-----------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*nibble_read_data) (struct parport *port,
+					void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Reads a block of data in nibble mode.  The ``flags`` parameter is ignored.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The number of whole bytes read.
+
+SEE ALSO
+^^^^^^^^
+
+byte_read_data, compat_write_data
+
+
+
+port->ops->byte_read_data - read a block of data in byte mode
+-------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*byte_read_data) (struct parport *port,
+					void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Reads a block of data in byte mode.  The ``flags`` parameter is ignored.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The number of bytes read.
+
+SEE ALSO
+^^^^^^^^
+
+nibble_read_data, compat_write_data
+
+
+
+port->ops->compat_write_data - write a block of data in compatibility mode
+--------------------------------------------------------------------------
+
+SYNOPSIS
+^^^^^^^^
+
+::
+
+	#include <linux/parport.h>
+
+	struct parport_operations {
+		...
+		size_t (*compat_write_data) (struct parport *port,
+					const void *buf, size_t len, int flags);
+		...
+	};
+
+DESCRIPTION
+^^^^^^^^^^^
+
+Writes a block of data in compatibility mode.  The ``flags`` parameter
+is ignored.
+
+RETURN VALUE
+^^^^^^^^^^^^
+
+The number of bytes written.
+
+SEE ALSO
+^^^^^^^^
+
+nibble_read_data, byte_read_data
diff --git a/Documentation/driver-api/pti_intel_mid.rst b/Documentation/driver-api/pti_intel_mid.rst
new file mode 100644
index 000000000000..20f1cff42d5f
--- /dev/null
+++ b/Documentation/driver-api/pti_intel_mid.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Intel MID PTI
+=============
+
+The Intel MID PTI project is HW implemented in Intel Atom
+system-on-a-chip designs based on the Parallel Trace
+Interface for MIPI P1149.7 cJTAG standard.  The kernel solution
+for this platform involves the following files::
+
+	./include/linux/pti.h
+	./drivers/.../n_tracesink.h
+	./drivers/.../n_tracerouter.c
+	./drivers/.../n_tracesink.c
+	./drivers/.../pti.c
+
+pti.c is the driver that enables various debugging features
+popular on platforms from certain mobile manufacturers.
+n_tracerouter.c and n_tracesink.c allow extra system information to
+be collected and routed to the pti driver, such as trace
+debugging data from a modem.  Although n_tracerouter
+and n_tracesink are a part of the complete PTI solution,
+these two line disciplines can work separately from
+pti.c and route any data stream from one /dev/tty node
+to another /dev/tty node via kernel-space.  This provides
+a stable, reliable connection that will not break unless
+the user-space application shuts down (plus avoids
+kernel->user->kernel context switch overheads of routing
+data).
+
+An example debugging usage for this driver system:
+
+  * Hook /dev/ttyPTI0 to syslogd.  Opening this port will also start
+    a console device to further capture debugging messages to PTI.
+  * Hook /dev/ttyPTI1 to modem debugging data to write to PTI HW.
+    This is where n_tracerouter and n_tracesink are used.
+  * Hook /dev/pti to a user-level debugging application for writing
+    to PTI HW.
+  * `Use mipi_` Kernel Driver API in other device drivers for
+    debugging to PTI by first requesting a PTI write address via
+    mipi_request_masterchannel(1).
+
+Below is example pseudo-code on how a 'privileged' application
+can hook up n_tracerouter and n_tracesink to any tty on
+a system.  'Privileged' means the application has enough
+privileges to successfully manipulate the ldisc drivers
+but is not just blindly executing as 'root'. Keep in mind
+the use of ioctl(,TIOCSETD,) is not specific to the n_tracerouter
+and n_tracesink line discpline drivers but is a generic
+operation for a program to use a line discpline driver
+on a tty port other than the default n_tty::
+
+  /////////// To hook up n_tracerouter and n_tracesink /////////
+
+  // Note that n_tracerouter depends on n_tracesink.
+  #include <errno.h>
+  #define ONE_TTY "/dev/ttyOne"
+  #define TWO_TTY "/dev/ttyTwo"
+
+  // needed global to hand onto ldisc connection
+  static int g_fd_source = -1;
+  static int g_fd_sink  = -1;
+
+  // these two vars used to grab LDISC values from loaded ldisc drivers
+  // in OS.  Look at /proc/tty/ldiscs to get the right numbers from
+  // the ldiscs loaded in the system.
+  int source_ldisc_num, sink_ldisc_num = -1;
+  int retval;
+
+  g_fd_source = open(ONE_TTY, O_RDWR); // must be R/W
+  g_fd_sink   = open(TWO_TTY, O_RDWR); // must be R/W
+
+  if (g_fd_source <= 0) || (g_fd_sink <= 0) {
+     // doubt you'll want to use these exact error lines of code
+     printf("Error on open(). errno: %d\n",errno);
+     return errno;
+  }
+
+  retval = ioctl(g_fd_sink, TIOCSETD, &sink_ldisc_num);
+  if (retval < 0) {
+     printf("Error on ioctl().  errno: %d\n", errno);
+     return errno;
+  }
+
+  retval = ioctl(g_fd_source, TIOCSETD, &source_ldisc_num);
+  if (retval < 0) {
+     printf("Error on ioctl().  errno: %d\n", errno);
+     return errno;
+  }
+
+  /////////// To disconnect n_tracerouter and n_tracesink ////////
+
+  // First make sure data through the ldiscs has stopped.
+
+  // Second, disconnect ldiscs.  This provides a
+  // little cleaner shutdown on tty stack.
+  sink_ldisc_num = 0;
+  source_ldisc_num = 0;
+  ioctl(g_fd_uart, TIOCSETD, &sink_ldisc_num);
+  ioctl(g_fd_gadget, TIOCSETD, &source_ldisc_num);
+
+  // Three, program closes connection, and cleanup:
+  close(g_fd_uart);
+  close(g_fd_gadget);
+  g_fd_uart = g_fd_gadget = NULL;
diff --git a/Documentation/driver-api/pwm.rst b/Documentation/driver-api/pwm.rst
new file mode 100644
index 000000000000..ab62f1bb0366
--- /dev/null
+++ b/Documentation/driver-api/pwm.rst
@@ -0,0 +1,165 @@
+======================================
+Pulse Width Modulation (PWM) interface
+======================================
+
+This provides an overview about the Linux PWM interface
+
+PWMs are commonly used for controlling LEDs, fans or vibrators in
+cell phones. PWMs with a fixed purpose have no need implementing
+the Linux PWM API (although they could). However, PWMs are often
+found as discrete devices on SoCs which have no fixed purpose. It's
+up to the board designer to connect them to LEDs or fans. To provide
+this kind of flexibility the generic PWM API exists.
+
+Identifying PWMs
+----------------
+
+Users of the legacy PWM API use unique IDs to refer to PWM devices.
+
+Instead of referring to a PWM device via its unique ID, board setup code
+should instead register a static mapping that can be used to match PWM
+consumers to providers, as given in the following example::
+
+	static struct pwm_lookup board_pwm_lookup[] = {
+		PWM_LOOKUP("tegra-pwm", 0, "pwm-backlight", NULL,
+			   50000, PWM_POLARITY_NORMAL),
+	};
+
+	static void __init board_init(void)
+	{
+		...
+		pwm_add_table(board_pwm_lookup, ARRAY_SIZE(board_pwm_lookup));
+		...
+	}
+
+Using PWMs
+----------
+
+Legacy users can request a PWM device using pwm_request() and free it
+after usage with pwm_free().
+
+New users should use the pwm_get() function and pass to it the consumer
+device or a consumer name. pwm_put() is used to free the PWM device. Managed
+variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist.
+
+After being requested, a PWM has to be configured using::
+
+	int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state);
+
+This API controls both the PWM period/duty_cycle config and the
+enable/disable state.
+
+The pwm_config(), pwm_enable() and pwm_disable() functions are just wrappers
+around pwm_apply_state() and should not be used if the user wants to change
+several parameter at once. For example, if you see pwm_config() and
+pwm_{enable,disable}() calls in the same function, this probably means you
+should switch to pwm_apply_state().
+
+The PWM user API also allows one to query the PWM state with pwm_get_state().
+
+In addition to the PWM state, the PWM API also exposes PWM arguments, which
+are the reference PWM config one should use on this PWM.
+PWM arguments are usually platform-specific and allows the PWM user to only
+care about dutycycle relatively to the full period (like, duty = 50% of the
+period). struct pwm_args contains 2 fields (period and polarity) and should
+be used to set the initial PWM config (usually done in the probe function
+of the PWM user). PWM arguments are retrieved with pwm_get_args().
+
+All consumers should really be reconfiguring the PWM upon resume as
+appropriate. This is the only way to ensure that everything is resumed in
+the proper order.
+
+Using PWMs with the sysfs interface
+-----------------------------------
+
+If CONFIG_SYSFS is enabled in your kernel configuration a simple sysfs
+interface is provided to use the PWMs from userspace. It is exposed at
+/sys/class/pwm/. Each probed PWM controller/chip will be exported as
+pwmchipN, where N is the base of the PWM chip. Inside the directory you
+will find:
+
+  npwm
+    The number of PWM channels this chip supports (read-only).
+
+  export
+    Exports a PWM channel for use with sysfs (write-only).
+
+  unexport
+   Unexports a PWM channel from sysfs (write-only).
+
+The PWM channels are numbered using a per-chip index from 0 to npwm-1.
+
+When a PWM channel is exported a pwmX directory will be created in the
+pwmchipN directory it is associated with, where X is the number of the
+channel that was exported. The following properties will then be available:
+
+  period
+    The total period of the PWM signal (read/write).
+    Value is in nanoseconds and is the sum of the active and inactive
+    time of the PWM.
+
+  duty_cycle
+    The active time of the PWM signal (read/write).
+    Value is in nanoseconds and must be less than the period.
+
+  polarity
+    Changes the polarity of the PWM signal (read/write).
+    Writes to this property only work if the PWM chip supports changing
+    the polarity. The polarity can only be changed if the PWM is not
+    enabled. Value is the string "normal" or "inversed".
+
+  enable
+    Enable/disable the PWM signal (read/write).
+
+	- 0 - disabled
+	- 1 - enabled
+
+Implementing a PWM driver
+-------------------------
+
+Currently there are two ways to implement pwm drivers. Traditionally
+there only has been the barebone API meaning that each driver has
+to implement the pwm_*() functions itself. This means that it's impossible
+to have multiple PWM drivers in the system. For this reason it's mandatory
+for new drivers to use the generic PWM framework.
+
+A new PWM controller/chip can be added using pwmchip_add() and removed
+again with pwmchip_remove(). pwmchip_add() takes a filled in struct
+pwm_chip as argument which provides a description of the PWM chip, the
+number of PWM devices provided by the chip and the chip-specific
+implementation of the supported PWM operations to the framework.
+
+When implementing polarity support in a PWM driver, make sure to respect the
+signal conventions in the PWM framework. By definition, normal polarity
+characterizes a signal starts high for the duration of the duty cycle and
+goes low for the remainder of the period. Conversely, a signal with inversed
+polarity starts low for the duration of the duty cycle and goes high for the
+remainder of the period.
+
+Drivers are encouraged to implement ->apply() instead of the legacy
+->enable(), ->disable() and ->config() methods. Doing that should provide
+atomicity in the PWM config workflow, which is required when the PWM controls
+a critical device (like a regulator).
+
+The implementation of ->get_state() (a method used to retrieve initial PWM
+state) is also encouraged for the same reason: letting the PWM user know
+about the current PWM state would allow him to avoid glitches.
+
+Drivers should not implement any power management. In other words,
+consumers should implement it as described in the "Using PWMs" section.
+
+Locking
+-------
+
+The PWM core list manipulations are protected by a mutex, so pwm_request()
+and pwm_free() may not be called from an atomic context. Currently the
+PWM core does not enforce any locking to pwm_enable(), pwm_disable() and
+pwm_config(), so the calling context is currently driver specific. This
+is an issue derived from the former barebone API and should be fixed soon.
+
+Helpers
+-------
+
+Currently a PWM can only be configured with period_ns and duty_ns. For several
+use cases freq_hz and duty_percent might be better. Instead of calculating
+this in your driver please consider adding appropriate helpers to the framework.
diff --git a/Documentation/driver-api/rfkill.rst b/Documentation/driver-api/rfkill.rst
new file mode 100644
index 000000000000..7d3684e81df6
--- /dev/null
+++ b/Documentation/driver-api/rfkill.rst
@@ -0,0 +1,132 @@
+===============================
+rfkill - RF kill switch support
+===============================
+
+
+.. contents::
+   :depth: 2
+
+Introduction
+============
+
+The rfkill subsystem provides a generic interface for disabling any radio
+transmitter in the system. When a transmitter is blocked, it shall not
+radiate any power.
+
+The subsystem also provides the ability to react on button presses and
+disable all transmitters of a certain type (or all). This is intended for
+situations where transmitters need to be turned off, for example on
+aircraft.
+
+The rfkill subsystem has a concept of "hard" and "soft" block, which
+differ little in their meaning (block == transmitters off) but rather in
+whether they can be changed or not:
+
+ - hard block
+	read-only radio block that cannot be overridden by software
+
+ - soft block
+	writable radio block (need not be readable) that is set by
+        the system software.
+
+The rfkill subsystem has two parameters, rfkill.default_state and
+rfkill.master_switch_mode, which are documented in
+admin-guide/kernel-parameters.rst.
+
+
+Implementation details
+======================
+
+The rfkill subsystem is composed of three main components:
+
+ * the rfkill core,
+ * the deprecated rfkill-input module (an input layer handler, being
+   replaced by userspace policy code) and
+ * the rfkill drivers.
+
+The rfkill core provides API for kernel drivers to register their radio
+transmitter with the kernel, methods for turning it on and off, and letting
+the system know about hardware-disabled states that may be implemented on
+the device.
+
+The rfkill core code also notifies userspace of state changes, and provides
+ways for userspace to query the current states. See the "Userspace support"
+section below.
+
+When the device is hard-blocked (either by a call to rfkill_set_hw_state()
+or from query_hw_block), set_block() will be invoked for additional software
+block, but drivers can ignore the method call since they can use the return
+value of the function rfkill_set_hw_state() to sync the software state
+instead of keeping track of calls to set_block(). In fact, drivers should
+use the return value of rfkill_set_hw_state() unless the hardware actually
+keeps track of soft and hard block separately.
+
+
+Kernel API
+==========
+
+Drivers for radio transmitters normally implement an rfkill driver.
+
+Platform drivers might implement input devices if the rfkill button is just
+that, a button. If that button influences the hardware then you need to
+implement an rfkill driver instead. This also applies if the platform provides
+a way to turn on/off the transmitter(s).
+
+For some platforms, it is possible that the hardware state changes during
+suspend/hibernation, in which case it will be necessary to update the rfkill
+core with the current state at resume time.
+
+To create an rfkill driver, driver's Kconfig needs to have::
+
+	depends on RFKILL || !RFKILL
+
+to ensure the driver cannot be built-in when rfkill is modular. The !RFKILL
+case allows the driver to be built when rfkill is not configured, in which
+case all rfkill API can still be used but will be provided by static inlines
+which compile to almost nothing.
+
+Calling rfkill_set_hw_state() when a state change happens is required from
+rfkill drivers that control devices that can be hard-blocked unless they also
+assign the poll_hw_block() callback (then the rfkill core will poll the
+device). Don't do this unless you cannot get the event in any other way.
+
+rfkill provides per-switch LED triggers, which can be used to drive LEDs
+according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
+
+
+Userspace support
+=================
+
+The recommended userspace interface to use is /dev/rfkill, which is a misc
+character device that allows userspace to obtain and set the state of rfkill
+devices and sets of devices. It also notifies userspace about device addition
+and removal. The API is a simple read/write API that is defined in
+linux/rfkill.h, with one ioctl that allows turning off the deprecated input
+handler in the kernel for the transition period.
+
+Except for the one ioctl, communication with the kernel is done via read()
+and write() of instances of 'struct rfkill_event'. In this structure, the
+soft and hard block are properly separated (unlike sysfs, see below) and
+userspace is able to get a consistent snapshot of all rfkill devices in the
+system. Also, it is possible to switch all rfkill drivers (or all drivers of
+a specified type) into a state which also updates the default state for
+hotplugged devices.
+
+After an application opens /dev/rfkill, it can read the current state of all
+devices. Changes can be obtained by either polling the descriptor for
+hotplug or state change events or by listening for uevents emitted by the
+rfkill core framework.
+
+Additionally, each rfkill device is registered in sysfs and emits uevents.
+
+rfkill devices issue uevents (with an action of "change"), with the following
+environment variables set::
+
+	RFKILL_NAME
+	RFKILL_STATE
+	RFKILL_TYPE
+
+The content of these variables corresponds to the "name", "state" and
+"type" sysfs files explained above.
+
+For further details consult Documentation/ABI/stable/sysfs-class-rfkill.
diff --git a/Documentation/driver-api/sgi-ioc4.rst b/Documentation/driver-api/sgi-ioc4.rst
new file mode 100644
index 000000000000..72709222d3c0
--- /dev/null
+++ b/Documentation/driver-api/sgi-ioc4.rst
@@ -0,0 +1,49 @@
+====================================
+SGI IOC4 PCI (multi function) device
+====================================
+
+The SGI IOC4 PCI device is a bit of a strange beast, so some notes on
+it are in order.
+
+First, even though the IOC4 performs multiple functions, such as an
+IDE controller, a serial controller, a PS/2 keyboard/mouse controller,
+and an external interrupt mechanism, it's not implemented as a
+multifunction device.  The consequence of this from a software
+standpoint is that all these functions share a single IRQ, and
+they can't all register to own the same PCI device ID.  To make
+matters a bit worse, some of the register blocks (and even registers
+themselves) present in IOC4 are mixed-purpose between these several
+functions, meaning that there's no clear "owning" device driver.
+
+The solution is to organize the IOC4 driver into several independent
+drivers, "ioc4", "sgiioc4", and "ioc4_serial".  Note that there is no
+PS/2 controller driver as this functionality has never been wired up
+on a shipping IO card.
+
+ioc4
+====
+This is the core (or shim) driver for IOC4.  It is responsible for
+initializing the basic functionality of the chip, and allocating
+the PCI resources that are shared between the IOC4 functions.
+
+This driver also provides registration functions that the other
+IOC4 drivers can call to make their presence known.  Each driver
+needs to provide a probe and remove function, which are invoked
+by the core driver at appropriate times.  The interface of these
+IOC4 function probe and remove operations isn't precisely the same
+as PCI device probe and remove operations, but is logically the
+same operation.
+
+sgiioc4
+=======
+This is the IDE driver for IOC4.  Its name isn't very descriptive
+simply for historical reasons (it used to be the only IOC4 driver
+component).  There's not much to say about it other than it hooks
+up to the ioc4 driver via the appropriate registration, probe, and
+remove functions.
+
+ioc4_serial
+===========
+This is the serial driver for IOC4.  There's not much to say about it
+other than it hooks up to the ioc4 driver via the appropriate registration,
+probe, and remove functions.
diff --git a/Documentation/driver-api/sm501.rst b/Documentation/driver-api/sm501.rst
new file mode 100644
index 000000000000..882507453ba4
--- /dev/null
+++ b/Documentation/driver-api/sm501.rst
@@ -0,0 +1,74 @@
+.. include:: <isonum.txt>
+
+============
+SM501 Driver
+============
+
+:Copyright: |copy| 2006, 2007 Simtec Electronics
+
+The Silicon Motion SM501 multimedia companion chip is a multifunction device
+which may provide numerous interfaces including USB host controller USB gadget,
+asynchronous serial ports, audio functions, and a dual display video interface.
+The device may be connected by PCI or local bus with varying functions enabled.
+
+Core
+----
+
+The core driver in drivers/mfd provides common services for the
+drivers which manage the specific hardware blocks. These services
+include locking for common registers, clock control and resource
+management.
+
+The core registers drivers for both PCI and generic bus based
+chips via the platform device and driver system.
+
+On detection of a device, the core initialises the chip (which may
+be specified by the platform data) and then exports the selected
+peripheral set as platform devices for the specific drivers.
+
+The core re-uses the platform device system as the platform device
+system provides enough features to support the drivers without the
+need to create a new bus-type and the associated code to go with it.
+
+
+Resources
+---------
+
+Each peripheral has a view of the device which is implicitly narrowed to
+the specific set of resources that peripheral requires in order to
+function correctly.
+
+The centralised memory allocation allows the driver to ensure that the
+maximum possible resource allocation can be made to the video subsystem
+as this is by-far the most resource-sensitive of the on-chip functions.
+
+The primary issue with memory allocation is that of moving the video
+buffers once a display mode is chosen. Indeed when a video mode change
+occurs the memory footprint of the video subsystem changes.
+
+Since video memory is difficult to move without changing the display
+(unless sufficient contiguous memory can be provided for the old and new
+modes simultaneously) the video driver fully utilises the memory area
+given to it by aligning fb0 to the start of the area and fb1 to the end
+of it. Any memory left over in the middle is used for the acceleration
+functions, which are transient and thus their location is less critical
+as it can be moved.
+
+
+Configuration
+-------------
+
+The platform device driver uses a set of platform data to pass
+configurations through to the core and the subsidiary drivers
+so that there can be support for more than one system carrying
+an SM501 built into a single kernel image.
+
+The PCI driver assumes that the PCI card behaves as per the Silicon
+Motion reference design.
+
+There is an errata (AB-5) affecting the selection of the
+of the M1XCLK and M1CLK frequencies. These two clocks
+must be sourced from the same PLL, although they can then
+be divided down individually. If this is not set, then SM501 may
+lock and hang the whole system. The driver will refuse to
+attach if the PLL selection is different.
diff --git a/Documentation/driver-api/smsc_ece1099.rst b/Documentation/driver-api/smsc_ece1099.rst
new file mode 100644
index 000000000000..079277421eaf
--- /dev/null
+++ b/Documentation/driver-api/smsc_ece1099.rst
@@ -0,0 +1,60 @@
+=================================================
+Msc Keyboard Scan Expansion/GPIO Expansion device
+=================================================
+
+What is smsc-ece1099?
+----------------------
+
+The ECE1099 is a 40-Pin 3.3V Keyboard Scan Expansion
+or GPIO Expansion device. The device supports a keyboard
+scan matrix of 23x8. The device is connected to a Master
+via the SMSC BC-Link interface or via the SMBus.
+Keypad scan Input(KSI) and Keypad Scan Output(KSO) signals
+are multiplexed with GPIOs.
+
+Interrupt generation
+--------------------
+
+Interrupts can be generated by an edge detection on a GPIO
+pin or an edge detection on one of the bus interface pins.
+Interrupts can also be detected on the keyboard scan interface.
+The bus interrupt pin (BC_INT# or SMBUS_INT#) is asserted if
+any bit in one of the Interrupt Status registers is 1 and
+the corresponding Interrupt Mask bit is also 1.
+
+In order for software to determine which device is the source
+of an interrupt, it should first read the Group Interrupt Status Register
+to determine which Status register group is a source for the interrupt.
+Software should read both the Status register and the associated Mask register,
+then AND the two values together. Bits that are 1 in the result of the AND
+are active interrupts. Software clears an interrupt by writing a 1 to the
+corresponding bit in the Status register.
+
+Communication Protocol
+----------------------
+
+- SMbus slave Interface
+	The host processor communicates with the ECE1099 device
+	through a series of read/write registers via the SMBus
+	interface. SMBus is a serial communication protocol between
+	a computer host and its peripheral devices. The SMBus data
+	rate is 10KHz minimum to 400 KHz maximum
+
+- Slave Bus Interface
+	The ECE1099 device SMBus implementation is a subset of the
+	SMBus interface to the host. The device is a slave-only SMBus device.
+	The implementation in the device is a subset of SMBus since it
+	only supports four protocols.
+
+	The Write Byte, Read Byte, Send Byte, and Receive Byte protocols are the
+	only valid SMBus protocols for the device.
+
+- BC-LinkTM Interface
+	The BC-Link is a proprietary bus that allows communication
+	between a Master device and a Companion device. The Master
+	device uses this serial bus to read and write registers
+	located on the Companion device. The bus comprises three signals,
+	BC_CLK, BC_DAT and BC_INT#. The Master device always provides the
+	clock, BC_CLK, and the Companion device is the source for an
+	independent asynchronous interrupt signal, BC_INT#. The ECE1099
+	supports BC-Link speeds up to 24MHz.
diff --git a/Documentation/driver-api/switchtec.rst b/Documentation/driver-api/switchtec.rst
new file mode 100644
index 000000000000..7611fdc53e19
--- /dev/null
+++ b/Documentation/driver-api/switchtec.rst
@@ -0,0 +1,102 @@
+========================
+Linux Switchtec Support
+========================
+
+Microsemi's "Switchtec" line of PCI switch devices is already
+supported by the kernel with standard PCI switch drivers. However, the
+Switchtec device advertises a special management endpoint which
+enables some additional functionality. This includes:
+
+* Packet and Byte Counters
+* Firmware Upgrades
+* Event and Error logs
+* Querying port link status
+* Custom user firmware commands
+
+The switchtec kernel module implements this functionality.
+
+
+Interface
+=========
+
+The primary means of communicating with the Switchtec management firmware is
+through the Memory-mapped Remote Procedure Call (MRPC) interface.
+Commands are submitted to the interface with a 4-byte command
+identifier and up to 1KB of command specific data. The firmware will
+respond with a 4-byte return code and up to 1KB of command-specific
+data. The interface only processes a single command at a time.
+
+
+Userspace Interface
+===================
+
+The MRPC interface will be exposed to userspace through a simple char
+device: /dev/switchtec#, one for each management endpoint in the system.
+
+The char device has the following semantics:
+
+* A write must consist of at least 4 bytes and no more than 1028 bytes.
+  The first 4 bytes will be interpreted as the Command ID and the
+  remainder will be used as the input data. A write will send the
+  command to the firmware to begin processing.
+
+* Each write must be followed by exactly one read. Any double write will
+  produce an error and any read that doesn't follow a write will
+  produce an error.
+
+* A read will block until the firmware completes the command and return
+  the 4-byte Command Return Value plus up to 1024 bytes of output
+  data. (The length will be specified by the size parameter of the read
+  call -- reading less than 4 bytes will produce an error.)
+
+* The poll call will also be supported for userspace applications that
+  need to do other things while waiting for the command to complete.
+
+The following IOCTLs are also supported by the device:
+
+* SWITCHTEC_IOCTL_FLASH_INFO - Retrieve firmware length and number
+  of partitions in the device.
+
+* SWITCHTEC_IOCTL_FLASH_PART_INFO - Retrieve address and lengeth for
+  any specified partition in flash.
+
+* SWITCHTEC_IOCTL_EVENT_SUMMARY - Read a structure of bitmaps
+  indicating all uncleared events.
+
+* SWITCHTEC_IOCTL_EVENT_CTL - Get the current count, clear and set flags
+  for any event. This ioctl takes in a switchtec_ioctl_event_ctl struct
+  with the event_id, index and flags set (index being the partition or PFF
+  number for non-global events). It returns whether the event has
+  occurred, the number of times and any event specific data. The flags
+  can be used to clear the count or enable and disable actions to
+  happen when the event occurs.
+  By using the SWITCHTEC_IOCTL_EVENT_FLAG_EN_POLL flag,
+  you can set an event to trigger a poll command to return with
+  POLLPRI. In this way, userspace can wait for events to occur.
+
+* SWITCHTEC_IOCTL_PFF_TO_PORT and SWITCHTEC_IOCTL_PORT_TO_PFF convert
+  between PCI Function Framework number (used by the event system)
+  and Switchtec Logic Port ID and Partition number (which is more
+  user friendly).
+
+
+Non-Transparent Bridge (NTB) Driver
+===================================
+
+An NTB hardware driver is provided for the Switchtec hardware in
+ntb_hw_switchtec. Currently, it only supports switches configured with
+exactly 2 NT partitions and zero or more non-NT partitions. It also requires
+the following configuration settings:
+
+* Both NT partitions must be able to access each other's GAS spaces.
+  Thus, the bits in the GAS Access Vector under Management Settings
+  must be set to support this.
+* Kernel configuration MUST include support for NTB (CONFIG_NTB needs
+  to be set)
+
+NT EP BAR 2 will be dynamically configured as a Direct Window, and
+the configuration file does not need to configure it explicitly.
+
+Please refer to Documentation/driver-api/ntb.rst in Linux source tree for an overall
+understanding of the Linux NTB stack. ntb_hw_switchtec works as an NTB
+Hardware Driver in this stack.
diff --git a/Documentation/driver-api/sync_file.rst b/Documentation/driver-api/sync_file.rst
new file mode 100644
index 000000000000..496fb2c3b3e6
--- /dev/null
+++ b/Documentation/driver-api/sync_file.rst
@@ -0,0 +1,86 @@
+===================
+Sync File API Guide
+===================
+
+:Author: Gustavo Padovan <gustavo at padovan dot org>
+
+This document serves as a guide for device drivers writers on what the
+sync_file API is, and how drivers can support it. Sync file is the carrier of
+the fences(struct dma_fence) that are needed to synchronize between drivers or
+across process boundaries.
+
+The sync_file API is meant to be used to send and receive fence information
+to/from userspace. It enables userspace to do explicit fencing, where instead
+of attaching a fence to the buffer a producer driver (such as a GPU or V4L
+driver) sends the fence related to the buffer to userspace via a sync_file.
+
+The sync_file then can be sent to the consumer (DRM driver for example), that
+will not use the buffer for anything before the fence(s) signals, i.e., the
+driver that issued the fence is not using/processing the buffer anymore, so it
+signals that the buffer is ready to use. And vice-versa for the consumer ->
+producer part of the cycle.
+
+Sync files allows userspace awareness on buffer sharing synchronization between
+drivers.
+
+Sync file was originally added in the Android kernel but current Linux Desktop
+can benefit a lot from it.
+
+in-fences and out-fences
+------------------------
+
+Sync files can go either to or from userspace. When a sync_file is sent from
+the driver to userspace we call the fences it contains 'out-fences'. They are
+related to a buffer that the driver is processing or is going to process, so
+the driver creates an out-fence to be able to notify, through
+dma_fence_signal(), when it has finished using (or processing) that buffer.
+Out-fences are fences that the driver creates.
+
+On the other hand if the driver receives fence(s) through a sync_file from
+userspace we call these fence(s) 'in-fences'. Receiving in-fences means that
+we need to wait for the fence(s) to signal before using any buffer related to
+the in-fences.
+
+Creating Sync Files
+-------------------
+
+When a driver needs to send an out-fence userspace it creates a sync_file.
+
+Interface::
+
+	struct sync_file *sync_file_create(struct dma_fence *fence);
+
+The caller pass the out-fence and gets back the sync_file. That is just the
+first step, next it needs to install an fd on sync_file->file. So it gets an
+fd::
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+
+and installs it on sync_file->file::
+
+	fd_install(fd, sync_file->file);
+
+The sync_file fd now can be sent to userspace.
+
+If the creation process fail, or the sync_file needs to be released by any
+other reason fput(sync_file->file) should be used.
+
+Receiving Sync Files from Userspace
+-----------------------------------
+
+When userspace needs to send an in-fence to the driver it passes file descriptor
+of the Sync File to the kernel. The kernel can then retrieve the fences
+from it.
+
+Interface::
+
+	struct dma_fence *sync_file_get_fence(int fd);
+
+
+The returned reference is owned by the caller and must be disposed of
+afterwards using dma_fence_put(). In case of error, a NULL is returned instead.
+
+References:
+
+1. struct sync_file in include/linux/sync_file.h
+2. All interfaces mentioned above defined in include/linux/sync_file.h
diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
new file mode 100644
index 000000000000..25eb7d5b834b
--- /dev/null
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -0,0 +1,414 @@
+.. include:: <isonum.txt>
+
+=====================
+VFIO Mediated devices
+=====================
+
+:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
+:Author: Neo Jia <cjia@nvidia.com>
+:Author: Kirti Wankhede <kwankhede@nvidia.com>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License version 2 as
+published by the Free Software Foundation.
+
+
+Virtual Function I/O (VFIO) Mediated devices[1]
+===============================================
+
+The number of use cases for virtualizing DMA devices that do not have built-in
+SR_IOV capability is increasing. Previously, to virtualize such devices,
+developers had to create their own management interfaces and APIs, and then
+integrate them with user space software. To simplify integration with user space
+software, we have identified common requirements and a unified management
+interface for such devices.
+
+The VFIO driver framework provides unified APIs for direct device access. It is
+an IOMMU/device-agnostic framework for exposing direct device access to user
+space in a secure, IOMMU-protected environment. This framework is used for
+multiple devices, such as GPUs, network adapters, and compute accelerators. With
+direct device access, virtual machines or user space applications have direct
+access to the physical device. This framework is reused for mediated devices.
+
+The mediated core driver provides a common interface for mediated device
+management that can be used by drivers of different devices. This module
+provides a generic interface to perform these operations:
+
+* Create and destroy a mediated device
+* Add a mediated device to and remove it from a mediated bus driver
+* Add a mediated device to and remove it from an IOMMU group
+
+The mediated core driver also provides an interface to register a bus driver.
+For example, the mediated VFIO mdev driver is designed for mediated devices and
+supports VFIO APIs. The mediated bus driver adds a mediated device to and
+removes it from a VFIO group.
+
+The following high-level block diagram shows the main components and interfaces
+in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
+devices as examples, as these devices are the first devices to use this module::
+
+     +---------------+
+     |               |
+     | +-----------+ |  mdev_register_driver() +--------------+
+     | |           | +<------------------------+              |
+     | |  mdev     | |                         |              |
+     | |  bus      | +------------------------>+ vfio_mdev.ko |<-> VFIO user
+     | |  driver   | |     probe()/remove()    |              |    APIs
+     | |           | |                         +--------------+
+     | +-----------+ |
+     |               |
+     |  MDEV CORE    |
+     |   MODULE      |
+     |   mdev.ko     |
+     | +-----------+ |  mdev_register_device() +--------------+
+     | |           | +<------------------------+              |
+     | |           | |                         |  nvidia.ko   |<-> physical
+     | |           | +------------------------>+              |    device
+     | |           | |        callbacks        +--------------+
+     | | Physical  | |
+     | |  device   | |  mdev_register_device() +--------------+
+     | | interface | |<------------------------+              |
+     | |           | |                         |  i915.ko     |<-> physical
+     | |           | +------------------------>+              |    device
+     | |           | |        callbacks        +--------------+
+     | |           | |
+     | |           | |  mdev_register_device() +--------------+
+     | |           | +<------------------------+              |
+     | |           | |                         | ccw_device.ko|<-> physical
+     | |           | +------------------------>+              |    device
+     | |           | |        callbacks        +--------------+
+     | +-----------+ |
+     +---------------+
+
+
+Registration Interfaces
+=======================
+
+The mediated core driver provides the following types of registration
+interfaces:
+
+* Registration interface for a mediated bus driver
+* Physical device driver interface
+
+Registration Interface for a Mediated Bus Driver
+------------------------------------------------
+
+The registration interface for a mediated bus driver provides the following
+structure to represent a mediated device's driver::
+
+     /*
+      * struct mdev_driver [2] - Mediated device's driver
+      * @name: driver name
+      * @probe: called when new device created
+      * @remove: called when device removed
+      * @driver: device driver structure
+      */
+     struct mdev_driver {
+	     const char *name;
+	     int  (*probe)  (struct device *dev);
+	     void (*remove) (struct device *dev);
+	     struct device_driver    driver;
+     };
+
+A mediated bus driver for mdev should use this structure in the function calls
+to register and unregister itself with the core driver:
+
+* Register::
+
+    extern int  mdev_register_driver(struct mdev_driver *drv,
+				   struct module *owner);
+
+* Unregister::
+
+    extern void mdev_unregister_driver(struct mdev_driver *drv);
+
+The mediated bus driver is responsible for adding mediated devices to the VFIO
+group when devices are bound to the driver and removing mediated devices from
+the VFIO when devices are unbound from the driver.
+
+
+Physical Device Driver Interface
+--------------------------------
+
+The physical device driver interface provides the mdev_parent_ops[3] structure
+to define the APIs to manage work in the mediated core driver that is related
+to the physical device.
+
+The structures in the mdev_parent_ops structure are as follows:
+
+* dev_attr_groups: attributes of the parent device
+* mdev_attr_groups: attributes of the mediated device
+* supported_config: attributes to define supported configurations
+
+The functions in the mdev_parent_ops structure are as follows:
+
+* create: allocate basic resources in a driver for a mediated device
+* remove: free resources in a driver when a mediated device is destroyed
+
+(Note that mdev-core provides no implicit serialization of create/remove
+callbacks per mdev parent device, per mdev type, or any other categorization.
+Vendor drivers are expected to be fully asynchronous in this respect or
+provide their own internal resource protection.)
+
+The callbacks in the mdev_parent_ops structure are as follows:
+
+* open: open callback of mediated device
+* close: close callback of mediated device
+* ioctl: ioctl callback of mediated device
+* read : read emulation callback
+* write: write emulation callback
+* mmap: mmap emulation callback
+
+A driver should use the mdev_parent_ops structure in the function call to
+register itself with the mdev core driver::
+
+	extern int  mdev_register_device(struct device *dev,
+	                                 const struct mdev_parent_ops *ops);
+
+However, the mdev_parent_ops structure is not required in the function call
+that a driver should use to unregister itself with the mdev core driver::
+
+	extern void mdev_unregister_device(struct device *dev);
+
+
+Mediated Device Management Interface Through sysfs
+==================================================
+
+The management interface through sysfs enables user space software, such as
+libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
+This management interface provides flexibility to the underlying physical
+device's driver to support features such as:
+
+* Mediated device hot plug
+* Multiple mediated devices in a single virtual machine
+* Multiple mediated devices from different physical devices
+
+Links in the mdev_bus Class Directory
+-------------------------------------
+The /sys/class/mdev_bus/ directory contains links to devices that are registered
+with the mdev core driver.
+
+Directories and files under the sysfs for Each Physical Device
+--------------------------------------------------------------
+
+::
+
+  |- [parent physical device]
+  |--- Vendor-specific-attributes [optional]
+  |--- [mdev_supported_types]
+  |     |--- [<type-id>]
+  |     |   |--- create
+  |     |   |--- name
+  |     |   |--- available_instances
+  |     |   |--- device_api
+  |     |   |--- description
+  |     |   |--- [devices]
+  |     |--- [<type-id>]
+  |     |   |--- create
+  |     |   |--- name
+  |     |   |--- available_instances
+  |     |   |--- device_api
+  |     |   |--- description
+  |     |   |--- [devices]
+  |     |--- [<type-id>]
+  |          |--- create
+  |          |--- name
+  |          |--- available_instances
+  |          |--- device_api
+  |          |--- description
+  |          |--- [devices]
+
+* [mdev_supported_types]
+
+  The list of currently supported mediated device types and their details.
+
+  [<type-id>], device_api, and available_instances are mandatory attributes
+  that should be provided by vendor driver.
+
+* [<type-id>]
+
+  The [<type-id>] name is created by adding the device driver string as a prefix
+  to the string provided by the vendor driver. This format of this name is as
+  follows::
+
+	sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
+
+  (or using mdev_parent_dev(mdev) to arrive at the parent device outside
+  of the core mdev code)
+
+* device_api
+
+  This attribute should show which device API is being created, for example,
+  "vfio-pci" for a PCI device.
+
+* available_instances
+
+  This attribute should show the number of devices of type <type-id> that can be
+  created.
+
+* [device]
+
+  This directory contains links to the devices of type <type-id> that have been
+  created.
+
+* name
+
+  This attribute should show human readable name. This is optional attribute.
+
+* description
+
+  This attribute should show brief features/description of the type. This is
+  optional attribute.
+
+Directories and Files Under the sysfs for Each mdev Device
+----------------------------------------------------------
+
+::
+
+  |- [parent phy device]
+  |--- [$MDEV_UUID]
+         |--- remove
+         |--- mdev_type {link to its type}
+         |--- vendor-specific-attributes [optional]
+
+* remove (write only)
+
+Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
+fail the remove() callback if that device is active and the vendor driver
+doesn't support hot unplug.
+
+Example::
+
+	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
+
+Mediated device Hot plug
+------------------------
+
+Mediated devices can be created and assigned at runtime. The procedure to hot
+plug a mediated device is the same as the procedure to hot plug a PCI device.
+
+Translation APIs for Mediated Devices
+=====================================
+
+The following APIs are provided for translating user pfn to host pfn in a VFIO
+driver::
+
+	extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
+				  int npage, int prot, unsigned long *phys_pfn);
+
+	extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
+				    int npage);
+
+These functions call back into the back-end IOMMU module by using the pin_pages
+and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
+these callbacks are supported in the TYPE1 IOMMU module. To enable them for
+other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
+these two callback functions.
+
+Using the Sample Code
+=====================
+
+mtty.c in samples/vfio-mdev/ directory is a sample driver program to
+demonstrate how to use the mediated device framework.
+
+The sample driver creates an mdev device that simulates a serial port over a PCI
+card.
+
+1. Build and load the mtty.ko module.
+
+   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
+
+   Files in this device directory in sysfs are similar to the following::
+
+     # tree /sys/devices/virtual/mtty/mtty/
+        /sys/devices/virtual/mtty/mtty/
+        |-- mdev_supported_types
+        |   |-- mtty-1
+        |   |   |-- available_instances
+        |   |   |-- create
+        |   |   |-- device_api
+        |   |   |-- devices
+        |   |   `-- name
+        |   `-- mtty-2
+        |       |-- available_instances
+        |       |-- create
+        |       |-- device_api
+        |       |-- devices
+        |       `-- name
+        |-- mtty_dev
+        |   `-- sample_mtty_dev
+        |-- power
+        |   |-- autosuspend_delay_ms
+        |   |-- control
+        |   |-- runtime_active_time
+        |   |-- runtime_status
+        |   `-- runtime_suspended_time
+        |-- subsystem -> ../../../../class/mtty
+        `-- uevent
+
+2. Create a mediated device by using the dummy device that you created in the
+   previous step::
+
+     # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >	\
+              /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
+
+3. Add parameters to qemu-kvm::
+
+     -device vfio-pci,\
+      sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
+
+4. Boot the VM.
+
+   In the Linux guest VM, with no hardware on the host, the device appears
+   as  follows::
+
+     # lspci -s 00:05.0 -xxvv
+     00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
+             Subsystem: Device 4348:3253
+             Physical Slot: 5
+             Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
+     Stepping- SERR- FastB2B- DisINTx-
+             Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
+     <TAbort- <MAbort- >SERR- <PERR- INTx-
+             Interrupt: pin A routed to IRQ 10
+             Region 0: I/O ports at c150 [size=8]
+             Region 1: I/O ports at c158 [size=8]
+             Kernel driver in use: serial
+     00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
+     10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
+     20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
+     30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
+
+     In the Linux guest VM, dmesg output for the device is as follows:
+
+     serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
+     0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
+     0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
+
+
+5. In the Linux guest VM, check the serial ports::
+
+     # setserial -g /dev/ttyS*
+     /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
+     /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
+     /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
+
+6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
+   /dev/ttyS2 with hardware flow control disabled.
+
+7. Type data on the minicom terminal or send data to the terminal emulation
+   program and read the data.
+
+   Data is loop backed from hosts mtty driver.
+
+8. Destroy the mediated device that you created::
+
+     # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
+
+References
+==========
+
+1. See Documentation/driver-api/vfio.rst for more information on VFIO.
+2. struct mdev_driver in include/linux/mdev.h
+3. struct mdev_parent_ops in include/linux/mdev.h
+4. struct vfio_iommu_driver_ops in include/linux/vfio.h
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
new file mode 100644
index 000000000000..f1a4d3c3ba0b
--- /dev/null
+++ b/Documentation/driver-api/vfio.rst
@@ -0,0 +1,520 @@
+==================================
+VFIO - "Virtual Function I/O" [1]_
+==================================
+
+Many modern system now provide DMA and interrupt remapping facilities
+to help ensure I/O devices behave within the boundaries they've been
+allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
+POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
+systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
+agnostic framework for exposing direct device access to userspace, in
+a secure, IOMMU protected environment.  In other words, this allows
+safe [2]_, non-privileged, userspace drivers.
+
+Why do we want that?  Virtual machines often make use of direct device
+access ("device assignment") when configured for the highest possible
+I/O performance.  From a device and host perspective, this simply
+turns the VM into a userspace driver, with the benefits of
+significantly reduced latency, higher bandwidth, and direct use of
+bare-metal device drivers [3]_.
+
+Some applications, particularly in the high performance computing
+field, also benefit from low-overhead, direct device access from
+userspace.  Examples include network adapters (often non-TCP/IP based)
+and compute accelerators.  Prior to VFIO, these drivers had to either
+go through the full development cycle to become proper upstream
+driver, be maintained out of tree, or make use of the UIO framework,
+which has no notion of IOMMU protection, limited interrupt support,
+and requires root privileges to access things like PCI configuration
+space.
+
+The VFIO driver framework intends to unify these, replacing both the
+KVM PCI specific device assignment code as well as provide a more
+secure, more featureful userspace driver environment than UIO.
+
+Groups, Devices, and IOMMUs
+---------------------------
+
+Devices are the main target of any I/O driver.  Devices typically
+create a programming interface made up of I/O access, interrupts,
+and DMA.  Without going into the details of each of these, DMA is
+by far the most critical aspect for maintaining a secure environment
+as allowing a device read-write access to system memory imposes the
+greatest risk to the overall system integrity.
+
+To help mitigate this risk, many modern IOMMUs now incorporate
+isolation properties into what was, in many cases, an interface only
+meant for translation (ie. solving the addressing problems of devices
+with limited address spaces).  With this, devices can now be isolated
+from each other and from arbitrary memory access, thus allowing
+things like secure direct assignment of devices into virtual machines.
+
+This isolation is not always at the granularity of a single device
+though.  Even when an IOMMU is capable of this, properties of devices,
+interconnects, and IOMMU topologies can each reduce this isolation.
+For instance, an individual device may be part of a larger multi-
+function enclosure.  While the IOMMU may be able to distinguish
+between devices within the enclosure, the enclosure may not require
+transactions between devices to reach the IOMMU.  Examples of this
+could be anything from a multi-function PCI device with backdoors
+between functions to a non-PCI-ACS (Access Control Services) capable
+bridge allowing redirection without reaching the IOMMU.  Topology
+can also play a factor in terms of hiding devices.  A PCIe-to-PCI
+bridge masks the devices behind it, making transaction appear as if
+from the bridge itself.  Obviously IOMMU design plays a major factor
+as well.
+
+Therefore, while for the most part an IOMMU may have device level
+granularity, any system is susceptible to reduced granularity.  The
+IOMMU API therefore supports a notion of IOMMU groups.  A group is
+a set of devices which is isolatable from all other devices in the
+system.  Groups are therefore the unit of ownership used by VFIO.
+
+While the group is the minimum granularity that must be used to
+ensure secure user access, it's not necessarily the preferred
+granularity.  In IOMMUs which make use of page tables, it may be
+possible to share a set of page tables between different groups,
+reducing the overhead both to the platform (reduced TLB thrashing,
+reduced duplicate page tables), and to the user (programming only
+a single set of translations).  For this reason, VFIO makes use of
+a container class, which may hold one or more groups.  A container
+is created by simply opening the /dev/vfio/vfio character device.
+
+On its own, the container provides little functionality, with all
+but a couple version and extension query interfaces locked away.
+The user needs to add a group into the container for the next level
+of functionality.  To do this, the user first needs to identify the
+group associated with the desired device.  This can be done using
+the sysfs links described in the example below.  By unbinding the
+device from the host driver and binding it to a VFIO driver, a new
+VFIO group will appear for the group as /dev/vfio/$GROUP, where
+$GROUP is the IOMMU group number of which the device is a member.
+If the IOMMU group contains multiple devices, each will need to
+be bound to a VFIO driver before operations on the VFIO group
+are allowed (it's also sufficient to only unbind the device from
+host drivers if a VFIO driver is unavailable; this will make the
+group available, but not that particular device).  TBD - interface
+for disabling driver probing/locking a device.
+
+Once the group is ready, it may be added to the container by opening
+the VFIO group character device (/dev/vfio/$GROUP) and using the
+VFIO_GROUP_SET_CONTAINER ioctl, passing the file descriptor of the
+previously opened container file.  If desired and if the IOMMU driver
+supports sharing the IOMMU context between groups, multiple groups may
+be set to the same container.  If a group fails to set to a container
+with existing groups, a new empty container will need to be used
+instead.
+
+With a group (or groups) attached to a container, the remaining
+ioctls become available, enabling access to the VFIO IOMMU interfaces.
+Additionally, it now becomes possible to get file descriptors for each
+device within a group using an ioctl on the VFIO group file descriptor.
+
+The VFIO device API includes ioctls for describing the device, the I/O
+regions and their read/write/mmap offsets on the device descriptor, as
+well as mechanisms for describing and registering interrupt
+notifications.
+
+VFIO Usage Example
+------------------
+
+Assume user wants to access PCI device 0000:06:0d.0::
+
+	$ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
+	../../../../kernel/iommu_groups/26
+
+This device is therefore in IOMMU group 26.  This device is on the
+pci bus, therefore the user will make use of vfio-pci to manage the
+group::
+
+	# modprobe vfio-pci
+
+Binding this device to the vfio-pci driver creates the VFIO group
+character devices for this group::
+
+	$ lspci -n -s 0000:06:0d.0
+	06:0d.0 0401: 1102:0002 (rev 08)
+	# echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
+	# echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
+
+Now we need to look at what other devices are in the group to free
+it for use by VFIO::
+
+	$ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices
+	total 0
+	lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 ->
+		../../../../devices/pci0000:00/0000:00:1e.0
+	lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 ->
+		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
+	lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 ->
+		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
+
+This device is behind a PCIe-to-PCI bridge [4]_, therefore we also
+need to add device 0000:06:0d.1 to the group following the same
+procedure as above.  Device 0000:00:1e.0 is a bridge that does
+not currently have a host driver, therefore it's not required to
+bind this device to the vfio-pci driver (vfio-pci does not currently
+support PCI bridges).
+
+The final step is to provide the user with access to the group if
+unprivileged operation is desired (note that /dev/vfio/vfio provides
+no capabilities on its own and is therefore expected to be set to
+mode 0666 by the system)::
+
+	# chown user:user /dev/vfio/26
+
+The user now has full access to all the devices and the iommu for this
+group and can access them as follows::
+
+	int container, group, device, i;
+	struct vfio_group_status group_status =
+					{ .argsz = sizeof(group_status) };
+	struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) };
+	struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };
+	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+
+	/* Create a new container */
+	container = open("/dev/vfio/vfio", O_RDWR);
+
+	if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
+		/* Unknown API version */
+
+	if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
+		/* Doesn't support the IOMMU driver we want. */
+
+	/* Open the group */
+	group = open("/dev/vfio/26", O_RDWR);
+
+	/* Test the group is viable and available */
+	ioctl(group, VFIO_GROUP_GET_STATUS, &group_status);
+
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE))
+		/* Group is not viable (ie, not all devices bound for vfio) */
+
+	/* Add the group to the container */
+	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
+
+	/* Enable the IOMMU model we want */
+	ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
+
+	/* Get addition IOMMU info */
+	ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info);
+
+	/* Allocate some space and setup a DMA mapping */
+	dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	dma_map.size = 1024 * 1024;
+	dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
+	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+	/* Get a file descriptor for the device */
+	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
+
+	/* Test and setup the device */
+	ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
+
+	for (i = 0; i < device_info.num_regions; i++) {
+		struct vfio_region_info reg = { .argsz = sizeof(reg) };
+
+		reg.index = i;
+
+		ioctl(device, VFIO_DEVICE_GET_REGION_INFO, &reg);
+
+		/* Setup mappings... read/write offsets, mmaps
+		 * For PCI devices, config space is a region */
+	}
+
+	for (i = 0; i < device_info.num_irqs; i++) {
+		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+
+		irq.index = i;
+
+		ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+
+		/* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */
+	}
+
+	/* Gratuitous device reset and go... */
+	ioctl(device, VFIO_DEVICE_RESET);
+
+VFIO User API
+-------------------------------------------------------------------------------
+
+Please see include/linux/vfio.h for complete API documentation.
+
+VFIO bus driver API
+-------------------------------------------------------------------------------
+
+VFIO bus drivers, such as vfio-pci make use of only a few interfaces
+into VFIO core.  When devices are bound and unbound to the driver,
+the driver should call vfio_add_group_dev() and vfio_del_group_dev()
+respectively::
+
+	extern int vfio_add_group_dev(struct device *dev,
+				      const struct vfio_device_ops *ops,
+				      void *device_data);
+
+	extern void *vfio_del_group_dev(struct device *dev);
+
+vfio_add_group_dev() indicates to the core to begin tracking the
+iommu_group of the specified dev and register the dev as owned by
+a VFIO bus driver.  The driver provides an ops structure for callbacks
+similar to a file operations structure::
+
+	struct vfio_device_ops {
+		int	(*open)(void *device_data);
+		void	(*release)(void *device_data);
+		ssize_t	(*read)(void *device_data, char __user *buf,
+				size_t count, loff_t *ppos);
+		ssize_t	(*write)(void *device_data, const char __user *buf,
+				 size_t size, loff_t *ppos);
+		long	(*ioctl)(void *device_data, unsigned int cmd,
+				 unsigned long arg);
+		int	(*mmap)(void *device_data, struct vm_area_struct *vma);
+	};
+
+Each function is passed the device_data that was originally registered
+in the vfio_add_group_dev() call above.  This allows the bus driver
+an easy place to store its opaque, private data.  The open/release
+callbacks are issued when a new file descriptor is created for a
+device (via VFIO_GROUP_GET_DEVICE_FD).  The ioctl interface provides
+a direct pass through for VFIO_DEVICE_* ioctls.  The read/write/mmap
+interfaces implement the device region access defined by the device's
+own VFIO_DEVICE_GET_REGION_INFO ioctl.
+
+
+PPC64 sPAPR implementation note
+-------------------------------
+
+This implementation has some specifics:
+
+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
+   container is supported as an IOMMU table is allocated at the boot time,
+   one table per a IOMMU group which is a Partitionable Endpoint (PE)
+   (PE is often a PCI domain but not always).
+
+   Newer systems (POWER8 with IODA2) have improved hardware design which allows
+   to remove this limitation and have multiple IOMMU groups per a VFIO
+   container.
+
+2) The hardware supports so called DMA windows - the PCI address range
+   within which DMA transfer is allowed, any attempt to access address space
+   out of the window leads to the whole PE isolation.
+
+3) PPC64 guests are paravirtualized but not fully emulated. There is an API
+   to map/unmap pages for DMA, and it normally maps 1..32 pages per call and
+   currently there is no way to reduce the number of calls. In order to make
+   things faster, the map/unmap handling has been implemented in real mode
+   which provides an excellent performance which has limitations such as
+   inability to do locked pages accounting in real time.
+
+4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
+   subtree that can be treated as a unit for the purposes of partitioning and
+   error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
+   function of a multi-function IOA, or multiple IOAs (possibly including
+   switch and bridge structures above the multiple IOAs). PPC64 guests detect
+   PCI errors and recover from them via EEH RTAS services, which works on the
+   basis of additional ioctl commands.
+
+   So 4 additional ioctls have been added:
+
+	VFIO_IOMMU_SPAPR_TCE_GET_INFO
+		returns the size and the start of the DMA window on the PCI bus.
+
+	VFIO_IOMMU_ENABLE
+		enables the container. The locked pages accounting
+		is done at this point. This lets user first to know what
+		the DMA window is and adjust rlimit before doing any real job.
+
+	VFIO_IOMMU_DISABLE
+		disables the container.
+
+	VFIO_EEH_PE_OP
+		provides an API for EEH setup, error detection and recovery.
+
+   The code flow from the example above should be slightly changed::
+
+	struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 };
+
+	.....
+	/* Add the group to the container */
+	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
+
+	/* Enable the IOMMU model we want */
+	ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU)
+
+	/* Get addition sPAPR IOMMU info */
+	vfio_iommu_spapr_tce_info spapr_iommu_info;
+	ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info);
+
+	if (ioctl(container, VFIO_IOMMU_ENABLE))
+		/* Cannot enable container, may be low rlimit */
+
+	/* Allocate some space and setup a DMA mapping */
+	dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+
+	dma_map.size = 1024 * 1024;
+	dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
+	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+	/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
+	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+	/* Get a file descriptor for the device */
+	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
+
+	....
+
+	/* Gratuitous device reset and go... */
+	ioctl(device, VFIO_DEVICE_RESET);
+
+	/* Make sure EEH is supported */
+	ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
+
+	/* Enable the EEH functionality on the device */
+	pe_op.op = VFIO_EEH_PE_ENABLE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* You're suggested to create additional data struct to represent
+	 * PE, and put child devices belonging to same IOMMU group to the
+	 * PE instance for later reference.
+	 */
+
+	/* Check the PE's state and make sure it's in functional state */
+	pe_op.op = VFIO_EEH_PE_GET_STATE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Save device state using pci_save_state().
+	 * EEH should be enabled on the specified device.
+	 */
+
+	....
+
+	/* Inject EEH error, which is expected to be caused by 32-bits
+	 * config load.
+	 */
+	pe_op.op = VFIO_EEH_PE_INJECT_ERR;
+	pe_op.err.type = EEH_ERR_TYPE_32;
+	pe_op.err.func = EEH_ERR_FUNC_LD_CFG_ADDR;
+	pe_op.err.addr = 0ul;
+	pe_op.err.mask = 0ul;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	....
+
+	/* When 0xFF's returned from reading PCI config space or IO BARs
+	 * of the PCI device. Check the PE's state to see if that has been
+	 * frozen.
+	 */
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Waiting for pending PCI transactions to be completed and don't
+	 * produce any more PCI traffic from/to the affected PE until
+	 * recovery is finished.
+	 */
+
+	/* Enable IO for the affected PE and collect logs. Usually, the
+	 * standard part of PCI config space, AER registers are dumped
+	 * as logs for further analysis.
+	 */
+	pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/*
+	 * Issue PE reset: hot or fundamental reset. Usually, hot reset
+	 * is enough. However, the firmware of some PCI adapters would
+	 * require fundamental reset.
+	 */
+	pe_op.op = VFIO_EEH_PE_RESET_HOT;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+	pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Configure the PCI bridges for the affected PE */
+	pe_op.op = VFIO_EEH_PE_CONFIGURE;
+	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+	/* Restored state we saved at initialization time. pci_restore_state()
+	 * is good enough as an example.
+	 */
+
+	/* Hopefully, error is recovered successfully. Now, you can resume to
+	 * start PCI traffic to/from the affected PE.
+	 */
+
+	....
+
+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
+   VFIO_IOMMU_DISABLE and implements 2 new ioctls:
+   VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
+   (which are unsupported in v1 IOMMU).
+
+   PPC64 paravirtualized guests generate a lot of map/unmap requests,
+   and the handling of those includes pinning/unpinning pages and updating
+   mm::locked_vm counter to make sure we do not exceed the rlimit.
+   The v2 IOMMU splits accounting and pinning into separate operations:
+
+   - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
+     receive a user space address and size of the block to be pinned.
+     Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
+     be called with the exact address and size used for registering
+     the memory block. The userspace is not expected to call these often.
+     The ranges are stored in a linked list in a VFIO container.
+
+   - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
+     IOMMU table and do not do pinning; instead these check that the userspace
+     address is from pre-registered range.
+
+   This separation helps in optimizing DMA for guests.
+
+6) sPAPR specification allows guests to have an additional DMA window(s) on
+   a PCI bus with a variable page size. Two ioctls have been added to support
+   this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE.
+   The platform has to support the functionality or error will be returned to
+   the userspace. The existing hardware supports up to 2 DMA windows, one is
+   2GB long, uses 4K pages and called "default 32bit window"; the other can
+   be as big as entire RAM, use different page size, it is optional - guests
+   create those in run-time if the guest driver supports 64bit DMA.
+
+   VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and
+   a number of TCE table levels (if a TCE table is going to be big enough and
+   the kernel may not be able to allocate enough of physically contiguous
+   memory). It creates a new window in the available slot and returns the bus
+   address where the new window starts. Due to hardware limitation, the user
+   space cannot choose the location of DMA windows.
+
+   VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window
+   and removes it.
+
+-------------------------------------------------------------------------------
+
+.. [1] VFIO was originally an acronym for "Virtual Function I/O" in its
+   initial implementation by Tom Lyon while as Cisco.  We've since
+   outgrown the acronym, but it's catchy.
+
+.. [2] "safe" also depends upon a device being "well behaved".  It's
+   possible for multi-function devices to have backdoors between
+   functions and even for single function devices to have alternative
+   access to things like PCI config space through MMIO registers.  To
+   guard against the former we can include additional precautions in the
+   IOMMU driver to group multi-function PCI devices together
+   (iommu=group_mf).  The latter we can't prevent, but the IOMMU should
+   still provide isolation.  For PCI, SR-IOV Virtual Functions are the
+   best indicator of "well behaved", as these are designed for
+   virtualization usage models.
+
+.. [3] As always there are trade-offs to virtual machine device
+   assignment that are beyond the scope of VFIO.  It's expected that
+   future IOMMU technologies will reduce some, but maybe not all, of
+   these trade-offs.
+
+.. [4] In this case the device is below a PCI bridge, so transactions
+   from either function of the device are indistinguishable to the iommu::
+
+	-[0000:00]-+-1e.0-[06]--+-0d.0
+				\-0d.1
+
+	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
diff --git a/Documentation/driver-api/xillybus.rst b/Documentation/driver-api/xillybus.rst
new file mode 100644
index 000000000000..2446ee303c09
--- /dev/null
+++ b/Documentation/driver-api/xillybus.rst
@@ -0,0 +1,379 @@
+==========================================
+Xillybus driver for generic FPGA interface
+==========================================
+
+:Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com)
+:Email:  eli.billauer@gmail.com or as advertised on Xillybus' site.
+
+.. Contents:
+
+ - Introduction
+  -- Background
+  -- Xillybus Overview
+
+ - Usage
+  -- User interface
+  -- Synchronization
+  -- Seekable pipes
+
+ - Internals
+  -- Source code organization
+  -- Pipe attributes
+  -- Host never reads from the FPGA
+  -- Channels, pipes, and the message channel
+  -- Data streaming
+  -- Data granularity
+  -- Probing
+  -- Buffer allocation
+  -- The "nonempty" message (supporting poll)
+
+
+Introduction
+============
+
+Background
+----------
+
+An FPGA (Field Programmable Gate Array) is a piece of logic hardware, which
+can be programmed to become virtually anything that is usually found as a
+dedicated chipset: For instance, a display adapter, network interface card,
+or even a processor with its peripherals. FPGAs are the LEGO of hardware:
+Based upon certain building blocks, you make your own toys the way you like
+them. It's usually pointless to reimplement something that is already
+available on the market as a chipset, so FPGAs are mostly used when some
+special functionality is needed, and the production volume is relatively low
+(hence not justifying the development of an ASIC).
+
+The challenge with FPGAs is that everything is implemented at a very low
+level, even lower than assembly language. In order to allow FPGA designers to
+focus on their specific project, and not reinvent the wheel over and over
+again, pre-designed building blocks, IP cores, are often used. These are the
+FPGA parallels of library functions. IP cores may implement certain
+mathematical functions, a functional unit (e.g. a USB interface), an entire
+processor (e.g. ARM) or anything that might come handy. Think of them as a
+building block, with electrical wires dangling on the sides for connection to
+other blocks.
+
+One of the daunting tasks in FPGA design is communicating with a fullblown
+operating system (actually, with the processor running it): Implementing the
+low-level bus protocol and the somewhat higher-level interface with the host
+(registers, interrupts, DMA etc.) is a project in itself. When the FPGA's
+function is a well-known one (e.g. a video adapter card, or a NIC), it can
+make sense to design the FPGA's interface logic specifically for the project.
+A special driver is then written to present the FPGA as a well-known interface
+to the kernel and/or user space. In that case, there is no reason to treat the
+FPGA differently than any device on the bus.
+
+It's however common that the desired data communication doesn't fit any well-
+known peripheral function. Also, the effort of designing an elegant
+abstraction for the data exchange is often considered too big. In those cases,
+a quicker and possibly less elegant solution is sought: The driver is
+effectively written as a user space program, leaving the kernel space part
+with just elementary data transport. This still requires designing some
+interface logic for the FPGA, and write a simple ad-hoc driver for the kernel.
+
+Xillybus Overview
+-----------------
+
+Xillybus is an IP core and a Linux driver. Together, they form a kit for
+elementary data transport between an FPGA and the host, providing pipe-like
+data streams with a straightforward user interface. It's intended as a low-
+effort solution for mixed FPGA-host projects, for which it makes sense to
+have the project-specific part of the driver running in a user-space program.
+
+Since the communication requirements may vary significantly from one FPGA
+project to another (the number of data pipes needed in each direction and
+their attributes), there isn't one specific chunk of logic being the Xillybus
+IP core. Rather, the IP core is configured and built based upon a
+specification given by its end user.
+
+Xillybus presents independent data streams, which resemble pipes or TCP/IP
+communication to the user. At the host side, a character device file is used
+just like any pipe file. On the FPGA side, hardware FIFOs are used to stream
+the data. This is contrary to a common method of communicating through fixed-
+sized buffers (even though such buffers are used by Xillybus under the hood).
+There may be more than a hundred of these streams on a single IP core, but
+also no more than one, depending on the configuration.
+
+In order to ease the deployment of the Xillybus IP core, it contains a simple
+data structure which completely defines the core's configuration. The Linux
+driver fetches this data structure during its initialization process, and sets
+up the DMA buffers and character devices accordingly. As a result, a single
+driver is used to work out of the box with any Xillybus IP core.
+
+The data structure just mentioned should not be confused with PCI's
+configuration space or the Flattened Device Tree.
+
+Usage
+=====
+
+User interface
+--------------
+
+On the host, all interface with Xillybus is done through /dev/xillybus_*
+device files, which are generated automatically as the drivers loads. The
+names of these files depend on the IP core that is loaded in the FPGA (see
+Probing below). To communicate with the FPGA, open the device file that
+corresponds to the hardware FIFO you want to send data or receive data from,
+and use plain write() or read() calls, just like with a regular pipe. In
+particular, it makes perfect sense to go::
+
+	$ cat mydata > /dev/xillybus_thisfifo
+
+	$ cat /dev/xillybus_thatfifo > hisdata
+
+possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have
+the capability to send an EOF (but may not use it).
+
+The driver and hardware are designed to behave sensibly as pipes, including:
+
+* Supporting non-blocking I/O (by setting O_NONBLOCK on open() ).
+
+* Supporting poll() and select().
+
+* Being bandwidth efficient under load (using DMA) but also handle small
+  pieces of data sent across (like TCP/IP) by autoflushing.
+
+A device file can be read only, write only or bidirectional. Bidirectional
+device files are treated like two independent pipes (except for sharing a
+"channel" structure in the implementation code).
+
+Synchronization
+---------------
+
+Xillybus pipes are configured (on the IP core) to be either synchronous or
+asynchronous. For a synchronous pipe, write() returns successfully only after
+some data has been submitted and acknowledged by the FPGA. This slows down
+bulk data transfers, and is nearly impossible for use with streams that
+require data at a constant rate: There is no data transmitted to the FPGA
+between write() calls, in particular when the process loses the CPU.
+
+When a pipe is configured asynchronous, write() returns if there was enough
+room in the buffers to store any of the data in the buffers.
+
+For FPGA to host pipes, asynchronous pipes allow data transfer from the FPGA
+as soon as the respective device file is opened, regardless of if the data
+has been requested by a read() call. On synchronous pipes, only the amount
+of data requested by a read() call is transmitted.
+
+In summary, for synchronous pipes, data between the host and FPGA is
+transmitted only to satisfy the read() or write() call currently handled
+by the driver, and those calls wait for the transmission to complete before
+returning.
+
+Note that the synchronization attribute has nothing to do with the possibility
+that read() or write() completes less bytes than requested. There is a
+separate configuration flag ("allowpartial") that determines whether such a
+partial completion is allowed.
+
+Seekable pipes
+--------------
+
+A synchronous pipe can be configured to have the stream's position exposed
+to the user logic at the FPGA. Such a pipe is also seekable on the host API.
+With this feature, a memory or register interface can be attached on the
+FPGA side to the seekable stream. Reading or writing to a certain address in
+the attached memory is done by seeking to the desired address, and calling
+read() or write() as required.
+
+
+Internals
+=========
+
+Source code organization
+------------------------
+
+The Xillybus driver consists of a core module, xillybus_core.c, and modules
+that depend on the specific bus interface (xillybus_of.c and xillybus_pcie.c).
+
+The bus specific modules are those probed when a suitable device is found by
+the kernel. Since the DMA mapping and synchronization functions, which are bus
+dependent by their nature, are used by the core module, a
+xilly_endpoint_hardware structure is passed to the core module on
+initialization. This structure is populated with pointers to wrapper functions
+which execute the DMA-related operations on the bus.
+
+Pipe attributes
+---------------
+
+Each pipe has a number of attributes which are set when the FPGA component
+(IP core) is built. They are fetched from the IDT (the data structure which
+defines the core's configuration, see Probing below) by xilly_setupchannels()
+in xillybus_core.c as follows:
+
+* is_writebuf: The pipe's direction. A non-zero value means it's an FPGA to
+  host pipe (the FPGA "writes").
+
+* channelnum: The pipe's identification number in communication between the
+  host and FPGA.
+
+* format: The underlying data width. See Data Granularity below.
+
+* allowpartial: A non-zero value means that a read() or write() (whichever
+  applies) may return with less than the requested number of bytes. The common
+  choice is a non-zero value, to match standard UNIX behavior.
+
+* synchronous: A non-zero value means that the pipe is synchronous. See
+  Synchronization above.
+
+* bufsize: Each DMA buffer's size. Always a power of two.
+
+* bufnum: The number of buffers allocated for this pipe. Always a power of two.
+
+* exclusive_open: A non-zero value forces exclusive opening of the associated
+  device file. If the device file is bidirectional, and already opened only in
+  one direction, the opposite direction may be opened once.
+
+* seekable: A non-zero value indicates that the pipe is seekable. See
+  Seekable pipes above.
+
+* supports_nonempty: A non-zero value (which is typical) indicates that the
+  hardware will send the messages that are necessary to support select() and
+  poll() for this pipe.
+
+Host never reads from the FPGA
+------------------------------
+
+Even though PCI Express is hotpluggable in general, a typical motherboard
+doesn't expect a card to go away all of the sudden. But since the PCIe card
+is based upon reprogrammable logic, a sudden disappearance from the bus is
+quite likely as a result of an accidental reprogramming of the FPGA while the
+host is up. In practice, nothing happens immediately in such a situation. But
+if the host attempts to read from an address that is mapped to the PCI Express
+device, that leads to an immediate freeze of the system on some motherboards,
+even though the PCIe standard requires a graceful recovery.
+
+In order to avoid these freezes, the Xillybus driver refrains completely from
+reading from the device's register space. All communication from the FPGA to
+the host is done through DMA. In particular, the Interrupt Service Routine
+doesn't follow the common practice of checking a status register when it's
+invoked. Rather, the FPGA prepares a small buffer which contains short
+messages, which inform the host what the interrupt was about.
+
+This mechanism is used on non-PCIe buses as well for the sake of uniformity.
+
+
+Channels, pipes, and the message channel
+----------------------------------------
+
+Each of the (possibly bidirectional) pipes presented to the user is allocated
+a data channel between the FPGA and the host. The distinction between channels
+and pipes is necessary only because of channel 0, which is used for interrupt-
+related messages from the FPGA, and has no pipe attached to it.
+
+Data streaming
+--------------
+
+Even though a non-segmented data stream is presented to the user at both
+sides, the implementation relies on a set of DMA buffers which is allocated
+for each channel. For the sake of illustration, let's take the FPGA to host
+direction: As data streams into the respective channel's interface in the
+FPGA, the Xillybus IP core writes it to one of the DMA buffers. When the
+buffer is full, the FPGA informs the host about that (appending a
+XILLYMSG_OPCODE_RELEASEBUF message channel 0 and sending an interrupt if
+necessary). The host responds by making the data available for reading through
+the character device. When all data has been read, the host writes on the
+the FPGA's buffer control register, allowing the buffer's overwriting. Flow
+control mechanisms exist on both sides to prevent underflows and overflows.
+
+This is not good enough for creating a TCP/IP-like stream: If the data flow
+stops momentarily before a DMA buffer is filled, the intuitive expectation is
+that the partial data in buffer will arrive anyhow, despite the buffer not
+being completed. This is implemented by adding a field in the
+XILLYMSG_OPCODE_RELEASEBUF message, through which the FPGA informs not just
+which buffer is submitted, but how much data it contains.
+
+But the FPGA will submit a partially filled buffer only if directed to do so
+by the host. This situation occurs when the read() method has been blocking
+for XILLY_RX_TIMEOUT jiffies (currently 10 ms), after which the host commands
+the FPGA to submit a DMA buffer as soon as it can. This timeout mechanism
+balances between bus bandwidth efficiency (preventing a lot of partially
+filled buffers being sent) and a latency held fairly low for tails of data.
+
+A similar setting is used in the host to FPGA direction. The handling of
+partial DMA buffers is somewhat different, though. The user can tell the
+driver to submit all data it has in the buffers to the FPGA, by issuing a
+write() with the byte count set to zero. This is similar to a flush request,
+but it doesn't block. There is also an autoflushing mechanism, which triggers
+an equivalent flush roughly XILLY_RX_TIMEOUT jiffies after the last write().
+This allows the user to be oblivious about the underlying buffering mechanism
+and yet enjoy a stream-like interface.
+
+Note that the issue of partial buffer flushing is irrelevant for pipes having
+the "synchronous" attribute nonzero, since synchronous pipes don't allow data
+to lay around in the DMA buffers between read() and write() anyhow.
+
+Data granularity
+----------------
+
+The data arrives or is sent at the FPGA as 8, 16 or 32 bit wide words, as
+configured by the "format" attribute. Whenever possible, the driver attempts
+to hide this when the pipe is accessed differently from its natural alignment.
+For example, reading single bytes from a pipe with 32 bit granularity works
+with no issues. Writing single bytes to pipes with 16 or 32 bit granularity
+will also work, but the driver can't send partially completed words to the
+FPGA, so the transmission of up to one word may be held until it's fully
+occupied with user data.
+
+This somewhat complicates the handling of host to FPGA streams, because
+when a buffer is flushed, it may contain up to 3 bytes don't form a word in
+the FPGA, and hence can't be sent. To prevent loss of data, these leftover
+bytes need to be moved to the next buffer. The parts in xillybus_core.c
+that mention "leftovers" in some way are related to this complication.
+
+Probing
+-------
+
+As mentioned earlier, the number of pipes that are created when the driver
+loads and their attributes depend on the Xillybus IP core in the FPGA. During
+the driver's initialization, a blob containing configuration info, the
+Interface Description Table (IDT), is sent from the FPGA to the host. The
+bootstrap process is done in three phases:
+
+1. Acquire the length of the IDT, so a buffer can be allocated for it. This
+   is done by sending a quiesce command to the device, since the acknowledge
+   for this command contains the IDT's buffer length.
+
+2. Acquire the IDT itself.
+
+3. Create the interfaces according to the IDT.
+
+Buffer allocation
+-----------------
+
+In order to simplify the logic that prevents illegal boundary crossings of
+PCIe packets, the following rule applies: If a buffer is smaller than 4kB,
+it must not cross a 4kB boundary. Otherwise, it must be 4kB aligned. The
+xilly_setupchannels() functions allocates these buffers by requesting whole
+pages from the kernel, and diving them into DMA buffers as necessary. Since
+all buffers' sizes are powers of two, it's possible to pack any set of such
+buffers, with a maximal waste of one page of memory.
+
+All buffers are allocated when the driver is loaded. This is necessary,
+since large continuous physical memory segments are sometimes requested,
+which are more likely to be available when the system is freshly booted.
+
+The allocation of buffer memory takes place in the same order they appear in
+the IDT. The driver relies on a rule that the pipes are sorted with decreasing
+buffer size in the IDT. If a requested buffer is larger or equal to a page,
+the necessary number of pages is requested from the kernel, and these are
+used for this buffer. If the requested buffer is smaller than a page, one
+single page is requested from the kernel, and that page is partially used.
+Or, if there already is a partially used page at hand, the buffer is packed
+into that page. It can be shown that all pages requested from the kernel
+(except possibly for the last) are 100% utilized this way.
+
+The "nonempty" message (supporting poll)
+----------------------------------------
+
+In order to support the "poll" method (and hence select() ), there is a small
+catch regarding the FPGA to host direction: The FPGA may have filled a DMA
+buffer with some data, but not submitted that buffer. If the host waited for
+the buffer's submission by the FPGA, there would be a possibility that the
+FPGA side has sent data, but a select() call would still block, because the
+host has not received any notification about this. This is solved with
+XILLYMSG_OPCODE_NONEMPTY messages sent by the FPGA when a channel goes from
+completely empty to containing some data.
+
+These messages are used only to support poll() and select(). The IP core can
+be configured not to send them for a slight reduction of bandwidth.
diff --git a/Documentation/driver-api/zorro.rst b/Documentation/driver-api/zorro.rst
new file mode 100644
index 000000000000..664072b017e3
--- /dev/null
+++ b/Documentation/driver-api/zorro.rst
@@ -0,0 +1,104 @@
+========================================
+Writing Device Drivers for Zorro Devices
+========================================
+
+:Author: Written by Geert Uytterhoeven <geert@linux-m68k.org>
+:Last revised: September 5, 2003
+
+
+Introduction
+------------
+
+The Zorro bus is the bus used in the Amiga family of computers. Thanks to
+AutoConfig(tm), it's 100% Plug-and-Play.
+
+There are two types of Zorro buses, Zorro II and Zorro III:
+
+  - The Zorro II address space is 24-bit and lies within the first 16 MB of the
+    Amiga's address map.
+
+  - Zorro III is a 32-bit extension of Zorro II, which is backwards compatible
+    with Zorro II. The Zorro III address space lies outside the first 16 MB.
+
+
+Probing for Zorro Devices
+-------------------------
+
+Zorro devices are found by calling ``zorro_find_device()``, which returns a
+pointer to the ``next`` Zorro device with the specified Zorro ID. A probe loop
+for the board with Zorro ID ``ZORRO_PROD_xxx`` looks like::
+
+    struct zorro_dev *z = NULL;
+
+    while ((z = zorro_find_device(ZORRO_PROD_xxx, z))) {
+	if (!zorro_request_region(z->resource.start+MY_START, MY_SIZE,
+				  "My explanation"))
+	...
+    }
+
+``ZORRO_WILDCARD`` acts as a wildcard and finds any Zorro device. If your driver
+supports different types of boards, you can use a construct like::
+
+    struct zorro_dev *z = NULL;
+
+    while ((z = zorro_find_device(ZORRO_WILDCARD, z))) {
+	if (z->id != ZORRO_PROD_xxx1 && z->id != ZORRO_PROD_xxx2 && ...)
+	    continue;
+	if (!zorro_request_region(z->resource.start+MY_START, MY_SIZE,
+				  "My explanation"))
+	...
+    }
+
+
+Zorro Resources
+---------------
+
+Before you can access a Zorro device's registers, you have to make sure it's
+not yet in use. This is done using the I/O memory space resource management
+functions::
+
+    request_mem_region()
+    release_mem_region()
+
+Shortcuts to claim the whole device's address space are provided as well::
+
+    zorro_request_device
+    zorro_release_device
+
+
+Accessing the Zorro Address Space
+---------------------------------
+
+The address regions in the Zorro device resources are Zorro bus address
+regions. Due to the identity bus-physical address mapping on the Zorro bus,
+they are CPU physical addresses as well.
+
+The treatment of these regions depends on the type of Zorro space:
+
+  - Zorro II address space is always mapped and does not have to be mapped
+    explicitly using z_ioremap().
+    
+    Conversion from bus/physical Zorro II addresses to kernel virtual addresses
+    and vice versa is done using::
+
+	virt_addr = ZTWO_VADDR(bus_addr);
+	bus_addr = ZTWO_PADDR(virt_addr);
+
+  - Zorro III address space must be mapped explicitly using z_ioremap() first
+    before it can be accessed::
+ 
+	virt_addr = z_ioremap(bus_addr, size);
+	...
+	z_iounmap(virt_addr);
+
+
+References
+----------
+
+#. linux/include/linux/zorro.h
+#. linux/include/uapi/linux/zorro.h
+#. linux/include/uapi/linux/zorro_ids.h
+#. linux/arch/m68k/include/asm/zorro.h
+#. linux/drivers/zorro
+#. /proc/bus/zorro
+
diff --git a/Documentation/eisa.txt b/Documentation/eisa.txt
deleted file mode 100644
index c07565ba57da..000000000000
--- a/Documentation/eisa.txt
+++ /dev/null
@@ -1,230 +0,0 @@
-================
-EISA bus support
-================
-
-:Author: Marc Zyngier <maz@wild-wind.fr.eu.org>
-
-This document groups random notes about porting EISA drivers to the
-new EISA/sysfs API.
-
-Starting from version 2.5.59, the EISA bus is almost given the same
-status as other much more mainstream busses such as PCI or USB. This
-has been possible through sysfs, which defines a nice enough set of
-abstractions to manage busses, devices and drivers.
-
-Although the new API is quite simple to use, converting existing
-drivers to the new infrastructure is not an easy task (mostly because
-detection code is generally also used to probe ISA cards). Moreover,
-most EISA drivers are among the oldest Linux drivers so, as you can
-imagine, some dust has settled here over the years.
-
-The EISA infrastructure is made up of three parts:
-
-    - The bus code implements most of the generic code. It is shared
-      among all the architectures that the EISA code runs on. It
-      implements bus probing (detecting EISA cards available on the bus),
-      allocates I/O resources, allows fancy naming through sysfs, and
-      offers interfaces for driver to register.
-
-    - The bus root driver implements the glue between the bus hardware
-      and the generic bus code. It is responsible for discovering the
-      device implementing the bus, and setting it up to be latter probed
-      by the bus code. This can go from something as simple as reserving
-      an I/O region on x86, to the rather more complex, like the hppa
-      EISA code. This is the part to implement in order to have EISA
-      running on an "new" platform.
-
-    - The driver offers the bus a list of devices that it manages, and
-      implements the necessary callbacks to probe and release devices
-      whenever told to.
-
-Every function/structure below lives in <linux/eisa.h>, which depends
-heavily on <linux/device.h>.
-
-Bus root driver
-===============
-
-::
-
-	int eisa_root_register (struct eisa_root_device *root);
-
-The eisa_root_register function is used to declare a device as the
-root of an EISA bus. The eisa_root_device structure holds a reference
-to this device, as well as some parameters for probing purposes::
-
-	struct eisa_root_device {
-		struct device   *dev;	 /* Pointer to bridge device */
-		struct resource *res;
-		unsigned long    bus_base_addr;
-		int		 slots;  /* Max slot number */
-		int		 force_probe; /* Probe even when no slot 0 */
-		u64		 dma_mask; /* from bridge device */
-		int              bus_nr; /* Set by eisa_root_register */
-		struct resource  eisa_root_res;	/* ditto */
-	};
-
-============= ======================================================
-node          used for eisa_root_register internal purpose
-dev           pointer to the root device
-res           root device I/O resource
-bus_base_addr slot 0 address on this bus
-slots	      max slot number to probe
-force_probe   Probe even when slot 0 is empty (no EISA mainboard)
-dma_mask      Default DMA mask. Usually the bridge device dma_mask.
-bus_nr	      unique bus id, set by eisa_root_register
-============= ======================================================
-
-Driver
-======
-
-::
-
-	int eisa_driver_register (struct eisa_driver *edrv);
-	void eisa_driver_unregister (struct eisa_driver *edrv);
-
-Clear enough ?
-
-::
-
-	struct eisa_device_id {
-		char sig[EISA_SIG_LEN];
-		unsigned long driver_data;
-	};
-
-	struct eisa_driver {
-		const struct eisa_device_id *id_table;
-		struct device_driver         driver;
-	};
-
-=============== ====================================================
-id_table	an array of NULL terminated EISA id strings,
-		followed by an empty string. Each string can
-		optionally be paired with a driver-dependent value
-		(driver_data).
-
-driver		a generic driver, such as described in
-		Documentation/driver-api/driver-model/driver.rst. Only .name,
-		.probe and .remove members are mandatory.
-=============== ====================================================
-
-An example is the 3c59x driver::
-
-	static struct eisa_device_id vortex_eisa_ids[] = {
-		{ "TCM5920", EISA_3C592_OFFSET },
-		{ "TCM5970", EISA_3C597_OFFSET },
-		{ "" }
-	};
-
-	static struct eisa_driver vortex_eisa_driver = {
-		.id_table = vortex_eisa_ids,
-		.driver   = {
-			.name    = "3c59x",
-			.probe   = vortex_eisa_probe,
-			.remove  = vortex_eisa_remove
-		}
-	};
-
-Device
-======
-
-The sysfs framework calls .probe and .remove functions upon device
-discovery and removal (note that the .remove function is only called
-when driver is built as a module).
-
-Both functions are passed a pointer to a 'struct device', which is
-encapsulated in a 'struct eisa_device' described as follows::
-
-	struct eisa_device {
-		struct eisa_device_id id;
-		int                   slot;
-		int                   state;
-		unsigned long         base_addr;
-		struct resource       res[EISA_MAX_RESOURCES];
-		u64                   dma_mask;
-		struct device         dev; /* generic device */
-	};
-
-======== ============================================================
-id	 EISA id, as read from device. id.driver_data is set from the
-	 matching driver EISA id.
-slot	 slot number which the device was detected on
-state    set of flags indicating the state of the device. Current
-	 flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED.
-res	 set of four 256 bytes I/O regions allocated to this device
-dma_mask DMA mask set from the parent device.
-dev	 generic device (see Documentation/driver-api/driver-model/device.rst)
-======== ============================================================
-
-You can get the 'struct eisa_device' from 'struct device' using the
-'to_eisa_device' macro.
-
-Misc stuff
-==========
-
-::
-
-	void eisa_set_drvdata (struct eisa_device *edev, void *data);
-
-Stores data into the device's driver_data area.
-
-::
-
-	void *eisa_get_drvdata (struct eisa_device *edev):
-
-Gets the pointer previously stored into the device's driver_data area.
-
-::
-
-	int eisa_get_region_index (void *addr);
-
-Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given
-address.
-
-Kernel parameters
-=================
-
-eisa_bus.enable_dev
-	A comma-separated list of slots to be enabled, even if the firmware
-	set the card as disabled. The driver must be able to properly
-	initialize the device in such conditions.
-
-eisa_bus.disable_dev
-	A comma-separated list of slots to be enabled, even if the firmware
-	set the card as enabled. The driver won't be called to handle this
-	device.
-
-virtual_root.force_probe
-	Force the probing code to probe EISA slots even when it cannot find an
-	EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
-	(don't force), and set to 1 (force probing) when either
-	CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
-
-Random notes
-============
-
-Converting an EISA driver to the new API mostly involves *deleting*
-code (since probing is now in the core EISA code). Unfortunately, most
-drivers share their probing routine between ISA, and EISA. Special
-care must be taken when ripping out the EISA code, so other busses
-won't suffer from these surgical strikes...
-
-You *must not* expect any EISA device to be detected when returning
-from eisa_driver_register, since the chances are that the bus has not
-yet been probed. In fact, that's what happens most of the time (the
-bus root driver usually kicks in rather late in the boot process).
-Unfortunately, most drivers are doing the probing by themselves, and
-expect to have explored the whole machine when they exit their probe
-routine.
-
-For example, switching your favorite EISA SCSI card to the "hotplug"
-model is "the right thing"(tm).
-
-Thanks
-======
-
-I'd like to thank the following people for their help:
-
-- Xavier Benigni for lending me a wonderful Alpha Jensen,
-- James Bottomley, Jeff Garzik for getting this stuff into the kernel,
-- Andries Brouwer for contributing numerous EISA ids,
-- Catrin Jones for coping with far too many machines at home.
diff --git a/Documentation/fb/fbcon.rst b/Documentation/fb/fbcon.rst
index 26bc5cdaabab..ebca41785abe 100644
--- a/Documentation/fb/fbcon.rst
+++ b/Documentation/fb/fbcon.rst
@@ -187,7 +187,7 @@ the hardware. Thus, in a VGA console::
 Assuming the VGA driver can be unloaded, one must first unbind the VGA driver
 from the console layer before unloading the driver.  The VGA driver cannot be
 unloaded if it is still bound to the console layer. (See
-Documentation/console/console.rst for more information).
+Documentation/driver-api/console.rst for more information).
 
 This is more complicated in the case of the framebuffer console (fbcon),
 because fbcon is an intermediate layer between the console and the drivers::
@@ -204,7 +204,7 @@ fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from
 fbcon.
 
 So, how do we unbind fbcon from the console? Part of the answer is in
-Documentation/console/console.rst. To summarize:
+Documentation/driver-api/console.rst. To summarize:
 
 Echo a value to the bind file that represents the framebuffer console
 driver. So assuming vtcon1 represents fbcon, then::
diff --git a/Documentation/isa.txt b/Documentation/isa.txt
deleted file mode 100644
index def4a7b690b5..000000000000
--- a/Documentation/isa.txt
+++ /dev/null
@@ -1,122 +0,0 @@
-===========
-ISA Drivers
-===========
-
-The following text is adapted from the commit message of the initial
-commit of the ISA bus driver authored by Rene Herman.
-
-During the recent "isa drivers using platform devices" discussion it was
-pointed out that (ALSA) ISA drivers ran into the problem of not having
-the option to fail driver load (device registration rather) upon not
-finding their hardware due to a probe() error not being passed up
-through the driver model. In the course of that, I suggested a separate
-ISA bus might be best; Russell King agreed and suggested this bus could
-use the .match() method for the actual device discovery.
-
-The attached does this. For this old non (generically) discoverable ISA
-hardware only the driver itself can do discovery so as a difference with
-the platform_bus, this isa_bus also distributes match() up to the
-driver.
-
-As another difference: these devices only exist in the driver model due
-to the driver creating them because it might want to drive them, meaning
-that all device creation has been made internal as well.
-
-The usage model this provides is nice, and has been acked from the ALSA
-side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's
-now (for oldisa-only drivers) become::
-
-	static int __init alsa_card_foo_init(void)
-	{
-		return isa_register_driver(&snd_foo_isa_driver, SNDRV_CARDS);
-	}
-
-	static void __exit alsa_card_foo_exit(void)
-	{
-		isa_unregister_driver(&snd_foo_isa_driver);
-	}
-
-Quite like the other bus models therefore. This removes a lot of
-duplicated init code from the ALSA ISA drivers.
-
-The passed in isa_driver struct is the regular driver struct embedding a
-struct device_driver, the normal probe/remove/shutdown/suspend/resume
-callbacks, and as indicated that .match callback.
-
-The "SNDRV_CARDS" you see being passed in is a "unsigned int ndev"
-parameter, indicating how many devices to create and call our methods
-with.
-
-The platform_driver callbacks are called with a platform_device param;
-the isa_driver callbacks are being called with a ``struct device *dev,
-unsigned int id`` pair directly -- with the device creation completely
-internal to the bus it's much cleaner to not leak isa_dev's by passing
-them in at all. The id is the only thing we ever want other then the
-struct device anyways, and it makes for nicer code in the callbacks as
-well.
-
-With this additional .match() callback ISA drivers have all options. If
-ALSA would want to keep the old non-load behaviour, it could stick all
-of the old .probe in .match, which would only keep them registered after
-everything was found to be present and accounted for. If it wanted the
-behaviour of always loading as it inadvertently did for a bit after the
-changeover to platform devices, it could just not provide a .match() and
-do everything in .probe() as before.
-
-If it, as Takashi Iwai already suggested earlier as a way of following
-the model from saner buses more closely, wants to load when a later bind
-could conceivably succeed, it could use .match() for the prerequisites
-(such as checking the user wants the card enabled and that port/irq/dma
-values have been passed in) and .probe() for everything else. This is
-the nicest model.
-
-To the code...
-
-This exports only two functions; isa_{,un}register_driver().
-
-isa_register_driver() register's the struct device_driver, and then
-loops over the passed in ndev creating devices and registering them.
-This causes the bus match method to be called for them, which is::
-
-	int isa_bus_match(struct device *dev, struct device_driver *driver)
-	{
-		struct isa_driver *isa_driver = to_isa_driver(driver);
-
-		if (dev->platform_data == isa_driver) {
-			if (!isa_driver->match ||
-				isa_driver->match(dev, to_isa_dev(dev)->id))
-				return 1;
-			dev->platform_data = NULL;
-		}
-		return 0;
-	}
-
-The first thing this does is check if this device is in fact one of this
-driver's devices by seeing if the device's platform_data pointer is set
-to this driver. Platform devices compare strings, but we don't need to
-do that with everything being internal, so isa_register_driver() abuses
-dev->platform_data as a isa_driver pointer which we can then check here.
-I believe platform_data is available for this, but if rather not, moving
-the isa_driver pointer to the private struct isa_dev is ofcourse fine as
-well.
-
-Then, if the the driver did not provide a .match, it matches. If it did,
-the driver match() method is called to determine a match.
-
-If it did **not** match, dev->platform_data is reset to indicate this to
-isa_register_driver which can then unregister the device again.
-
-If during all this, there's any error, or no devices matched at all
-everything is backed out again and the error, or -ENODEV, is returned.
-
-isa_unregister_driver() just unregisters the matched devices and the
-driver itself.
-
-module_isa_driver is a helper macro for ISA drivers which do not do
-anything special in module init/exit. This eliminates a lot of
-boilerplate code. Each module may only use this macro once, and calling
-it replaces module_init and module_exit.
-
-max_num_isa_dev is a macro to determine the maximum possible number of
-ISA devices which may be registered in the I/O port address space given
-the address extent of the ISA devices.
diff --git a/Documentation/isapnp.txt b/Documentation/isapnp.txt
deleted file mode 100644
index 8d0840ac847b..000000000000
--- a/Documentation/isapnp.txt
+++ /dev/null
@@ -1,15 +0,0 @@
-==========================================================
-ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
-==========================================================
-
-Interface /proc/isapnp
-======================
-
-The interface has been removed. See pnp.txt for more details.
-
-Interface /proc/bus/isapnp
-==========================
-
-This directory allows access to ISA PnP cards and logical devices.
-The regular files contain the contents of ISA PnP registers for
-a logical device.
diff --git a/Documentation/lightnvm/pblk.txt b/Documentation/lightnvm/pblk.txt
deleted file mode 100644
index 1040ed1cec81..000000000000
--- a/Documentation/lightnvm/pblk.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-pblk: Physical Block Device Target
-==================================
-
-pblk implements a fully associative, host-based FTL that exposes a traditional
-block I/O interface. Its primary responsibilities are:
-
-  - Map logical addresses onto physical addresses (4KB granularity) in a
-    logical-to-physical (L2P) table.
-  - Maintain the integrity and consistency of the L2P table as well as its
-    recovery from normal tear down and power outage.
-  - Deal with controller- and media-specific constrains.
-  - Handle I/O errors.
-  - Implement garbage collection.
-  - Maintain consistency across the I/O stack during synchronization points.
-
-For more information please refer to:
-
-  http://lightnvm.io
-
-which maintains updated FAQs, manual pages, technical documentation, tools,
-contacts, etc.
diff --git a/Documentation/men-chameleon-bus.txt b/Documentation/men-chameleon-bus.txt
deleted file mode 100644
index 1b1f048aa748..000000000000
--- a/Documentation/men-chameleon-bus.txt
+++ /dev/null
@@ -1,175 +0,0 @@
-=================
-MEN Chameleon Bus
-=================
-
-.. Table of Contents
-   =================
-   1 Introduction
-       1.1 Scope of this Document
-       1.2 Limitations of the current implementation
-   2 Architecture
-       2.1 MEN Chameleon Bus
-       2.2 Carrier Devices
-       2.3 Parser
-   3 Resource handling
-       3.1 Memory Resources
-       3.2 IRQs
-   4 Writing an MCB driver
-       4.1 The driver structure
-       4.2 Probing and attaching
-       4.3 Initializing the driver
-
-
-Introduction
-============
-
-This document describes the architecture and implementation of the MEN
-Chameleon Bus (called MCB throughout this document).
-
-Scope of this Document
-----------------------
-
-This document is intended to be a short overview of the current
-implementation and does by no means describe the complete possibilities of MCB
-based devices.
-
-Limitations of the current implementation
------------------------------------------
-
-The current implementation is limited to PCI and PCIe based carrier devices
-that only use a single memory resource and share the PCI legacy IRQ.  Not
-implemented are:
-
-- Multi-resource MCB devices like the VME Controller or M-Module carrier.
-- MCB devices that need another MCB device, like SRAM for a DMA Controller's
-  buffer descriptors or a video controller's video memory.
-- A per-carrier IRQ domain for carrier devices that have one (or more) IRQs
-  per MCB device like PCIe based carriers with MSI or MSI-X support.
-
-Architecture
-============
-
-MCB is divided into 3 functional blocks:
-
-- The MEN Chameleon Bus itself,
-- drivers for MCB Carrier Devices and
-- the parser for the Chameleon table.
-
-MEN Chameleon Bus
------------------
-
-The MEN Chameleon Bus is an artificial bus system that attaches to a so
-called Chameleon FPGA device found on some hardware produced my MEN Mikro
-Elektronik GmbH. These devices are multi-function devices implemented in a
-single FPGA and usually attached via some sort of PCI or PCIe link. Each
-FPGA contains a header section describing the content of the FPGA. The
-header lists the device id, PCI BAR, offset from the beginning of the PCI
-BAR, size in the FPGA, interrupt number and some other properties currently
-not handled by the MCB implementation.
-
-Carrier Devices
----------------
-
-A carrier device is just an abstraction for the real world physical bus the
-Chameleon FPGA is attached to. Some IP Core drivers may need to interact with
-properties of the carrier device (like querying the IRQ number of a PCI
-device). To provide abstraction from the real hardware bus, an MCB carrier
-device provides callback methods to translate the driver's MCB function calls
-to hardware related function calls. For example a carrier device may
-implement the get_irq() method which can be translated into a hardware bus
-query for the IRQ number the device should use.
-
-Parser
-------
-
-The parser reads the first 512 bytes of a Chameleon device and parses the
-Chameleon table. Currently the parser only supports the Chameleon v2 variant
-of the Chameleon table but can easily be adopted to support an older or
-possible future variant. While parsing the table's entries new MCB devices
-are allocated and their resources are assigned according to the resource
-assignment in the Chameleon table. After resource assignment is finished, the
-MCB devices are registered at the MCB and thus at the driver core of the
-Linux kernel.
-
-Resource handling
-=================
-
-The current implementation assigns exactly one memory and one IRQ resource
-per MCB device. But this is likely going to change in the future.
-
-Memory Resources
-----------------
-
-Each MCB device has exactly one memory resource, which can be requested from
-the MCB bus. This memory resource is the physical address of the MCB device
-inside the carrier and is intended to be passed to ioremap() and friends. It
-is already requested from the kernel by calling request_mem_region().
-
-IRQs
-----
-
-Each MCB device has exactly one IRQ resource, which can be requested from the
-MCB bus. If a carrier device driver implements the ->get_irq() callback
-method, the IRQ number assigned by the carrier device will be returned,
-otherwise the IRQ number inside the Chameleon table will be returned. This
-number is suitable to be passed to request_irq().
-
-Writing an MCB driver
-=====================
-
-The driver structure
---------------------
-
-Each MCB driver has a structure to identify the device driver as well as
-device ids which identify the IP Core inside the FPGA. The driver structure
-also contains callback methods which get executed on driver probe and
-removal from the system::
-
-	static const struct mcb_device_id foo_ids[] = {
-		{ .device = 0x123 },
-		{ }
-	};
-	MODULE_DEVICE_TABLE(mcb, foo_ids);
-
-	static struct mcb_driver foo_driver = {
-	driver = {
-		.name = "foo-bar",
-		.owner = THIS_MODULE,
-	},
-		.probe = foo_probe,
-		.remove = foo_remove,
-		.id_table = foo_ids,
-	};
-
-Probing and attaching
----------------------
-
-When a driver is loaded and the MCB devices it services are found, the MCB
-core will call the driver's probe callback method. When the driver is removed
-from the system, the MCB core will call the driver's remove callback method::
-
-	static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id);
-	static void foo_remove(struct mcb_device *mdev);
-
-Initializing the driver
------------------------
-
-When the kernel is booted or your foo driver module is inserted, you have to
-perform driver initialization. Usually it is enough to register your driver
-module at the MCB core::
-
-	static int __init foo_init(void)
-	{
-		return mcb_register_driver(&foo_driver);
-	}
-	module_init(foo_init);
-
-	static void __exit foo_exit(void)
-	{
-		mcb_unregister_driver(&foo_driver);
-	}
-	module_exit(foo_exit);
-
-The module_mcb_driver() macro can be used to reduce the above code::
-
-	module_mcb_driver(foo_driver);
diff --git a/Documentation/ntb.txt b/Documentation/ntb.txt
deleted file mode 100644
index 074a423c853c..000000000000
--- a/Documentation/ntb.txt
+++ /dev/null
@@ -1,236 +0,0 @@
-===========
-NTB Drivers
-===========
-
-NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
-the separate memory systems of two or more computers to the same PCI-Express
-fabric. Existing NTB hardware supports a common feature set: doorbell
-registers and memory translation windows, as well as non common features like
-scratchpad and message registers. Scratchpad registers are read-and-writable
-registers that are accessible from either side of the device, so that peers can
-exchange a small amount of information at a fixed address. Message registers can
-be utilized for the same purpose. Additionally they are provided with with
-special status bits to make sure the information isn't rewritten by another
-peer. Doorbell registers provide a way for peers to send interrupt events.
-Memory windows allow translated read and write access to the peer memory.
-
-NTB Core Driver (ntb)
-=====================
-
-The NTB core driver defines an api wrapping the common feature set, and allows
-clients interested in NTB features to discover NTB the devices supported by
-hardware drivers.  The term "client" is used here to mean an upper layer
-component making use of the NTB api.  The term "driver," or "hardware driver,"
-is used here to mean a driver for a specific vendor and model of NTB hardware.
-
-NTB Client Drivers
-==================
-
-NTB client drivers should register with the NTB core driver.  After
-registering, the client probe and remove functions will be called appropriately
-as ntb hardware, or hardware drivers, are inserted and removed.  The
-registration uses the Linux Device framework, so it should feel familiar to
-anyone who has written a pci driver.
-
-NTB Typical client driver implementation
-----------------------------------------
-
-Primary purpose of NTB is to share some peace of memory between at least two
-systems. So the NTB device features like Scratchpad/Message registers are
-mainly used to perform the proper memory window initialization. Typically
-there are two types of memory window interfaces supported by the NTB API:
-inbound translation configured on the local ntb port and outbound translation
-configured by the peer, on the peer ntb port. The first type is
-depicted on the next figure::
-
- Inbound translation:
-
- Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
-  ____________
- | dma-mapped |-ntb_mw_set_trans(addr)  |
- | memory     |        _v____________   |   ______________
- | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
- |------------|       |--------------|  |  |--------------|
-
-So typical scenario of the first type memory window initialization looks:
-1) allocate a memory region, 2) put translated address to NTB config,
-3) somehow notify a peer device of performed initialization, 4) peer device
-maps corresponding outbound memory window so to have access to the shared
-memory region.
-
-The second type of interface, that implies the shared windows being
-initialized by a peer device, is depicted on the figure::
-
- Outbound translation:
-
- Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
-  ____________                      ______________
- | dma-mapped |                |   | MW base addr |<== memory-mapped IO
- | memory     |                |   |--------------|
- | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
- |------------|                |   |--------------|
-
-Typical scenario of the second type interface initialization would be:
-1) allocate a memory region, 2) somehow deliver a translated address to a peer
-device, 3) peer puts the translated address to NTB config, 4) peer device maps
-outbound memory window so to have access to the shared memory region.
-
-As one can see the described scenarios can be combined in one portable
-algorithm.
-
- Local device:
-  1) Allocate memory for a shared window
-  2) Initialize memory window by translated address of the allocated region
-     (it may fail if local memory window initialization is unsupported)
-  3) Send the translated address and memory window index to a peer device
-
- Peer device:
-  1) Initialize memory window with retrieved address of the allocated
-     by another device memory region (it may fail if peer memory window
-     initialization is unsupported)
-  2) Map outbound memory window
-
-In accordance with this scenario, the NTB Memory Window API can be used as
-follows:
-
- Local device:
-  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
-     be allocated for memory windows between local device and peer device
-     of port with specified index.
-  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
-     shared memory region alignment and size. Then memory can be properly
-     allocated.
-  3) Allocate physically contiguous memory region in compliance with
-     restrictions retrieved in 2).
-  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
-     the memory window with specified index for the defined peer device
-     (it may fail if local translated address setting is not supported)
-  5) Send translated base address (usually together with memory window
-     number) to the peer device using, for instance, scratchpad or message
-     registers.
-
- Peer device:
-  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
-     device (related to pidx) translated address for specified memory
-     window. It may fail if retrieved address, for instance, exceeds
-     maximum possible address or isn't properly aligned.
-  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
-     window so to have an access to the shared memory.
-
-Also it is worth to note, that method ntb_mw_count(pidx) should return the
-same value as ntb_peer_mw_count() on the peer with port index - pidx.
-
-NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
-------------------------------------------------------------------
-
-The primary client for NTB is the Transport client, used in tandem with NTB
-Netdev.  These drivers function together to create a logical link to the peer,
-across the ntb, to exchange packets of network data.  The Transport client
-establishes a logical link to the peer, and creates queue pairs to exchange
-messages and data.  The NTB Netdev then creates an ethernet device using a
-Transport queue pair.  Network data is copied between socket buffers and the
-Transport queue pair buffer.  The Transport client may be used for other things
-besides Netdev, however no other applications have yet been written.
-
-NTB Ping Pong Test Client (ntb\_pingpong)
------------------------------------------
-
-The Ping Pong test client serves as a demonstration to exercise the doorbell
-and scratchpad registers of NTB hardware, and as an example simple NTB client.
-Ping Pong enables the link when started, waits for the NTB link to come up, and
-then proceeds to read and write the doorbell scratchpad registers of the NTB.
-The peers interrupt each other using a bit mask of doorbell bits, which is
-shifted by one in each round, to test the behavior of multiple doorbell bits
-and interrupt vectors.  The Ping Pong driver also reads the first local
-scratchpad, and writes the value plus one to the first peer scratchpad, each
-round before writing the peer doorbell register.
-
-Module Parameters:
-
-* unsafe - Some hardware has known issues with scratchpad and doorbell
-	registers.  By default, Ping Pong will not attempt to exercise such
-	hardware.  You may override this behavior at your own risk by setting
-	unsafe=1.
-* delay\_ms - Specify the delay between receiving a doorbell
-	interrupt event and setting the peer doorbell register for the next
-	round.
-* init\_db - Specify the doorbell bits to start new series of rounds.  A new
-	series begins once all the doorbell bits have been shifted out of
-	range.
-* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
-	then to observe debugging output on the console.
-
-NTB Tool Test Client (ntb\_tool)
---------------------------------
-
-The Tool test client serves for debugging, primarily, ntb hardware and drivers.
-The Tool provides access through debugfs for reading, setting, and clearing the
-NTB doorbell, and reading and writing scratchpads.
-
-The Tool does not currently have any module parameters.
-
-Debugfs Files:
-
-* *debugfs*/ntb\_tool/*hw*/
-	A directory in debugfs will be created for each
-	NTB device probed by the tool.  This directory is shortened to *hw*
-	below.
-* *hw*/db
-	This file is used to read, set, and clear the local doorbell.  Not
-	all operations may be supported by all hardware.  To read the doorbell,
-	read the file.  To set the doorbell, write `s` followed by the bits to
-	set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
-	followed by the bits to clear.
-* *hw*/mask
-	This file is used to read, set, and clear the local doorbell mask.
-	See *db* for details.
-* *hw*/peer\_db
-	This file is used to read, set, and clear the peer doorbell.
-	See *db* for details.
-* *hw*/peer\_mask
-	This file is used to read, set, and clear the peer doorbell
-	mask.  See *db* for details.
-* *hw*/spad
-	This file is used to read and write local scratchpads.  To read
-	the values of all scratchpads, read the file.  To write values, write a
-	series of pairs of scratchpad number and value
-	(eg: `echo '4 0x123 7 0xabc' > spad`
-	# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
-* *hw*/peer\_spad
-	This file is used to read and write peer scratchpads.  See
-	*spad* for details.
-
-NTB Hardware Drivers
-====================
-
-NTB hardware drivers should register devices with the NTB core driver.  After
-registering, clients probe and remove functions will be called.
-
-NTB Intel Hardware Driver (ntb\_hw\_intel)
-------------------------------------------
-
-The Intel hardware driver supports NTB on Xeon and Atom CPUs.
-
-Module Parameters:
-
-* b2b\_mw\_idx
-	If the peer ntb is to be accessed via a memory window, then use
-	this memory window to access the peer ntb.  A value of zero or positive
-	starts from the first mw idx, and a negative value starts from the last
-	mw idx.  Both sides MUST set the same value here!  The default value is
-	`-1`.
-* b2b\_mw\_share
-	If the peer ntb is to be accessed via a memory window, and if
-	the memory window is large enough, still allow the client to use the
-	second half of the memory window for address translation to the peer.
-* xeon\_b2b\_usd\_bar2\_addr64
-	If using B2B topology on Xeon hardware, use
-	this 64 bit address on the bus between the NTB devices for the window
-	at BAR2, on the upstream side of the link.
-* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
-* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
-* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
-* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
-* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
-* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
-* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
diff --git a/Documentation/nvmem/nvmem.rst b/Documentation/nvmem/nvmem.rst
deleted file mode 100644
index 3866b6e066d5..000000000000
--- a/Documentation/nvmem/nvmem.rst
+++ /dev/null
@@ -1,189 +0,0 @@
-:orphan:
-
-===============
-NVMEM Subsystem
-===============
-
- Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
-
-This document explains the NVMEM Framework along with the APIs provided,
-and how to use it.
-
-1. Introduction
-===============
-*NVMEM* is the abbreviation for Non Volatile Memory layer. It is used to
-retrieve configuration of SOC or Device specific data from non volatile
-memories like eeprom, efuses and so on.
-
-Before this framework existed, NVMEM drivers like eeprom were stored in
-drivers/misc, where they all had to duplicate pretty much the same code to
-register a sysfs file, allow in-kernel users to access the content of the
-devices they were driving, etc.
-
-This was also a problem as far as other in-kernel users were involved, since
-the solutions used were pretty much different from one driver to another, there
-was a rather big abstraction leak.
-
-This framework aims at solve these problems. It also introduces DT
-representation for consumer devices to go get the data they require (MAC
-Addresses, SoC/Revision ID, part numbers, and so on) from the NVMEMs. This
-framework is based on regmap, so that most of the abstraction available in
-regmap can be reused, across multiple types of buses.
-
-NVMEM Providers
-+++++++++++++++
-
-NVMEM provider refers to an entity that implements methods to initialize, read
-and write the non-volatile memory.
-
-2. Registering/Unregistering the NVMEM provider
-===============================================
-
-A NVMEM provider can register with NVMEM core by supplying relevant
-nvmem configuration to nvmem_register(), on success core would return a valid
-nvmem_device pointer.
-
-nvmem_unregister(nvmem) is used to unregister a previously registered provider.
-
-For example, a simple qfprom case::
-
-  static struct nvmem_config econfig = {
-	.name = "qfprom",
-	.owner = THIS_MODULE,
-  };
-
-  static int qfprom_probe(struct platform_device *pdev)
-  {
-	...
-	econfig.dev = &pdev->dev;
-	nvmem = nvmem_register(&econfig);
-	...
-  }
-
-It is mandatory that the NVMEM provider has a regmap associated with its
-struct device. Failure to do would return error code from nvmem_register().
-
-Users of board files can define and register nvmem cells using the
-nvmem_cell_table struct::
-
-  static struct nvmem_cell_info foo_nvmem_cells[] = {
-	{
-		.name		= "macaddr",
-		.offset		= 0x7f00,
-		.bytes		= ETH_ALEN,
-	}
-  };
-
-  static struct nvmem_cell_table foo_nvmem_cell_table = {
-	.nvmem_name		= "i2c-eeprom",
-	.cells			= foo_nvmem_cells,
-	.ncells			= ARRAY_SIZE(foo_nvmem_cells),
-  };
-
-  nvmem_add_cell_table(&foo_nvmem_cell_table);
-
-Additionally it is possible to create nvmem cell lookup entries and register
-them with the nvmem framework from machine code as shown in the example below::
-
-  static struct nvmem_cell_lookup foo_nvmem_lookup = {
-	.nvmem_name		= "i2c-eeprom",
-	.cell_name		= "macaddr",
-	.dev_id			= "foo_mac.0",
-	.con_id			= "mac-address",
-  };
-
-  nvmem_add_cell_lookups(&foo_nvmem_lookup, 1);
-
-NVMEM Consumers
-+++++++++++++++
-
-NVMEM consumers are the entities which make use of the NVMEM provider to
-read from and to NVMEM.
-
-3. NVMEM cell based consumer APIs
-=================================
-
-NVMEM cells are the data entries/fields in the NVMEM.
-The NVMEM framework provides 3 APIs to read/write NVMEM cells::
-
-  struct nvmem_cell *nvmem_cell_get(struct device *dev, const char *name);
-  struct nvmem_cell *devm_nvmem_cell_get(struct device *dev, const char *name);
-
-  void nvmem_cell_put(struct nvmem_cell *cell);
-  void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
-
-  void *nvmem_cell_read(struct nvmem_cell *cell, ssize_t *len);
-  int nvmem_cell_write(struct nvmem_cell *cell, void *buf, ssize_t len);
-
-`*nvmem_cell_get()` apis will get a reference to nvmem cell for a given id,
-and nvmem_cell_read/write() can then read or write to the cell.
-Once the usage of the cell is finished the consumer should call
-`*nvmem_cell_put()` to free all the allocation memory for the cell.
-
-4. Direct NVMEM device based consumer APIs
-==========================================
-
-In some instances it is necessary to directly read/write the NVMEM.
-To facilitate such consumers NVMEM framework provides below apis::
-
-  struct nvmem_device *nvmem_device_get(struct device *dev, const char *name);
-  struct nvmem_device *devm_nvmem_device_get(struct device *dev,
-					   const char *name);
-  void nvmem_device_put(struct nvmem_device *nvmem);
-  int nvmem_device_read(struct nvmem_device *nvmem, unsigned int offset,
-		      size_t bytes, void *buf);
-  int nvmem_device_write(struct nvmem_device *nvmem, unsigned int offset,
-		       size_t bytes, void *buf);
-  int nvmem_device_cell_read(struct nvmem_device *nvmem,
-			   struct nvmem_cell_info *info, void *buf);
-  int nvmem_device_cell_write(struct nvmem_device *nvmem,
-			    struct nvmem_cell_info *info, void *buf);
-
-Before the consumers can read/write NVMEM directly, it should get hold
-of nvmem_controller from one of the `*nvmem_device_get()` api.
-
-The difference between these apis and cell based apis is that these apis always
-take nvmem_device as parameter.
-
-5. Releasing a reference to the NVMEM
-=====================================
-
-When a consumer no longer needs the NVMEM, it has to release the reference
-to the NVMEM it has obtained using the APIs mentioned in the above section.
-The NVMEM framework provides 2 APIs to release a reference to the NVMEM::
-
-  void nvmem_cell_put(struct nvmem_cell *cell);
-  void devm_nvmem_cell_put(struct device *dev, struct nvmem_cell *cell);
-  void nvmem_device_put(struct nvmem_device *nvmem);
-  void devm_nvmem_device_put(struct device *dev, struct nvmem_device *nvmem);
-
-Both these APIs are used to release a reference to the NVMEM and
-devm_nvmem_cell_put and devm_nvmem_device_put destroys the devres associated
-with this NVMEM.
-
-Userspace
-+++++++++
-
-6. Userspace binary interface
-==============================
-
-Userspace can read/write the raw NVMEM file located at::
-
-	/sys/bus/nvmem/devices/*/nvmem
-
-ex::
-
-  hexdump /sys/bus/nvmem/devices/qfprom0/nvmem
-
-  0000000 0000 0000 0000 0000 0000 0000 0000 0000
-  *
-  00000a0 db10 2240 0000 e000 0c00 0c00 0000 0c00
-  0000000 0000 0000 0000 0000 0000 0000 0000 0000
-  ...
-  *
-  0001000
-
-7. DeviceTree Binding
-=====================
-
-See Documentation/devicetree/bindings/nvmem/nvmem.txt
diff --git a/Documentation/parport-lowlevel.txt b/Documentation/parport-lowlevel.txt
deleted file mode 100644
index 0633d70ffda7..000000000000
--- a/Documentation/parport-lowlevel.txt
+++ /dev/null
@@ -1,1832 +0,0 @@
-===============================
-PARPORT interface documentation
-===============================
-
-:Time-stamp: <2000-02-24 13:30:20 twaugh>
-
-Described here are the following functions:
-
-Global functions::
-  parport_register_driver
-  parport_unregister_driver
-  parport_enumerate
-  parport_register_device
-  parport_unregister_device
-  parport_claim
-  parport_claim_or_block
-  parport_release
-  parport_yield
-  parport_yield_blocking
-  parport_wait_peripheral
-  parport_poll_peripheral
-  parport_wait_event
-  parport_negotiate
-  parport_read
-  parport_write
-  parport_open
-  parport_close
-  parport_device_id
-  parport_device_coords
-  parport_find_class
-  parport_find_device
-  parport_set_timeout
-
-Port functions (can be overridden by low-level drivers):
-
-  SPP::
-    port->ops->read_data
-    port->ops->write_data
-    port->ops->read_status
-    port->ops->read_control
-    port->ops->write_control
-    port->ops->frob_control
-    port->ops->enable_irq
-    port->ops->disable_irq
-    port->ops->data_forward
-    port->ops->data_reverse
-
-  EPP::
-    port->ops->epp_write_data
-    port->ops->epp_read_data
-    port->ops->epp_write_addr
-    port->ops->epp_read_addr
-
-  ECP::
-    port->ops->ecp_write_data
-    port->ops->ecp_read_data
-    port->ops->ecp_write_addr
-
-  Other::
-    port->ops->nibble_read_data
-    port->ops->byte_read_data
-    port->ops->compat_write_data
-
-The parport subsystem comprises ``parport`` (the core port-sharing
-code), and a variety of low-level drivers that actually do the port
-accesses.  Each low-level driver handles a particular style of port
-(PC, Amiga, and so on).
-
-The parport interface to the device driver author can be broken down
-into global functions and port functions.
-
-The global functions are mostly for communicating between the device
-driver and the parport subsystem: acquiring a list of available ports,
-claiming a port for exclusive use, and so on.  They also include
-``generic`` functions for doing standard things that will work on any
-IEEE 1284-capable architecture.
-
-The port functions are provided by the low-level drivers, although the
-core parport module provides generic ``defaults`` for some routines.
-The port functions can be split into three groups: SPP, EPP, and ECP.
-
-SPP (Standard Parallel Port) functions modify so-called ``SPP``
-registers: data, status, and control.  The hardware may not actually
-have registers exactly like that, but the PC does and this interface is
-modelled after common PC implementations.  Other low-level drivers may
-be able to emulate most of the functionality.
-
-EPP (Enhanced Parallel Port) functions are provided for reading and
-writing in IEEE 1284 EPP mode, and ECP (Extended Capabilities Port)
-functions are used for IEEE 1284 ECP mode. (What about BECP? Does
-anyone care?)
-
-Hardware assistance for EPP and/or ECP transfers may or may not be
-available, and if it is available it may or may not be used.  If
-hardware is not used, the transfer will be software-driven.  In order
-to cope with peripherals that only tenuously support IEEE 1284, a
-low-level driver specific function is provided, for altering 'fudge
-factors'.
-
-Global functions
-================
-
-parport_register_driver - register a device driver with parport
----------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_driver {
-		const char *name;
-		void (*attach) (struct parport *);
-		void (*detach) (struct parport *);
-		struct parport_driver *next;
-	};
-	int parport_register_driver (struct parport_driver *driver);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-In order to be notified about parallel ports when they are detected,
-parport_register_driver should be called.  Your driver will
-immediately be notified of all ports that have already been detected,
-and of each new port as low-level drivers are loaded.
-
-A ``struct parport_driver`` contains the textual name of your driver,
-a pointer to a function to handle new ports, and a pointer to a
-function to handle ports going away due to a low-level driver
-unloading.  Ports will only be detached if they are not being used
-(i.e. there are no devices registered on them).
-
-The visible parts of the ``struct parport *`` argument given to
-attach/detach are::
-
-	struct parport
-	{
-		struct parport *next; /* next parport in list */
-		const char *name;     /* port's name */
-		unsigned int modes;   /* bitfield of hardware modes */
-		struct parport_device_info probe_info;
-				/* IEEE1284 info */
-		int number;           /* parport index */
-		struct parport_operations *ops;
-		...
-	};
-
-There are other members of the structure, but they should not be
-touched.
-
-The ``modes`` member summarises the capabilities of the underlying
-hardware.  It consists of flags which may be bitwise-ored together:
-
-  ============================= ===============================================
-  PARPORT_MODE_PCSPP		IBM PC registers are available,
-				i.e. functions that act on data,
-				control and status registers are
-				probably writing directly to the
-				hardware.
-  PARPORT_MODE_TRISTATE		The data drivers may be turned off.
-				This allows the data lines to be used
-				for reverse (peripheral to host)
-				transfers.
-  PARPORT_MODE_COMPAT		The hardware can assist with
-				compatibility-mode (printer)
-				transfers, i.e. compat_write_block.
-  PARPORT_MODE_EPP		The hardware can assist with EPP
-				transfers.
-  PARPORT_MODE_ECP		The hardware can assist with ECP
-				transfers.
-  PARPORT_MODE_DMA		The hardware can use DMA, so you might
-				want to pass ISA DMA-able memory
-				(i.e. memory allocated using the
-				GFP_DMA flag with kmalloc) to the
-				low-level driver in order to take
-				advantage of it.
-  ============================= ===============================================
-
-There may be other flags in ``modes`` as well.
-
-The contents of ``modes`` is advisory only.  For example, if the
-hardware is capable of DMA, and PARPORT_MODE_DMA is in ``modes``, it
-doesn't necessarily mean that DMA will always be used when possible.
-Similarly, hardware that is capable of assisting ECP transfers won't
-necessarily be used.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-Zero on success, otherwise an error code.
-
-ERRORS
-^^^^^^
-
-None. (Can it fail? Why return int?)
-
-EXAMPLE
-^^^^^^^
-
-::
-
-	static void lp_attach (struct parport *port)
-	{
-		...
-		private = kmalloc (...);
-		dev[count++] = parport_register_device (...);
-		...
-	}
-
-	static void lp_detach (struct parport *port)
-	{
-		...
-	}
-
-	static struct parport_driver lp_driver = {
-		"lp",
-		lp_attach,
-		lp_detach,
-		NULL /* always put NULL here */
-	};
-
-	int lp_init (void)
-	{
-		...
-		if (parport_register_driver (&lp_driver)) {
-			/* Failed; nothing we can do. */
-			return -EIO;
-		}
-		...
-	}
-
-
-SEE ALSO
-^^^^^^^^
-
-parport_unregister_driver, parport_register_device, parport_enumerate
-
-
-
-parport_unregister_driver - tell parport to forget about this driver
---------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_driver {
-		const char *name;
-		void (*attach) (struct parport *);
-		void (*detach) (struct parport *);
-		struct parport_driver *next;
-	};
-	void parport_unregister_driver (struct parport_driver *driver);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-This tells parport not to notify the device driver of new ports or of
-ports going away.  Registered devices belonging to that driver are NOT
-unregistered: parport_unregister_device must be used for each one.
-
-EXAMPLE
-^^^^^^^
-
-::
-
-	void cleanup_module (void)
-	{
-		...
-		/* Stop notifications. */
-		parport_unregister_driver (&lp_driver);
-
-		/* Unregister devices. */
-		for (i = 0; i < NUM_DEVS; i++)
-			parport_unregister_device (dev[i]);
-		...
-	}
-
-SEE ALSO
-^^^^^^^^
-
-parport_register_driver, parport_enumerate
-
-
-
-parport_enumerate - retrieve a list of parallel ports (DEPRECATED)
-------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport *parport_enumerate (void);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Retrieve the first of a list of valid parallel ports for this machine.
-Successive parallel ports can be found using the ``struct parport
-*next`` element of the ``struct parport *`` that is returned.  If ``next``
-is NULL, there are no more parallel ports in the list.  The number of
-ports in the list will not exceed PARPORT_MAX.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-A ``struct parport *`` describing a valid parallel port for the machine,
-or NULL if there are none.
-
-ERRORS
-^^^^^^
-
-This function can return NULL to indicate that there are no parallel
-ports to use.
-
-EXAMPLE
-^^^^^^^
-
-::
-
-	int detect_device (void)
-	{
-		struct parport *port;
-
-		for (port = parport_enumerate ();
-		port != NULL;
-		port = port->next) {
-			/* Try to detect a device on the port... */
-			...
-		}
-		}
-
-		...
-	}
-
-NOTES
-^^^^^
-
-parport_enumerate is deprecated; parport_register_driver should be
-used instead.
-
-SEE ALSO
-^^^^^^^^
-
-parport_register_driver, parport_unregister_driver
-
-
-
-parport_register_device - register to use a port
-------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	typedef int (*preempt_func) (void *handle);
-	typedef void (*wakeup_func) (void *handle);
-	typedef int (*irq_func) (int irq, void *handle, struct pt_regs *);
-
-	struct pardevice *parport_register_device(struct parport *port,
-						  const char *name,
-						  preempt_func preempt,
-						  wakeup_func wakeup,
-						  irq_func irq,
-						  int flags,
-						  void *handle);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Use this function to register your device driver on a parallel port
-(``port``).  Once you have done that, you will be able to use
-parport_claim and parport_release in order to use the port.
-
-The (``name``) argument is the name of the device that appears in /proc
-filesystem. The string must be valid for the whole lifetime of the
-device (until parport_unregister_device is called).
-
-This function will register three callbacks into your driver:
-``preempt``, ``wakeup`` and ``irq``.  Each of these may be NULL in order to
-indicate that you do not want a callback.
-
-When the ``preempt`` function is called, it is because another driver
-wishes to use the parallel port.  The ``preempt`` function should return
-non-zero if the parallel port cannot be released yet -- if zero is
-returned, the port is lost to another driver and the port must be
-re-claimed before use.
-
-The ``wakeup`` function is called once another driver has released the
-port and no other driver has yet claimed it.  You can claim the
-parallel port from within the ``wakeup`` function (in which case the
-claim is guaranteed to succeed), or choose not to if you don't need it
-now.
-
-If an interrupt occurs on the parallel port your driver has claimed,
-the ``irq`` function will be called. (Write something about shared
-interrupts here.)
-
-The ``handle`` is a pointer to driver-specific data, and is passed to
-the callback functions.
-
-``flags`` may be a bitwise combination of the following flags:
-
-  ===================== =================================================
-        Flag            Meaning
-  ===================== =================================================
-  PARPORT_DEV_EXCL	The device cannot share the parallel port at all.
-			Use this only when absolutely necessary.
-  ===================== =================================================
-
-The typedefs are not actually defined -- they are only shown in order
-to make the function prototype more readable.
-
-The visible parts of the returned ``struct pardevice`` are::
-
-	struct pardevice {
-		struct parport *port;	/* Associated port */
-		void *private;		/* Device driver's 'handle' */
-		...
-	};
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-A ``struct pardevice *``: a handle to the registered parallel port
-device that can be used for parport_claim, parport_release, etc.
-
-ERRORS
-^^^^^^
-
-A return value of NULL indicates that there was a problem registering
-a device on that port.
-
-EXAMPLE
-^^^^^^^
-
-::
-
-	static int preempt (void *handle)
-	{
-		if (busy_right_now)
-			return 1;
-
-		must_reclaim_port = 1;
-		return 0;
-	}
-
-	static void wakeup (void *handle)
-	{
-		struct toaster *private = handle;
-		struct pardevice *dev = private->dev;
-		if (!dev) return; /* avoid races */
-
-		if (want_port)
-			parport_claim (dev);
-	}
-
-	static int toaster_detect (struct toaster *private, struct parport *port)
-	{
-		private->dev = parport_register_device (port, "toaster", preempt,
-							wakeup, NULL, 0,
-							private);
-		if (!private->dev)
-			/* Couldn't register with parport. */
-			return -EIO;
-
-		must_reclaim_port = 0;
-		busy_right_now = 1;
-		parport_claim_or_block (private->dev);
-		...
-		/* Don't need the port while the toaster warms up. */
-		busy_right_now = 0;
-		...
-		busy_right_now = 1;
-		if (must_reclaim_port) {
-			parport_claim_or_block (private->dev);
-			must_reclaim_port = 0;
-		}
-		...
-	}
-
-SEE ALSO
-^^^^^^^^
-
-parport_unregister_device, parport_claim
-
-
-
-parport_unregister_device - finish using a port
------------------------------------------------
-
-SYNPOPSIS
-
-::
-
-	#include <linux/parport.h>
-
-	void parport_unregister_device (struct pardevice *dev);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-This function is the opposite of parport_register_device.  After using
-parport_unregister_device, ``dev`` is no longer a valid device handle.
-
-You should not unregister a device that is currently claimed, although
-if you do it will be released automatically.
-
-EXAMPLE
-^^^^^^^
-
-::
-
-	...
-	kfree (dev->private); /* before we lose the pointer */
-	parport_unregister_device (dev);
-	...
-
-SEE ALSO
-^^^^^^^^
-
-
-parport_unregister_driver
-
-parport_claim, parport_claim_or_block - claim the parallel port for a device
-----------------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_claim (struct pardevice *dev);
-	int parport_claim_or_block (struct pardevice *dev);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-These functions attempt to gain control of the parallel port on which
-``dev`` is registered.  ``parport_claim`` does not block, but
-``parport_claim_or_block`` may do. (Put something here about blocking
-interruptibly or non-interruptibly.)
-
-You should not try to claim a port that you have already claimed.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-A return value of zero indicates that the port was successfully
-claimed, and the caller now has possession of the parallel port.
-
-If ``parport_claim_or_block`` blocks before returning successfully, the
-return value is positive.
-
-ERRORS
-^^^^^^
-
-========== ==========================================================
-  -EAGAIN  The port is unavailable at the moment, but another attempt
-           to claim it may succeed.
-========== ==========================================================
-
-SEE ALSO
-^^^^^^^^
-
-
-parport_release
-
-parport_release - release the parallel port
--------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	void parport_release (struct pardevice *dev);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Once a parallel port device has been claimed, it can be released using
-``parport_release``.  It cannot fail, but you should not release a
-device that you do not have possession of.
-
-EXAMPLE
-^^^^^^^
-
-::
-
-	static size_t write (struct pardevice *dev, const void *buf,
-			size_t len)
-	{
-		...
-		written = dev->port->ops->write_ecp_data (dev->port, buf,
-							len);
-		parport_release (dev);
-		...
-	}
-
-
-SEE ALSO
-^^^^^^^^
-
-change_mode, parport_claim, parport_claim_or_block, parport_yield
-
-
-
-parport_yield, parport_yield_blocking - temporarily release a parallel port
----------------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_yield (struct pardevice *dev)
-	int parport_yield_blocking (struct pardevice *dev);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-When a driver has control of a parallel port, it may allow another
-driver to temporarily ``borrow`` it.  ``parport_yield`` does not block;
-``parport_yield_blocking`` may do.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-A return value of zero indicates that the caller still owns the port
-and the call did not block.
-
-A positive return value from ``parport_yield_blocking`` indicates that
-the caller still owns the port and the call blocked.
-
-A return value of -EAGAIN indicates that the caller no longer owns the
-port, and it must be re-claimed before use.
-
-ERRORS
-^^^^^^
-
-========= ==========================================================
-  -EAGAIN  Ownership of the parallel port was given away.
-========= ==========================================================
-
-SEE ALSO
-^^^^^^^^
-
-parport_release
-
-
-
-parport_wait_peripheral - wait for status lines, up to 35ms
------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_wait_peripheral (struct parport *port,
-				     unsigned char mask,
-				     unsigned char val);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Wait for the status lines in mask to match the values in val.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-======== ==========================================================
- -EINTR  a signal is pending
-      0  the status lines in mask have values in val
-      1  timed out while waiting (35ms elapsed)
-======== ==========================================================
-
-SEE ALSO
-^^^^^^^^
-
-parport_poll_peripheral
-
-
-
-parport_poll_peripheral - wait for status lines, in usec
---------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_poll_peripheral (struct parport *port,
-				     unsigned char mask,
-				     unsigned char val,
-				     int usec);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Wait for the status lines in mask to match the values in val.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-======== ==========================================================
- -EINTR  a signal is pending
-      0  the status lines in mask have values in val
-      1  timed out while waiting (usec microseconds have elapsed)
-======== ==========================================================
-
-SEE ALSO
-^^^^^^^^
-
-parport_wait_peripheral
-
-
-
-parport_wait_event - wait for an event on a port
-------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_wait_event (struct parport *port, signed long timeout)
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Wait for an event (e.g. interrupt) on a port.  The timeout is in
-jiffies.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-======= ==========================================================
-      0  success
-     <0  error (exit as soon as possible)
-     >0  timed out
-======= ==========================================================
-
-parport_negotiate - perform IEEE 1284 negotiation
--------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_negotiate (struct parport *, int mode);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Perform IEEE 1284 negotiation.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-======= ==========================================================
-     0  handshake OK; IEEE 1284 peripheral and mode available
-    -1  handshake failed; peripheral not compliant (or none present)
-     1  handshake OK; IEEE 1284 peripheral present but mode not
-        available
-======= ==========================================================
-
-SEE ALSO
-^^^^^^^^
-
-parport_read, parport_write
-
-
-
-parport_read - read data from device
-------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	ssize_t parport_read (struct parport *, void *buf, size_t len);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Read data from device in current IEEE 1284 transfer mode.  This only
-works for modes that support reverse data transfer.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-If negative, an error code; otherwise the number of bytes transferred.
-
-SEE ALSO
-^^^^^^^^
-
-parport_write, parport_negotiate
-
-
-
-parport_write - write data to device
-------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	ssize_t parport_write (struct parport *, const void *buf, size_t len);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Write data to device in current IEEE 1284 transfer mode.  This only
-works for modes that support forward data transfer.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-If negative, an error code; otherwise the number of bytes transferred.
-
-SEE ALSO
-^^^^^^^^
-
-parport_read, parport_negotiate
-
-
-
-parport_open - register device for particular device number
------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct pardevice *parport_open (int devnum, const char *name,
-				        int (*pf) (void *),
-					void (*kf) (void *),
-					void (*irqf) (int, void *,
-						      struct pt_regs *),
-					int flags, void *handle);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-This is like parport_register_device but takes a device number instead
-of a pointer to a struct parport.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-See parport_register_device.  If no device is associated with devnum,
-NULL is returned.
-
-SEE ALSO
-^^^^^^^^
-
-parport_register_device
-
-
-
-parport_close - unregister device for particular device number
---------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	void parport_close (struct pardevice *dev);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-This is the equivalent of parport_unregister_device for parport_open.
-
-SEE ALSO
-^^^^^^^^
-
-parport_unregister_device, parport_open
-
-
-
-parport_device_id - obtain IEEE 1284 Device ID
-----------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	ssize_t parport_device_id (int devnum, char *buffer, size_t len);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Obtains the IEEE 1284 Device ID associated with a given device.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-If negative, an error code; otherwise, the number of bytes of buffer
-that contain the device ID.  The format of the device ID is as
-follows::
-
-	[length][ID]
-
-The first two bytes indicate the inclusive length of the entire Device
-ID, and are in big-endian order.  The ID is a sequence of pairs of the
-form::
-
-	key:value;
-
-NOTES
-^^^^^
-
-Many devices have ill-formed IEEE 1284 Device IDs.
-
-SEE ALSO
-^^^^^^^^
-
-parport_find_class, parport_find_device
-
-
-
-parport_device_coords - convert device number to device coordinates
--------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_device_coords (int devnum, int *parport, int *mux,
-				   int *daisy);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Convert between device number (zero-based) and device coordinates
-(port, multiplexor, daisy chain address).
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-Zero on success, in which case the coordinates are (``*parport``, ``*mux``,
-``*daisy``).
-
-SEE ALSO
-^^^^^^^^
-
-parport_open, parport_device_id
-
-
-
-parport_find_class - find a device by its class
------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	typedef enum {
-		PARPORT_CLASS_LEGACY = 0,       /* Non-IEEE1284 device */
-		PARPORT_CLASS_PRINTER,
-		PARPORT_CLASS_MODEM,
-		PARPORT_CLASS_NET,
-		PARPORT_CLASS_HDC,              /* Hard disk controller */
-		PARPORT_CLASS_PCMCIA,
-		PARPORT_CLASS_MEDIA,            /* Multimedia device */
-		PARPORT_CLASS_FDC,              /* Floppy disk controller */
-		PARPORT_CLASS_PORTS,
-		PARPORT_CLASS_SCANNER,
-		PARPORT_CLASS_DIGCAM,
-		PARPORT_CLASS_OTHER,            /* Anything else */
-		PARPORT_CLASS_UNSPEC,           /* No CLS field in ID */
-		PARPORT_CLASS_SCSIADAPTER
-	} parport_device_class;
-
-	int parport_find_class (parport_device_class cls, int from);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Find a device by class.  The search starts from device number from+1.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The device number of the next device in that class, or -1 if no such
-device exists.
-
-NOTES
-^^^^^
-
-Example usage::
-
-	int devnum = -1;
-	while ((devnum = parport_find_class (PARPORT_CLASS_DIGCAM, devnum)) != -1) {
-		struct pardevice *dev = parport_open (devnum, ...);
-		...
-	}
-
-SEE ALSO
-^^^^^^^^
-
-parport_find_device, parport_open, parport_device_id
-
-
-
-parport_find_device - find a device by its class
-------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	int parport_find_device (const char *mfg, const char *mdl, int from);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Find a device by vendor and model.  The search starts from device
-number from+1.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The device number of the next device matching the specifications, or
--1 if no such device exists.
-
-NOTES
-^^^^^
-
-Example usage::
-
-	int devnum = -1;
-	while ((devnum = parport_find_device ("IOMEGA", "ZIP+", devnum)) != -1) {
-		struct pardevice *dev = parport_open (devnum, ...);
-		...
-	}
-
-SEE ALSO
-^^^^^^^^
-
-parport_find_class, parport_open, parport_device_id
-
-
-
-parport_set_timeout - set the inactivity timeout
-------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	long parport_set_timeout (struct pardevice *dev, long inactivity);
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Set the inactivity timeout, in jiffies, for a registered device.  The
-previous timeout is returned.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The previous timeout, in jiffies.
-
-NOTES
-^^^^^
-
-Some of the port->ops functions for a parport may take time, owing to
-delays at the peripheral.  After the peripheral has not responded for
-``inactivity`` jiffies, a timeout will occur and the blocking function
-will return.
-
-A timeout of 0 jiffies is a special case: the function must do as much
-as it can without blocking or leaving the hardware in an unknown
-state.  If port operations are performed from within an interrupt
-handler, for instance, a timeout of 0 jiffies should be used.
-
-Once set for a registered device, the timeout will remain at the set
-value until set again.
-
-SEE ALSO
-^^^^^^^^
-
-port->ops->xxx_read/write_yyy
-
-
-
-
-PORT FUNCTIONS
-==============
-
-The functions in the port->ops structure (struct parport_operations)
-are provided by the low-level driver responsible for that port.
-
-port->ops->read_data - read the data register
----------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		unsigned char (*read_data) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-If port->modes contains the PARPORT_MODE_TRISTATE flag and the
-PARPORT_CONTROL_DIRECTION bit in the control register is set, this
-returns the value on the data pins.  If port->modes contains the
-PARPORT_MODE_TRISTATE flag and the PARPORT_CONTROL_DIRECTION bit is
-not set, the return value _may_ be the last value written to the data
-register.  Otherwise the return value is undefined.
-
-SEE ALSO
-^^^^^^^^
-
-write_data, read_status, write_control
-
-
-
-port->ops->write_data - write the data register
------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		void (*write_data) (struct parport *port, unsigned char d);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes to the data register.  May have side-effects (a STROBE pulse,
-for instance).
-
-SEE ALSO
-^^^^^^^^
-
-read_data, read_status, write_control
-
-
-
-port->ops->read_status - read the status register
--------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		unsigned char (*read_status) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Reads from the status register.  This is a bitmask:
-
-- PARPORT_STATUS_ERROR (printer fault, "nFault")
-- PARPORT_STATUS_SELECT (on-line, "Select")
-- PARPORT_STATUS_PAPEROUT (no paper, "PError")
-- PARPORT_STATUS_ACK (handshake, "nAck")
-- PARPORT_STATUS_BUSY (busy, "Busy")
-
-There may be other bits set.
-
-SEE ALSO
-^^^^^^^^
-
-read_data, write_data, write_control
-
-
-
-port->ops->read_control - read the control register
----------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		unsigned char (*read_control) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Returns the last value written to the control register (either from
-write_control or frob_control).  No port access is performed.
-
-SEE ALSO
-^^^^^^^^
-
-read_data, write_data, read_status, write_control
-
-
-
-port->ops->write_control - write the control register
------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		void (*write_control) (struct parport *port, unsigned char s);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes to the control register. This is a bitmask::
-
-				  _______
-	- PARPORT_CONTROL_STROBE (nStrobe)
-				  _______
-	- PARPORT_CONTROL_AUTOFD (nAutoFd)
-				_____
-	- PARPORT_CONTROL_INIT (nInit)
-				  _________
-	- PARPORT_CONTROL_SELECT (nSelectIn)
-
-SEE ALSO
-^^^^^^^^
-
-read_data, write_data, read_status, frob_control
-
-
-
-port->ops->frob_control - write control register bits
------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		unsigned char (*frob_control) (struct parport *port,
-					unsigned char mask,
-					unsigned char val);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-This is equivalent to reading from the control register, masking out
-the bits in mask, exclusive-or'ing with the bits in val, and writing
-the result to the control register.
-
-As some ports don't allow reads from the control port, a software copy
-of its contents is maintained, so frob_control is in fact only one
-port access.
-
-SEE ALSO
-^^^^^^^^
-
-read_data, write_data, read_status, write_control
-
-
-
-port->ops->enable_irq - enable interrupt generation
----------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		void (*enable_irq) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-The parallel port hardware is instructed to generate interrupts at
-appropriate moments, although those moments are
-architecture-specific.  For the PC architecture, interrupts are
-commonly generated on the rising edge of nAck.
-
-SEE ALSO
-^^^^^^^^
-
-disable_irq
-
-
-
-port->ops->disable_irq - disable interrupt generation
------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		void (*disable_irq) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-The parallel port hardware is instructed not to generate interrupts.
-The interrupt itself is not masked.
-
-SEE ALSO
-^^^^^^^^
-
-enable_irq
-
-
-
-port->ops->data_forward - enable data drivers
----------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		void (*data_forward) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Enables the data line drivers, for 8-bit host-to-peripheral
-communications.
-
-SEE ALSO
-^^^^^^^^
-
-data_reverse
-
-
-
-port->ops->data_reverse - tristate the buffer
----------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		void (*data_reverse) (struct parport *port);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Places the data bus in a high impedance state, if port->modes has the
-PARPORT_MODE_TRISTATE bit set.
-
-SEE ALSO
-^^^^^^^^
-
-data_forward
-
-
-
-port->ops->epp_write_data - write EPP data
-------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*epp_write_data) (struct parport *port, const void *buf,
-					size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes data in EPP mode, and returns the number of bytes written.
-
-The ``flags`` parameter may be one or more of the following,
-bitwise-or'ed together:
-
-======================= =================================================
-PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
-			32-bit registers.  However, if a transfer
-			times out, the return value may be unreliable.
-======================= =================================================
-
-SEE ALSO
-^^^^^^^^
-
-epp_read_data, epp_write_addr, epp_read_addr
-
-
-
-port->ops->epp_read_data - read EPP data
-----------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*epp_read_data) (struct parport *port, void *buf,
-					size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Reads data in EPP mode, and returns the number of bytes read.
-
-The ``flags`` parameter may be one or more of the following,
-bitwise-or'ed together:
-
-======================= =================================================
-PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
-			32-bit registers.  However, if a transfer
-			times out, the return value may be unreliable.
-======================= =================================================
-
-SEE ALSO
-^^^^^^^^
-
-epp_write_data, epp_write_addr, epp_read_addr
-
-
-
-port->ops->epp_write_addr - write EPP address
----------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*epp_write_addr) (struct parport *port,
-					const void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes EPP addresses (8 bits each), and returns the number written.
-
-The ``flags`` parameter may be one or more of the following,
-bitwise-or'ed together:
-
-======================= =================================================
-PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
-			32-bit registers.  However, if a transfer
-			times out, the return value may be unreliable.
-======================= =================================================
-
-(Does PARPORT_EPP_FAST make sense for this function?)
-
-SEE ALSO
-^^^^^^^^
-
-epp_write_data, epp_read_data, epp_read_addr
-
-
-
-port->ops->epp_read_addr - read EPP address
--------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*epp_read_addr) (struct parport *port, void *buf,
-					size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Reads EPP addresses (8 bits each), and returns the number read.
-
-The ``flags`` parameter may be one or more of the following,
-bitwise-or'ed together:
-
-======================= =================================================
-PARPORT_EPP_FAST	Use fast transfers. Some chips provide 16-bit and
-			32-bit registers.  However, if a transfer
-			times out, the return value may be unreliable.
-======================= =================================================
-
-(Does PARPORT_EPP_FAST make sense for this function?)
-
-SEE ALSO
-^^^^^^^^
-
-epp_write_data, epp_read_data, epp_write_addr
-
-
-
-port->ops->ecp_write_data - write a block of ECP data
------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*ecp_write_data) (struct parport *port,
-					const void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes a block of ECP data.  The ``flags`` parameter is ignored.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The number of bytes written.
-
-SEE ALSO
-^^^^^^^^
-
-ecp_read_data, ecp_write_addr
-
-
-
-port->ops->ecp_read_data - read a block of ECP data
----------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*ecp_read_data) (struct parport *port,
-					void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Reads a block of ECP data.  The ``flags`` parameter is ignored.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The number of bytes read.  NB. There may be more unread data in a
-FIFO.  Is there a way of stunning the FIFO to prevent this?
-
-SEE ALSO
-^^^^^^^^
-
-ecp_write_block, ecp_write_addr
-
-
-
-port->ops->ecp_write_addr - write a block of ECP addresses
-----------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*ecp_write_addr) (struct parport *port,
-					const void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes a block of ECP addresses.  The ``flags`` parameter is ignored.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The number of bytes written.
-
-NOTES
-^^^^^
-
-This may use a FIFO, and if so shall not return until the FIFO is empty.
-
-SEE ALSO
-^^^^^^^^
-
-ecp_read_data, ecp_write_data
-
-
-
-port->ops->nibble_read_data - read a block of data in nibble mode
------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*nibble_read_data) (struct parport *port,
-					void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Reads a block of data in nibble mode.  The ``flags`` parameter is ignored.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The number of whole bytes read.
-
-SEE ALSO
-^^^^^^^^
-
-byte_read_data, compat_write_data
-
-
-
-port->ops->byte_read_data - read a block of data in byte mode
--------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*byte_read_data) (struct parport *port,
-					void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Reads a block of data in byte mode.  The ``flags`` parameter is ignored.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The number of bytes read.
-
-SEE ALSO
-^^^^^^^^
-
-nibble_read_data, compat_write_data
-
-
-
-port->ops->compat_write_data - write a block of data in compatibility mode
---------------------------------------------------------------------------
-
-SYNOPSIS
-^^^^^^^^
-
-::
-
-	#include <linux/parport.h>
-
-	struct parport_operations {
-		...
-		size_t (*compat_write_data) (struct parport *port,
-					const void *buf, size_t len, int flags);
-		...
-	};
-
-DESCRIPTION
-^^^^^^^^^^^
-
-Writes a block of data in compatibility mode.  The ``flags`` parameter
-is ignored.
-
-RETURN VALUE
-^^^^^^^^^^^^
-
-The number of bytes written.
-
-SEE ALSO
-^^^^^^^^
-
-nibble_read_data, byte_read_data
diff --git a/Documentation/pti/pti_intel_mid.rst b/Documentation/pti/pti_intel_mid.rst
deleted file mode 100644
index ea05725174cb..000000000000
--- a/Documentation/pti/pti_intel_mid.rst
+++ /dev/null
@@ -1,106 +0,0 @@
-:orphan:
-
-=============
-Intel MID PTI
-=============
-
-The Intel MID PTI project is HW implemented in Intel Atom
-system-on-a-chip designs based on the Parallel Trace
-Interface for MIPI P1149.7 cJTAG standard.  The kernel solution
-for this platform involves the following files::
-
-	./include/linux/pti.h
-	./drivers/.../n_tracesink.h
-	./drivers/.../n_tracerouter.c
-	./drivers/.../n_tracesink.c
-	./drivers/.../pti.c
-
-pti.c is the driver that enables various debugging features
-popular on platforms from certain mobile manufacturers.
-n_tracerouter.c and n_tracesink.c allow extra system information to
-be collected and routed to the pti driver, such as trace
-debugging data from a modem.  Although n_tracerouter
-and n_tracesink are a part of the complete PTI solution,
-these two line disciplines can work separately from
-pti.c and route any data stream from one /dev/tty node
-to another /dev/tty node via kernel-space.  This provides
-a stable, reliable connection that will not break unless
-the user-space application shuts down (plus avoids
-kernel->user->kernel context switch overheads of routing
-data).
-
-An example debugging usage for this driver system:
-
-  * Hook /dev/ttyPTI0 to syslogd.  Opening this port will also start
-    a console device to further capture debugging messages to PTI.
-  * Hook /dev/ttyPTI1 to modem debugging data to write to PTI HW.
-    This is where n_tracerouter and n_tracesink are used.
-  * Hook /dev/pti to a user-level debugging application for writing
-    to PTI HW.
-  * `Use mipi_` Kernel Driver API in other device drivers for
-    debugging to PTI by first requesting a PTI write address via
-    mipi_request_masterchannel(1).
-
-Below is example pseudo-code on how a 'privileged' application
-can hook up n_tracerouter and n_tracesink to any tty on
-a system.  'Privileged' means the application has enough
-privileges to successfully manipulate the ldisc drivers
-but is not just blindly executing as 'root'. Keep in mind
-the use of ioctl(,TIOCSETD,) is not specific to the n_tracerouter
-and n_tracesink line discpline drivers but is a generic
-operation for a program to use a line discpline driver
-on a tty port other than the default n_tty::
-
-  /////////// To hook up n_tracerouter and n_tracesink /////////
-
-  // Note that n_tracerouter depends on n_tracesink.
-  #include <errno.h>
-  #define ONE_TTY "/dev/ttyOne"
-  #define TWO_TTY "/dev/ttyTwo"
-
-  // needed global to hand onto ldisc connection
-  static int g_fd_source = -1;
-  static int g_fd_sink  = -1;
-
-  // these two vars used to grab LDISC values from loaded ldisc drivers
-  // in OS.  Look at /proc/tty/ldiscs to get the right numbers from
-  // the ldiscs loaded in the system.
-  int source_ldisc_num, sink_ldisc_num = -1;
-  int retval;
-
-  g_fd_source = open(ONE_TTY, O_RDWR); // must be R/W
-  g_fd_sink   = open(TWO_TTY, O_RDWR); // must be R/W
-
-  if (g_fd_source <= 0) || (g_fd_sink <= 0) {
-     // doubt you'll want to use these exact error lines of code
-     printf("Error on open(). errno: %d\n",errno);
-     return errno;
-  }
-
-  retval = ioctl(g_fd_sink, TIOCSETD, &sink_ldisc_num);
-  if (retval < 0) {
-     printf("Error on ioctl().  errno: %d\n", errno);
-     return errno;
-  }
-
-  retval = ioctl(g_fd_source, TIOCSETD, &source_ldisc_num);
-  if (retval < 0) {
-     printf("Error on ioctl().  errno: %d\n", errno);
-     return errno;
-  }
-
-  /////////// To disconnect n_tracerouter and n_tracesink ////////
-
-  // First make sure data through the ldiscs has stopped.
-
-  // Second, disconnect ldiscs.  This provides a
-  // little cleaner shutdown on tty stack.
-  sink_ldisc_num = 0;
-  source_ldisc_num = 0;
-  ioctl(g_fd_uart, TIOCSETD, &sink_ldisc_num);
-  ioctl(g_fd_gadget, TIOCSETD, &source_ldisc_num);
-
-  // Three, program closes connection, and cleanup:
-  close(g_fd_uart);
-  close(g_fd_gadget);
-  g_fd_uart = g_fd_gadget = NULL;
diff --git a/Documentation/pwm.txt b/Documentation/pwm.txt
deleted file mode 100644
index ab62f1bb0366..000000000000
--- a/Documentation/pwm.txt
+++ /dev/null
@@ -1,165 +0,0 @@
-======================================
-Pulse Width Modulation (PWM) interface
-======================================
-
-This provides an overview about the Linux PWM interface
-
-PWMs are commonly used for controlling LEDs, fans or vibrators in
-cell phones. PWMs with a fixed purpose have no need implementing
-the Linux PWM API (although they could). However, PWMs are often
-found as discrete devices on SoCs which have no fixed purpose. It's
-up to the board designer to connect them to LEDs or fans. To provide
-this kind of flexibility the generic PWM API exists.
-
-Identifying PWMs
-----------------
-
-Users of the legacy PWM API use unique IDs to refer to PWM devices.
-
-Instead of referring to a PWM device via its unique ID, board setup code
-should instead register a static mapping that can be used to match PWM
-consumers to providers, as given in the following example::
-
-	static struct pwm_lookup board_pwm_lookup[] = {
-		PWM_LOOKUP("tegra-pwm", 0, "pwm-backlight", NULL,
-			   50000, PWM_POLARITY_NORMAL),
-	};
-
-	static void __init board_init(void)
-	{
-		...
-		pwm_add_table(board_pwm_lookup, ARRAY_SIZE(board_pwm_lookup));
-		...
-	}
-
-Using PWMs
-----------
-
-Legacy users can request a PWM device using pwm_request() and free it
-after usage with pwm_free().
-
-New users should use the pwm_get() function and pass to it the consumer
-device or a consumer name. pwm_put() is used to free the PWM device. Managed
-variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist.
-
-After being requested, a PWM has to be configured using::
-
-	int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state);
-
-This API controls both the PWM period/duty_cycle config and the
-enable/disable state.
-
-The pwm_config(), pwm_enable() and pwm_disable() functions are just wrappers
-around pwm_apply_state() and should not be used if the user wants to change
-several parameter at once. For example, if you see pwm_config() and
-pwm_{enable,disable}() calls in the same function, this probably means you
-should switch to pwm_apply_state().
-
-The PWM user API also allows one to query the PWM state with pwm_get_state().
-
-In addition to the PWM state, the PWM API also exposes PWM arguments, which
-are the reference PWM config one should use on this PWM.
-PWM arguments are usually platform-specific and allows the PWM user to only
-care about dutycycle relatively to the full period (like, duty = 50% of the
-period). struct pwm_args contains 2 fields (period and polarity) and should
-be used to set the initial PWM config (usually done in the probe function
-of the PWM user). PWM arguments are retrieved with pwm_get_args().
-
-All consumers should really be reconfiguring the PWM upon resume as
-appropriate. This is the only way to ensure that everything is resumed in
-the proper order.
-
-Using PWMs with the sysfs interface
------------------------------------
-
-If CONFIG_SYSFS is enabled in your kernel configuration a simple sysfs
-interface is provided to use the PWMs from userspace. It is exposed at
-/sys/class/pwm/. Each probed PWM controller/chip will be exported as
-pwmchipN, where N is the base of the PWM chip. Inside the directory you
-will find:
-
-  npwm
-    The number of PWM channels this chip supports (read-only).
-
-  export
-    Exports a PWM channel for use with sysfs (write-only).
-
-  unexport
-   Unexports a PWM channel from sysfs (write-only).
-
-The PWM channels are numbered using a per-chip index from 0 to npwm-1.
-
-When a PWM channel is exported a pwmX directory will be created in the
-pwmchipN directory it is associated with, where X is the number of the
-channel that was exported. The following properties will then be available:
-
-  period
-    The total period of the PWM signal (read/write).
-    Value is in nanoseconds and is the sum of the active and inactive
-    time of the PWM.
-
-  duty_cycle
-    The active time of the PWM signal (read/write).
-    Value is in nanoseconds and must be less than the period.
-
-  polarity
-    Changes the polarity of the PWM signal (read/write).
-    Writes to this property only work if the PWM chip supports changing
-    the polarity. The polarity can only be changed if the PWM is not
-    enabled. Value is the string "normal" or "inversed".
-
-  enable
-    Enable/disable the PWM signal (read/write).
-
-	- 0 - disabled
-	- 1 - enabled
-
-Implementing a PWM driver
--------------------------
-
-Currently there are two ways to implement pwm drivers. Traditionally
-there only has been the barebone API meaning that each driver has
-to implement the pwm_*() functions itself. This means that it's impossible
-to have multiple PWM drivers in the system. For this reason it's mandatory
-for new drivers to use the generic PWM framework.
-
-A new PWM controller/chip can be added using pwmchip_add() and removed
-again with pwmchip_remove(). pwmchip_add() takes a filled in struct
-pwm_chip as argument which provides a description of the PWM chip, the
-number of PWM devices provided by the chip and the chip-specific
-implementation of the supported PWM operations to the framework.
-
-When implementing polarity support in a PWM driver, make sure to respect the
-signal conventions in the PWM framework. By definition, normal polarity
-characterizes a signal starts high for the duration of the duty cycle and
-goes low for the remainder of the period. Conversely, a signal with inversed
-polarity starts low for the duration of the duty cycle and goes high for the
-remainder of the period.
-
-Drivers are encouraged to implement ->apply() instead of the legacy
-->enable(), ->disable() and ->config() methods. Doing that should provide
-atomicity in the PWM config workflow, which is required when the PWM controls
-a critical device (like a regulator).
-
-The implementation of ->get_state() (a method used to retrieve initial PWM
-state) is also encouraged for the same reason: letting the PWM user know
-about the current PWM state would allow him to avoid glitches.
-
-Drivers should not implement any power management. In other words,
-consumers should implement it as described in the "Using PWMs" section.
-
-Locking
--------
-
-The PWM core list manipulations are protected by a mutex, so pwm_request()
-and pwm_free() may not be called from an atomic context. Currently the
-PWM core does not enforce any locking to pwm_enable(), pwm_disable() and
-pwm_config(), so the calling context is currently driver specific. This
-is an issue derived from the former barebone API and should be fixed soon.
-
-Helpers
--------
-
-Currently a PWM can only be configured with period_ns and duty_ns. For several
-use cases freq_hz and duty_percent might be better. Instead of calculating
-this in your driver please consider adding appropriate helpers to the framework.
diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
deleted file mode 100644
index 7d3684e81df6..000000000000
--- a/Documentation/rfkill.txt
+++ /dev/null
@@ -1,132 +0,0 @@
-===============================
-rfkill - RF kill switch support
-===============================
-
-
-.. contents::
-   :depth: 2
-
-Introduction
-============
-
-The rfkill subsystem provides a generic interface for disabling any radio
-transmitter in the system. When a transmitter is blocked, it shall not
-radiate any power.
-
-The subsystem also provides the ability to react on button presses and
-disable all transmitters of a certain type (or all). This is intended for
-situations where transmitters need to be turned off, for example on
-aircraft.
-
-The rfkill subsystem has a concept of "hard" and "soft" block, which
-differ little in their meaning (block == transmitters off) but rather in
-whether they can be changed or not:
-
- - hard block
-	read-only radio block that cannot be overridden by software
-
- - soft block
-	writable radio block (need not be readable) that is set by
-        the system software.
-
-The rfkill subsystem has two parameters, rfkill.default_state and
-rfkill.master_switch_mode, which are documented in
-admin-guide/kernel-parameters.rst.
-
-
-Implementation details
-======================
-
-The rfkill subsystem is composed of three main components:
-
- * the rfkill core,
- * the deprecated rfkill-input module (an input layer handler, being
-   replaced by userspace policy code) and
- * the rfkill drivers.
-
-The rfkill core provides API for kernel drivers to register their radio
-transmitter with the kernel, methods for turning it on and off, and letting
-the system know about hardware-disabled states that may be implemented on
-the device.
-
-The rfkill core code also notifies userspace of state changes, and provides
-ways for userspace to query the current states. See the "Userspace support"
-section below.
-
-When the device is hard-blocked (either by a call to rfkill_set_hw_state()
-or from query_hw_block), set_block() will be invoked for additional software
-block, but drivers can ignore the method call since they can use the return
-value of the function rfkill_set_hw_state() to sync the software state
-instead of keeping track of calls to set_block(). In fact, drivers should
-use the return value of rfkill_set_hw_state() unless the hardware actually
-keeps track of soft and hard block separately.
-
-
-Kernel API
-==========
-
-Drivers for radio transmitters normally implement an rfkill driver.
-
-Platform drivers might implement input devices if the rfkill button is just
-that, a button. If that button influences the hardware then you need to
-implement an rfkill driver instead. This also applies if the platform provides
-a way to turn on/off the transmitter(s).
-
-For some platforms, it is possible that the hardware state changes during
-suspend/hibernation, in which case it will be necessary to update the rfkill
-core with the current state at resume time.
-
-To create an rfkill driver, driver's Kconfig needs to have::
-
-	depends on RFKILL || !RFKILL
-
-to ensure the driver cannot be built-in when rfkill is modular. The !RFKILL
-case allows the driver to be built when rfkill is not configured, in which
-case all rfkill API can still be used but will be provided by static inlines
-which compile to almost nothing.
-
-Calling rfkill_set_hw_state() when a state change happens is required from
-rfkill drivers that control devices that can be hard-blocked unless they also
-assign the poll_hw_block() callback (then the rfkill core will poll the
-device). Don't do this unless you cannot get the event in any other way.
-
-rfkill provides per-switch LED triggers, which can be used to drive LEDs
-according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
-
-
-Userspace support
-=================
-
-The recommended userspace interface to use is /dev/rfkill, which is a misc
-character device that allows userspace to obtain and set the state of rfkill
-devices and sets of devices. It also notifies userspace about device addition
-and removal. The API is a simple read/write API that is defined in
-linux/rfkill.h, with one ioctl that allows turning off the deprecated input
-handler in the kernel for the transition period.
-
-Except for the one ioctl, communication with the kernel is done via read()
-and write() of instances of 'struct rfkill_event'. In this structure, the
-soft and hard block are properly separated (unlike sysfs, see below) and
-userspace is able to get a consistent snapshot of all rfkill devices in the
-system. Also, it is possible to switch all rfkill drivers (or all drivers of
-a specified type) into a state which also updates the default state for
-hotplugged devices.
-
-After an application opens /dev/rfkill, it can read the current state of all
-devices. Changes can be obtained by either polling the descriptor for
-hotplug or state change events or by listening for uevents emitted by the
-rfkill core framework.
-
-Additionally, each rfkill device is registered in sysfs and emits uevents.
-
-rfkill devices issue uevents (with an action of "change"), with the following
-environment variables set::
-
-	RFKILL_NAME
-	RFKILL_STATE
-	RFKILL_TYPE
-
-The content of these variables corresponds to the "name", "state" and
-"type" sysfs files explained above.
-
-For further details consult Documentation/ABI/stable/sysfs-class-rfkill.
diff --git a/Documentation/s390/vfio-ccw.rst b/Documentation/s390/vfio-ccw.rst
index 1f6d0b56d53e..1e210c6afa88 100644
--- a/Documentation/s390/vfio-ccw.rst
+++ b/Documentation/s390/vfio-ccw.rst
@@ -38,7 +38,7 @@ every detail. More information/reference could be found here:
   qemu/hw/s390x/css.c
 
 For vfio mediated device framework:
-- Documentation/vfio-mediated-device.txt
+- Documentation/driver-api/vfio-mediated-device.rst
 
 Motivation of vfio-ccw
 ----------------------
@@ -322,5 +322,5 @@ Reference
 2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
 3. https://en.wikipedia.org/wiki/Channel_I/O
 4. Documentation/s390/cds.rst
-5. Documentation/vfio.txt
-6. Documentation/vfio-mediated-device.txt
+5. Documentation/driver-api/vfio.rst
+6. Documentation/driver-api/vfio-mediated-device.rst
diff --git a/Documentation/sgi-ioc4.txt b/Documentation/sgi-ioc4.txt
deleted file mode 100644
index 72709222d3c0..000000000000
--- a/Documentation/sgi-ioc4.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-====================================
-SGI IOC4 PCI (multi function) device
-====================================
-
-The SGI IOC4 PCI device is a bit of a strange beast, so some notes on
-it are in order.
-
-First, even though the IOC4 performs multiple functions, such as an
-IDE controller, a serial controller, a PS/2 keyboard/mouse controller,
-and an external interrupt mechanism, it's not implemented as a
-multifunction device.  The consequence of this from a software
-standpoint is that all these functions share a single IRQ, and
-they can't all register to own the same PCI device ID.  To make
-matters a bit worse, some of the register blocks (and even registers
-themselves) present in IOC4 are mixed-purpose between these several
-functions, meaning that there's no clear "owning" device driver.
-
-The solution is to organize the IOC4 driver into several independent
-drivers, "ioc4", "sgiioc4", and "ioc4_serial".  Note that there is no
-PS/2 controller driver as this functionality has never been wired up
-on a shipping IO card.
-
-ioc4
-====
-This is the core (or shim) driver for IOC4.  It is responsible for
-initializing the basic functionality of the chip, and allocating
-the PCI resources that are shared between the IOC4 functions.
-
-This driver also provides registration functions that the other
-IOC4 drivers can call to make their presence known.  Each driver
-needs to provide a probe and remove function, which are invoked
-by the core driver at appropriate times.  The interface of these
-IOC4 function probe and remove operations isn't precisely the same
-as PCI device probe and remove operations, but is logically the
-same operation.
-
-sgiioc4
-=======
-This is the IDE driver for IOC4.  Its name isn't very descriptive
-simply for historical reasons (it used to be the only IOC4 driver
-component).  There's not much to say about it other than it hooks
-up to the ioc4 driver via the appropriate registration, probe, and
-remove functions.
-
-ioc4_serial
-===========
-This is the serial driver for IOC4.  There's not much to say about it
-other than it hooks up to the ioc4 driver via the appropriate registration,
-probe, and remove functions.
diff --git a/Documentation/smsc_ece1099.txt b/Documentation/smsc_ece1099.txt
deleted file mode 100644
index 079277421eaf..000000000000
--- a/Documentation/smsc_ece1099.txt
+++ /dev/null
@@ -1,60 +0,0 @@
-=================================================
-Msc Keyboard Scan Expansion/GPIO Expansion device
-=================================================
-
-What is smsc-ece1099?
-----------------------
-
-The ECE1099 is a 40-Pin 3.3V Keyboard Scan Expansion
-or GPIO Expansion device. The device supports a keyboard
-scan matrix of 23x8. The device is connected to a Master
-via the SMSC BC-Link interface or via the SMBus.
-Keypad scan Input(KSI) and Keypad Scan Output(KSO) signals
-are multiplexed with GPIOs.
-
-Interrupt generation
---------------------
-
-Interrupts can be generated by an edge detection on a GPIO
-pin or an edge detection on one of the bus interface pins.
-Interrupts can also be detected on the keyboard scan interface.
-The bus interrupt pin (BC_INT# or SMBUS_INT#) is asserted if
-any bit in one of the Interrupt Status registers is 1 and
-the corresponding Interrupt Mask bit is also 1.
-
-In order for software to determine which device is the source
-of an interrupt, it should first read the Group Interrupt Status Register
-to determine which Status register group is a source for the interrupt.
-Software should read both the Status register and the associated Mask register,
-then AND the two values together. Bits that are 1 in the result of the AND
-are active interrupts. Software clears an interrupt by writing a 1 to the
-corresponding bit in the Status register.
-
-Communication Protocol
-----------------------
-
-- SMbus slave Interface
-	The host processor communicates with the ECE1099 device
-	through a series of read/write registers via the SMBus
-	interface. SMBus is a serial communication protocol between
-	a computer host and its peripheral devices. The SMBus data
-	rate is 10KHz minimum to 400 KHz maximum
-
-- Slave Bus Interface
-	The ECE1099 device SMBus implementation is a subset of the
-	SMBus interface to the host. The device is a slave-only SMBus device.
-	The implementation in the device is a subset of SMBus since it
-	only supports four protocols.
-
-	The Write Byte, Read Byte, Send Byte, and Receive Byte protocols are the
-	only valid SMBus protocols for the device.
-
-- BC-LinkTM Interface
-	The BC-Link is a proprietary bus that allows communication
-	between a Master device and a Companion device. The Master
-	device uses this serial bus to read and write registers
-	located on the Companion device. The bus comprises three signals,
-	BC_CLK, BC_DAT and BC_INT#. The Master device always provides the
-	clock, BC_CLK, and the Companion device is the source for an
-	independent asynchronous interrupt signal, BC_INT#. The ECE1099
-	supports BC-Link speeds up to 24MHz.
diff --git a/Documentation/switchtec.txt b/Documentation/switchtec.txt
deleted file mode 100644
index 30d6a64e53f7..000000000000
--- a/Documentation/switchtec.txt
+++ /dev/null
@@ -1,102 +0,0 @@
-========================
-Linux Switchtec Support
-========================
-
-Microsemi's "Switchtec" line of PCI switch devices is already
-supported by the kernel with standard PCI switch drivers. However, the
-Switchtec device advertises a special management endpoint which
-enables some additional functionality. This includes:
-
-* Packet and Byte Counters
-* Firmware Upgrades
-* Event and Error logs
-* Querying port link status
-* Custom user firmware commands
-
-The switchtec kernel module implements this functionality.
-
-
-Interface
-=========
-
-The primary means of communicating with the Switchtec management firmware is
-through the Memory-mapped Remote Procedure Call (MRPC) interface.
-Commands are submitted to the interface with a 4-byte command
-identifier and up to 1KB of command specific data. The firmware will
-respond with a 4-byte return code and up to 1KB of command-specific
-data. The interface only processes a single command at a time.
-
-
-Userspace Interface
-===================
-
-The MRPC interface will be exposed to userspace through a simple char
-device: /dev/switchtec#, one for each management endpoint in the system.
-
-The char device has the following semantics:
-
-* A write must consist of at least 4 bytes and no more than 1028 bytes.
-  The first 4 bytes will be interpreted as the Command ID and the
-  remainder will be used as the input data. A write will send the
-  command to the firmware to begin processing.
-
-* Each write must be followed by exactly one read. Any double write will
-  produce an error and any read that doesn't follow a write will
-  produce an error.
-
-* A read will block until the firmware completes the command and return
-  the 4-byte Command Return Value plus up to 1024 bytes of output
-  data. (The length will be specified by the size parameter of the read
-  call -- reading less than 4 bytes will produce an error.)
-
-* The poll call will also be supported for userspace applications that
-  need to do other things while waiting for the command to complete.
-
-The following IOCTLs are also supported by the device:
-
-* SWITCHTEC_IOCTL_FLASH_INFO - Retrieve firmware length and number
-  of partitions in the device.
-
-* SWITCHTEC_IOCTL_FLASH_PART_INFO - Retrieve address and lengeth for
-  any specified partition in flash.
-
-* SWITCHTEC_IOCTL_EVENT_SUMMARY - Read a structure of bitmaps
-  indicating all uncleared events.
-
-* SWITCHTEC_IOCTL_EVENT_CTL - Get the current count, clear and set flags
-  for any event. This ioctl takes in a switchtec_ioctl_event_ctl struct
-  with the event_id, index and flags set (index being the partition or PFF
-  number for non-global events). It returns whether the event has
-  occurred, the number of times and any event specific data. The flags
-  can be used to clear the count or enable and disable actions to
-  happen when the event occurs.
-  By using the SWITCHTEC_IOCTL_EVENT_FLAG_EN_POLL flag,
-  you can set an event to trigger a poll command to return with
-  POLLPRI. In this way, userspace can wait for events to occur.
-
-* SWITCHTEC_IOCTL_PFF_TO_PORT and SWITCHTEC_IOCTL_PORT_TO_PFF convert
-  between PCI Function Framework number (used by the event system)
-  and Switchtec Logic Port ID and Partition number (which is more
-  user friendly).
-
-
-Non-Transparent Bridge (NTB) Driver
-===================================
-
-An NTB hardware driver is provided for the Switchtec hardware in
-ntb_hw_switchtec. Currently, it only supports switches configured with
-exactly 2 NT partitions and zero or more non-NT partitions. It also requires
-the following configuration settings:
-
-* Both NT partitions must be able to access each other's GAS spaces.
-  Thus, the bits in the GAS Access Vector under Management Settings
-  must be set to support this.
-* Kernel configuration MUST include support for NTB (CONFIG_NTB needs
-  to be set)
-
-NT EP BAR 2 will be dynamically configured as a Direct Window, and
-the configuration file does not need to configure it explicitly.
-
-Please refer to Documentation/ntb.txt in Linux source tree for an overall
-understanding of the Linux NTB stack. ntb_hw_switchtec works as an NTB
-Hardware Driver in this stack.
diff --git a/Documentation/sync_file.txt b/Documentation/sync_file.txt
deleted file mode 100644
index 496fb2c3b3e6..000000000000
--- a/Documentation/sync_file.txt
+++ /dev/null
@@ -1,86 +0,0 @@
-===================
-Sync File API Guide
-===================
-
-:Author: Gustavo Padovan <gustavo at padovan dot org>
-
-This document serves as a guide for device drivers writers on what the
-sync_file API is, and how drivers can support it. Sync file is the carrier of
-the fences(struct dma_fence) that are needed to synchronize between drivers or
-across process boundaries.
-
-The sync_file API is meant to be used to send and receive fence information
-to/from userspace. It enables userspace to do explicit fencing, where instead
-of attaching a fence to the buffer a producer driver (such as a GPU or V4L
-driver) sends the fence related to the buffer to userspace via a sync_file.
-
-The sync_file then can be sent to the consumer (DRM driver for example), that
-will not use the buffer for anything before the fence(s) signals, i.e., the
-driver that issued the fence is not using/processing the buffer anymore, so it
-signals that the buffer is ready to use. And vice-versa for the consumer ->
-producer part of the cycle.
-
-Sync files allows userspace awareness on buffer sharing synchronization between
-drivers.
-
-Sync file was originally added in the Android kernel but current Linux Desktop
-can benefit a lot from it.
-
-in-fences and out-fences
-------------------------
-
-Sync files can go either to or from userspace. When a sync_file is sent from
-the driver to userspace we call the fences it contains 'out-fences'. They are
-related to a buffer that the driver is processing or is going to process, so
-the driver creates an out-fence to be able to notify, through
-dma_fence_signal(), when it has finished using (or processing) that buffer.
-Out-fences are fences that the driver creates.
-
-On the other hand if the driver receives fence(s) through a sync_file from
-userspace we call these fence(s) 'in-fences'. Receiving in-fences means that
-we need to wait for the fence(s) to signal before using any buffer related to
-the in-fences.
-
-Creating Sync Files
--------------------
-
-When a driver needs to send an out-fence userspace it creates a sync_file.
-
-Interface::
-
-	struct sync_file *sync_file_create(struct dma_fence *fence);
-
-The caller pass the out-fence and gets back the sync_file. That is just the
-first step, next it needs to install an fd on sync_file->file. So it gets an
-fd::
-
-	fd = get_unused_fd_flags(O_CLOEXEC);
-
-and installs it on sync_file->file::
-
-	fd_install(fd, sync_file->file);
-
-The sync_file fd now can be sent to userspace.
-
-If the creation process fail, or the sync_file needs to be released by any
-other reason fput(sync_file->file) should be used.
-
-Receiving Sync Files from Userspace
------------------------------------
-
-When userspace needs to send an in-fence to the driver it passes file descriptor
-of the Sync File to the kernel. The kernel can then retrieve the fences
-from it.
-
-Interface::
-
-	struct dma_fence *sync_file_get_fence(int fd);
-
-
-The returned reference is owned by the caller and must be disposed of
-afterwards using dma_fence_put(). In case of error, a NULL is returned instead.
-
-References:
-
-1. struct sync_file in include/linux/sync_file.h
-2. All interfaces mentioned above defined in include/linux/sync_file.h
diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
deleted file mode 100644
index c3f69bcaf96e..000000000000
--- a/Documentation/vfio-mediated-device.txt
+++ /dev/null
@@ -1,414 +0,0 @@
-.. include:: <isonum.txt>
-
-=====================
-VFIO Mediated devices
-=====================
-
-:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
-:Author: Neo Jia <cjia@nvidia.com>
-:Author: Kirti Wankhede <kwankhede@nvidia.com>
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License version 2 as
-published by the Free Software Foundation.
-
-
-Virtual Function I/O (VFIO) Mediated devices[1]
-===============================================
-
-The number of use cases for virtualizing DMA devices that do not have built-in
-SR_IOV capability is increasing. Previously, to virtualize such devices,
-developers had to create their own management interfaces and APIs, and then
-integrate them with user space software. To simplify integration with user space
-software, we have identified common requirements and a unified management
-interface for such devices.
-
-The VFIO driver framework provides unified APIs for direct device access. It is
-an IOMMU/device-agnostic framework for exposing direct device access to user
-space in a secure, IOMMU-protected environment. This framework is used for
-multiple devices, such as GPUs, network adapters, and compute accelerators. With
-direct device access, virtual machines or user space applications have direct
-access to the physical device. This framework is reused for mediated devices.
-
-The mediated core driver provides a common interface for mediated device
-management that can be used by drivers of different devices. This module
-provides a generic interface to perform these operations:
-
-* Create and destroy a mediated device
-* Add a mediated device to and remove it from a mediated bus driver
-* Add a mediated device to and remove it from an IOMMU group
-
-The mediated core driver also provides an interface to register a bus driver.
-For example, the mediated VFIO mdev driver is designed for mediated devices and
-supports VFIO APIs. The mediated bus driver adds a mediated device to and
-removes it from a VFIO group.
-
-The following high-level block diagram shows the main components and interfaces
-in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
-devices as examples, as these devices are the first devices to use this module::
-
-     +---------------+
-     |               |
-     | +-----------+ |  mdev_register_driver() +--------------+
-     | |           | +<------------------------+              |
-     | |  mdev     | |                         |              |
-     | |  bus      | +------------------------>+ vfio_mdev.ko |<-> VFIO user
-     | |  driver   | |     probe()/remove()    |              |    APIs
-     | |           | |                         +--------------+
-     | +-----------+ |
-     |               |
-     |  MDEV CORE    |
-     |   MODULE      |
-     |   mdev.ko     |
-     | +-----------+ |  mdev_register_device() +--------------+
-     | |           | +<------------------------+              |
-     | |           | |                         |  nvidia.ko   |<-> physical
-     | |           | +------------------------>+              |    device
-     | |           | |        callbacks        +--------------+
-     | | Physical  | |
-     | |  device   | |  mdev_register_device() +--------------+
-     | | interface | |<------------------------+              |
-     | |           | |                         |  i915.ko     |<-> physical
-     | |           | +------------------------>+              |    device
-     | |           | |        callbacks        +--------------+
-     | |           | |
-     | |           | |  mdev_register_device() +--------------+
-     | |           | +<------------------------+              |
-     | |           | |                         | ccw_device.ko|<-> physical
-     | |           | +------------------------>+              |    device
-     | |           | |        callbacks        +--------------+
-     | +-----------+ |
-     +---------------+
-
-
-Registration Interfaces
-=======================
-
-The mediated core driver provides the following types of registration
-interfaces:
-
-* Registration interface for a mediated bus driver
-* Physical device driver interface
-
-Registration Interface for a Mediated Bus Driver
-------------------------------------------------
-
-The registration interface for a mediated bus driver provides the following
-structure to represent a mediated device's driver::
-
-     /*
-      * struct mdev_driver [2] - Mediated device's driver
-      * @name: driver name
-      * @probe: called when new device created
-      * @remove: called when device removed
-      * @driver: device driver structure
-      */
-     struct mdev_driver {
-	     const char *name;
-	     int  (*probe)  (struct device *dev);
-	     void (*remove) (struct device *dev);
-	     struct device_driver    driver;
-     };
-
-A mediated bus driver for mdev should use this structure in the function calls
-to register and unregister itself with the core driver:
-
-* Register::
-
-    extern int  mdev_register_driver(struct mdev_driver *drv,
-				   struct module *owner);
-
-* Unregister::
-
-    extern void mdev_unregister_driver(struct mdev_driver *drv);
-
-The mediated bus driver is responsible for adding mediated devices to the VFIO
-group when devices are bound to the driver and removing mediated devices from
-the VFIO when devices are unbound from the driver.
-
-
-Physical Device Driver Interface
---------------------------------
-
-The physical device driver interface provides the mdev_parent_ops[3] structure
-to define the APIs to manage work in the mediated core driver that is related
-to the physical device.
-
-The structures in the mdev_parent_ops structure are as follows:
-
-* dev_attr_groups: attributes of the parent device
-* mdev_attr_groups: attributes of the mediated device
-* supported_config: attributes to define supported configurations
-
-The functions in the mdev_parent_ops structure are as follows:
-
-* create: allocate basic resources in a driver for a mediated device
-* remove: free resources in a driver when a mediated device is destroyed
-
-(Note that mdev-core provides no implicit serialization of create/remove
-callbacks per mdev parent device, per mdev type, or any other categorization.
-Vendor drivers are expected to be fully asynchronous in this respect or
-provide their own internal resource protection.)
-
-The callbacks in the mdev_parent_ops structure are as follows:
-
-* open: open callback of mediated device
-* close: close callback of mediated device
-* ioctl: ioctl callback of mediated device
-* read : read emulation callback
-* write: write emulation callback
-* mmap: mmap emulation callback
-
-A driver should use the mdev_parent_ops structure in the function call to
-register itself with the mdev core driver::
-
-	extern int  mdev_register_device(struct device *dev,
-	                                 const struct mdev_parent_ops *ops);
-
-However, the mdev_parent_ops structure is not required in the function call
-that a driver should use to unregister itself with the mdev core driver::
-
-	extern void mdev_unregister_device(struct device *dev);
-
-
-Mediated Device Management Interface Through sysfs
-==================================================
-
-The management interface through sysfs enables user space software, such as
-libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
-This management interface provides flexibility to the underlying physical
-device's driver to support features such as:
-
-* Mediated device hot plug
-* Multiple mediated devices in a single virtual machine
-* Multiple mediated devices from different physical devices
-
-Links in the mdev_bus Class Directory
--------------------------------------
-The /sys/class/mdev_bus/ directory contains links to devices that are registered
-with the mdev core driver.
-
-Directories and files under the sysfs for Each Physical Device
---------------------------------------------------------------
-
-::
-
-  |- [parent physical device]
-  |--- Vendor-specific-attributes [optional]
-  |--- [mdev_supported_types]
-  |     |--- [<type-id>]
-  |     |   |--- create
-  |     |   |--- name
-  |     |   |--- available_instances
-  |     |   |--- device_api
-  |     |   |--- description
-  |     |   |--- [devices]
-  |     |--- [<type-id>]
-  |     |   |--- create
-  |     |   |--- name
-  |     |   |--- available_instances
-  |     |   |--- device_api
-  |     |   |--- description
-  |     |   |--- [devices]
-  |     |--- [<type-id>]
-  |          |--- create
-  |          |--- name
-  |          |--- available_instances
-  |          |--- device_api
-  |          |--- description
-  |          |--- [devices]
-
-* [mdev_supported_types]
-
-  The list of currently supported mediated device types and their details.
-
-  [<type-id>], device_api, and available_instances are mandatory attributes
-  that should be provided by vendor driver.
-
-* [<type-id>]
-
-  The [<type-id>] name is created by adding the device driver string as a prefix
-  to the string provided by the vendor driver. This format of this name is as
-  follows::
-
-	sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
-
-  (or using mdev_parent_dev(mdev) to arrive at the parent device outside
-  of the core mdev code)
-
-* device_api
-
-  This attribute should show which device API is being created, for example,
-  "vfio-pci" for a PCI device.
-
-* available_instances
-
-  This attribute should show the number of devices of type <type-id> that can be
-  created.
-
-* [device]
-
-  This directory contains links to the devices of type <type-id> that have been
-  created.
-
-* name
-
-  This attribute should show human readable name. This is optional attribute.
-
-* description
-
-  This attribute should show brief features/description of the type. This is
-  optional attribute.
-
-Directories and Files Under the sysfs for Each mdev Device
-----------------------------------------------------------
-
-::
-
-  |- [parent phy device]
-  |--- [$MDEV_UUID]
-         |--- remove
-         |--- mdev_type {link to its type}
-         |--- vendor-specific-attributes [optional]
-
-* remove (write only)
-
-Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
-fail the remove() callback if that device is active and the vendor driver
-doesn't support hot unplug.
-
-Example::
-
-	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
-
-Mediated device Hot plug
-------------------------
-
-Mediated devices can be created and assigned at runtime. The procedure to hot
-plug a mediated device is the same as the procedure to hot plug a PCI device.
-
-Translation APIs for Mediated Devices
-=====================================
-
-The following APIs are provided for translating user pfn to host pfn in a VFIO
-driver::
-
-	extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
-				  int npage, int prot, unsigned long *phys_pfn);
-
-	extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
-				    int npage);
-
-These functions call back into the back-end IOMMU module by using the pin_pages
-and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
-these callbacks are supported in the TYPE1 IOMMU module. To enable them for
-other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
-these two callback functions.
-
-Using the Sample Code
-=====================
-
-mtty.c in samples/vfio-mdev/ directory is a sample driver program to
-demonstrate how to use the mediated device framework.
-
-The sample driver creates an mdev device that simulates a serial port over a PCI
-card.
-
-1. Build and load the mtty.ko module.
-
-   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
-
-   Files in this device directory in sysfs are similar to the following::
-
-     # tree /sys/devices/virtual/mtty/mtty/
-        /sys/devices/virtual/mtty/mtty/
-        |-- mdev_supported_types
-        |   |-- mtty-1
-        |   |   |-- available_instances
-        |   |   |-- create
-        |   |   |-- device_api
-        |   |   |-- devices
-        |   |   `-- name
-        |   `-- mtty-2
-        |       |-- available_instances
-        |       |-- create
-        |       |-- device_api
-        |       |-- devices
-        |       `-- name
-        |-- mtty_dev
-        |   `-- sample_mtty_dev
-        |-- power
-        |   |-- autosuspend_delay_ms
-        |   |-- control
-        |   |-- runtime_active_time
-        |   |-- runtime_status
-        |   `-- runtime_suspended_time
-        |-- subsystem -> ../../../../class/mtty
-        `-- uevent
-
-2. Create a mediated device by using the dummy device that you created in the
-   previous step::
-
-     # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >	\
-              /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
-
-3. Add parameters to qemu-kvm::
-
-     -device vfio-pci,\
-      sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
-
-4. Boot the VM.
-
-   In the Linux guest VM, with no hardware on the host, the device appears
-   as  follows::
-
-     # lspci -s 00:05.0 -xxvv
-     00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
-             Subsystem: Device 4348:3253
-             Physical Slot: 5
-             Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
-     Stepping- SERR- FastB2B- DisINTx-
-             Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
-     <TAbort- <MAbort- >SERR- <PERR- INTx-
-             Interrupt: pin A routed to IRQ 10
-             Region 0: I/O ports at c150 [size=8]
-             Region 1: I/O ports at c158 [size=8]
-             Kernel driver in use: serial
-     00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
-     10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
-     20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
-     30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
-
-     In the Linux guest VM, dmesg output for the device is as follows:
-
-     serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
-     0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
-     0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
-
-
-5. In the Linux guest VM, check the serial ports::
-
-     # setserial -g /dev/ttyS*
-     /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
-     /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
-     /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
-
-6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
-   /dev/ttyS2 with hardware flow control disabled.
-
-7. Type data on the minicom terminal or send data to the terminal emulation
-   program and read the data.
-
-   Data is loop backed from hosts mtty driver.
-
-8. Destroy the mediated device that you created::
-
-     # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
-
-References
-==========
-
-1. See Documentation/vfio.txt for more information on VFIO.
-2. struct mdev_driver in include/linux/mdev.h
-3. struct mdev_parent_ops in include/linux/mdev.h
-4. struct vfio_iommu_driver_ops in include/linux/vfio.h
diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
deleted file mode 100644
index f1a4d3c3ba0b..000000000000
--- a/Documentation/vfio.txt
+++ /dev/null
@@ -1,520 +0,0 @@
-==================================
-VFIO - "Virtual Function I/O" [1]_
-==================================
-
-Many modern system now provide DMA and interrupt remapping facilities
-to help ensure I/O devices behave within the boundaries they've been
-allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
-POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
-systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
-agnostic framework for exposing direct device access to userspace, in
-a secure, IOMMU protected environment.  In other words, this allows
-safe [2]_, non-privileged, userspace drivers.
-
-Why do we want that?  Virtual machines often make use of direct device
-access ("device assignment") when configured for the highest possible
-I/O performance.  From a device and host perspective, this simply
-turns the VM into a userspace driver, with the benefits of
-significantly reduced latency, higher bandwidth, and direct use of
-bare-metal device drivers [3]_.
-
-Some applications, particularly in the high performance computing
-field, also benefit from low-overhead, direct device access from
-userspace.  Examples include network adapters (often non-TCP/IP based)
-and compute accelerators.  Prior to VFIO, these drivers had to either
-go through the full development cycle to become proper upstream
-driver, be maintained out of tree, or make use of the UIO framework,
-which has no notion of IOMMU protection, limited interrupt support,
-and requires root privileges to access things like PCI configuration
-space.
-
-The VFIO driver framework intends to unify these, replacing both the
-KVM PCI specific device assignment code as well as provide a more
-secure, more featureful userspace driver environment than UIO.
-
-Groups, Devices, and IOMMUs
----------------------------
-
-Devices are the main target of any I/O driver.  Devices typically
-create a programming interface made up of I/O access, interrupts,
-and DMA.  Without going into the details of each of these, DMA is
-by far the most critical aspect for maintaining a secure environment
-as allowing a device read-write access to system memory imposes the
-greatest risk to the overall system integrity.
-
-To help mitigate this risk, many modern IOMMUs now incorporate
-isolation properties into what was, in many cases, an interface only
-meant for translation (ie. solving the addressing problems of devices
-with limited address spaces).  With this, devices can now be isolated
-from each other and from arbitrary memory access, thus allowing
-things like secure direct assignment of devices into virtual machines.
-
-This isolation is not always at the granularity of a single device
-though.  Even when an IOMMU is capable of this, properties of devices,
-interconnects, and IOMMU topologies can each reduce this isolation.
-For instance, an individual device may be part of a larger multi-
-function enclosure.  While the IOMMU may be able to distinguish
-between devices within the enclosure, the enclosure may not require
-transactions between devices to reach the IOMMU.  Examples of this
-could be anything from a multi-function PCI device with backdoors
-between functions to a non-PCI-ACS (Access Control Services) capable
-bridge allowing redirection without reaching the IOMMU.  Topology
-can also play a factor in terms of hiding devices.  A PCIe-to-PCI
-bridge masks the devices behind it, making transaction appear as if
-from the bridge itself.  Obviously IOMMU design plays a major factor
-as well.
-
-Therefore, while for the most part an IOMMU may have device level
-granularity, any system is susceptible to reduced granularity.  The
-IOMMU API therefore supports a notion of IOMMU groups.  A group is
-a set of devices which is isolatable from all other devices in the
-system.  Groups are therefore the unit of ownership used by VFIO.
-
-While the group is the minimum granularity that must be used to
-ensure secure user access, it's not necessarily the preferred
-granularity.  In IOMMUs which make use of page tables, it may be
-possible to share a set of page tables between different groups,
-reducing the overhead both to the platform (reduced TLB thrashing,
-reduced duplicate page tables), and to the user (programming only
-a single set of translations).  For this reason, VFIO makes use of
-a container class, which may hold one or more groups.  A container
-is created by simply opening the /dev/vfio/vfio character device.
-
-On its own, the container provides little functionality, with all
-but a couple version and extension query interfaces locked away.
-The user needs to add a group into the container for the next level
-of functionality.  To do this, the user first needs to identify the
-group associated with the desired device.  This can be done using
-the sysfs links described in the example below.  By unbinding the
-device from the host driver and binding it to a VFIO driver, a new
-VFIO group will appear for the group as /dev/vfio/$GROUP, where
-$GROUP is the IOMMU group number of which the device is a member.
-If the IOMMU group contains multiple devices, each will need to
-be bound to a VFIO driver before operations on the VFIO group
-are allowed (it's also sufficient to only unbind the device from
-host drivers if a VFIO driver is unavailable; this will make the
-group available, but not that particular device).  TBD - interface
-for disabling driver probing/locking a device.
-
-Once the group is ready, it may be added to the container by opening
-the VFIO group character device (/dev/vfio/$GROUP) and using the
-VFIO_GROUP_SET_CONTAINER ioctl, passing the file descriptor of the
-previously opened container file.  If desired and if the IOMMU driver
-supports sharing the IOMMU context between groups, multiple groups may
-be set to the same container.  If a group fails to set to a container
-with existing groups, a new empty container will need to be used
-instead.
-
-With a group (or groups) attached to a container, the remaining
-ioctls become available, enabling access to the VFIO IOMMU interfaces.
-Additionally, it now becomes possible to get file descriptors for each
-device within a group using an ioctl on the VFIO group file descriptor.
-
-The VFIO device API includes ioctls for describing the device, the I/O
-regions and their read/write/mmap offsets on the device descriptor, as
-well as mechanisms for describing and registering interrupt
-notifications.
-
-VFIO Usage Example
-------------------
-
-Assume user wants to access PCI device 0000:06:0d.0::
-
-	$ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
-	../../../../kernel/iommu_groups/26
-
-This device is therefore in IOMMU group 26.  This device is on the
-pci bus, therefore the user will make use of vfio-pci to manage the
-group::
-
-	# modprobe vfio-pci
-
-Binding this device to the vfio-pci driver creates the VFIO group
-character devices for this group::
-
-	$ lspci -n -s 0000:06:0d.0
-	06:0d.0 0401: 1102:0002 (rev 08)
-	# echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
-	# echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
-
-Now we need to look at what other devices are in the group to free
-it for use by VFIO::
-
-	$ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices
-	total 0
-	lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 ->
-		../../../../devices/pci0000:00/0000:00:1e.0
-	lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 ->
-		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
-	lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 ->
-		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
-
-This device is behind a PCIe-to-PCI bridge [4]_, therefore we also
-need to add device 0000:06:0d.1 to the group following the same
-procedure as above.  Device 0000:00:1e.0 is a bridge that does
-not currently have a host driver, therefore it's not required to
-bind this device to the vfio-pci driver (vfio-pci does not currently
-support PCI bridges).
-
-The final step is to provide the user with access to the group if
-unprivileged operation is desired (note that /dev/vfio/vfio provides
-no capabilities on its own and is therefore expected to be set to
-mode 0666 by the system)::
-
-	# chown user:user /dev/vfio/26
-
-The user now has full access to all the devices and the iommu for this
-group and can access them as follows::
-
-	int container, group, device, i;
-	struct vfio_group_status group_status =
-					{ .argsz = sizeof(group_status) };
-	struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) };
-	struct vfio_iommu_type1_dma_map dma_map = { .argsz = sizeof(dma_map) };
-	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
-
-	/* Create a new container */
-	container = open("/dev/vfio/vfio", O_RDWR);
-
-	if (ioctl(container, VFIO_GET_API_VERSION) != VFIO_API_VERSION)
-		/* Unknown API version */
-
-	if (!ioctl(container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
-		/* Doesn't support the IOMMU driver we want. */
-
-	/* Open the group */
-	group = open("/dev/vfio/26", O_RDWR);
-
-	/* Test the group is viable and available */
-	ioctl(group, VFIO_GROUP_GET_STATUS, &group_status);
-
-	if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE))
-		/* Group is not viable (ie, not all devices bound for vfio) */
-
-	/* Add the group to the container */
-	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
-
-	/* Enable the IOMMU model we want */
-	ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU);
-
-	/* Get addition IOMMU info */
-	ioctl(container, VFIO_IOMMU_GET_INFO, &iommu_info);
-
-	/* Allocate some space and setup a DMA mapping */
-	dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
-			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
-	dma_map.size = 1024 * 1024;
-	dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
-	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
-
-	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
-
-	/* Get a file descriptor for the device */
-	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
-
-	/* Test and setup the device */
-	ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
-
-	for (i = 0; i < device_info.num_regions; i++) {
-		struct vfio_region_info reg = { .argsz = sizeof(reg) };
-
-		reg.index = i;
-
-		ioctl(device, VFIO_DEVICE_GET_REGION_INFO, &reg);
-
-		/* Setup mappings... read/write offsets, mmaps
-		 * For PCI devices, config space is a region */
-	}
-
-	for (i = 0; i < device_info.num_irqs; i++) {
-		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
-
-		irq.index = i;
-
-		ioctl(device, VFIO_DEVICE_GET_IRQ_INFO, &irq);
-
-		/* Setup IRQs... eventfds, VFIO_DEVICE_SET_IRQS */
-	}
-
-	/* Gratuitous device reset and go... */
-	ioctl(device, VFIO_DEVICE_RESET);
-
-VFIO User API
--------------------------------------------------------------------------------
-
-Please see include/linux/vfio.h for complete API documentation.
-
-VFIO bus driver API
--------------------------------------------------------------------------------
-
-VFIO bus drivers, such as vfio-pci make use of only a few interfaces
-into VFIO core.  When devices are bound and unbound to the driver,
-the driver should call vfio_add_group_dev() and vfio_del_group_dev()
-respectively::
-
-	extern int vfio_add_group_dev(struct device *dev,
-				      const struct vfio_device_ops *ops,
-				      void *device_data);
-
-	extern void *vfio_del_group_dev(struct device *dev);
-
-vfio_add_group_dev() indicates to the core to begin tracking the
-iommu_group of the specified dev and register the dev as owned by
-a VFIO bus driver.  The driver provides an ops structure for callbacks
-similar to a file operations structure::
-
-	struct vfio_device_ops {
-		int	(*open)(void *device_data);
-		void	(*release)(void *device_data);
-		ssize_t	(*read)(void *device_data, char __user *buf,
-				size_t count, loff_t *ppos);
-		ssize_t	(*write)(void *device_data, const char __user *buf,
-				 size_t size, loff_t *ppos);
-		long	(*ioctl)(void *device_data, unsigned int cmd,
-				 unsigned long arg);
-		int	(*mmap)(void *device_data, struct vm_area_struct *vma);
-	};
-
-Each function is passed the device_data that was originally registered
-in the vfio_add_group_dev() call above.  This allows the bus driver
-an easy place to store its opaque, private data.  The open/release
-callbacks are issued when a new file descriptor is created for a
-device (via VFIO_GROUP_GET_DEVICE_FD).  The ioctl interface provides
-a direct pass through for VFIO_DEVICE_* ioctls.  The read/write/mmap
-interfaces implement the device region access defined by the device's
-own VFIO_DEVICE_GET_REGION_INFO ioctl.
-
-
-PPC64 sPAPR implementation note
--------------------------------
-
-This implementation has some specifics:
-
-1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
-   container is supported as an IOMMU table is allocated at the boot time,
-   one table per a IOMMU group which is a Partitionable Endpoint (PE)
-   (PE is often a PCI domain but not always).
-
-   Newer systems (POWER8 with IODA2) have improved hardware design which allows
-   to remove this limitation and have multiple IOMMU groups per a VFIO
-   container.
-
-2) The hardware supports so called DMA windows - the PCI address range
-   within which DMA transfer is allowed, any attempt to access address space
-   out of the window leads to the whole PE isolation.
-
-3) PPC64 guests are paravirtualized but not fully emulated. There is an API
-   to map/unmap pages for DMA, and it normally maps 1..32 pages per call and
-   currently there is no way to reduce the number of calls. In order to make
-   things faster, the map/unmap handling has been implemented in real mode
-   which provides an excellent performance which has limitations such as
-   inability to do locked pages accounting in real time.
-
-4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O
-   subtree that can be treated as a unit for the purposes of partitioning and
-   error recovery. A PE may be a single or multi-function IOA (IO Adapter), a
-   function of a multi-function IOA, or multiple IOAs (possibly including
-   switch and bridge structures above the multiple IOAs). PPC64 guests detect
-   PCI errors and recover from them via EEH RTAS services, which works on the
-   basis of additional ioctl commands.
-
-   So 4 additional ioctls have been added:
-
-	VFIO_IOMMU_SPAPR_TCE_GET_INFO
-		returns the size and the start of the DMA window on the PCI bus.
-
-	VFIO_IOMMU_ENABLE
-		enables the container. The locked pages accounting
-		is done at this point. This lets user first to know what
-		the DMA window is and adjust rlimit before doing any real job.
-
-	VFIO_IOMMU_DISABLE
-		disables the container.
-
-	VFIO_EEH_PE_OP
-		provides an API for EEH setup, error detection and recovery.
-
-   The code flow from the example above should be slightly changed::
-
-	struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 };
-
-	.....
-	/* Add the group to the container */
-	ioctl(group, VFIO_GROUP_SET_CONTAINER, &container);
-
-	/* Enable the IOMMU model we want */
-	ioctl(container, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU)
-
-	/* Get addition sPAPR IOMMU info */
-	vfio_iommu_spapr_tce_info spapr_iommu_info;
-	ioctl(container, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &spapr_iommu_info);
-
-	if (ioctl(container, VFIO_IOMMU_ENABLE))
-		/* Cannot enable container, may be low rlimit */
-
-	/* Allocate some space and setup a DMA mapping */
-	dma_map.vaddr = mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
-			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
-
-	dma_map.size = 1024 * 1024;
-	dma_map.iova = 0; /* 1MB starting at 0x0 from device view */
-	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
-
-	/* Check here is .iova/.size are within DMA window from spapr_iommu_info */
-	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
-
-	/* Get a file descriptor for the device */
-	device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
-
-	....
-
-	/* Gratuitous device reset and go... */
-	ioctl(device, VFIO_DEVICE_RESET);
-
-	/* Make sure EEH is supported */
-	ioctl(container, VFIO_CHECK_EXTENSION, VFIO_EEH);
-
-	/* Enable the EEH functionality on the device */
-	pe_op.op = VFIO_EEH_PE_ENABLE;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	/* You're suggested to create additional data struct to represent
-	 * PE, and put child devices belonging to same IOMMU group to the
-	 * PE instance for later reference.
-	 */
-
-	/* Check the PE's state and make sure it's in functional state */
-	pe_op.op = VFIO_EEH_PE_GET_STATE;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	/* Save device state using pci_save_state().
-	 * EEH should be enabled on the specified device.
-	 */
-
-	....
-
-	/* Inject EEH error, which is expected to be caused by 32-bits
-	 * config load.
-	 */
-	pe_op.op = VFIO_EEH_PE_INJECT_ERR;
-	pe_op.err.type = EEH_ERR_TYPE_32;
-	pe_op.err.func = EEH_ERR_FUNC_LD_CFG_ADDR;
-	pe_op.err.addr = 0ul;
-	pe_op.err.mask = 0ul;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	....
-
-	/* When 0xFF's returned from reading PCI config space or IO BARs
-	 * of the PCI device. Check the PE's state to see if that has been
-	 * frozen.
-	 */
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	/* Waiting for pending PCI transactions to be completed and don't
-	 * produce any more PCI traffic from/to the affected PE until
-	 * recovery is finished.
-	 */
-
-	/* Enable IO for the affected PE and collect logs. Usually, the
-	 * standard part of PCI config space, AER registers are dumped
-	 * as logs for further analysis.
-	 */
-	pe_op.op = VFIO_EEH_PE_UNFREEZE_IO;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	/*
-	 * Issue PE reset: hot or fundamental reset. Usually, hot reset
-	 * is enough. However, the firmware of some PCI adapters would
-	 * require fundamental reset.
-	 */
-	pe_op.op = VFIO_EEH_PE_RESET_HOT;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-	pe_op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	/* Configure the PCI bridges for the affected PE */
-	pe_op.op = VFIO_EEH_PE_CONFIGURE;
-	ioctl(container, VFIO_EEH_PE_OP, &pe_op);
-
-	/* Restored state we saved at initialization time. pci_restore_state()
-	 * is good enough as an example.
-	 */
-
-	/* Hopefully, error is recovered successfully. Now, you can resume to
-	 * start PCI traffic to/from the affected PE.
-	 */
-
-	....
-
-5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
-   VFIO_IOMMU_DISABLE and implements 2 new ioctls:
-   VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
-   (which are unsupported in v1 IOMMU).
-
-   PPC64 paravirtualized guests generate a lot of map/unmap requests,
-   and the handling of those includes pinning/unpinning pages and updating
-   mm::locked_vm counter to make sure we do not exceed the rlimit.
-   The v2 IOMMU splits accounting and pinning into separate operations:
-
-   - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
-     receive a user space address and size of the block to be pinned.
-     Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
-     be called with the exact address and size used for registering
-     the memory block. The userspace is not expected to call these often.
-     The ranges are stored in a linked list in a VFIO container.
-
-   - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
-     IOMMU table and do not do pinning; instead these check that the userspace
-     address is from pre-registered range.
-
-   This separation helps in optimizing DMA for guests.
-
-6) sPAPR specification allows guests to have an additional DMA window(s) on
-   a PCI bus with a variable page size. Two ioctls have been added to support
-   this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE.
-   The platform has to support the functionality or error will be returned to
-   the userspace. The existing hardware supports up to 2 DMA windows, one is
-   2GB long, uses 4K pages and called "default 32bit window"; the other can
-   be as big as entire RAM, use different page size, it is optional - guests
-   create those in run-time if the guest driver supports 64bit DMA.
-
-   VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and
-   a number of TCE table levels (if a TCE table is going to be big enough and
-   the kernel may not be able to allocate enough of physically contiguous
-   memory). It creates a new window in the available slot and returns the bus
-   address where the new window starts. Due to hardware limitation, the user
-   space cannot choose the location of DMA windows.
-
-   VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window
-   and removes it.
-
--------------------------------------------------------------------------------
-
-.. [1] VFIO was originally an acronym for "Virtual Function I/O" in its
-   initial implementation by Tom Lyon while as Cisco.  We've since
-   outgrown the acronym, but it's catchy.
-
-.. [2] "safe" also depends upon a device being "well behaved".  It's
-   possible for multi-function devices to have backdoors between
-   functions and even for single function devices to have alternative
-   access to things like PCI config space through MMIO registers.  To
-   guard against the former we can include additional precautions in the
-   IOMMU driver to group multi-function PCI devices together
-   (iommu=group_mf).  The latter we can't prevent, but the IOMMU should
-   still provide isolation.  For PCI, SR-IOV Virtual Functions are the
-   best indicator of "well behaved", as these are designed for
-   virtualization usage models.
-
-.. [3] As always there are trade-offs to virtual machine device
-   assignment that are beyond the scope of VFIO.  It's expected that
-   future IOMMU technologies will reduce some, but maybe not all, of
-   these trade-offs.
-
-.. [4] In this case the device is below a PCI bridge, so transactions
-   from either function of the device are indistinguishable to the iommu::
-
-	-[0000:00]-+-1e.0-[06]--+-0d.0
-				\-0d.1
-
-	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink
index ef2727192d69..94ad4c420828 100644
--- a/Documentation/w1/w1.netlink
+++ b/Documentation/w1/w1.netlink
@@ -183,7 +183,7 @@ acknowledge number is set to seq+1.
 Additional documantion, source code examples.
 ============================================
 
-1. Documentation/connector
+1. Documentation/driver-api/connector.rst
 2. http://www.ioremap.net/archive/w1
 This archive includes userspace application w1d.c which uses
 read/write/search commands for all master/slave devices found on the bus.
diff --git a/Documentation/xillybus.txt b/Documentation/xillybus.txt
deleted file mode 100644
index 2446ee303c09..000000000000
--- a/Documentation/xillybus.txt
+++ /dev/null
@@ -1,379 +0,0 @@
-==========================================
-Xillybus driver for generic FPGA interface
-==========================================
-
-:Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com)
-:Email:  eli.billauer@gmail.com or as advertised on Xillybus' site.
-
-.. Contents:
-
- - Introduction
-  -- Background
-  -- Xillybus Overview
-
- - Usage
-  -- User interface
-  -- Synchronization
-  -- Seekable pipes
-
- - Internals
-  -- Source code organization
-  -- Pipe attributes
-  -- Host never reads from the FPGA
-  -- Channels, pipes, and the message channel
-  -- Data streaming
-  -- Data granularity
-  -- Probing
-  -- Buffer allocation
-  -- The "nonempty" message (supporting poll)
-
-
-Introduction
-============
-
-Background
-----------
-
-An FPGA (Field Programmable Gate Array) is a piece of logic hardware, which
-can be programmed to become virtually anything that is usually found as a
-dedicated chipset: For instance, a display adapter, network interface card,
-or even a processor with its peripherals. FPGAs are the LEGO of hardware:
-Based upon certain building blocks, you make your own toys the way you like
-them. It's usually pointless to reimplement something that is already
-available on the market as a chipset, so FPGAs are mostly used when some
-special functionality is needed, and the production volume is relatively low
-(hence not justifying the development of an ASIC).
-
-The challenge with FPGAs is that everything is implemented at a very low
-level, even lower than assembly language. In order to allow FPGA designers to
-focus on their specific project, and not reinvent the wheel over and over
-again, pre-designed building blocks, IP cores, are often used. These are the
-FPGA parallels of library functions. IP cores may implement certain
-mathematical functions, a functional unit (e.g. a USB interface), an entire
-processor (e.g. ARM) or anything that might come handy. Think of them as a
-building block, with electrical wires dangling on the sides for connection to
-other blocks.
-
-One of the daunting tasks in FPGA design is communicating with a fullblown
-operating system (actually, with the processor running it): Implementing the
-low-level bus protocol and the somewhat higher-level interface with the host
-(registers, interrupts, DMA etc.) is a project in itself. When the FPGA's
-function is a well-known one (e.g. a video adapter card, or a NIC), it can
-make sense to design the FPGA's interface logic specifically for the project.
-A special driver is then written to present the FPGA as a well-known interface
-to the kernel and/or user space. In that case, there is no reason to treat the
-FPGA differently than any device on the bus.
-
-It's however common that the desired data communication doesn't fit any well-
-known peripheral function. Also, the effort of designing an elegant
-abstraction for the data exchange is often considered too big. In those cases,
-a quicker and possibly less elegant solution is sought: The driver is
-effectively written as a user space program, leaving the kernel space part
-with just elementary data transport. This still requires designing some
-interface logic for the FPGA, and write a simple ad-hoc driver for the kernel.
-
-Xillybus Overview
------------------
-
-Xillybus is an IP core and a Linux driver. Together, they form a kit for
-elementary data transport between an FPGA and the host, providing pipe-like
-data streams with a straightforward user interface. It's intended as a low-
-effort solution for mixed FPGA-host projects, for which it makes sense to
-have the project-specific part of the driver running in a user-space program.
-
-Since the communication requirements may vary significantly from one FPGA
-project to another (the number of data pipes needed in each direction and
-their attributes), there isn't one specific chunk of logic being the Xillybus
-IP core. Rather, the IP core is configured and built based upon a
-specification given by its end user.
-
-Xillybus presents independent data streams, which resemble pipes or TCP/IP
-communication to the user. At the host side, a character device file is used
-just like any pipe file. On the FPGA side, hardware FIFOs are used to stream
-the data. This is contrary to a common method of communicating through fixed-
-sized buffers (even though such buffers are used by Xillybus under the hood).
-There may be more than a hundred of these streams on a single IP core, but
-also no more than one, depending on the configuration.
-
-In order to ease the deployment of the Xillybus IP core, it contains a simple
-data structure which completely defines the core's configuration. The Linux
-driver fetches this data structure during its initialization process, and sets
-up the DMA buffers and character devices accordingly. As a result, a single
-driver is used to work out of the box with any Xillybus IP core.
-
-The data structure just mentioned should not be confused with PCI's
-configuration space or the Flattened Device Tree.
-
-Usage
-=====
-
-User interface
---------------
-
-On the host, all interface with Xillybus is done through /dev/xillybus_*
-device files, which are generated automatically as the drivers loads. The
-names of these files depend on the IP core that is loaded in the FPGA (see
-Probing below). To communicate with the FPGA, open the device file that
-corresponds to the hardware FIFO you want to send data or receive data from,
-and use plain write() or read() calls, just like with a regular pipe. In
-particular, it makes perfect sense to go::
-
-	$ cat mydata > /dev/xillybus_thisfifo
-
-	$ cat /dev/xillybus_thatfifo > hisdata
-
-possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have
-the capability to send an EOF (but may not use it).
-
-The driver and hardware are designed to behave sensibly as pipes, including:
-
-* Supporting non-blocking I/O (by setting O_NONBLOCK on open() ).
-
-* Supporting poll() and select().
-
-* Being bandwidth efficient under load (using DMA) but also handle small
-  pieces of data sent across (like TCP/IP) by autoflushing.
-
-A device file can be read only, write only or bidirectional. Bidirectional
-device files are treated like two independent pipes (except for sharing a
-"channel" structure in the implementation code).
-
-Synchronization
----------------
-
-Xillybus pipes are configured (on the IP core) to be either synchronous or
-asynchronous. For a synchronous pipe, write() returns successfully only after
-some data has been submitted and acknowledged by the FPGA. This slows down
-bulk data transfers, and is nearly impossible for use with streams that
-require data at a constant rate: There is no data transmitted to the FPGA
-between write() calls, in particular when the process loses the CPU.
-
-When a pipe is configured asynchronous, write() returns if there was enough
-room in the buffers to store any of the data in the buffers.
-
-For FPGA to host pipes, asynchronous pipes allow data transfer from the FPGA
-as soon as the respective device file is opened, regardless of if the data
-has been requested by a read() call. On synchronous pipes, only the amount
-of data requested by a read() call is transmitted.
-
-In summary, for synchronous pipes, data between the host and FPGA is
-transmitted only to satisfy the read() or write() call currently handled
-by the driver, and those calls wait for the transmission to complete before
-returning.
-
-Note that the synchronization attribute has nothing to do with the possibility
-that read() or write() completes less bytes than requested. There is a
-separate configuration flag ("allowpartial") that determines whether such a
-partial completion is allowed.
-
-Seekable pipes
---------------
-
-A synchronous pipe can be configured to have the stream's position exposed
-to the user logic at the FPGA. Such a pipe is also seekable on the host API.
-With this feature, a memory or register interface can be attached on the
-FPGA side to the seekable stream. Reading or writing to a certain address in
-the attached memory is done by seeking to the desired address, and calling
-read() or write() as required.
-
-
-Internals
-=========
-
-Source code organization
-------------------------
-
-The Xillybus driver consists of a core module, xillybus_core.c, and modules
-that depend on the specific bus interface (xillybus_of.c and xillybus_pcie.c).
-
-The bus specific modules are those probed when a suitable device is found by
-the kernel. Since the DMA mapping and synchronization functions, which are bus
-dependent by their nature, are used by the core module, a
-xilly_endpoint_hardware structure is passed to the core module on
-initialization. This structure is populated with pointers to wrapper functions
-which execute the DMA-related operations on the bus.
-
-Pipe attributes
----------------
-
-Each pipe has a number of attributes which are set when the FPGA component
-(IP core) is built. They are fetched from the IDT (the data structure which
-defines the core's configuration, see Probing below) by xilly_setupchannels()
-in xillybus_core.c as follows:
-
-* is_writebuf: The pipe's direction. A non-zero value means it's an FPGA to
-  host pipe (the FPGA "writes").
-
-* channelnum: The pipe's identification number in communication between the
-  host and FPGA.
-
-* format: The underlying data width. See Data Granularity below.
-
-* allowpartial: A non-zero value means that a read() or write() (whichever
-  applies) may return with less than the requested number of bytes. The common
-  choice is a non-zero value, to match standard UNIX behavior.
-
-* synchronous: A non-zero value means that the pipe is synchronous. See
-  Synchronization above.
-
-* bufsize: Each DMA buffer's size. Always a power of two.
-
-* bufnum: The number of buffers allocated for this pipe. Always a power of two.
-
-* exclusive_open: A non-zero value forces exclusive opening of the associated
-  device file. If the device file is bidirectional, and already opened only in
-  one direction, the opposite direction may be opened once.
-
-* seekable: A non-zero value indicates that the pipe is seekable. See
-  Seekable pipes above.
-
-* supports_nonempty: A non-zero value (which is typical) indicates that the
-  hardware will send the messages that are necessary to support select() and
-  poll() for this pipe.
-
-Host never reads from the FPGA
-------------------------------
-
-Even though PCI Express is hotpluggable in general, a typical motherboard
-doesn't expect a card to go away all of the sudden. But since the PCIe card
-is based upon reprogrammable logic, a sudden disappearance from the bus is
-quite likely as a result of an accidental reprogramming of the FPGA while the
-host is up. In practice, nothing happens immediately in such a situation. But
-if the host attempts to read from an address that is mapped to the PCI Express
-device, that leads to an immediate freeze of the system on some motherboards,
-even though the PCIe standard requires a graceful recovery.
-
-In order to avoid these freezes, the Xillybus driver refrains completely from
-reading from the device's register space. All communication from the FPGA to
-the host is done through DMA. In particular, the Interrupt Service Routine
-doesn't follow the common practice of checking a status register when it's
-invoked. Rather, the FPGA prepares a small buffer which contains short
-messages, which inform the host what the interrupt was about.
-
-This mechanism is used on non-PCIe buses as well for the sake of uniformity.
-
-
-Channels, pipes, and the message channel
-----------------------------------------
-
-Each of the (possibly bidirectional) pipes presented to the user is allocated
-a data channel between the FPGA and the host. The distinction between channels
-and pipes is necessary only because of channel 0, which is used for interrupt-
-related messages from the FPGA, and has no pipe attached to it.
-
-Data streaming
---------------
-
-Even though a non-segmented data stream is presented to the user at both
-sides, the implementation relies on a set of DMA buffers which is allocated
-for each channel. For the sake of illustration, let's take the FPGA to host
-direction: As data streams into the respective channel's interface in the
-FPGA, the Xillybus IP core writes it to one of the DMA buffers. When the
-buffer is full, the FPGA informs the host about that (appending a
-XILLYMSG_OPCODE_RELEASEBUF message channel 0 and sending an interrupt if
-necessary). The host responds by making the data available for reading through
-the character device. When all data has been read, the host writes on the
-the FPGA's buffer control register, allowing the buffer's overwriting. Flow
-control mechanisms exist on both sides to prevent underflows and overflows.
-
-This is not good enough for creating a TCP/IP-like stream: If the data flow
-stops momentarily before a DMA buffer is filled, the intuitive expectation is
-that the partial data in buffer will arrive anyhow, despite the buffer not
-being completed. This is implemented by adding a field in the
-XILLYMSG_OPCODE_RELEASEBUF message, through which the FPGA informs not just
-which buffer is submitted, but how much data it contains.
-
-But the FPGA will submit a partially filled buffer only if directed to do so
-by the host. This situation occurs when the read() method has been blocking
-for XILLY_RX_TIMEOUT jiffies (currently 10 ms), after which the host commands
-the FPGA to submit a DMA buffer as soon as it can. This timeout mechanism
-balances between bus bandwidth efficiency (preventing a lot of partially
-filled buffers being sent) and a latency held fairly low for tails of data.
-
-A similar setting is used in the host to FPGA direction. The handling of
-partial DMA buffers is somewhat different, though. The user can tell the
-driver to submit all data it has in the buffers to the FPGA, by issuing a
-write() with the byte count set to zero. This is similar to a flush request,
-but it doesn't block. There is also an autoflushing mechanism, which triggers
-an equivalent flush roughly XILLY_RX_TIMEOUT jiffies after the last write().
-This allows the user to be oblivious about the underlying buffering mechanism
-and yet enjoy a stream-like interface.
-
-Note that the issue of partial buffer flushing is irrelevant for pipes having
-the "synchronous" attribute nonzero, since synchronous pipes don't allow data
-to lay around in the DMA buffers between read() and write() anyhow.
-
-Data granularity
-----------------
-
-The data arrives or is sent at the FPGA as 8, 16 or 32 bit wide words, as
-configured by the "format" attribute. Whenever possible, the driver attempts
-to hide this when the pipe is accessed differently from its natural alignment.
-For example, reading single bytes from a pipe with 32 bit granularity works
-with no issues. Writing single bytes to pipes with 16 or 32 bit granularity
-will also work, but the driver can't send partially completed words to the
-FPGA, so the transmission of up to one word may be held until it's fully
-occupied with user data.
-
-This somewhat complicates the handling of host to FPGA streams, because
-when a buffer is flushed, it may contain up to 3 bytes don't form a word in
-the FPGA, and hence can't be sent. To prevent loss of data, these leftover
-bytes need to be moved to the next buffer. The parts in xillybus_core.c
-that mention "leftovers" in some way are related to this complication.
-
-Probing
--------
-
-As mentioned earlier, the number of pipes that are created when the driver
-loads and their attributes depend on the Xillybus IP core in the FPGA. During
-the driver's initialization, a blob containing configuration info, the
-Interface Description Table (IDT), is sent from the FPGA to the host. The
-bootstrap process is done in three phases:
-
-1. Acquire the length of the IDT, so a buffer can be allocated for it. This
-   is done by sending a quiesce command to the device, since the acknowledge
-   for this command contains the IDT's buffer length.
-
-2. Acquire the IDT itself.
-
-3. Create the interfaces according to the IDT.
-
-Buffer allocation
------------------
-
-In order to simplify the logic that prevents illegal boundary crossings of
-PCIe packets, the following rule applies: If a buffer is smaller than 4kB,
-it must not cross a 4kB boundary. Otherwise, it must be 4kB aligned. The
-xilly_setupchannels() functions allocates these buffers by requesting whole
-pages from the kernel, and diving them into DMA buffers as necessary. Since
-all buffers' sizes are powers of two, it's possible to pack any set of such
-buffers, with a maximal waste of one page of memory.
-
-All buffers are allocated when the driver is loaded. This is necessary,
-since large continuous physical memory segments are sometimes requested,
-which are more likely to be available when the system is freshly booted.
-
-The allocation of buffer memory takes place in the same order they appear in
-the IDT. The driver relies on a rule that the pipes are sorted with decreasing
-buffer size in the IDT. If a requested buffer is larger or equal to a page,
-the necessary number of pages is requested from the kernel, and these are
-used for this buffer. If the requested buffer is smaller than a page, one
-single page is requested from the kernel, and that page is partially used.
-Or, if there already is a partially used page at hand, the buffer is packed
-into that page. It can be shown that all pages requested from the kernel
-(except possibly for the last) are 100% utilized this way.
-
-The "nonempty" message (supporting poll)
-----------------------------------------
-
-In order to support the "poll" method (and hence select() ), there is a small
-catch regarding the FPGA to host direction: The FPGA may have filled a DMA
-buffer with some data, but not submitted that buffer. If the host waited for
-the buffer's submission by the FPGA, there would be a possibility that the
-FPGA side has sent data, but a select() call would still block, because the
-host has not received any notification about this. This is solved with
-XILLYMSG_OPCODE_NONEMPTY messages sent by the FPGA when a channel goes from
-completely empty to containing some data.
-
-These messages are used only to support poll() and select(). The IP core can
-be configured not to send them for a slight reduction of bandwidth.
diff --git a/Documentation/zorro.txt b/Documentation/zorro.txt
deleted file mode 100644
index 664072b017e3..000000000000
--- a/Documentation/zorro.txt
+++ /dev/null
@@ -1,104 +0,0 @@
-========================================
-Writing Device Drivers for Zorro Devices
-========================================
-
-:Author: Written by Geert Uytterhoeven <geert@linux-m68k.org>
-:Last revised: September 5, 2003
-
-
-Introduction
-------------
-
-The Zorro bus is the bus used in the Amiga family of computers. Thanks to
-AutoConfig(tm), it's 100% Plug-and-Play.
-
-There are two types of Zorro buses, Zorro II and Zorro III:
-
-  - The Zorro II address space is 24-bit and lies within the first 16 MB of the
-    Amiga's address map.
-
-  - Zorro III is a 32-bit extension of Zorro II, which is backwards compatible
-    with Zorro II. The Zorro III address space lies outside the first 16 MB.
-
-
-Probing for Zorro Devices
--------------------------
-
-Zorro devices are found by calling ``zorro_find_device()``, which returns a
-pointer to the ``next`` Zorro device with the specified Zorro ID. A probe loop
-for the board with Zorro ID ``ZORRO_PROD_xxx`` looks like::
-
-    struct zorro_dev *z = NULL;
-
-    while ((z = zorro_find_device(ZORRO_PROD_xxx, z))) {
-	if (!zorro_request_region(z->resource.start+MY_START, MY_SIZE,
-				  "My explanation"))
-	...
-    }
-
-``ZORRO_WILDCARD`` acts as a wildcard and finds any Zorro device. If your driver
-supports different types of boards, you can use a construct like::
-
-    struct zorro_dev *z = NULL;
-
-    while ((z = zorro_find_device(ZORRO_WILDCARD, z))) {
-	if (z->id != ZORRO_PROD_xxx1 && z->id != ZORRO_PROD_xxx2 && ...)
-	    continue;
-	if (!zorro_request_region(z->resource.start+MY_START, MY_SIZE,
-				  "My explanation"))
-	...
-    }
-
-
-Zorro Resources
----------------
-
-Before you can access a Zorro device's registers, you have to make sure it's
-not yet in use. This is done using the I/O memory space resource management
-functions::
-
-    request_mem_region()
-    release_mem_region()
-
-Shortcuts to claim the whole device's address space are provided as well::
-
-    zorro_request_device
-    zorro_release_device
-
-
-Accessing the Zorro Address Space
----------------------------------
-
-The address regions in the Zorro device resources are Zorro bus address
-regions. Due to the identity bus-physical address mapping on the Zorro bus,
-they are CPU physical addresses as well.
-
-The treatment of these regions depends on the type of Zorro space:
-
-  - Zorro II address space is always mapped and does not have to be mapped
-    explicitly using z_ioremap().
-    
-    Conversion from bus/physical Zorro II addresses to kernel virtual addresses
-    and vice versa is done using::
-
-	virt_addr = ZTWO_VADDR(bus_addr);
-	bus_addr = ZTWO_PADDR(virt_addr);
-
-  - Zorro III address space must be mapped explicitly using z_ioremap() first
-    before it can be accessed::
- 
-	virt_addr = z_ioremap(bus_addr, size);
-	...
-	z_iounmap(virt_addr);
-
-
-References
-----------
-
-#. linux/include/linux/zorro.h
-#. linux/include/uapi/linux/zorro.h
-#. linux/include/uapi/linux/zorro_ids.h
-#. linux/arch/m68k/include/asm/zorro.h
-#. linux/drivers/zorro
-#. /proc/bus/zorro
-
diff --git a/MAINTAINERS b/MAINTAINERS
index 570572627fd1..d1a0a817dd92 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4642,7 +4642,7 @@ DELL SYSTEMS MANAGEMENT BASE DRIVER (dcdbas)
 M:	Stuart Hayes <stuart.w.hayes@gmail.com>
 L:	platform-driver-x86@vger.kernel.org
 S:	Maintained
-F:	Documentation/dcdbas.txt
+F:	Documentation/driver-api/dcdbas.rst
 F:	drivers/platform/x86/dcdbas.*
 
 DELL WMI NOTIFICATIONS DRIVER
@@ -8462,7 +8462,7 @@ F:	drivers/irqchip/
 ISA
 M:	William Breathitt Gray <vilhelm.gray@gmail.com>
 S:	Maintained
-F:	Documentation/isa.txt
+F:	Documentation/driver-api/isa.rst
 F:	drivers/base/isa.c
 F:	include/linux/isa.h
 
@@ -8477,7 +8477,7 @@ F:	drivers/media/radio/radio-isa*
 ISAPNP
 M:	Jaroslav Kysela <perex@perex.cz>
 S:	Maintained
-F:	Documentation/isapnp.txt
+F:	Documentation/driver-api/isapnp.rst
 F:	drivers/pnp/isapnp/
 F:	include/linux/isapnp.h
 
@@ -10353,7 +10353,7 @@ M:	Johannes Thumshirn <morbidrsa@gmail.com>
 S:	Maintained
 F:	drivers/mcb/
 F:	include/linux/mcb.h
-F:	Documentation/men-chameleon-bus.txt
+F:	Documentation/driver-api/men-chameleon-bus.rst
 
 MEN F21BMC (Board Management Controller)
 M:	Andreas Werner <andreas.werner@men.de>
@@ -12070,7 +12070,7 @@ F:	drivers/parport/
 F:	include/linux/parport*.h
 F:	drivers/char/ppdev.c
 F:	include/uapi/linux/ppdev.h
-F:	Documentation/parport*.txt
+F:	Documentation/driver-api/parport*.rst
 
 PARAVIRT_OPS INTERFACE
 M:	Juergen Gross <jgross@suse.com>
@@ -12245,7 +12245,7 @@ M:	Kurt Schwemmer <kurt.schwemmer@microsemi.com>
 M:	Logan Gunthorpe <logang@deltatee.com>
 L:	linux-pci@vger.kernel.org
 S:	Maintained
-F:	Documentation/switchtec.txt
+F:	Documentation/driver-api/switchtec.rst
 F:	Documentation/ABI/testing/sysfs-class-switchtec
 F:	drivers/pci/switch/switchtec*
 F:	include/uapi/linux/switchtec_ioctl.h
@@ -13006,7 +13006,7 @@ M:	Thierry Reding <thierry.reding@gmail.com>
 L:	linux-pwm@vger.kernel.org
 S:	Maintained
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm.git
-F:	Documentation/pwm.txt
+F:	Documentation/driver-api/pwm.rst
 F:	Documentation/devicetree/bindings/pwm/
 F:	include/linux/pwm.h
 F:	drivers/pwm/
@@ -13620,7 +13620,7 @@ W:	http://wireless.kernel.org/
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git
 S:	Maintained
-F:	Documentation/rfkill.txt
+F:	Documentation/driver-api/rfkill.rst
 F:	Documentation/ABI/stable/sysfs-class-rfkill
 F:	net/rfkill/
 F:	include/linux/rfkill.h
@@ -15343,7 +15343,7 @@ F:	drivers/dma-buf/dma-fence*
 F:	drivers/dma-buf/sw_sync.c
 F:	include/linux/sync_file.h
 F:	include/uapi/linux/sync_file.h
-F:	Documentation/sync_file.txt
+F:	Documentation/driver-api/sync_file.rst
 T:	git git://anongit.freedesktop.org/drm/drm-misc
 
 SYNOPSYS ARC ARCHITECTURE
@@ -16839,7 +16839,7 @@ R:	Cornelia Huck <cohuck@redhat.com>
 L:	kvm@vger.kernel.org
 T:	git git://github.com/awilliam/linux-vfio.git
 S:	Maintained
-F:	Documentation/vfio.txt
+F:	Documentation/driver-api/vfio.rst
 F:	drivers/vfio/
 F:	include/linux/vfio.h
 F:	include/uapi/linux/vfio.h
@@ -16848,7 +16848,7 @@ VFIO MEDIATED DEVICE DRIVERS
 M:	Kirti Wankhede <kwankhede@nvidia.com>
 L:	kvm@vger.kernel.org
 S:	Maintained
-F:	Documentation/vfio-mediated-device.txt
+F:	Documentation/driver-api/vfio-mediated-device.rst
 F:	drivers/vfio/mdev/
 F:	include/linux/mdev.h
 F:	samples/vfio-mdev/
diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index d5f915830b68..b6a9c2f1bc41 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -15,7 +15,7 @@ config SYNC_FILE
 	  associated with a buffer. When a job is submitted to the GPU a fence
 	  is attached to the buffer and is transferred via userspace, using Sync
 	  Files fds, to the DRM driver for example. More details at
-	  Documentation/sync_file.txt.
+	  Documentation/driver-api/sync_file.rst.
 
 config SW_SYNC
 	bool "Sync File Validation Framework"
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index e4fee216d5a4..079cca438466 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -1301,7 +1301,7 @@ config GPIO_BT8XX
 	  The card needs to be physically altered for using it as a
 	  GPIO card. For more information on how to build a GPIO card
 	  from a BT8xx TV card, see the documentation file at
-	  Documentation/bt8xxgpio.txt
+	  Documentation/driver-api/bt8xxgpio.rst
 
 	  If unsure, say N.
 
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e20e2956f620..9f49de00777e 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -141,7 +141,7 @@ config DRM_LOAD_EDID_FIRMWARE
 	  monitor are unable to provide appropriate EDID data. Since this
 	  feature is provided as a workaround for broken hardware, the
 	  default case is N. Details and instructions how to build your own
-	  EDID data are given in Documentation/EDID/howto.rst.
+	  EDID data are given in Documentation/driver-api/edid.rst.
 
 config DRM_DP_CEC
 	bool "Enable DisplayPort CEC-Tunneling-over-AUX HDMI support"
diff --git a/drivers/pci/switch/Kconfig b/drivers/pci/switch/Kconfig
index aee28a5bb98f..d370f4ce0492 100644
--- a/drivers/pci/switch/Kconfig
+++ b/drivers/pci/switch/Kconfig
@@ -9,7 +9,7 @@ config PCI_SW_SWITCHTEC
 	 Enables support for the management interface for the MicroSemi
 	 Switchtec series of PCIe switches. Supports userspace access
 	 to submit MRPC commands to the switch via /dev/switchtecX
-	 devices. See <file:Documentation/switchtec.txt> for more
+	 devices. See <file:Documentation/driver-api/switchtec.rst> for more
 	 information.
 
 endmenu
diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
index 5f580580a8e0..1b67bb578f9f 100644
--- a/drivers/platform/x86/Kconfig
+++ b/drivers/platform/x86/Kconfig
@@ -118,7 +118,7 @@ config DCDBAS
 	  Interrupts (SMIs) and Host Control Actions (system power cycle or
 	  power off after OS shutdown) on certain Dell systems.
 
-	  See <file:Documentation/dcdbas.txt> for more details on the driver
+	  See <file:Documentation/driver-api/dcdbas.rst> for more details on the driver
 	  and the Dell systems on which Dell systems management software makes
 	  use of this driver.
 
@@ -259,7 +259,7 @@ config DELL_RBU
 	 DELL system. Note you need a Dell OpenManage or Dell Update package (DUP)
 	 supporting application to communicate with the BIOS regarding the new
 	 image for the image update to take effect.
-	 See <file:Documentation/dell_rbu.txt> for more details on the driver.
+	 See <file:Documentation/driver-api/dell_rbu.rst> for more details on the driver.
 
 
 config FUJITSU_LAPTOP
diff --git a/drivers/platform/x86/dcdbas.c b/drivers/platform/x86/dcdbas.c
index 12cf9475ac85..84f4cc839cc3 100644
--- a/drivers/platform/x86/dcdbas.c
+++ b/drivers/platform/x86/dcdbas.c
@@ -7,7 +7,7 @@
  *  and Host Control Actions (power cycle or power off after OS shutdown) on
  *  Dell systems.
  *
- *  See Documentation/dcdbas.txt for more information.
+ *  See Documentation/driver-api/dcdbas.rst for more information.
  *
  *  Copyright (C) 1995-2006 Dell Inc.
  */
diff --git a/drivers/platform/x86/dell_rbu.c b/drivers/platform/x86/dell_rbu.c
index a58fc10293ee..3691391fea6b 100644
--- a/drivers/platform/x86/dell_rbu.c
+++ b/drivers/platform/x86/dell_rbu.c
@@ -24,7 +24,7 @@
  * on every time the packet data is written. This driver requires an
  * application to break the BIOS image in to fixed sized packet chunks.
  *
- * See Documentation/dell_rbu.txt for more info.
+ * See Documentation/driver-api/dell_rbu.rst for more info.
  */
 #include <linux/init.h>
 #include <linux/module.h>
diff --git a/drivers/pnp/isapnp/Kconfig b/drivers/pnp/isapnp/Kconfig
index 4b58a3dcb52b..d0479a563123 100644
--- a/drivers/pnp/isapnp/Kconfig
+++ b/drivers/pnp/isapnp/Kconfig
@@ -7,6 +7,6 @@ config ISAPNP
 	depends on ISA || COMPILE_TEST
 	help
 	  Say Y here if you would like support for ISA Plug and Play devices.
-	  Some information is in <file:Documentation/isapnp.txt>.
+	  Some information is in <file:Documentation/driver-api/isapnp.rst>.
 
 	  If unsure, say Y.
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index 1cb50f19d58c..ee51b9514225 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -93,7 +93,7 @@ config VT_HW_CONSOLE_BINDING
          select the console driver that will serve as the backend for the
          virtual terminals.
 
-	 See <file:Documentation/console/console.rst> for more
+	 See <file:Documentation/driver-api/console.rst> for more
 	 information. For framebuffer console users, please refer to
 	 <file:Documentation/fb/fbcon.rst>.
 
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index e5a7a454fe17..fd17db9b432f 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -25,7 +25,7 @@ menuconfig VFIO
 	select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
 	help
 	  VFIO provides a framework for secure userspace device drivers.
-	  See Documentation/vfio.txt for more details.
+	  See Documentation/driver-api/vfio.rst for more details.
 
 	  If you don't know what to do here, say N.
 
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index ba94a076887f..5da27f2100f9 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -6,7 +6,7 @@ config VFIO_MDEV
 	default n
 	help
 	  Provides a framework to virtualize devices.
-	  See Documentation/vfio-mediated-device.txt for more details.
+	  See Documentation/driver-api/vfio-mediated-device.rst for more details.
 
 	  If you don't know what do here, say N.
 
diff --git a/drivers/w1/Kconfig b/drivers/w1/Kconfig
index 160053c0baea..3e7ad7b232fe 100644
--- a/drivers/w1/Kconfig
+++ b/drivers/w1/Kconfig
@@ -19,7 +19,7 @@ config W1_CON
 	default y
 	---help---
 	  This allows to communicate with userspace using connector. For more
-	  information see <file:Documentation/connector/connector.rst>.
+	  information see <file:Documentation/driver-api/connector.rst>.
 	  There are three types of messages between w1 core and userspace:
 	  1. Events. They are generated each time new master or slave device found
 		either due to automatic or requested search.
diff --git a/samples/Kconfig b/samples/Kconfig
index 155da47dc6a4..c8dacb4dda80 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -99,7 +99,7 @@ config SAMPLE_CONNECTOR
 	  When enabled, this builds both a sample kernel module for
 	  the connector interface and a user space tool to communicate
 	  with it.
-	  See also Documentation/connector/connector.rst
+	  See also Documentation/driver-api/connector.rst
 
 config SAMPLE_HIDRAW
 	bool "hidraw sample"
-- 
cgit v1.2.3-55-g7522


From fb8c5327b3c6c78b74a27a3c42e4f32b2cc30a04 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 13 Jun 2019 14:40:42 -0300
Subject: docs: driver-api: add xilinx driver API documentation

The current file there (emmi) provides a description of
the driver uAPI and kAPI.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst        |  1 +
 Documentation/driver-api/xilinx/eemi.rst  | 67 +++++++++++++++++++++++++++++++
 Documentation/driver-api/xilinx/index.rst | 16 ++++++++
 Documentation/xilinx/eemi.rst             | 67 -------------------------------
 Documentation/xilinx/index.rst            | 17 --------
 5 files changed, 84 insertions(+), 84 deletions(-)
 create mode 100644 Documentation/driver-api/xilinx/eemi.rst
 create mode 100644 Documentation/driver-api/xilinx/index.rst
 delete mode 100644 Documentation/xilinx/eemi.rst
 delete mode 100644 Documentation/xilinx/index.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index d1c6513dd20d..77322753c1bc 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -93,6 +93,7 @@ available subsections can be seen below.
    sync_file
    vfio-mediated-device
    vfio
+   xilinx/index
    xillybus
    zorro
 
diff --git a/Documentation/driver-api/xilinx/eemi.rst b/Documentation/driver-api/xilinx/eemi.rst
new file mode 100644
index 000000000000..9dcbc6f18d75
--- /dev/null
+++ b/Documentation/driver-api/xilinx/eemi.rst
@@ -0,0 +1,67 @@
+====================================
+Xilinx Zynq MPSoC EEMI Documentation
+====================================
+
+Xilinx Zynq MPSoC Firmware Interface
+-------------------------------------
+The zynqmp-firmware node describes the interface to platform firmware.
+ZynqMP has an interface to communicate with secure firmware. Firmware
+driver provides an interface to firmware APIs. Interface APIs can be
+used by any driver to communicate with PMC(Platform Management Controller).
+
+Embedded Energy Management Interface (EEMI)
+----------------------------------------------
+The embedded energy management interface is used to allow software
+components running across different processing clusters on a chip or
+device to communicate with a power management controller (PMC) on a
+device to issue or respond to power management requests.
+
+EEMI ops is a structure containing all eemi APIs supported by Zynq MPSoC.
+The zynqmp-firmware driver maintain all EEMI APIs in zynqmp_eemi_ops
+structure. Any driver who want to communicate with PMC using EEMI APIs
+can call zynqmp_pm_get_eemi_ops().
+
+Example of EEMI ops::
+
+	/* zynqmp-firmware driver maintain all EEMI APIs */
+	struct zynqmp_eemi_ops {
+		int (*get_api_version)(u32 *version);
+		int (*query_data)(struct zynqmp_pm_query_data qdata, u32 *out);
+	};
+
+	static const struct zynqmp_eemi_ops eemi_ops = {
+		.get_api_version = zynqmp_pm_get_api_version,
+		.query_data = zynqmp_pm_query_data,
+	};
+
+Example of EEMI ops usage::
+
+	static const struct zynqmp_eemi_ops *eemi_ops;
+	u32 ret_payload[PAYLOAD_ARG_CNT];
+	int ret;
+
+	eemi_ops = zynqmp_pm_get_eemi_ops();
+	if (IS_ERR(eemi_ops))
+		return PTR_ERR(eemi_ops);
+
+	ret = eemi_ops->query_data(qdata, ret_payload);
+
+IOCTL
+------
+IOCTL API is for device control and configuration. It is not a system
+IOCTL but it is an EEMI API. This API can be used by master to control
+any device specific configuration. IOCTL definitions can be platform
+specific. This API also manage shared device configuration.
+
+The following IOCTL IDs are valid for device control:
+- IOCTL_SET_PLL_FRAC_MODE	8
+- IOCTL_GET_PLL_FRAC_MODE	9
+- IOCTL_SET_PLL_FRAC_DATA	10
+- IOCTL_GET_PLL_FRAC_DATA	11
+
+Refer EEMI API guide [0] for IOCTL specific parameters and other EEMI APIs.
+
+References
+----------
+[0] Embedded Energy Management Interface (EEMI) API guide:
+    https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
diff --git a/Documentation/driver-api/xilinx/index.rst b/Documentation/driver-api/xilinx/index.rst
new file mode 100644
index 000000000000..13f7589ed442
--- /dev/null
+++ b/Documentation/driver-api/xilinx/index.rst
@@ -0,0 +1,16 @@
+
+===========
+Xilinx FPGA
+===========
+
+.. toctree::
+    :maxdepth: 1
+
+    eemi
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/xilinx/eemi.rst b/Documentation/xilinx/eemi.rst
deleted file mode 100644
index 9dcbc6f18d75..000000000000
--- a/Documentation/xilinx/eemi.rst
+++ /dev/null
@@ -1,67 +0,0 @@
-====================================
-Xilinx Zynq MPSoC EEMI Documentation
-====================================
-
-Xilinx Zynq MPSoC Firmware Interface
--------------------------------------
-The zynqmp-firmware node describes the interface to platform firmware.
-ZynqMP has an interface to communicate with secure firmware. Firmware
-driver provides an interface to firmware APIs. Interface APIs can be
-used by any driver to communicate with PMC(Platform Management Controller).
-
-Embedded Energy Management Interface (EEMI)
-----------------------------------------------
-The embedded energy management interface is used to allow software
-components running across different processing clusters on a chip or
-device to communicate with a power management controller (PMC) on a
-device to issue or respond to power management requests.
-
-EEMI ops is a structure containing all eemi APIs supported by Zynq MPSoC.
-The zynqmp-firmware driver maintain all EEMI APIs in zynqmp_eemi_ops
-structure. Any driver who want to communicate with PMC using EEMI APIs
-can call zynqmp_pm_get_eemi_ops().
-
-Example of EEMI ops::
-
-	/* zynqmp-firmware driver maintain all EEMI APIs */
-	struct zynqmp_eemi_ops {
-		int (*get_api_version)(u32 *version);
-		int (*query_data)(struct zynqmp_pm_query_data qdata, u32 *out);
-	};
-
-	static const struct zynqmp_eemi_ops eemi_ops = {
-		.get_api_version = zynqmp_pm_get_api_version,
-		.query_data = zynqmp_pm_query_data,
-	};
-
-Example of EEMI ops usage::
-
-	static const struct zynqmp_eemi_ops *eemi_ops;
-	u32 ret_payload[PAYLOAD_ARG_CNT];
-	int ret;
-
-	eemi_ops = zynqmp_pm_get_eemi_ops();
-	if (IS_ERR(eemi_ops))
-		return PTR_ERR(eemi_ops);
-
-	ret = eemi_ops->query_data(qdata, ret_payload);
-
-IOCTL
-------
-IOCTL API is for device control and configuration. It is not a system
-IOCTL but it is an EEMI API. This API can be used by master to control
-any device specific configuration. IOCTL definitions can be platform
-specific. This API also manage shared device configuration.
-
-The following IOCTL IDs are valid for device control:
-- IOCTL_SET_PLL_FRAC_MODE	8
-- IOCTL_GET_PLL_FRAC_MODE	9
-- IOCTL_SET_PLL_FRAC_DATA	10
-- IOCTL_GET_PLL_FRAC_DATA	11
-
-Refer EEMI API guide [0] for IOCTL specific parameters and other EEMI APIs.
-
-References
-----------
-[0] Embedded Energy Management Interface (EEMI) API guide:
-    https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
diff --git a/Documentation/xilinx/index.rst b/Documentation/xilinx/index.rst
deleted file mode 100644
index 01cc1a0714df..000000000000
--- a/Documentation/xilinx/index.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-:orphan:
-
-===========
-Xilinx FPGA
-===========
-
-.. toctree::
-    :maxdepth: 1
-
-    eemi
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
-- 
cgit v1.2.3-55-g7522


From c92992fc609fe99d926855eb1945f38ef4ad8e6c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Mon, 22 Apr 2019 16:49:11 -0300
Subject: docs: driver-api: add remaining converted dirs to it

There are a number of driver-specific descriptions that contain a
mix of userspace and kernelspace documentation. Just like we did
with other similar subsystems, add them at the driver-api
groupset, but don't move the directories.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst | 2 ++
 Documentation/driver-api/pps.rst   | 2 +-
 Documentation/driver-api/ptp.rst   | 2 +-
 Documentation/index.rst            | 3 +++
 Documentation/mic/index.rst        | 2 --
 Documentation/phy/samsung-usb2.rst | 2 --
 Documentation/scheduler/index.rst  | 2 --
 7 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 77322753c1bc..1dde9692075c 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -83,6 +83,8 @@ available subsections can be seen below.
    ntb
    nvmem
    parport-lowlevel
+   pps
+   ptp
    pti_intel_mid
    pwm
    rfkill
diff --git a/Documentation/driver-api/pps.rst b/Documentation/driver-api/pps.rst
index 1456d2c32ebd..2d6b99766ee8 100644
--- a/Documentation/driver-api/pps.rst
+++ b/Documentation/driver-api/pps.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ======================
 PPS - Pulse Per Second
diff --git a/Documentation/driver-api/ptp.rst b/Documentation/driver-api/ptp.rst
index b6e65d66d37a..a15192e32347 100644
--- a/Documentation/driver-api/ptp.rst
+++ b/Documentation/driver-api/ptp.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ===========================================
 PTP hardware clock infrastructure for Linux
diff --git a/Documentation/index.rst b/Documentation/index.rst
index dcdaaff71633..041ffe442960 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -110,6 +110,9 @@ needed).
    bpf/index
    usb/index
    misc-devices/index
+   mic/index
+   phy/samsung-usb2
+   scheduler/index
 
 Architecture-specific documentation
 -----------------------------------
diff --git a/Documentation/mic/index.rst b/Documentation/mic/index.rst
index 082fa8f6a260..3a8d06367ef1 100644
--- a/Documentation/mic/index.rst
+++ b/Documentation/mic/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 =============================================
 Intel Many Integrated Core (MIC) architecture
 =============================================
diff --git a/Documentation/phy/samsung-usb2.rst b/Documentation/phy/samsung-usb2.rst
index 98b5952fcb97..c48c8b9797b9 100644
--- a/Documentation/phy/samsung-usb2.rst
+++ b/Documentation/phy/samsung-usb2.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ====================================
 Samsung USB 2.0 PHY adaptation layer
 ====================================
diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst
index 058be77a4c34..69074e5de9c4 100644
--- a/Documentation/scheduler/index.rst
+++ b/Documentation/scheduler/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 ===============
 Linux Scheduler
 ===============
-- 
cgit v1.2.3-55-g7522


From 65388dad1bbb51a4eb6cc91b9fa865b57646fb67 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 27 Jun 2019 16:31:35 -0300
Subject: docs: serial: move it to the driver-api

The contents of this directory is mostly driver-api stuff.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/driver-api/index.rst                 |   1 +
 Documentation/driver-api/serial/cyclades_z.rst     |  11 +
 Documentation/driver-api/serial/driver.rst         | 549 ++++++++++++++++++
 Documentation/driver-api/serial/index.rst          |  32 ++
 Documentation/driver-api/serial/moxa-smartio.rst   | 615 +++++++++++++++++++++
 Documentation/driver-api/serial/n_gsm.rst          | 103 ++++
 Documentation/driver-api/serial/rocket.rst         | 185 +++++++
 Documentation/driver-api/serial/serial-iso7816.rst |  90 +++
 Documentation/driver-api/serial/serial-rs485.rst   | 103 ++++
 Documentation/driver-api/serial/tty.rst            | 328 +++++++++++
 Documentation/serial/cyclades_z.rst                |  11 -
 Documentation/serial/driver.rst                    | 549 ------------------
 Documentation/serial/index.rst                     |  32 --
 Documentation/serial/moxa-smartio.rst              | 615 ---------------------
 Documentation/serial/n_gsm.rst                     | 103 ----
 Documentation/serial/rocket.rst                    | 185 -------
 Documentation/serial/serial-iso7816.rst            |  90 ---
 Documentation/serial/serial-rs485.rst              | 103 ----
 Documentation/serial/tty.rst                       | 328 -----------
 MAINTAINERS                                        |   6 +-
 drivers/tty/Kconfig                                |   4 +-
 drivers/tty/serial/ucc_uart.c                      |   2 +-
 include/linux/serial_core.h                        |   2 +-
 23 files changed, 2024 insertions(+), 2023 deletions(-)
 create mode 100644 Documentation/driver-api/serial/cyclades_z.rst
 create mode 100644 Documentation/driver-api/serial/driver.rst
 create mode 100644 Documentation/driver-api/serial/index.rst
 create mode 100644 Documentation/driver-api/serial/moxa-smartio.rst
 create mode 100644 Documentation/driver-api/serial/n_gsm.rst
 create mode 100644 Documentation/driver-api/serial/rocket.rst
 create mode 100644 Documentation/driver-api/serial/serial-iso7816.rst
 create mode 100644 Documentation/driver-api/serial/serial-rs485.rst
 create mode 100644 Documentation/driver-api/serial/tty.rst
 delete mode 100644 Documentation/serial/cyclades_z.rst
 delete mode 100644 Documentation/serial/driver.rst
 delete mode 100644 Documentation/serial/index.rst
 delete mode 100644 Documentation/serial/moxa-smartio.rst
 delete mode 100644 Documentation/serial/n_gsm.rst
 delete mode 100644 Documentation/serial/rocket.rst
 delete mode 100644 Documentation/serial/serial-iso7816.rst
 delete mode 100644 Documentation/serial/serial-rs485.rst
 delete mode 100644 Documentation/serial/tty.rst

diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index 1dde9692075c..cf39b8f9d0f9 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -88,6 +88,7 @@ available subsections can be seen below.
    pti_intel_mid
    pwm
    rfkill
+   serial/index
    sgi-ioc4
    sm501
    smsc_ece1099
diff --git a/Documentation/driver-api/serial/cyclades_z.rst b/Documentation/driver-api/serial/cyclades_z.rst
new file mode 100644
index 000000000000..532ff67e2f1c
--- /dev/null
+++ b/Documentation/driver-api/serial/cyclades_z.rst
@@ -0,0 +1,11 @@
+================
+Cyclades-Z notes
+================
+
+The Cyclades-Z must have firmware loaded onto the card before it will
+operate.  This operation should be performed during system startup,
+
+The firmware, loader program and the latest device driver code are
+available from Cyclades at
+
+    ftp://ftp.cyclades.com/pub/cyclades/cyclades-z/linux/
diff --git a/Documentation/driver-api/serial/driver.rst b/Documentation/driver-api/serial/driver.rst
new file mode 100644
index 000000000000..31bd4e16fb1f
--- /dev/null
+++ b/Documentation/driver-api/serial/driver.rst
@@ -0,0 +1,549 @@
+====================
+Low Level Serial API
+====================
+
+
+This document is meant as a brief overview of some aspects of the new serial
+driver.  It is not complete, any questions you have should be directed to
+<rmk@arm.linux.org.uk>
+
+The reference implementation is contained within amba-pl011.c.
+
+
+
+Low Level Serial Hardware Driver
+--------------------------------
+
+The low level serial hardware driver is responsible for supplying port
+information (defined by uart_port) and a set of control methods (defined
+by uart_ops) to the core serial driver.  The low level driver is also
+responsible for handling interrupts for the port, and providing any
+console support.
+
+
+Console Support
+---------------
+
+The serial core provides a few helper functions.  This includes identifing
+the correct port structure (via uart_get_console) and decoding command line
+arguments (uart_parse_options).
+
+There is also a helper function (uart_console_write) which performs a
+character by character write, translating newlines to CRLF sequences.
+Driver writers are recommended to use this function rather than implementing
+their own version.
+
+
+Locking
+-------
+
+It is the responsibility of the low level hardware driver to perform the
+necessary locking using port->lock.  There are some exceptions (which
+are described in the uart_ops listing below.)
+
+There are two locks.  A per-port spinlock, and an overall semaphore.
+
+From the core driver perspective, the port->lock locks the following
+data::
+
+	port->mctrl
+	port->icount
+	port->state->xmit.head (circ_buf->head)
+	port->state->xmit.tail (circ_buf->tail)
+
+The low level driver is free to use this lock to provide any additional
+locking.
+
+The port_sem semaphore is used to protect against ports being added/
+removed or reconfigured at inappropriate times. Since v2.6.27, this
+semaphore has been the 'mutex' member of the tty_port struct, and
+commonly referred to as the port mutex.
+
+
+uart_ops
+--------
+
+The uart_ops structure is the main interface between serial_core and the
+hardware specific driver.  It contains all the methods to control the
+hardware.
+
+  tx_empty(port)
+	This function tests whether the transmitter fifo and shifter
+	for the port described by 'port' is empty.  If it is empty,
+	this function should return TIOCSER_TEMT, otherwise return 0.
+	If the port does not support this operation, then it should
+	return TIOCSER_TEMT.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+	This call must not sleep
+
+  set_mctrl(port, mctrl)
+	This function sets the modem control lines for port described
+	by 'port' to the state described by mctrl.  The relevant bits
+	of mctrl are:
+
+		- TIOCM_RTS	RTS signal.
+		- TIOCM_DTR	DTR signal.
+		- TIOCM_OUT1	OUT1 signal.
+		- TIOCM_OUT2	OUT2 signal.
+		- TIOCM_LOOP	Set the port into loopback mode.
+
+	If the appropriate bit is set, the signal should be driven
+	active.  If the bit is clear, the signal should be driven
+	inactive.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  get_mctrl(port)
+	Returns the current state of modem control inputs.  The state
+	of the outputs should not be returned, since the core keeps
+	track of their state.  The state information should include:
+
+		- TIOCM_CAR	state of DCD signal
+		- TIOCM_CTS	state of CTS signal
+		- TIOCM_DSR	state of DSR signal
+		- TIOCM_RI	state of RI signal
+
+	The bit is set if the signal is currently driven active.  If
+	the port does not support CTS, DCD or DSR, the driver should
+	indicate that the signal is permanently active.  If RI is
+	not available, the signal should not be indicated as active.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  stop_tx(port)
+	Stop transmitting characters.  This might be due to the CTS
+	line becoming inactive or the tty layer indicating we want
+	to stop transmission due to an XOFF character.
+
+	The driver should stop transmitting characters as soon as
+	possible.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  start_tx(port)
+	Start transmitting characters.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  throttle(port)
+	Notify the serial driver that input buffers for the line discipline are
+	close to full, and it should somehow signal that no more characters
+	should be sent to the serial port.
+	This will be called only if hardware assisted flow control is enabled.
+
+	Locking: serialized with .unthrottle() and termios modification by the
+	tty layer.
+
+  unthrottle(port)
+	Notify the serial driver that characters can now be sent to the serial
+	port without fear of overrunning the input buffers of the line
+	disciplines.
+
+	This will be called only if hardware assisted flow control is enabled.
+
+	Locking: serialized with .throttle() and termios modification by the
+	tty layer.
+
+  send_xchar(port,ch)
+	Transmit a high priority character, even if the port is stopped.
+	This is used to implement XON/XOFF flow control and tcflow().  If
+	the serial driver does not implement this function, the tty core
+	will append the character to the circular buffer and then call
+	start_tx() / stop_tx() to flush the data out.
+
+	Do not transmit if ch == '\0' (__DISABLED_CHAR).
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  stop_rx(port)
+	Stop receiving characters; the port is in the process of
+	being closed.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  enable_ms(port)
+	Enable the modem status interrupts.
+
+	This method may be called multiple times.  Modem status
+	interrupts should be disabled when the shutdown method is
+	called.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  break_ctl(port,ctl)
+	Control the transmission of a break signal.  If ctl is
+	nonzero, the break signal should be transmitted.  The signal
+	should be terminated when another call is made with a zero
+	ctl.
+
+	Locking: caller holds tty_port->mutex
+
+  startup(port)
+	Grab any interrupt resources and initialise any low level driver
+	state.  Enable the port for reception.  It should not activate
+	RTS nor DTR; this will be done via a separate call to set_mctrl.
+
+	This method will only be called when the port is initially opened.
+
+	Locking: port_sem taken.
+
+	Interrupts: globally disabled.
+
+  shutdown(port)
+	Disable the port, disable any break condition that may be in
+	effect, and free any interrupt resources.  It should not disable
+	RTS nor DTR; this will have already been done via a separate
+	call to set_mctrl.
+
+	Drivers must not access port->state once this call has completed.
+
+	This method will only be called when there are no more users of
+	this port.
+
+	Locking: port_sem taken.
+
+	Interrupts: caller dependent.
+
+  flush_buffer(port)
+	Flush any write buffers, reset any DMA state and stop any
+	ongoing DMA transfers.
+
+	This will be called whenever the port->state->xmit circular
+	buffer is cleared.
+
+	Locking: port->lock taken.
+
+	Interrupts: locally disabled.
+
+	This call must not sleep
+
+  set_termios(port,termios,oldtermios)
+	Change the port parameters, including word length, parity, stop
+	bits.  Update read_status_mask and ignore_status_mask to indicate
+	the types of events we are interested in receiving.  Relevant
+	termios->c_cflag bits are:
+
+		CSIZE
+			- word size
+		CSTOPB
+			- 2 stop bits
+		PARENB
+			- parity enable
+		PARODD
+			- odd parity (when PARENB is in force)
+		CREAD
+			- enable reception of characters (if not set,
+			  still receive characters from the port, but
+			  throw them away.
+		CRTSCTS
+			- if set, enable CTS status change reporting
+		CLOCAL
+			- if not set, enable modem status change
+			  reporting.
+
+	Relevant termios->c_iflag bits are:
+
+		INPCK
+			- enable frame and parity error events to be
+			  passed to the TTY layer.
+		BRKINT / PARMRK
+			- both of these enable break events to be
+			  passed to the TTY layer.
+
+		IGNPAR
+			- ignore parity and framing errors
+		IGNBRK
+			- ignore break errors,  If IGNPAR is also
+			  set, ignore overrun errors as well.
+
+	The interaction of the iflag bits is as follows (parity error
+	given as an example):
+
+	=============== ======= ======  =============================
+	Parity error	INPCK	IGNPAR
+	=============== ======= ======  =============================
+	n/a		0	n/a	character received, marked as
+					TTY_NORMAL
+	None		1	n/a	character received, marked as
+					TTY_NORMAL
+	Yes		1	0	character received, marked as
+					TTY_PARITY
+	Yes		1	1	character discarded
+	=============== ======= ======  =============================
+
+	Other flags may be used (eg, xon/xoff characters) if your
+	hardware supports hardware "soft" flow control.
+
+	Locking: caller holds tty_port->mutex
+
+	Interrupts: caller dependent.
+
+	This call must not sleep
+
+  set_ldisc(port,termios)
+	Notifier for discipline change. See Documentation/driver-api/serial/tty.rst.
+
+	Locking: caller holds tty_port->mutex
+
+  pm(port,state,oldstate)
+	Perform any power management related activities on the specified
+	port.  State indicates the new state (defined by
+	enum uart_pm_state), oldstate indicates the previous state.
+
+	This function should not be used to grab any resources.
+
+	This will be called when the port is initially opened and finally
+	closed, except when the port is also the system console.  This
+	will occur even if CONFIG_PM is not set.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  type(port)
+	Return a pointer to a string constant describing the specified
+	port, or return NULL, in which case the string 'unknown' is
+	substituted.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  release_port(port)
+	Release any memory and IO region resources currently in use by
+	the port.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  request_port(port)
+	Request any memory and IO region resources required by the port.
+	If any fail, no resources should be registered when this function
+	returns, and it should return -EBUSY on failure.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  config_port(port,type)
+	Perform any autoconfiguration steps required for the port.  `type`
+	contains a bit mask of the required configuration.  UART_CONFIG_TYPE
+	indicates that the port requires detection and identification.
+	port->type should be set to the type found, or PORT_UNKNOWN if
+	no port was detected.
+
+	UART_CONFIG_IRQ indicates autoconfiguration of the interrupt signal,
+	which should be probed using standard kernel autoprobing techniques.
+	This is not necessary on platforms where ports have interrupts
+	internally hard wired (eg, system on a chip implementations).
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  verify_port(port,serinfo)
+	Verify the new serial port information contained within serinfo is
+	suitable for this port type.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  ioctl(port,cmd,arg)
+	Perform any port specific IOCTLs.  IOCTL commands must be defined
+	using the standard numbering system found in <asm/ioctl.h>
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+  poll_init(port)
+	Called by kgdb to perform the minimal hardware initialization needed
+	to support poll_put_char() and poll_get_char().  Unlike ->startup()
+	this should not request interrupts.
+
+	Locking: tty_mutex and tty_port->mutex taken.
+
+	Interrupts: n/a.
+
+  poll_put_char(port,ch)
+	Called by kgdb to write a single character directly to the serial
+	port.  It can and should block until there is space in the TX FIFO.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+	This call must not sleep
+
+  poll_get_char(port)
+	Called by kgdb to read a single character directly from the serial
+	port.  If data is available, it should be returned; otherwise
+	the function should return NO_POLL_CHAR immediately.
+
+	Locking: none.
+
+	Interrupts: caller dependent.
+
+	This call must not sleep
+
+Other functions
+---------------
+
+uart_update_timeout(port,cflag,baud)
+	Update the FIFO drain timeout, port->timeout, according to the
+	number of bits, parity, stop bits and baud rate.
+
+	Locking: caller is expected to take port->lock
+
+	Interrupts: n/a
+
+uart_get_baud_rate(port,termios,old,min,max)
+	Return the numeric baud rate for the specified termios, taking
+	account of the special 38400 baud "kludge".  The B0 baud rate
+	is mapped to 9600 baud.
+
+	If the baud rate is not within min..max, then if old is non-NULL,
+	the original baud rate will be tried.  If that exceeds the
+	min..max constraint, 9600 baud will be returned.  termios will
+	be updated to the baud rate in use.
+
+	Note: min..max must always allow 9600 baud to be selected.
+
+	Locking: caller dependent.
+
+	Interrupts: n/a
+
+uart_get_divisor(port,baud)
+	Return the divisor (baud_base / baud) for the specified baud
+	rate, appropriately rounded.
+
+	If 38400 baud and custom divisor is selected, return the
+	custom divisor instead.
+
+	Locking: caller dependent.
+
+	Interrupts: n/a
+
+uart_match_port(port1,port2)
+	This utility function can be used to determine whether two
+	uart_port structures describe the same port.
+
+	Locking: n/a
+
+	Interrupts: n/a
+
+uart_write_wakeup(port)
+	A driver is expected to call this function when the number of
+	characters in the transmit buffer have dropped below a threshold.
+
+	Locking: port->lock should be held.
+
+	Interrupts: n/a
+
+uart_register_driver(drv)
+	Register a uart driver with the core driver.  We in turn register
+	with the tty layer, and initialise the core driver per-port state.
+
+	drv->port should be NULL, and the per-port structures should be
+	registered using uart_add_one_port after this call has succeeded.
+
+	Locking: none
+
+	Interrupts: enabled
+
+uart_unregister_driver()
+	Remove all references to a driver from the core driver.  The low
+	level driver must have removed all its ports via the
+	uart_remove_one_port() if it registered them with uart_add_one_port().
+
+	Locking: none
+
+	Interrupts: enabled
+
+**uart_suspend_port()**
+
+**uart_resume_port()**
+
+**uart_add_one_port()**
+
+**uart_remove_one_port()**
+
+Other notes
+-----------
+
+It is intended some day to drop the 'unused' entries from uart_port, and
+allow low level drivers to register their own individual uart_port's with
+the core.  This will allow drivers to use uart_port as a pointer to a
+structure containing both the uart_port entry with their own extensions,
+thus::
+
+	struct my_port {
+		struct uart_port	port;
+		int			my_stuff;
+	};
+
+Modem control lines via GPIO
+----------------------------
+
+Some helpers are provided in order to set/get modem control lines via GPIO.
+
+mctrl_gpio_init(port, idx):
+	This will get the {cts,rts,...}-gpios from device tree if they are
+	present and request them, set direction etc, and return an
+	allocated structure. `devm_*` functions are used, so there's no need
+	to call mctrl_gpio_free().
+	As this sets up the irq handling make sure to not handle changes to the
+	gpio input lines in your driver, too.
+
+mctrl_gpio_free(dev, gpios):
+	This will free the requested gpios in mctrl_gpio_init().
+	As `devm_*` functions are used, there's generally no need to call
+	this function.
+
+mctrl_gpio_to_gpiod(gpios, gidx)
+	This returns the gpio_desc structure associated to the modem line
+	index.
+
+mctrl_gpio_set(gpios, mctrl):
+	This will sets the gpios according to the mctrl state.
+
+mctrl_gpio_get(gpios, mctrl):
+	This will update mctrl with the gpios values.
+
+mctrl_gpio_enable_ms(gpios):
+	Enables irqs and handling of changes to the ms lines.
+
+mctrl_gpio_disable_ms(gpios):
+	Disables irqs and handling of changes to the ms lines.
diff --git a/Documentation/driver-api/serial/index.rst b/Documentation/driver-api/serial/index.rst
new file mode 100644
index 000000000000..33ad10d05b26
--- /dev/null
+++ b/Documentation/driver-api/serial/index.rst
@@ -0,0 +1,32 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+Support for Serial devices
+==========================
+
+.. toctree::
+    :maxdepth: 1
+
+
+    driver
+    tty
+
+Serial drivers
+==============
+
+.. toctree::
+    :maxdepth: 1
+
+    cyclades_z
+    moxa-smartio
+    n_gsm
+    rocket
+    serial-iso7816
+    serial-rs485
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/serial/moxa-smartio.rst b/Documentation/driver-api/serial/moxa-smartio.rst
new file mode 100644
index 000000000000..156100f17c3f
--- /dev/null
+++ b/Documentation/driver-api/serial/moxa-smartio.rst
@@ -0,0 +1,615 @@
+=============================================================
+MOXA Smartio/Industio Family Device Driver Installation Guide
+=============================================================
+
+.. note::
+
+   This file is outdated. It needs some care in order to make it
+   updated to Kernel 5.0 and upper
+
+Copyright (C) 2008, Moxa Inc.
+
+Date: 01/21/2008
+
+.. Content
+
+   1. Introduction
+   2. System Requirement
+   3. Installation
+      3.1 Hardware installation
+      3.2 Driver files
+      3.3 Device naming convention
+      3.4 Module driver configuration
+      3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x.
+      3.6 Custom configuration
+      3.7 Verify driver installation
+   4. Utilities
+   5. Setserial
+   6. Troubleshooting
+
+1. Introduction
+^^^^^^^^^^^^^^^
+
+   The Smartio/Industio/UPCI family Linux driver supports following multiport
+   boards.
+
+    - 2 ports multiport board
+	CP-102U, CP-102UL, CP-102UF
+	CP-132U-I, CP-132UL,
+	CP-132, CP-132I, CP132S, CP-132IS,
+	CI-132, CI-132I, CI-132IS,
+	(C102H, C102HI, C102HIS, C102P, CP-102, CP-102S)
+
+    - 4 ports multiport board
+	CP-104EL,
+	CP-104UL, CP-104JU,
+	CP-134U, CP-134U-I,
+	C104H/PCI, C104HS/PCI,
+	CP-114, CP-114I, CP-114S, CP-114IS, CP-114UL,
+	C104H, C104HS,
+	CI-104J, CI-104JS,
+	CI-134, CI-134I, CI-134IS,
+	(C114HI, CT-114I, C104P),
+	POS-104UL,
+	CB-114,
+	CB-134I
+
+    - 8 ports multiport board
+	CP-118EL, CP-168EL,
+	CP-118U, CP-168U,
+	C168H/PCI,
+	C168H, C168HS,
+	(C168P),
+	CB-108
+
+   This driver and installation procedure have been developed upon Linux Kernel
+   2.4.x and 2.6.x. This driver supports Intel x86 hardware platform. In order
+   to maintain compatibility, this version has also been properly tested with
+   RedHat, Mandrake, Fedora and S.u.S.E Linux. However, if compatibility problem
+   occurs, please contact Moxa at support@moxa.com.tw.
+
+   In addition to device driver, useful utilities are also provided in this
+   version. They are:
+
+    - msdiag
+		 Diagnostic program for displaying installed Moxa
+                 Smartio/Industio boards.
+    - msmon
+		 Monitor program to observe data count and line status signals.
+    - msterm     A simple terminal program which is useful in testing serial
+	         ports.
+    - io-irq.exe
+		 Configuration program to setup ISA boards. Please note that
+                 this program can only be executed under DOS.
+
+   All the drivers and utilities are published in form of source code under
+   GNU General Public License in this version. Please refer to GNU General
+   Public License announcement in each source code file for more detail.
+
+   In Moxa's Web sites, you may always find latest driver at http://www.moxa.com/.
+
+   This version of driver can be installed as Loadable Module (Module driver)
+   or built-in into kernel (Static driver). You may refer to following
+   installation procedure for suitable one. Before you install the driver,
+   please refer to hardware installation procedure in the User's Manual.
+
+   We assume the user should be familiar with following documents.
+
+   - Serial-HOWTO
+   - Kernel-HOWTO
+
+2. System Requirement
+^^^^^^^^^^^^^^^^^^^^^
+
+   - Hardware platform: Intel x86 machine
+   - Kernel version: 2.4.x or 2.6.x
+   - gcc version 2.72 or later
+   - Maximum 4 boards can be installed in combination
+
+3. Installation
+^^^^^^^^^^^^^^^
+
+3.1 Hardware installation
+=========================
+
+   There are two types of buses, ISA and PCI, for Smartio/Industio
+   family multiport board.
+
+ISA board
+---------
+
+   You'll have to configure CAP address, I/O address, Interrupt Vector
+   as well as IRQ before installing this driver. Please refer to hardware
+   installation procedure in User's Manual before proceed any further.
+   Please make sure the JP1 is open after the ISA board is set properly.
+
+PCI/UPCI board
+--------------
+
+   You may need to adjust IRQ usage in BIOS to avoid from IRQ conflict
+   with other ISA devices. Please refer to hardware installation
+   procedure in User's Manual in advance.
+
+PCI IRQ Sharing
+---------------
+
+   Each port within the same multiport board shares the same IRQ. Up to
+   4 Moxa Smartio/Industio PCI Family multiport boards can be installed
+   together on one system and they can share the same IRQ.
+
+
+3.2 Driver files
+================
+
+   The driver file may be obtained from ftp, CD-ROM or floppy disk. The
+   first step, anyway, is to copy driver file "mxser.tgz" into specified
+   directory. e.g. /moxa. The execute commands as below::
+
+       # cd /
+       # mkdir moxa
+       # cd /moxa
+       # tar xvf /dev/fd0
+
+or::
+
+       # cd /
+       # mkdir moxa
+       # cd /moxa
+       # cp /mnt/cdrom/<driver directory>/mxser.tgz .
+       # tar xvfz mxser.tgz
+
+
+3.3 Device naming convention
+============================
+
+   You may find all the driver and utilities files in /moxa/mxser.
+   Following installation procedure depends on the model you'd like to
+   run the driver. If you prefer module driver, please refer to 3.4.
+   If static driver is required, please refer to 3.5.
+
+Dialin and callout port
+-----------------------
+
+   This driver remains traditional serial device properties. There are
+   two special file name for each serial port. One is dial-in port
+   which is named "ttyMxx". For callout port, the naming convention
+   is "cumxx".
+
+Device naming when more than 2 boards installed
+-----------------------------------------------
+
+   Naming convention for each Smartio/Industio multiport board is
+   pre-defined as below.
+
+   ============ ===============       ==============
+   Board Num.	 Dial-in Port	      Callout port
+   1st board	ttyM0  - ttyM7	      cum0  - cum7
+   2nd board	ttyM8  - ttyM15       cum8  - cum15
+   3rd board	ttyM16 - ttyM23       cum16 - cum23
+   4th board	ttyM24 - ttym31       cum24 - cum31
+   ============ ===============       ==============
+
+.. note::
+
+   Under Kernel 2.6 and upper, the cum Device is Obsolete. So use ttyM*
+   device instead.
+
+Board sequence
+--------------
+
+   This driver will activate ISA boards according to the parameter set
+   in the driver. After all specified ISA board activated, PCI board
+   will be installed in the system automatically driven.
+   Therefore the board number is sorted by the CAP address of ISA boards.
+   For PCI boards, their sequence will be after ISA boards and C168H/PCI
+   has higher priority than C104H/PCI boards.
+
+3.4 Module driver configuration
+===============================
+
+   Module driver is easiest way to install. If you prefer static driver
+   installation, please skip this paragraph.
+
+
+   ------------- Prepare to use the MOXA driver --------------------
+
+3.4.1 Create tty device with correct major number
+-------------------------------------------------
+
+   Before using MOXA driver, your system must have the tty devices
+   which are created with driver's major number. We offer one shell
+   script "msmknod" to simplify the procedure.
+   This step is only needed to be executed once. But you still
+   need to do this procedure when:
+
+   a. You change the driver's major number. Please refer the "3.7"
+      section.
+   b. Your total installed MOXA boards number is changed. Maybe you
+      add/delete one MOXA board.
+   c. You want to change the tty name. This needs to modify the
+      shell script "msmknod"
+
+   The procedure is::
+
+	 # cd /moxa/mxser/driver
+	 # ./msmknod
+
+   This shell script will require the major number for dial-in
+   device and callout device to create tty device. You also need
+   to specify the total installed MOXA board number. Default major
+   numbers for dial-in device and callout device are 30, 35. If
+   you need to change to other number, please refer section "3.7"
+   for more detailed procedure.
+   Msmknod will delete any special files occupying the same device
+   naming.
+
+3.4.2 Build the MOXA driver and utilities
+-----------------------------------------
+
+   Before using the MOXA driver and utilities, you need compile the
+   all the source code. This step is only need to be executed once.
+   But you still re-compile the source code if you modify the source
+   code. For example, if you change the driver's major number (see
+   "3.7" section), then you need to do this step again.
+
+   Find "Makefile" in /moxa/mxser, then run
+
+	 # make clean; make install
+
+   ..note::
+
+	 For Red Hat 9, Red Hat Enterprise Linux AS3/ES3/WS3 & Fedora Core1:
+	 # make clean; make installsp1
+
+	 For Red Hat Enterprise Linux AS4/ES4/WS4:
+	 # make clean; make installsp2
+
+   The driver files "mxser.o" and utilities will be properly compiled
+   and copied to system directories respectively.
+
+------------- Load MOXA driver--------------------
+
+3.4.3 Load the MOXA driver
+--------------------------
+
+   ::
+
+	 # modprobe mxser <argument>
+
+   will activate the module driver. You may run "lsmod" to check
+   if "mxser" is activated. If the MOXA board is ISA board, the
+   <argument> is needed. Please refer to section "3.4.5" for more
+   information.
+
+------------- Load MOXA driver on boot --------------------
+
+3.4.4 Load the mxser driver
+---------------------------
+
+
+   For the above description, you may manually execute
+   "modprobe mxser" to activate this driver and run
+   "rmmod mxser" to remove it.
+
+   However, it's better to have a boot time configuration to
+   eliminate manual operation. Boot time configuration can be
+   achieved by rc file. We offer one "rc.mxser" file to simplify
+   the procedure under "moxa/mxser/driver".
+
+   But if you use ISA board, please modify the "modprobe ..." command
+   to add the argument (see "3.4.5" section). After modifying the
+   rc.mxser, please try to execute "/moxa/mxser/driver/rc.mxser"
+   manually to make sure the modification is ok. If any error
+   encountered, please try to modify again. If the modification is
+   completed, follow the below step.
+
+   Run following command for setting rc files::
+
+	 # cd /moxa/mxser/driver
+	 # cp ./rc.mxser /etc/rc.d
+	 # cd /etc/rc.d
+
+   Check "rc.serial" is existed or not. If "rc.serial" doesn't exist,
+   create it by vi, run "chmod 755 rc.serial" to change the permission.
+
+   Add "/etc/rc.d/rc.mxser" in last line.
+
+   Reboot and check if moxa.o activated by "lsmod" command.
+
+3.4.5. specify CAP address
+--------------------------
+
+   If you'd like to drive Smartio/Industio ISA boards in the system,
+   you'll have to add parameter to specify CAP address of given
+   board while activating "mxser.o". The format for parameters are
+   as follows.::
+
+	   modprobe mxser ioaddr=0x???,0x???,0x???,0x???
+				  |  |  |    |
+				  |  |  |    +- 4th ISA board
+				  |  |  +------ 3rd ISA board
+				  |  +------------ 2nd ISA board
+				  +-------------------1st ISA board
+
+3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x
+================================================================
+
+    Note:
+          To use static driver, you must install the linux kernel
+          source package.
+
+3.5.1 Backup the built-in driver in the kernel
+----------------------------------------------
+
+    ::
+
+       # cd /usr/src/linux/drivers/char
+       # mv mxser.c mxser.c.old
+
+       For Red Hat 7.x user, you need to create link:
+       # cd /usr/src
+       # ln -s linux-2.4 linux
+
+3.5.2 Create link
+-----------------
+    ::
+
+	  # cd /usr/src/linux/drivers/char
+	  # ln -s /moxa/mxser/driver/mxser.c mxser.c
+
+3.5.3 Add CAP address list for ISA boards.
+------------------------------------------
+
+    For PCI boards user, please skip this step.
+
+    In module mode, the CAP address for ISA board is given by
+    parameter. In static driver configuration, you'll have to
+    assign it within driver's source code. If you will not
+    install any ISA boards, you may skip to next portion.
+    The instructions to modify driver source code are as
+    below.
+
+    a. run::
+
+	# cd /moxa/mxser/driver
+	# vi mxser.c
+
+    b. Find the array mxserBoardCAP[] as below::
+
+	  static int mxserBoardCAP[] = {0x00, 0x00, 0x00, 0x00};
+
+    c. Change the address within this array using vi. For
+       example, to driver 2 ISA boards with CAP address
+       0x280 and 0x180 as 1st and 2nd board. Just to change
+       the source code as follows::
+
+	  static int mxserBoardCAP[] = {0x280, 0x180, 0x00, 0x00};
+
+3.5.4 Setup kernel configuration
+--------------------------------
+
+    Configure the kernel::
+
+      # cd /usr/src/linux
+      # make menuconfig
+
+    You will go into a menu-driven system. Please select [Character
+    devices][Non-standard serial port support], enable the [Moxa
+    SmartIO support] driver with "[*]" for built-in (not "[M]"), then
+    select [Exit] to exit this program.
+
+3.5.5 Rebuild kernel
+--------------------
+
+    The following are for Linux kernel rebuilding, for your
+    reference only.
+
+    For appropriate details, please refer to the Linux document:
+
+        a. Run the following commands::
+
+	     cd /usr/src/linux
+	     make clean		     # take a few minutes
+	     make dep		     # take a few minutes
+	     make bzImage	     # take probably 10-20 minutes
+	     make install	     # copy boot image to correct position
+
+	f. Please make sure the boot kernel (vmlinuz) is in the
+	   correct position.
+	g. If you use 'lilo' utility, you should check /etc/lilo.conf
+	   'image' item specified the path which is the 'vmlinuz' path,
+	   or you will load wrong (or old) boot kernel image (vmlinuz).
+	   After checking /etc/lilo.conf, please run "lilo".
+
+	  Note that if the result of "make bzImage" is ERROR, then you have to
+	  go back to Linux configuration Setup. Type "make menuconfig" in
+          directory /usr/src/linux.
+
+
+3.5.6 Make tty device and special file
+--------------------------------------
+
+    ::
+       # cd /moxa/mxser/driver
+       # ./msmknod
+
+3.5.7 Make utility
+------------------
+
+    ::
+
+	  # cd /moxa/mxser/utility
+	  # make clean; make install
+
+3.5.8 Reboot
+------------
+
+
+
+3.6 Custom configuration
+========================
+
+    Although this driver already provides you default configuration, you
+    still can change the device name and major number. The instruction to
+    change these parameters are shown as below.
+
+a. Change Device name
+
+    If you'd like to use other device names instead of default naming
+    convention, all you have to do is to modify the internal code
+    within the shell script "msmknod". First, you have to open "msmknod"
+    by vi. Locate each line contains "ttyM" and "cum" and change them
+    to the device name you desired. "msmknod" creates the device names
+    you need next time executed.
+
+b. Change Major number
+
+    If major number 30 and 35 had been occupied, you may have to select
+    2 free major numbers for this driver. There are 3 steps to change
+    major numbers.
+
+3.6.1 Find free major numbers
+-----------------------------
+
+    In /proc/devices, you may find all the major numbers occupied
+    in the system. Please select 2 major numbers that are available.
+    e.g. 40, 45.
+
+3.6.2 Create special files
+--------------------------
+
+   Run /moxa/mxser/driver/msmknod to create special files with
+   specified major numbers.
+
+3.6.3 Modify driver with new major number
+-----------------------------------------
+
+   Run vi to open /moxa/mxser/driver/mxser.c. Locate the line
+   contains "MXSERMAJOR". Change the content as below::
+
+	  #define	  MXSERMAJOR		  40
+	  #define	  MXSERCUMAJOR		  45
+
+    3.6.4 Run "make clean; make install" in /moxa/mxser/driver.
+
+3.7 Verify driver installation
+==============================
+
+    You may refer to /var/log/messages to check the latest status
+    log reported by this driver whenever it's activated.
+
+4. Utilities
+^^^^^^^^^^^^
+
+   There are 3 utilities contained in this driver. They are msdiag, msmon and
+   msterm. These 3 utilities are released in form of source code. They should
+   be compiled into executable file and copied into /usr/bin.
+
+   Before using these utilities, please load driver (refer 3.4 & 3.5) and
+   make sure you had run the "msmknod" utility.
+
+msdiag - Diagnostic
+===================
+
+   This utility provides the function to display what Moxa Smartio/Industio
+   board found by driver in the system.
+
+msmon - Port Monitoring
+=======================
+
+   This utility gives the user a quick view about all the MOXA ports'
+   activities. One can easily learn each port's total received/transmitted
+   (Rx/Tx) character count since the time when the monitoring is started.
+
+   Rx/Tx throughputs per second are also reported in interval basis (e.g.
+   the last 5 seconds) and in average basis (since the time the monitoring
+   is started). You can reset all ports' count by <HOME> key. <+> <->
+   (plus/minus) keys to change the displaying time interval. Press <ENTER>
+   on the port, that cursor stay, to view the port's communication
+   parameters, signal status, and input/output queue.
+
+msterm - Terminal Emulation
+===========================
+
+   This utility provides data sending and receiving ability of all tty ports,
+   especially for MOXA ports. It is quite useful for testing simple
+   application, for example, sending AT command to a modem connected to the
+   port or used as a terminal for login purpose. Note that this is only a
+   dumb terminal emulation without handling full screen operation.
+
+5. Setserial
+^^^^^^^^^^^^
+
+   Supported Setserial parameters are listed as below.
+
+   ============== =========================================================
+   uart		  set UART type(16450-->disable FIFO, 16550A-->enable FIFO)
+   close_delay	  set the amount of time(in 1/100 of a second) that DTR
+		  should be kept low while being closed.
+   closing_wait   set the amount of time(in 1/100 of a second) that the
+		  serial port should wait for data to be drained while
+		  being closed, before the receiver is disable.
+   spd_hi	  Use  57.6kb  when  the application requests 38.4kb.
+   spd_vhi	  Use  115.2kb	when  the application requests 38.4kb.
+   spd_shi	  Use  230.4kb	when  the application requests 38.4kb.
+   spd_warp	  Use  460.8kb	when  the application requests 38.4kb.
+   spd_normal	  Use  38.4kb  when  the application requests 38.4kb.
+   spd_cust	  Use  the custom divisor to set the speed when  the
+		  application requests 38.4kb.
+   divisor	  This option set the custom division.
+   baud_base	  This option set the base baud rate.
+   ============== =========================================================
+
+6. Troubleshooting
+^^^^^^^^^^^^^^^^^^
+
+   The boot time error messages and solutions are stated as clearly as
+   possible. If all the possible solutions fail, please contact our technical
+   support team to get more help.
+
+
+   Error msg:
+	      More than 4 Moxa Smartio/Industio family boards found. Fifth board
+              and after are ignored.
+
+   Solution:
+   To avoid this problem, please unplug fifth and after board, because Moxa
+   driver supports up to 4 boards.
+
+   Error msg:
+	      Request_irq fail, IRQ(?) may be conflict with another device.
+
+   Solution:
+   Other PCI or ISA devices occupy the assigned IRQ. If you are not sure
+   which device causes the situation, please check /proc/interrupts to find
+   free IRQ and simply change another free IRQ for Moxa board.
+
+   Error msg:
+	      Board #: C1xx Series(CAP=xxx) interrupt number invalid.
+
+   Solution:
+   Each port within the same multiport board shares the same IRQ. Please set
+   one IRQ (IRQ doesn't equal to zero) for one Moxa board.
+
+   Error msg:
+	      No interrupt vector be set for Moxa ISA board(CAP=xxx).
+
+   Solution:
+   Moxa ISA board needs an interrupt vector.Please refer to user's manual
+   "Hardware Installation" chapter to set interrupt vector.
+
+   Error msg:
+              Couldn't install MOXA Smartio/Industio family driver!
+
+   Solution:
+   Load Moxa driver fail, the major number may conflict with other devices.
+   Please refer to previous section 3.7 to change a free major number for
+   Moxa driver.
+
+   Error msg:
+              Couldn't install MOXA Smartio/Industio family callout driver!
+
+   Solution:
+   Load Moxa callout driver fail, the callout device major number may
+   conflict with other devices. Please refer to previous section 3.7 to
+   change a free callout device major number for Moxa driver.
diff --git a/Documentation/driver-api/serial/n_gsm.rst b/Documentation/driver-api/serial/n_gsm.rst
new file mode 100644
index 000000000000..f3ad9fd26408
--- /dev/null
+++ b/Documentation/driver-api/serial/n_gsm.rst
@@ -0,0 +1,103 @@
+==============================
+GSM 0710 tty multiplexor HOWTO
+==============================
+
+This line discipline implements the GSM 07.10 multiplexing protocol
+detailed in the following 3GPP document:
+
+	http://www.3gpp.org/ftp/Specs/archive/07_series/07.10/0710-720.zip
+
+This document give some hints on how to use this driver with GPRS and 3G
+modems connected to a physical serial port.
+
+How to use it
+-------------
+1. initialize the modem in 0710 mux mode (usually AT+CMUX= command) through
+   its serial port. Depending on the modem used, you can pass more or less
+   parameters to this command,
+2. switch the serial line to using the n_gsm line discipline by using
+   TIOCSETD ioctl,
+3. configure the mux using GSMIOC_GETCONF / GSMIOC_SETCONF ioctl,
+
+Major parts of the initialization program :
+(a good starting point is util-linux-ng/sys-utils/ldattach.c)::
+
+  #include <linux/gsmmux.h>
+  #define N_GSM0710	21	/* GSM 0710 Mux */
+  #define DEFAULT_SPEED	B115200
+  #define SERIAL_PORT	/dev/ttyS0
+
+	int ldisc = N_GSM0710;
+	struct gsm_config c;
+	struct termios configuration;
+
+	/* open the serial port connected to the modem */
+	fd = open(SERIAL_PORT, O_RDWR | O_NOCTTY | O_NDELAY);
+
+	/* configure the serial port : speed, flow control ... */
+
+	/* send the AT commands to switch the modem to CMUX mode
+	   and check that it's successful (should return OK) */
+	write(fd, "AT+CMUX=0\r", 10);
+
+	/* experience showed that some modems need some time before
+	   being able to answer to the first MUX packet so a delay
+	   may be needed here in some case */
+	sleep(3);
+
+	/* use n_gsm line discipline */
+	ioctl(fd, TIOCSETD, &ldisc);
+
+	/* get n_gsm configuration */
+	ioctl(fd, GSMIOC_GETCONF, &c);
+	/* we are initiator and need encoding 0 (basic) */
+	c.initiator = 1;
+	c.encapsulation = 0;
+	/* our modem defaults to a maximum size of 127 bytes */
+	c.mru = 127;
+	c.mtu = 127;
+	/* set the new configuration */
+	ioctl(fd, GSMIOC_SETCONF, &c);
+
+	/* and wait for ever to keep the line discipline enabled */
+	daemon(0,0);
+	pause();
+
+4. create the devices corresponding to the "virtual" serial ports (take care,
+   each modem has its configuration and some DLC have dedicated functions,
+   for example GPS), starting with minor 1 (DLC0 is reserved for the management
+   of the mux)::
+
+     MAJOR=`cat /proc/devices |grep gsmtty | awk '{print $1}`
+     for i in `seq 1 4`; do
+	mknod /dev/ttygsm$i c $MAJOR $i
+     done
+
+5. use these devices as plain serial ports.
+
+   for example, it's possible:
+
+   - and to use gnokii to send / receive SMS on ttygsm1
+   - to use ppp to establish a datalink on ttygsm2
+
+6. first close all virtual ports before closing the physical port.
+
+   Note that after closing the physical port the modem is still in multiplexing
+   mode. This may prevent a successful re-opening of the port later. To avoid
+   this situation either reset the modem if your hardware allows that or send
+   a disconnect command frame manually before initializing the multiplexing mode
+   for the second time. The byte sequence for the disconnect command frame is::
+
+      0xf9, 0x03, 0xef, 0x03, 0xc3, 0x16, 0xf9.
+
+Additional Documentation
+------------------------
+More practical details on the protocol and how it's supported by industrial
+modems can be found in the following documents :
+
+- http://www.telit.com/module/infopool/download.php?id=616
+- http://www.u-blox.com/images/downloads/Product_Docs/LEON-G100-G200-MuxImplementation_ApplicationNote_%28GSM%20G1-CS-10002%29.pdf
+- http://www.sierrawireless.com/Support/Downloads/AirPrime/WMP_Series/~/media/Support_Downloads/AirPrime/Application_notes/CMUX_Feature_Application_Note-Rev004.ashx
+- http://wm.sim.com/sim/News/photo/2010721161442.pdf
+
+11-03-08 - Eric Bénard - <eric@eukrea.com>
diff --git a/Documentation/driver-api/serial/rocket.rst b/Documentation/driver-api/serial/rocket.rst
new file mode 100644
index 000000000000..23761eae4282
--- /dev/null
+++ b/Documentation/driver-api/serial/rocket.rst
@@ -0,0 +1,185 @@
+================================================
+Comtrol(tm) RocketPort(R)/RocketModem(TM) Series
+================================================
+
+Device Driver for the Linux Operating System
+============================================
+
+Product overview
+----------------
+
+This driver provides a loadable kernel driver for the Comtrol RocketPort
+and RocketModem PCI boards. These boards provide, 2, 4, 8, 16, or 32
+high-speed serial ports or modems.  This driver supports up to a combination
+of four RocketPort or RocketModems boards in one machine simultaneously.
+This file assumes that you are using the RocketPort driver which is
+integrated into the kernel sources.
+
+The driver can also be installed as an external module using the usual
+"make;make install" routine.  This external module driver, obtainable
+from the Comtrol website listed below, is useful for updating the driver
+or installing it into kernels which do not have the driver configured
+into them.  Installations instructions for the external module
+are in the included README and HW_INSTALL files.
+
+RocketPort ISA and RocketModem II PCI boards currently are only supported by
+this driver in module form.
+
+The RocketPort ISA board requires I/O ports to be configured by the DIP
+switches on the board.  See the section "ISA Rocketport Boards" below for
+information on how to set the DIP switches.
+
+You pass the I/O port to the driver using the following module parameters:
+
+board1:
+	I/O port for the first ISA board
+board2:
+	I/O port for the second ISA board
+board3:
+	I/O port for the third ISA board
+board4:
+	I/O port for the fourth ISA board
+
+There is a set of utilities and scripts provided with the external driver
+(downloadable from http://www.comtrol.com) that ease the configuration and
+setup of the ISA cards.
+
+The RocketModem II PCI boards require firmware to be loaded into the card
+before it will function.  The driver has only been tested as a module for this
+board.
+
+Installation Procedures
+-----------------------
+
+RocketPort/RocketModem PCI cards require no driver configuration, they are
+automatically detected and configured.
+
+The RocketPort driver can be installed as a module (recommended) or built
+into the kernel. This is selected, as for other drivers, through the `make config`
+command from the root of the Linux source tree during the kernel build process.
+
+The RocketPort/RocketModem serial ports installed by this driver are assigned
+device major number 46, and will be named /dev/ttyRx, where x is the port number
+starting at zero (ex. /dev/ttyR0, /devttyR1, ...).  If you have multiple cards
+installed in the system, the mapping of port names to serial ports is displayed
+in the system log at /var/log/messages.
+
+If installed as a module, the module must be loaded.  This can be done
+manually by entering "modprobe rocket".  To have the module loaded automatically
+upon system boot, edit a `/etc/modprobe.d/*.conf` file and add the line
+"alias char-major-46 rocket".
+
+In order to use the ports, their device names (nodes) must be created with mknod.
+This is only required once, the system will retain the names once created.  To
+create the RocketPort/RocketModem device names, use the command
+"mknod /dev/ttyRx c 46 x" where x is the port number starting at zero.
+
+For example::
+
+	> mknod /dev/ttyR0 c 46 0
+	> mknod /dev/ttyR1 c 46 1
+	> mknod /dev/ttyR2 c 46 2
+
+The Linux script MAKEDEV will create the first 16 ttyRx device names (nodes)
+for you::
+
+	>/dev/MAKEDEV ttyR
+
+ISA Rocketport Boards
+---------------------
+
+You must assign and configure the I/O addresses used by the ISA Rocketport
+card before installing and using it.  This is done by setting a set of DIP
+switches on the Rocketport board.
+
+
+Setting the I/O address
+-----------------------
+
+Before installing RocketPort(R) or RocketPort RA boards, you must find
+a range of I/O addresses for it to use. The first RocketPort card
+requires a 68-byte contiguous block of I/O addresses, starting at one
+of the following: 0x100h, 0x140h, 0x180h, 0x200h, 0x240h, 0x280h,
+0x300h, 0x340h, 0x380h.  This I/O address must be reflected in the DIP
+switches of *all* of the Rocketport cards.
+
+The second, third, and fourth RocketPort cards require a 64-byte
+contiguous block of I/O addresses, starting at one of the following
+I/O addresses: 0x100h, 0x140h, 0x180h, 0x1C0h, 0x200h, 0x240h, 0x280h,
+0x2C0h, 0x300h, 0x340h, 0x380h, 0x3C0h.  The I/O address used by the
+second, third, and fourth Rocketport cards (if present) are set via
+software control.  The DIP switch settings for the I/O address must be
+set to the value of the first Rocketport cards.
+
+In order to distinguish each of the card from the others, each card
+must have a unique board ID set on the dip switches.  The first
+Rocketport board must be set with the DIP switches corresponding to
+the first board, the second board must be set with the DIP switches
+corresponding to the second board, etc.  IMPORTANT: The board ID is
+the only place where the DIP switch settings should differ between the
+various Rocketport boards in a system.
+
+The I/O address range used by any of the RocketPort cards must not
+conflict with any other cards in the system, including other
+RocketPort cards.  Below, you will find a list of commonly used I/O
+address ranges which may be in use by other devices in your system.
+On a Linux system, "cat /proc/ioports" will also be helpful in
+identifying what I/O addresses are being used by devices on your
+system.
+
+Remember, the FIRST RocketPort uses 68 I/O addresses.  So, if you set it
+for 0x100, it will occupy 0x100 to 0x143.  This would mean that you
+CAN NOT set the second, third or fourth board for address 0x140 since
+the first 4 bytes of that range are used by the first board.  You would
+need to set the second, third, or fourth board to one of the next available
+blocks such as 0x180.
+
+RocketPort and RocketPort RA SW1 Settings::
+
+            +-------------------------------+
+            | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
+            +-------+-------+---------------+
+            | Unused| Card  | I/O Port Block|
+            +-------------------------------+
+
+  DIP Switches                             DIP Switches
+  7    8                                   6    5
+  ===================                      ===================
+  On   On   UNUSED, MUST BE ON.            On   On   First Card    <==== Default
+                                           On   Off  Second Card
+                                           Off  On   Third Card
+                                           Off  Off  Fourth Card
+
+  DIP Switches         I/O Address Range
+  4    3    2    1     Used by the First Card
+  =====================================
+  On   Off  On   Off   100-143
+  On   Off  Off  On    140-183
+  On   Off  Off  Off   180-1C3       <==== Default
+  Off  On   On   Off   200-243
+  Off  On   Off  On    240-283
+  Off  On   Off  Off   280-2C3
+  Off  Off  On   Off   300-343
+  Off  Off  Off  On    340-383
+  Off  Off  Off  Off   380-3C3
+
+Reporting Bugs
+--------------
+
+For technical support, please provide the following
+information: Driver version, kernel release, distribution of
+kernel, and type of board you are using. Error messages and log
+printouts port configuration details are especially helpful.
+
+USA:
+    :Phone: (612) 494-4100
+    :FAX: (612) 494-4199
+    :email: support@comtrol.com
+
+Comtrol Europe:
+    :Phone: +44 (0) 1 869 323-220
+    :FAX: +44 (0) 1 869 323-211
+    :email: support@comtrol.co.uk
+
+Web:	http://www.comtrol.com
+FTP:	ftp.comtrol.com
diff --git a/Documentation/driver-api/serial/serial-iso7816.rst b/Documentation/driver-api/serial/serial-iso7816.rst
new file mode 100644
index 000000000000..d990143de0c6
--- /dev/null
+++ b/Documentation/driver-api/serial/serial-iso7816.rst
@@ -0,0 +1,90 @@
+=============================
+ISO7816 Serial Communications
+=============================
+
+1. Introduction
+===============
+
+  ISO/IEC7816 is a series of standards specifying integrated circuit cards (ICC)
+  also known as smart cards.
+
+2. Hardware-related considerations
+==================================
+
+  Some CPUs/UARTs (e.g., Microchip AT91) contain a built-in mode capable of
+  handling communication with a smart card.
+
+  For these microcontrollers, the Linux driver should be made capable of
+  working in both modes, and proper ioctls (see later) should be made
+  available at user-level to allow switching from one mode to the other, and
+  vice versa.
+
+3. Data Structures Already Available in the Kernel
+==================================================
+
+  The Linux kernel provides the serial_iso7816 structure (see [1]) to handle
+  ISO7816 communications. This data structure is used to set and configure
+  ISO7816 parameters in ioctls.
+
+  Any driver for devices capable of working both as RS232 and ISO7816 should
+  implement the iso7816_config callback in the uart_port structure. The
+  serial_core calls iso7816_config to do the device specific part in response
+  to TIOCGISO7816 and TIOCSISO7816 ioctls (see below). The iso7816_config
+  callback receives a pointer to struct serial_iso7816.
+
+4. Usage from user-level
+========================
+
+  From user-level, ISO7816 configuration can be get/set using the previous
+  ioctls. For instance, to set ISO7816 you can use the following code::
+
+	#include <linux/serial.h>
+
+	/* Include definition for ISO7816 ioctls: TIOCSISO7816 and TIOCGISO7816 */
+	#include <sys/ioctl.h>
+
+	/* Open your specific device (e.g., /dev/mydevice): */
+	int fd = open ("/dev/mydevice", O_RDWR);
+	if (fd < 0) {
+		/* Error handling. See errno. */
+	}
+
+	struct serial_iso7816 iso7816conf;
+
+	/* Reserved fields as to be zeroed */
+	memset(&iso7816conf, 0, sizeof(iso7816conf));
+
+	/* Enable ISO7816 mode: */
+	iso7816conf.flags |= SER_ISO7816_ENABLED;
+
+	/* Select the protocol: */
+	/* T=0 */
+	iso7816conf.flags |= SER_ISO7816_T(0);
+	/* or T=1 */
+	iso7816conf.flags |= SER_ISO7816_T(1);
+
+	/* Set the guard time: */
+	iso7816conf.tg = 2;
+
+	/* Set the clock frequency*/
+	iso7816conf.clk = 3571200;
+
+	/* Set transmission factors: */
+	iso7816conf.sc_fi = 372;
+	iso7816conf.sc_di = 1;
+
+	if (ioctl(fd_usart, TIOCSISO7816, &iso7816conf) < 0) {
+		/* Error handling. See errno. */
+	}
+
+	/* Use read() and write() syscalls here... */
+
+	/* Close the device when finished: */
+	if (close (fd) < 0) {
+		/* Error handling. See errno. */
+	}
+
+5. References
+=============
+
+ [1]    include/uapi/linux/serial.h
diff --git a/Documentation/driver-api/serial/serial-rs485.rst b/Documentation/driver-api/serial/serial-rs485.rst
new file mode 100644
index 000000000000..6bc824f948f9
--- /dev/null
+++ b/Documentation/driver-api/serial/serial-rs485.rst
@@ -0,0 +1,103 @@
+===========================
+RS485 Serial Communications
+===========================
+
+1. Introduction
+===============
+
+   EIA-485, also known as TIA/EIA-485 or RS-485, is a standard defining the
+   electrical characteristics of drivers and receivers for use in balanced
+   digital multipoint systems.
+   This standard is widely used for communications in industrial automation
+   because it can be used effectively over long distances and in electrically
+   noisy environments.
+
+2. Hardware-related Considerations
+==================================
+
+   Some CPUs/UARTs (e.g., Atmel AT91 or 16C950 UART) contain a built-in
+   half-duplex mode capable of automatically controlling line direction by
+   toggling RTS or DTR signals. That can be used to control external
+   half-duplex hardware like an RS485 transceiver or any RS232-connected
+   half-duplex devices like some modems.
+
+   For these microcontrollers, the Linux driver should be made capable of
+   working in both modes, and proper ioctls (see later) should be made
+   available at user-level to allow switching from one mode to the other, and
+   vice versa.
+
+3. Data Structures Already Available in the Kernel
+==================================================
+
+   The Linux kernel provides the serial_rs485 structure (see [1]) to handle
+   RS485 communications. This data structure is used to set and configure RS485
+   parameters in the platform data and in ioctls.
+
+   The device tree can also provide RS485 boot time parameters (see [2]
+   for bindings). The driver is in charge of filling this data structure from
+   the values given by the device tree.
+
+   Any driver for devices capable of working both as RS232 and RS485 should
+   implement the rs485_config callback in the uart_port structure. The
+   serial_core calls rs485_config to do the device specific part in response
+   to TIOCSRS485 and TIOCGRS485 ioctls (see below). The rs485_config callback
+   receives a pointer to struct serial_rs485.
+
+4. Usage from user-level
+========================
+
+   From user-level, RS485 configuration can be get/set using the previous
+   ioctls. For instance, to set RS485 you can use the following code::
+
+	#include <linux/serial.h>
+
+	/* Include definition for RS485 ioctls: TIOCGRS485 and TIOCSRS485 */
+	#include <sys/ioctl.h>
+
+	/* Open your specific device (e.g., /dev/mydevice): */
+	int fd = open ("/dev/mydevice", O_RDWR);
+	if (fd < 0) {
+		/* Error handling. See errno. */
+	}
+
+	struct serial_rs485 rs485conf;
+
+	/* Enable RS485 mode: */
+	rs485conf.flags |= SER_RS485_ENABLED;
+
+	/* Set logical level for RTS pin equal to 1 when sending: */
+	rs485conf.flags |= SER_RS485_RTS_ON_SEND;
+	/* or, set logical level for RTS pin equal to 0 when sending: */
+	rs485conf.flags &= ~(SER_RS485_RTS_ON_SEND);
+
+	/* Set logical level for RTS pin equal to 1 after sending: */
+	rs485conf.flags |= SER_RS485_RTS_AFTER_SEND;
+	/* or, set logical level for RTS pin equal to 0 after sending: */
+	rs485conf.flags &= ~(SER_RS485_RTS_AFTER_SEND);
+
+	/* Set rts delay before send, if needed: */
+	rs485conf.delay_rts_before_send = ...;
+
+	/* Set rts delay after send, if needed: */
+	rs485conf.delay_rts_after_send = ...;
+
+	/* Set this flag if you want to receive data even while sending data */
+	rs485conf.flags |= SER_RS485_RX_DURING_TX;
+
+	if (ioctl (fd, TIOCSRS485, &rs485conf) < 0) {
+		/* Error handling. See errno. */
+	}
+
+	/* Use read() and write() syscalls here... */
+
+	/* Close the device when finished: */
+	if (close (fd) < 0) {
+		/* Error handling. See errno. */
+	}
+
+5. References
+=============
+
+ [1]	include/uapi/linux/serial.h
+
+ [2]	Documentation/devicetree/bindings/serial/rs485.txt
diff --git a/Documentation/driver-api/serial/tty.rst b/Documentation/driver-api/serial/tty.rst
new file mode 100644
index 000000000000..dd972caacf3e
--- /dev/null
+++ b/Documentation/driver-api/serial/tty.rst
@@ -0,0 +1,328 @@
+=================
+The Lockronomicon
+=================
+
+Your guide to the ancient and twisted locking policies of the tty layer and
+the warped logic behind them. Beware all ye who read on.
+
+
+Line Discipline
+---------------
+
+Line disciplines are registered with tty_register_ldisc() passing the
+discipline number and the ldisc structure. At the point of registration the
+discipline must be ready to use and it is possible it will get used before
+the call returns success. If the call returns an error then it won't get
+called. Do not re-use ldisc numbers as they are part of the userspace ABI
+and writing over an existing ldisc will cause demons to eat your computer.
+After the return the ldisc data has been copied so you may free your own
+copy of the structure. You must not re-register over the top of the line
+discipline even with the same data or your computer again will be eaten by
+demons.
+
+In order to remove a line discipline call tty_unregister_ldisc().
+In ancient times this always worked. In modern times the function will
+return -EBUSY if the ldisc is currently in use. Since the ldisc referencing
+code manages the module counts this should not usually be a concern.
+
+Heed this warning: the reference count field of the registered copies of the
+tty_ldisc structure in the ldisc table counts the number of lines using this
+discipline. The reference count of the tty_ldisc structure within a tty
+counts the number of active users of the ldisc at this instant. In effect it
+counts the number of threads of execution within an ldisc method (plus those
+about to enter and exit although this detail matters not).
+
+Line Discipline Methods
+-----------------------
+
+TTY side interfaces
+^^^^^^^^^^^^^^^^^^^
+
+======================= =======================================================
+open()			Called when the line discipline is attached to
+			the terminal. No other call into the line
+			discipline for this tty will occur until it
+			completes successfully. Should initialize any
+			state needed by the ldisc, and set receive_room
+			in the tty_struct to the maximum amount of data
+			the line discipline is willing to accept from the
+			driver with a single call to receive_buf().
+			Returning an error will prevent the ldisc from
+			being attached. Can sleep.
+
+close()			This is called on a terminal when the line
+			discipline is being unplugged. At the point of
+			execution no further users will enter the
+			ldisc code for this tty. Can sleep.
+
+hangup()		Called when the tty line is hung up.
+			The line discipline should cease I/O to the tty.
+			No further calls into the ldisc code will occur.
+			The return value is ignored. Can sleep.
+
+read()			(optional) A process requests reading data from
+			the line. Multiple read calls may occur in parallel
+			and the ldisc must deal with serialization issues.
+			If not defined, the process will receive an EIO
+			error. May sleep.
+
+write()			(optional) A process requests writing data to the
+			line. Multiple write calls are serialized by the
+			tty layer for the ldisc. If not defined, the
+			process will receive an EIO error. May sleep.
+
+flush_buffer()		(optional) May be called at any point between
+			open and close, and instructs the line discipline
+			to empty its input buffer.
+
+set_termios()		(optional) Called on termios structure changes.
+			The caller passes the old termios data and the
+			current data is in the tty. Called under the
+			termios semaphore so allowed to sleep. Serialized
+			against itself only.
+
+poll()			(optional) Check the status for the poll/select
+			calls. Multiple poll calls may occur in parallel.
+			May sleep.
+
+ioctl()			(optional) Called when an ioctl is handed to the
+			tty layer that might be for the ldisc. Multiple
+			ioctl calls may occur in parallel. May sleep.
+
+compat_ioctl()		(optional) Called when a 32 bit ioctl is handed
+			to the tty layer that might be for the ldisc.
+			Multiple ioctl calls may occur in parallel.
+			May sleep.
+======================= =======================================================
+
+Driver Side Interfaces
+^^^^^^^^^^^^^^^^^^^^^^
+
+======================= =======================================================
+receive_buf()		(optional) Called by the low-level driver to hand
+			a buffer of received bytes to the ldisc for
+			processing. The number of bytes is guaranteed not
+			to exceed the current value of tty->receive_room.
+			All bytes must be processed.
+
+receive_buf2()		(optional) Called by the low-level driver to hand
+			a buffer of received bytes to the ldisc for
+			processing. Returns the number of bytes processed.
+
+			If both receive_buf() and receive_buf2() are
+			defined, receive_buf2() should be preferred.
+
+write_wakeup()		May be called at any point between open and close.
+			The TTY_DO_WRITE_WAKEUP flag indicates if a call
+			is needed but always races versus calls. Thus the
+			ldisc must be careful about setting order and to
+			handle unexpected calls. Must not sleep.
+
+			The driver is forbidden from calling this directly
+			from the ->write call from the ldisc as the ldisc
+			is permitted to call the driver write method from
+			this function. In such a situation defer it.
+
+dcd_change()		Report to the tty line the current DCD pin status
+			changes and the relative timestamp. The timestamp
+			cannot be NULL.
+======================= =======================================================
+
+
+Driver Access
+^^^^^^^^^^^^^
+
+Line discipline methods can call the following methods of the underlying
+hardware driver through the function pointers within the tty->driver
+structure:
+
+======================= =======================================================
+write()			Write a block of characters to the tty device.
+			Returns the number of characters accepted. The
+			character buffer passed to this method is already
+			in kernel space.
+
+put_char()		Queues a character for writing to the tty device.
+			If there is no room in the queue, the character is
+			ignored.
+
+flush_chars()		(Optional) If defined, must be called after
+			queueing characters with put_char() in order to
+			start transmission.
+
+write_room()		Returns the numbers of characters the tty driver
+			will accept for queueing to be written.
+
+ioctl()			Invoke device specific ioctl.
+			Expects data pointers to refer to userspace.
+			Returns ENOIOCTLCMD for unrecognized ioctl numbers.
+
+set_termios()		Notify the tty driver that the device's termios
+			settings have changed. New settings are in
+			tty->termios. Previous settings should be passed in
+			the "old" argument.
+
+			The API is defined such that the driver should return
+			the actual modes selected. This means that the
+			driver function is responsible for modifying any
+			bits in the request it cannot fulfill to indicate
+			the actual modes being used. A device with no
+			hardware capability for change (e.g. a USB dongle or
+			virtual port) can provide NULL for this method.
+
+throttle()		Notify the tty driver that input buffers for the
+			line discipline are close to full, and it should
+			somehow signal that no more characters should be
+			sent to the tty.
+
+unthrottle()		Notify the tty driver that characters can now be
+			sent to the tty without fear of overrunning the
+			input buffers of the line disciplines.
+
+stop()			Ask the tty driver to stop outputting characters
+			to the tty device.
+
+start()			Ask the tty driver to resume sending characters
+			to the tty device.
+
+hangup()		Ask the tty driver to hang up the tty device.
+
+break_ctl()		(Optional) Ask the tty driver to turn on or off
+			BREAK status on the RS-232 port.  If state is -1,
+			then the BREAK status should be turned on; if
+			state is 0, then BREAK should be turned off.
+			If this routine is not implemented, use ioctls
+			TIOCSBRK / TIOCCBRK instead.
+
+wait_until_sent()	Waits until the device has written out all of the
+			characters in its transmitter FIFO.
+
+send_xchar()		Send a high-priority XON/XOFF character to the device.
+======================= =======================================================
+
+
+Flags
+^^^^^
+
+Line discipline methods have access to tty->flags field containing the
+following interesting flags:
+
+======================= =======================================================
+TTY_THROTTLED		Driver input is throttled. The ldisc should call
+			tty->driver->unthrottle() in order to resume
+			reception when it is ready to process more data.
+
+TTY_DO_WRITE_WAKEUP	If set, causes the driver to call the ldisc's
+			write_wakeup() method in order to resume
+			transmission when it can accept more data
+			to transmit.
+
+TTY_IO_ERROR		If set, causes all subsequent userspace read/write
+			calls on the tty to fail, returning -EIO.
+
+TTY_OTHER_CLOSED	Device is a pty and the other side has closed.
+
+TTY_NO_WRITE_SPLIT	Prevent driver from splitting up writes into
+			smaller chunks.
+======================= =======================================================
+
+
+Locking
+^^^^^^^
+
+Callers to the line discipline functions from the tty layer are required to
+take line discipline locks. The same is true of calls from the driver side
+but not yet enforced.
+
+Three calls are now provided::
+
+	ldisc = tty_ldisc_ref(tty);
+
+takes a handle to the line discipline in the tty and returns it. If no ldisc
+is currently attached or the ldisc is being closed and re-opened at this
+point then NULL is returned. While this handle is held the ldisc will not
+change or go away::
+
+	tty_ldisc_deref(ldisc)
+
+Returns the ldisc reference and allows the ldisc to be closed. Returning the
+reference takes away your right to call the ldisc functions until you take
+a new reference::
+
+	ldisc = tty_ldisc_ref_wait(tty);
+
+Performs the same function as tty_ldisc_ref except that it will wait for an
+ldisc change to complete and then return a reference to the new ldisc.
+
+While these functions are slightly slower than the old code they should have
+minimal impact as most receive logic uses the flip buffers and they only
+need to take a reference when they push bits up through the driver.
+
+A caution: The ldisc->open(), ldisc->close() and driver->set_ldisc
+functions are called with the ldisc unavailable. Thus tty_ldisc_ref will
+fail in this situation if used within these functions. Ldisc and driver
+code calling its own functions must be careful in this case.
+
+
+Driver Interface
+----------------
+
+======================= =======================================================
+open()			Called when a device is opened. May sleep
+
+close()			Called when a device is closed. At the point of
+			return from this call the driver must make no
+			further ldisc calls of any kind. May sleep
+
+write()			Called to write bytes to the device. May not
+			sleep. May occur in parallel in special cases.
+			Because this includes panic paths drivers generally
+			shouldn't try and do clever locking here.
+
+put_char()		Stuff a single character onto the queue. The
+			driver is guaranteed following up calls to
+			flush_chars.
+
+flush_chars()		Ask the kernel to write put_char queue
+
+write_room()		Return the number of characters that can be stuffed
+			into the port buffers without overflow (or less).
+			The ldisc is responsible for being intelligent
+			about multi-threading of write_room/write calls
+
+ioctl()			Called when an ioctl may be for the driver
+
+set_termios()		Called on termios change, serialized against
+			itself by a semaphore. May sleep.
+
+set_ldisc()		Notifier for discipline change. At the point this
+			is done the discipline is not yet usable. Can now
+			sleep (I think)
+
+throttle()		Called by the ldisc to ask the driver to do flow
+			control.  Serialization including with unthrottle
+			is the job of the ldisc layer.
+
+unthrottle()		Called by the ldisc to ask the driver to stop flow
+			control.
+
+stop()			Ldisc notifier to the driver to stop output. As with
+			throttle the serializations with start() are down
+			to the ldisc layer.
+
+start()			Ldisc notifier to the driver to start output.
+
+hangup()		Ask the tty driver to cause a hangup initiated
+			from the host side. [Can sleep ??]
+
+break_ctl()		Send RS232 break. Can sleep. Can get called in
+			parallel, driver must serialize (for now), and
+			with write calls.
+
+wait_until_sent()	Wait for characters to exit the hardware queue
+			of the driver. Can sleep
+
+send_xchar()	  	Send XON/XOFF and if possible jump the queue with
+			it in order to get fast flow control responses.
+			Cannot sleep ??
+======================= =======================================================
diff --git a/Documentation/serial/cyclades_z.rst b/Documentation/serial/cyclades_z.rst
deleted file mode 100644
index 532ff67e2f1c..000000000000
--- a/Documentation/serial/cyclades_z.rst
+++ /dev/null
@@ -1,11 +0,0 @@
-================
-Cyclades-Z notes
-================
-
-The Cyclades-Z must have firmware loaded onto the card before it will
-operate.  This operation should be performed during system startup,
-
-The firmware, loader program and the latest device driver code are
-available from Cyclades at
-
-    ftp://ftp.cyclades.com/pub/cyclades/cyclades-z/linux/
diff --git a/Documentation/serial/driver.rst b/Documentation/serial/driver.rst
deleted file mode 100644
index 4537119bf624..000000000000
--- a/Documentation/serial/driver.rst
+++ /dev/null
@@ -1,549 +0,0 @@
-====================
-Low Level Serial API
-====================
-
-
-This document is meant as a brief overview of some aspects of the new serial
-driver.  It is not complete, any questions you have should be directed to
-<rmk@arm.linux.org.uk>
-
-The reference implementation is contained within amba-pl011.c.
-
-
-
-Low Level Serial Hardware Driver
---------------------------------
-
-The low level serial hardware driver is responsible for supplying port
-information (defined by uart_port) and a set of control methods (defined
-by uart_ops) to the core serial driver.  The low level driver is also
-responsible for handling interrupts for the port, and providing any
-console support.
-
-
-Console Support
----------------
-
-The serial core provides a few helper functions.  This includes identifing
-the correct port structure (via uart_get_console) and decoding command line
-arguments (uart_parse_options).
-
-There is also a helper function (uart_console_write) which performs a
-character by character write, translating newlines to CRLF sequences.
-Driver writers are recommended to use this function rather than implementing
-their own version.
-
-
-Locking
--------
-
-It is the responsibility of the low level hardware driver to perform the
-necessary locking using port->lock.  There are some exceptions (which
-are described in the uart_ops listing below.)
-
-There are two locks.  A per-port spinlock, and an overall semaphore.
-
-From the core driver perspective, the port->lock locks the following
-data::
-
-	port->mctrl
-	port->icount
-	port->state->xmit.head (circ_buf->head)
-	port->state->xmit.tail (circ_buf->tail)
-
-The low level driver is free to use this lock to provide any additional
-locking.
-
-The port_sem semaphore is used to protect against ports being added/
-removed or reconfigured at inappropriate times. Since v2.6.27, this
-semaphore has been the 'mutex' member of the tty_port struct, and
-commonly referred to as the port mutex.
-
-
-uart_ops
---------
-
-The uart_ops structure is the main interface between serial_core and the
-hardware specific driver.  It contains all the methods to control the
-hardware.
-
-  tx_empty(port)
-	This function tests whether the transmitter fifo and shifter
-	for the port described by 'port' is empty.  If it is empty,
-	this function should return TIOCSER_TEMT, otherwise return 0.
-	If the port does not support this operation, then it should
-	return TIOCSER_TEMT.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-	This call must not sleep
-
-  set_mctrl(port, mctrl)
-	This function sets the modem control lines for port described
-	by 'port' to the state described by mctrl.  The relevant bits
-	of mctrl are:
-
-		- TIOCM_RTS	RTS signal.
-		- TIOCM_DTR	DTR signal.
-		- TIOCM_OUT1	OUT1 signal.
-		- TIOCM_OUT2	OUT2 signal.
-		- TIOCM_LOOP	Set the port into loopback mode.
-
-	If the appropriate bit is set, the signal should be driven
-	active.  If the bit is clear, the signal should be driven
-	inactive.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  get_mctrl(port)
-	Returns the current state of modem control inputs.  The state
-	of the outputs should not be returned, since the core keeps
-	track of their state.  The state information should include:
-
-		- TIOCM_CAR	state of DCD signal
-		- TIOCM_CTS	state of CTS signal
-		- TIOCM_DSR	state of DSR signal
-		- TIOCM_RI	state of RI signal
-
-	The bit is set if the signal is currently driven active.  If
-	the port does not support CTS, DCD or DSR, the driver should
-	indicate that the signal is permanently active.  If RI is
-	not available, the signal should not be indicated as active.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  stop_tx(port)
-	Stop transmitting characters.  This might be due to the CTS
-	line becoming inactive or the tty layer indicating we want
-	to stop transmission due to an XOFF character.
-
-	The driver should stop transmitting characters as soon as
-	possible.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  start_tx(port)
-	Start transmitting characters.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  throttle(port)
-	Notify the serial driver that input buffers for the line discipline are
-	close to full, and it should somehow signal that no more characters
-	should be sent to the serial port.
-	This will be called only if hardware assisted flow control is enabled.
-
-	Locking: serialized with .unthrottle() and termios modification by the
-	tty layer.
-
-  unthrottle(port)
-	Notify the serial driver that characters can now be sent to the serial
-	port without fear of overrunning the input buffers of the line
-	disciplines.
-
-	This will be called only if hardware assisted flow control is enabled.
-
-	Locking: serialized with .throttle() and termios modification by the
-	tty layer.
-
-  send_xchar(port,ch)
-	Transmit a high priority character, even if the port is stopped.
-	This is used to implement XON/XOFF flow control and tcflow().  If
-	the serial driver does not implement this function, the tty core
-	will append the character to the circular buffer and then call
-	start_tx() / stop_tx() to flush the data out.
-
-	Do not transmit if ch == '\0' (__DISABLED_CHAR).
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  stop_rx(port)
-	Stop receiving characters; the port is in the process of
-	being closed.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  enable_ms(port)
-	Enable the modem status interrupts.
-
-	This method may be called multiple times.  Modem status
-	interrupts should be disabled when the shutdown method is
-	called.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  break_ctl(port,ctl)
-	Control the transmission of a break signal.  If ctl is
-	nonzero, the break signal should be transmitted.  The signal
-	should be terminated when another call is made with a zero
-	ctl.
-
-	Locking: caller holds tty_port->mutex
-
-  startup(port)
-	Grab any interrupt resources and initialise any low level driver
-	state.  Enable the port for reception.  It should not activate
-	RTS nor DTR; this will be done via a separate call to set_mctrl.
-
-	This method will only be called when the port is initially opened.
-
-	Locking: port_sem taken.
-
-	Interrupts: globally disabled.
-
-  shutdown(port)
-	Disable the port, disable any break condition that may be in
-	effect, and free any interrupt resources.  It should not disable
-	RTS nor DTR; this will have already been done via a separate
-	call to set_mctrl.
-
-	Drivers must not access port->state once this call has completed.
-
-	This method will only be called when there are no more users of
-	this port.
-
-	Locking: port_sem taken.
-
-	Interrupts: caller dependent.
-
-  flush_buffer(port)
-	Flush any write buffers, reset any DMA state and stop any
-	ongoing DMA transfers.
-
-	This will be called whenever the port->state->xmit circular
-	buffer is cleared.
-
-	Locking: port->lock taken.
-
-	Interrupts: locally disabled.
-
-	This call must not sleep
-
-  set_termios(port,termios,oldtermios)
-	Change the port parameters, including word length, parity, stop
-	bits.  Update read_status_mask and ignore_status_mask to indicate
-	the types of events we are interested in receiving.  Relevant
-	termios->c_cflag bits are:
-
-		CSIZE
-			- word size
-		CSTOPB
-			- 2 stop bits
-		PARENB
-			- parity enable
-		PARODD
-			- odd parity (when PARENB is in force)
-		CREAD
-			- enable reception of characters (if not set,
-			  still receive characters from the port, but
-			  throw them away.
-		CRTSCTS
-			- if set, enable CTS status change reporting
-		CLOCAL
-			- if not set, enable modem status change
-			  reporting.
-
-	Relevant termios->c_iflag bits are:
-
-		INPCK
-			- enable frame and parity error events to be
-			  passed to the TTY layer.
-		BRKINT / PARMRK
-			- both of these enable break events to be
-			  passed to the TTY layer.
-
-		IGNPAR
-			- ignore parity and framing errors
-		IGNBRK
-			- ignore break errors,  If IGNPAR is also
-			  set, ignore overrun errors as well.
-
-	The interaction of the iflag bits is as follows (parity error
-	given as an example):
-
-	=============== ======= ======  =============================
-	Parity error	INPCK	IGNPAR
-	=============== ======= ======  =============================
-	n/a		0	n/a	character received, marked as
-					TTY_NORMAL
-	None		1	n/a	character received, marked as
-					TTY_NORMAL
-	Yes		1	0	character received, marked as
-					TTY_PARITY
-	Yes		1	1	character discarded
-	=============== ======= ======  =============================
-
-	Other flags may be used (eg, xon/xoff characters) if your
-	hardware supports hardware "soft" flow control.
-
-	Locking: caller holds tty_port->mutex
-
-	Interrupts: caller dependent.
-
-	This call must not sleep
-
-  set_ldisc(port,termios)
-	Notifier for discipline change. See Documentation/serial/tty.rst.
-
-	Locking: caller holds tty_port->mutex
-
-  pm(port,state,oldstate)
-	Perform any power management related activities on the specified
-	port.  State indicates the new state (defined by
-	enum uart_pm_state), oldstate indicates the previous state.
-
-	This function should not be used to grab any resources.
-
-	This will be called when the port is initially opened and finally
-	closed, except when the port is also the system console.  This
-	will occur even if CONFIG_PM is not set.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  type(port)
-	Return a pointer to a string constant describing the specified
-	port, or return NULL, in which case the string 'unknown' is
-	substituted.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  release_port(port)
-	Release any memory and IO region resources currently in use by
-	the port.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  request_port(port)
-	Request any memory and IO region resources required by the port.
-	If any fail, no resources should be registered when this function
-	returns, and it should return -EBUSY on failure.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  config_port(port,type)
-	Perform any autoconfiguration steps required for the port.  `type`
-	contains a bit mask of the required configuration.  UART_CONFIG_TYPE
-	indicates that the port requires detection and identification.
-	port->type should be set to the type found, or PORT_UNKNOWN if
-	no port was detected.
-
-	UART_CONFIG_IRQ indicates autoconfiguration of the interrupt signal,
-	which should be probed using standard kernel autoprobing techniques.
-	This is not necessary on platforms where ports have interrupts
-	internally hard wired (eg, system on a chip implementations).
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  verify_port(port,serinfo)
-	Verify the new serial port information contained within serinfo is
-	suitable for this port type.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  ioctl(port,cmd,arg)
-	Perform any port specific IOCTLs.  IOCTL commands must be defined
-	using the standard numbering system found in <asm/ioctl.h>
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-  poll_init(port)
-	Called by kgdb to perform the minimal hardware initialization needed
-	to support poll_put_char() and poll_get_char().  Unlike ->startup()
-	this should not request interrupts.
-
-	Locking: tty_mutex and tty_port->mutex taken.
-
-	Interrupts: n/a.
-
-  poll_put_char(port,ch)
-	Called by kgdb to write a single character directly to the serial
-	port.  It can and should block until there is space in the TX FIFO.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-	This call must not sleep
-
-  poll_get_char(port)
-	Called by kgdb to read a single character directly from the serial
-	port.  If data is available, it should be returned; otherwise
-	the function should return NO_POLL_CHAR immediately.
-
-	Locking: none.
-
-	Interrupts: caller dependent.
-
-	This call must not sleep
-
-Other functions
----------------
-
-uart_update_timeout(port,cflag,baud)
-	Update the FIFO drain timeout, port->timeout, according to the
-	number of bits, parity, stop bits and baud rate.
-
-	Locking: caller is expected to take port->lock
-
-	Interrupts: n/a
-
-uart_get_baud_rate(port,termios,old,min,max)
-	Return the numeric baud rate for the specified termios, taking
-	account of the special 38400 baud "kludge".  The B0 baud rate
-	is mapped to 9600 baud.
-
-	If the baud rate is not within min..max, then if old is non-NULL,
-	the original baud rate will be tried.  If that exceeds the
-	min..max constraint, 9600 baud will be returned.  termios will
-	be updated to the baud rate in use.
-
-	Note: min..max must always allow 9600 baud to be selected.
-
-	Locking: caller dependent.
-
-	Interrupts: n/a
-
-uart_get_divisor(port,baud)
-	Return the divisor (baud_base / baud) for the specified baud
-	rate, appropriately rounded.
-
-	If 38400 baud and custom divisor is selected, return the
-	custom divisor instead.
-
-	Locking: caller dependent.
-
-	Interrupts: n/a
-
-uart_match_port(port1,port2)
-	This utility function can be used to determine whether two
-	uart_port structures describe the same port.
-
-	Locking: n/a
-
-	Interrupts: n/a
-
-uart_write_wakeup(port)
-	A driver is expected to call this function when the number of
-	characters in the transmit buffer have dropped below a threshold.
-
-	Locking: port->lock should be held.
-
-	Interrupts: n/a
-
-uart_register_driver(drv)
-	Register a uart driver with the core driver.  We in turn register
-	with the tty layer, and initialise the core driver per-port state.
-
-	drv->port should be NULL, and the per-port structures should be
-	registered using uart_add_one_port after this call has succeeded.
-
-	Locking: none
-
-	Interrupts: enabled
-
-uart_unregister_driver()
-	Remove all references to a driver from the core driver.  The low
-	level driver must have removed all its ports via the
-	uart_remove_one_port() if it registered them with uart_add_one_port().
-
-	Locking: none
-
-	Interrupts: enabled
-
-**uart_suspend_port()**
-
-**uart_resume_port()**
-
-**uart_add_one_port()**
-
-**uart_remove_one_port()**
-
-Other notes
------------
-
-It is intended some day to drop the 'unused' entries from uart_port, and
-allow low level drivers to register their own individual uart_port's with
-the core.  This will allow drivers to use uart_port as a pointer to a
-structure containing both the uart_port entry with their own extensions,
-thus::
-
-	struct my_port {
-		struct uart_port	port;
-		int			my_stuff;
-	};
-
-Modem control lines via GPIO
-----------------------------
-
-Some helpers are provided in order to set/get modem control lines via GPIO.
-
-mctrl_gpio_init(port, idx):
-	This will get the {cts,rts,...}-gpios from device tree if they are
-	present and request them, set direction etc, and return an
-	allocated structure. `devm_*` functions are used, so there's no need
-	to call mctrl_gpio_free().
-	As this sets up the irq handling make sure to not handle changes to the
-	gpio input lines in your driver, too.
-
-mctrl_gpio_free(dev, gpios):
-	This will free the requested gpios in mctrl_gpio_init().
-	As `devm_*` functions are used, there's generally no need to call
-	this function.
-
-mctrl_gpio_to_gpiod(gpios, gidx)
-	This returns the gpio_desc structure associated to the modem line
-	index.
-
-mctrl_gpio_set(gpios, mctrl):
-	This will sets the gpios according to the mctrl state.
-
-mctrl_gpio_get(gpios, mctrl):
-	This will update mctrl with the gpios values.
-
-mctrl_gpio_enable_ms(gpios):
-	Enables irqs and handling of changes to the ms lines.
-
-mctrl_gpio_disable_ms(gpios):
-	Disables irqs and handling of changes to the ms lines.
diff --git a/Documentation/serial/index.rst b/Documentation/serial/index.rst
deleted file mode 100644
index d0ba22ea23bf..000000000000
--- a/Documentation/serial/index.rst
+++ /dev/null
@@ -1,32 +0,0 @@
-:orphan:
-
-==========================
-Support for Serial devices
-==========================
-
-.. toctree::
-    :maxdepth: 1
-
-
-    driver
-    tty
-
-Serial drivers
-==============
-
-.. toctree::
-    :maxdepth: 1
-
-    cyclades_z
-    moxa-smartio
-    n_gsm
-    rocket
-    serial-iso7816
-    serial-rs485
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/serial/moxa-smartio.rst b/Documentation/serial/moxa-smartio.rst
deleted file mode 100644
index 156100f17c3f..000000000000
--- a/Documentation/serial/moxa-smartio.rst
+++ /dev/null
@@ -1,615 +0,0 @@
-=============================================================
-MOXA Smartio/Industio Family Device Driver Installation Guide
-=============================================================
-
-.. note::
-
-   This file is outdated. It needs some care in order to make it
-   updated to Kernel 5.0 and upper
-
-Copyright (C) 2008, Moxa Inc.
-
-Date: 01/21/2008
-
-.. Content
-
-   1. Introduction
-   2. System Requirement
-   3. Installation
-      3.1 Hardware installation
-      3.2 Driver files
-      3.3 Device naming convention
-      3.4 Module driver configuration
-      3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x.
-      3.6 Custom configuration
-      3.7 Verify driver installation
-   4. Utilities
-   5. Setserial
-   6. Troubleshooting
-
-1. Introduction
-^^^^^^^^^^^^^^^
-
-   The Smartio/Industio/UPCI family Linux driver supports following multiport
-   boards.
-
-    - 2 ports multiport board
-	CP-102U, CP-102UL, CP-102UF
-	CP-132U-I, CP-132UL,
-	CP-132, CP-132I, CP132S, CP-132IS,
-	CI-132, CI-132I, CI-132IS,
-	(C102H, C102HI, C102HIS, C102P, CP-102, CP-102S)
-
-    - 4 ports multiport board
-	CP-104EL,
-	CP-104UL, CP-104JU,
-	CP-134U, CP-134U-I,
-	C104H/PCI, C104HS/PCI,
-	CP-114, CP-114I, CP-114S, CP-114IS, CP-114UL,
-	C104H, C104HS,
-	CI-104J, CI-104JS,
-	CI-134, CI-134I, CI-134IS,
-	(C114HI, CT-114I, C104P),
-	POS-104UL,
-	CB-114,
-	CB-134I
-
-    - 8 ports multiport board
-	CP-118EL, CP-168EL,
-	CP-118U, CP-168U,
-	C168H/PCI,
-	C168H, C168HS,
-	(C168P),
-	CB-108
-
-   This driver and installation procedure have been developed upon Linux Kernel
-   2.4.x and 2.6.x. This driver supports Intel x86 hardware platform. In order
-   to maintain compatibility, this version has also been properly tested with
-   RedHat, Mandrake, Fedora and S.u.S.E Linux. However, if compatibility problem
-   occurs, please contact Moxa at support@moxa.com.tw.
-
-   In addition to device driver, useful utilities are also provided in this
-   version. They are:
-
-    - msdiag
-		 Diagnostic program for displaying installed Moxa
-                 Smartio/Industio boards.
-    - msmon
-		 Monitor program to observe data count and line status signals.
-    - msterm     A simple terminal program which is useful in testing serial
-	         ports.
-    - io-irq.exe
-		 Configuration program to setup ISA boards. Please note that
-                 this program can only be executed under DOS.
-
-   All the drivers and utilities are published in form of source code under
-   GNU General Public License in this version. Please refer to GNU General
-   Public License announcement in each source code file for more detail.
-
-   In Moxa's Web sites, you may always find latest driver at http://www.moxa.com/.
-
-   This version of driver can be installed as Loadable Module (Module driver)
-   or built-in into kernel (Static driver). You may refer to following
-   installation procedure for suitable one. Before you install the driver,
-   please refer to hardware installation procedure in the User's Manual.
-
-   We assume the user should be familiar with following documents.
-
-   - Serial-HOWTO
-   - Kernel-HOWTO
-
-2. System Requirement
-^^^^^^^^^^^^^^^^^^^^^
-
-   - Hardware platform: Intel x86 machine
-   - Kernel version: 2.4.x or 2.6.x
-   - gcc version 2.72 or later
-   - Maximum 4 boards can be installed in combination
-
-3. Installation
-^^^^^^^^^^^^^^^
-
-3.1 Hardware installation
-=========================
-
-   There are two types of buses, ISA and PCI, for Smartio/Industio
-   family multiport board.
-
-ISA board
----------
-
-   You'll have to configure CAP address, I/O address, Interrupt Vector
-   as well as IRQ before installing this driver. Please refer to hardware
-   installation procedure in User's Manual before proceed any further.
-   Please make sure the JP1 is open after the ISA board is set properly.
-
-PCI/UPCI board
---------------
-
-   You may need to adjust IRQ usage in BIOS to avoid from IRQ conflict
-   with other ISA devices. Please refer to hardware installation
-   procedure in User's Manual in advance.
-
-PCI IRQ Sharing
----------------
-
-   Each port within the same multiport board shares the same IRQ. Up to
-   4 Moxa Smartio/Industio PCI Family multiport boards can be installed
-   together on one system and they can share the same IRQ.
-
-
-3.2 Driver files
-================
-
-   The driver file may be obtained from ftp, CD-ROM or floppy disk. The
-   first step, anyway, is to copy driver file "mxser.tgz" into specified
-   directory. e.g. /moxa. The execute commands as below::
-
-       # cd /
-       # mkdir moxa
-       # cd /moxa
-       # tar xvf /dev/fd0
-
-or::
-
-       # cd /
-       # mkdir moxa
-       # cd /moxa
-       # cp /mnt/cdrom/<driver directory>/mxser.tgz .
-       # tar xvfz mxser.tgz
-
-
-3.3 Device naming convention
-============================
-
-   You may find all the driver and utilities files in /moxa/mxser.
-   Following installation procedure depends on the model you'd like to
-   run the driver. If you prefer module driver, please refer to 3.4.
-   If static driver is required, please refer to 3.5.
-
-Dialin and callout port
------------------------
-
-   This driver remains traditional serial device properties. There are
-   two special file name for each serial port. One is dial-in port
-   which is named "ttyMxx". For callout port, the naming convention
-   is "cumxx".
-
-Device naming when more than 2 boards installed
------------------------------------------------
-
-   Naming convention for each Smartio/Industio multiport board is
-   pre-defined as below.
-
-   ============ ===============       ==============
-   Board Num.	 Dial-in Port	      Callout port
-   1st board	ttyM0  - ttyM7	      cum0  - cum7
-   2nd board	ttyM8  - ttyM15       cum8  - cum15
-   3rd board	ttyM16 - ttyM23       cum16 - cum23
-   4th board	ttyM24 - ttym31       cum24 - cum31
-   ============ ===============       ==============
-
-.. note::
-
-   Under Kernel 2.6 and upper, the cum Device is Obsolete. So use ttyM*
-   device instead.
-
-Board sequence
---------------
-
-   This driver will activate ISA boards according to the parameter set
-   in the driver. After all specified ISA board activated, PCI board
-   will be installed in the system automatically driven.
-   Therefore the board number is sorted by the CAP address of ISA boards.
-   For PCI boards, their sequence will be after ISA boards and C168H/PCI
-   has higher priority than C104H/PCI boards.
-
-3.4 Module driver configuration
-===============================
-
-   Module driver is easiest way to install. If you prefer static driver
-   installation, please skip this paragraph.
-
-
-   ------------- Prepare to use the MOXA driver --------------------
-
-3.4.1 Create tty device with correct major number
--------------------------------------------------
-
-   Before using MOXA driver, your system must have the tty devices
-   which are created with driver's major number. We offer one shell
-   script "msmknod" to simplify the procedure.
-   This step is only needed to be executed once. But you still
-   need to do this procedure when:
-
-   a. You change the driver's major number. Please refer the "3.7"
-      section.
-   b. Your total installed MOXA boards number is changed. Maybe you
-      add/delete one MOXA board.
-   c. You want to change the tty name. This needs to modify the
-      shell script "msmknod"
-
-   The procedure is::
-
-	 # cd /moxa/mxser/driver
-	 # ./msmknod
-
-   This shell script will require the major number for dial-in
-   device and callout device to create tty device. You also need
-   to specify the total installed MOXA board number. Default major
-   numbers for dial-in device and callout device are 30, 35. If
-   you need to change to other number, please refer section "3.7"
-   for more detailed procedure.
-   Msmknod will delete any special files occupying the same device
-   naming.
-
-3.4.2 Build the MOXA driver and utilities
------------------------------------------
-
-   Before using the MOXA driver and utilities, you need compile the
-   all the source code. This step is only need to be executed once.
-   But you still re-compile the source code if you modify the source
-   code. For example, if you change the driver's major number (see
-   "3.7" section), then you need to do this step again.
-
-   Find "Makefile" in /moxa/mxser, then run
-
-	 # make clean; make install
-
-   ..note::
-
-	 For Red Hat 9, Red Hat Enterprise Linux AS3/ES3/WS3 & Fedora Core1:
-	 # make clean; make installsp1
-
-	 For Red Hat Enterprise Linux AS4/ES4/WS4:
-	 # make clean; make installsp2
-
-   The driver files "mxser.o" and utilities will be properly compiled
-   and copied to system directories respectively.
-
-------------- Load MOXA driver--------------------
-
-3.4.3 Load the MOXA driver
---------------------------
-
-   ::
-
-	 # modprobe mxser <argument>
-
-   will activate the module driver. You may run "lsmod" to check
-   if "mxser" is activated. If the MOXA board is ISA board, the
-   <argument> is needed. Please refer to section "3.4.5" for more
-   information.
-
-------------- Load MOXA driver on boot --------------------
-
-3.4.4 Load the mxser driver
----------------------------
-
-
-   For the above description, you may manually execute
-   "modprobe mxser" to activate this driver and run
-   "rmmod mxser" to remove it.
-
-   However, it's better to have a boot time configuration to
-   eliminate manual operation. Boot time configuration can be
-   achieved by rc file. We offer one "rc.mxser" file to simplify
-   the procedure under "moxa/mxser/driver".
-
-   But if you use ISA board, please modify the "modprobe ..." command
-   to add the argument (see "3.4.5" section). After modifying the
-   rc.mxser, please try to execute "/moxa/mxser/driver/rc.mxser"
-   manually to make sure the modification is ok. If any error
-   encountered, please try to modify again. If the modification is
-   completed, follow the below step.
-
-   Run following command for setting rc files::
-
-	 # cd /moxa/mxser/driver
-	 # cp ./rc.mxser /etc/rc.d
-	 # cd /etc/rc.d
-
-   Check "rc.serial" is existed or not. If "rc.serial" doesn't exist,
-   create it by vi, run "chmod 755 rc.serial" to change the permission.
-
-   Add "/etc/rc.d/rc.mxser" in last line.
-
-   Reboot and check if moxa.o activated by "lsmod" command.
-
-3.4.5. specify CAP address
---------------------------
-
-   If you'd like to drive Smartio/Industio ISA boards in the system,
-   you'll have to add parameter to specify CAP address of given
-   board while activating "mxser.o". The format for parameters are
-   as follows.::
-
-	   modprobe mxser ioaddr=0x???,0x???,0x???,0x???
-				  |  |  |    |
-				  |  |  |    +- 4th ISA board
-				  |  |  +------ 3rd ISA board
-				  |  +------------ 2nd ISA board
-				  +-------------------1st ISA board
-
-3.5 Static driver configuration for Linux kernel 2.4.x and 2.6.x
-================================================================
-
-    Note:
-          To use static driver, you must install the linux kernel
-          source package.
-
-3.5.1 Backup the built-in driver in the kernel
-----------------------------------------------
-
-    ::
-
-       # cd /usr/src/linux/drivers/char
-       # mv mxser.c mxser.c.old
-
-       For Red Hat 7.x user, you need to create link:
-       # cd /usr/src
-       # ln -s linux-2.4 linux
-
-3.5.2 Create link
------------------
-    ::
-
-	  # cd /usr/src/linux/drivers/char
-	  # ln -s /moxa/mxser/driver/mxser.c mxser.c
-
-3.5.3 Add CAP address list for ISA boards.
-------------------------------------------
-
-    For PCI boards user, please skip this step.
-
-    In module mode, the CAP address for ISA board is given by
-    parameter. In static driver configuration, you'll have to
-    assign it within driver's source code. If you will not
-    install any ISA boards, you may skip to next portion.
-    The instructions to modify driver source code are as
-    below.
-
-    a. run::
-
-	# cd /moxa/mxser/driver
-	# vi mxser.c
-
-    b. Find the array mxserBoardCAP[] as below::
-
-	  static int mxserBoardCAP[] = {0x00, 0x00, 0x00, 0x00};
-
-    c. Change the address within this array using vi. For
-       example, to driver 2 ISA boards with CAP address
-       0x280 and 0x180 as 1st and 2nd board. Just to change
-       the source code as follows::
-
-	  static int mxserBoardCAP[] = {0x280, 0x180, 0x00, 0x00};
-
-3.5.4 Setup kernel configuration
---------------------------------
-
-    Configure the kernel::
-
-      # cd /usr/src/linux
-      # make menuconfig
-
-    You will go into a menu-driven system. Please select [Character
-    devices][Non-standard serial port support], enable the [Moxa
-    SmartIO support] driver with "[*]" for built-in (not "[M]"), then
-    select [Exit] to exit this program.
-
-3.5.5 Rebuild kernel
---------------------
-
-    The following are for Linux kernel rebuilding, for your
-    reference only.
-
-    For appropriate details, please refer to the Linux document:
-
-        a. Run the following commands::
-
-	     cd /usr/src/linux
-	     make clean		     # take a few minutes
-	     make dep		     # take a few minutes
-	     make bzImage	     # take probably 10-20 minutes
-	     make install	     # copy boot image to correct position
-
-	f. Please make sure the boot kernel (vmlinuz) is in the
-	   correct position.
-	g. If you use 'lilo' utility, you should check /etc/lilo.conf
-	   'image' item specified the path which is the 'vmlinuz' path,
-	   or you will load wrong (or old) boot kernel image (vmlinuz).
-	   After checking /etc/lilo.conf, please run "lilo".
-
-	  Note that if the result of "make bzImage" is ERROR, then you have to
-	  go back to Linux configuration Setup. Type "make menuconfig" in
-          directory /usr/src/linux.
-
-
-3.5.6 Make tty device and special file
---------------------------------------
-
-    ::
-       # cd /moxa/mxser/driver
-       # ./msmknod
-
-3.5.7 Make utility
-------------------
-
-    ::
-
-	  # cd /moxa/mxser/utility
-	  # make clean; make install
-
-3.5.8 Reboot
-------------
-
-
-
-3.6 Custom configuration
-========================
-
-    Although this driver already provides you default configuration, you
-    still can change the device name and major number. The instruction to
-    change these parameters are shown as below.
-
-a. Change Device name
-
-    If you'd like to use other device names instead of default naming
-    convention, all you have to do is to modify the internal code
-    within the shell script "msmknod". First, you have to open "msmknod"
-    by vi. Locate each line contains "ttyM" and "cum" and change them
-    to the device name you desired. "msmknod" creates the device names
-    you need next time executed.
-
-b. Change Major number
-
-    If major number 30 and 35 had been occupied, you may have to select
-    2 free major numbers for this driver. There are 3 steps to change
-    major numbers.
-
-3.6.1 Find free major numbers
------------------------------
-
-    In /proc/devices, you may find all the major numbers occupied
-    in the system. Please select 2 major numbers that are available.
-    e.g. 40, 45.
-
-3.6.2 Create special files
---------------------------
-
-   Run /moxa/mxser/driver/msmknod to create special files with
-   specified major numbers.
-
-3.6.3 Modify driver with new major number
------------------------------------------
-
-   Run vi to open /moxa/mxser/driver/mxser.c. Locate the line
-   contains "MXSERMAJOR". Change the content as below::
-
-	  #define	  MXSERMAJOR		  40
-	  #define	  MXSERCUMAJOR		  45
-
-    3.6.4 Run "make clean; make install" in /moxa/mxser/driver.
-
-3.7 Verify driver installation
-==============================
-
-    You may refer to /var/log/messages to check the latest status
-    log reported by this driver whenever it's activated.
-
-4. Utilities
-^^^^^^^^^^^^
-
-   There are 3 utilities contained in this driver. They are msdiag, msmon and
-   msterm. These 3 utilities are released in form of source code. They should
-   be compiled into executable file and copied into /usr/bin.
-
-   Before using these utilities, please load driver (refer 3.4 & 3.5) and
-   make sure you had run the "msmknod" utility.
-
-msdiag - Diagnostic
-===================
-
-   This utility provides the function to display what Moxa Smartio/Industio
-   board found by driver in the system.
-
-msmon - Port Monitoring
-=======================
-
-   This utility gives the user a quick view about all the MOXA ports'
-   activities. One can easily learn each port's total received/transmitted
-   (Rx/Tx) character count since the time when the monitoring is started.
-
-   Rx/Tx throughputs per second are also reported in interval basis (e.g.
-   the last 5 seconds) and in average basis (since the time the monitoring
-   is started). You can reset all ports' count by <HOME> key. <+> <->
-   (plus/minus) keys to change the displaying time interval. Press <ENTER>
-   on the port, that cursor stay, to view the port's communication
-   parameters, signal status, and input/output queue.
-
-msterm - Terminal Emulation
-===========================
-
-   This utility provides data sending and receiving ability of all tty ports,
-   especially for MOXA ports. It is quite useful for testing simple
-   application, for example, sending AT command to a modem connected to the
-   port or used as a terminal for login purpose. Note that this is only a
-   dumb terminal emulation without handling full screen operation.
-
-5. Setserial
-^^^^^^^^^^^^
-
-   Supported Setserial parameters are listed as below.
-
-   ============== =========================================================
-   uart		  set UART type(16450-->disable FIFO, 16550A-->enable FIFO)
-   close_delay	  set the amount of time(in 1/100 of a second) that DTR
-		  should be kept low while being closed.
-   closing_wait   set the amount of time(in 1/100 of a second) that the
-		  serial port should wait for data to be drained while
-		  being closed, before the receiver is disable.
-   spd_hi	  Use  57.6kb  when  the application requests 38.4kb.
-   spd_vhi	  Use  115.2kb	when  the application requests 38.4kb.
-   spd_shi	  Use  230.4kb	when  the application requests 38.4kb.
-   spd_warp	  Use  460.8kb	when  the application requests 38.4kb.
-   spd_normal	  Use  38.4kb  when  the application requests 38.4kb.
-   spd_cust	  Use  the custom divisor to set the speed when  the
-		  application requests 38.4kb.
-   divisor	  This option set the custom division.
-   baud_base	  This option set the base baud rate.
-   ============== =========================================================
-
-6. Troubleshooting
-^^^^^^^^^^^^^^^^^^
-
-   The boot time error messages and solutions are stated as clearly as
-   possible. If all the possible solutions fail, please contact our technical
-   support team to get more help.
-
-
-   Error msg:
-	      More than 4 Moxa Smartio/Industio family boards found. Fifth board
-              and after are ignored.
-
-   Solution:
-   To avoid this problem, please unplug fifth and after board, because Moxa
-   driver supports up to 4 boards.
-
-   Error msg:
-	      Request_irq fail, IRQ(?) may be conflict with another device.
-
-   Solution:
-   Other PCI or ISA devices occupy the assigned IRQ. If you are not sure
-   which device causes the situation, please check /proc/interrupts to find
-   free IRQ and simply change another free IRQ for Moxa board.
-
-   Error msg:
-	      Board #: C1xx Series(CAP=xxx) interrupt number invalid.
-
-   Solution:
-   Each port within the same multiport board shares the same IRQ. Please set
-   one IRQ (IRQ doesn't equal to zero) for one Moxa board.
-
-   Error msg:
-	      No interrupt vector be set for Moxa ISA board(CAP=xxx).
-
-   Solution:
-   Moxa ISA board needs an interrupt vector.Please refer to user's manual
-   "Hardware Installation" chapter to set interrupt vector.
-
-   Error msg:
-              Couldn't install MOXA Smartio/Industio family driver!
-
-   Solution:
-   Load Moxa driver fail, the major number may conflict with other devices.
-   Please refer to previous section 3.7 to change a free major number for
-   Moxa driver.
-
-   Error msg:
-              Couldn't install MOXA Smartio/Industio family callout driver!
-
-   Solution:
-   Load Moxa callout driver fail, the callout device major number may
-   conflict with other devices. Please refer to previous section 3.7 to
-   change a free callout device major number for Moxa driver.
diff --git a/Documentation/serial/n_gsm.rst b/Documentation/serial/n_gsm.rst
deleted file mode 100644
index f3ad9fd26408..000000000000
--- a/Documentation/serial/n_gsm.rst
+++ /dev/null
@@ -1,103 +0,0 @@
-==============================
-GSM 0710 tty multiplexor HOWTO
-==============================
-
-This line discipline implements the GSM 07.10 multiplexing protocol
-detailed in the following 3GPP document:
-
-	http://www.3gpp.org/ftp/Specs/archive/07_series/07.10/0710-720.zip
-
-This document give some hints on how to use this driver with GPRS and 3G
-modems connected to a physical serial port.
-
-How to use it
--------------
-1. initialize the modem in 0710 mux mode (usually AT+CMUX= command) through
-   its serial port. Depending on the modem used, you can pass more or less
-   parameters to this command,
-2. switch the serial line to using the n_gsm line discipline by using
-   TIOCSETD ioctl,
-3. configure the mux using GSMIOC_GETCONF / GSMIOC_SETCONF ioctl,
-
-Major parts of the initialization program :
-(a good starting point is util-linux-ng/sys-utils/ldattach.c)::
-
-  #include <linux/gsmmux.h>
-  #define N_GSM0710	21	/* GSM 0710 Mux */
-  #define DEFAULT_SPEED	B115200
-  #define SERIAL_PORT	/dev/ttyS0
-
-	int ldisc = N_GSM0710;
-	struct gsm_config c;
-	struct termios configuration;
-
-	/* open the serial port connected to the modem */
-	fd = open(SERIAL_PORT, O_RDWR | O_NOCTTY | O_NDELAY);
-
-	/* configure the serial port : speed, flow control ... */
-
-	/* send the AT commands to switch the modem to CMUX mode
-	   and check that it's successful (should return OK) */
-	write(fd, "AT+CMUX=0\r", 10);
-
-	/* experience showed that some modems need some time before
-	   being able to answer to the first MUX packet so a delay
-	   may be needed here in some case */
-	sleep(3);
-
-	/* use n_gsm line discipline */
-	ioctl(fd, TIOCSETD, &ldisc);
-
-	/* get n_gsm configuration */
-	ioctl(fd, GSMIOC_GETCONF, &c);
-	/* we are initiator and need encoding 0 (basic) */
-	c.initiator = 1;
-	c.encapsulation = 0;
-	/* our modem defaults to a maximum size of 127 bytes */
-	c.mru = 127;
-	c.mtu = 127;
-	/* set the new configuration */
-	ioctl(fd, GSMIOC_SETCONF, &c);
-
-	/* and wait for ever to keep the line discipline enabled */
-	daemon(0,0);
-	pause();
-
-4. create the devices corresponding to the "virtual" serial ports (take care,
-   each modem has its configuration and some DLC have dedicated functions,
-   for example GPS), starting with minor 1 (DLC0 is reserved for the management
-   of the mux)::
-
-     MAJOR=`cat /proc/devices |grep gsmtty | awk '{print $1}`
-     for i in `seq 1 4`; do
-	mknod /dev/ttygsm$i c $MAJOR $i
-     done
-
-5. use these devices as plain serial ports.
-
-   for example, it's possible:
-
-   - and to use gnokii to send / receive SMS on ttygsm1
-   - to use ppp to establish a datalink on ttygsm2
-
-6. first close all virtual ports before closing the physical port.
-
-   Note that after closing the physical port the modem is still in multiplexing
-   mode. This may prevent a successful re-opening of the port later. To avoid
-   this situation either reset the modem if your hardware allows that or send
-   a disconnect command frame manually before initializing the multiplexing mode
-   for the second time. The byte sequence for the disconnect command frame is::
-
-      0xf9, 0x03, 0xef, 0x03, 0xc3, 0x16, 0xf9.
-
-Additional Documentation
-------------------------
-More practical details on the protocol and how it's supported by industrial
-modems can be found in the following documents :
-
-- http://www.telit.com/module/infopool/download.php?id=616
-- http://www.u-blox.com/images/downloads/Product_Docs/LEON-G100-G200-MuxImplementation_ApplicationNote_%28GSM%20G1-CS-10002%29.pdf
-- http://www.sierrawireless.com/Support/Downloads/AirPrime/WMP_Series/~/media/Support_Downloads/AirPrime/Application_notes/CMUX_Feature_Application_Note-Rev004.ashx
-- http://wm.sim.com/sim/News/photo/2010721161442.pdf
-
-11-03-08 - Eric Bénard - <eric@eukrea.com>
diff --git a/Documentation/serial/rocket.rst b/Documentation/serial/rocket.rst
deleted file mode 100644
index 23761eae4282..000000000000
--- a/Documentation/serial/rocket.rst
+++ /dev/null
@@ -1,185 +0,0 @@
-================================================
-Comtrol(tm) RocketPort(R)/RocketModem(TM) Series
-================================================
-
-Device Driver for the Linux Operating System
-============================================
-
-Product overview
-----------------
-
-This driver provides a loadable kernel driver for the Comtrol RocketPort
-and RocketModem PCI boards. These boards provide, 2, 4, 8, 16, or 32
-high-speed serial ports or modems.  This driver supports up to a combination
-of four RocketPort or RocketModems boards in one machine simultaneously.
-This file assumes that you are using the RocketPort driver which is
-integrated into the kernel sources.
-
-The driver can also be installed as an external module using the usual
-"make;make install" routine.  This external module driver, obtainable
-from the Comtrol website listed below, is useful for updating the driver
-or installing it into kernels which do not have the driver configured
-into them.  Installations instructions for the external module
-are in the included README and HW_INSTALL files.
-
-RocketPort ISA and RocketModem II PCI boards currently are only supported by
-this driver in module form.
-
-The RocketPort ISA board requires I/O ports to be configured by the DIP
-switches on the board.  See the section "ISA Rocketport Boards" below for
-information on how to set the DIP switches.
-
-You pass the I/O port to the driver using the following module parameters:
-
-board1:
-	I/O port for the first ISA board
-board2:
-	I/O port for the second ISA board
-board3:
-	I/O port for the third ISA board
-board4:
-	I/O port for the fourth ISA board
-
-There is a set of utilities and scripts provided with the external driver
-(downloadable from http://www.comtrol.com) that ease the configuration and
-setup of the ISA cards.
-
-The RocketModem II PCI boards require firmware to be loaded into the card
-before it will function.  The driver has only been tested as a module for this
-board.
-
-Installation Procedures
------------------------
-
-RocketPort/RocketModem PCI cards require no driver configuration, they are
-automatically detected and configured.
-
-The RocketPort driver can be installed as a module (recommended) or built
-into the kernel. This is selected, as for other drivers, through the `make config`
-command from the root of the Linux source tree during the kernel build process.
-
-The RocketPort/RocketModem serial ports installed by this driver are assigned
-device major number 46, and will be named /dev/ttyRx, where x is the port number
-starting at zero (ex. /dev/ttyR0, /devttyR1, ...).  If you have multiple cards
-installed in the system, the mapping of port names to serial ports is displayed
-in the system log at /var/log/messages.
-
-If installed as a module, the module must be loaded.  This can be done
-manually by entering "modprobe rocket".  To have the module loaded automatically
-upon system boot, edit a `/etc/modprobe.d/*.conf` file and add the line
-"alias char-major-46 rocket".
-
-In order to use the ports, their device names (nodes) must be created with mknod.
-This is only required once, the system will retain the names once created.  To
-create the RocketPort/RocketModem device names, use the command
-"mknod /dev/ttyRx c 46 x" where x is the port number starting at zero.
-
-For example::
-
-	> mknod /dev/ttyR0 c 46 0
-	> mknod /dev/ttyR1 c 46 1
-	> mknod /dev/ttyR2 c 46 2
-
-The Linux script MAKEDEV will create the first 16 ttyRx device names (nodes)
-for you::
-
-	>/dev/MAKEDEV ttyR
-
-ISA Rocketport Boards
----------------------
-
-You must assign and configure the I/O addresses used by the ISA Rocketport
-card before installing and using it.  This is done by setting a set of DIP
-switches on the Rocketport board.
-
-
-Setting the I/O address
------------------------
-
-Before installing RocketPort(R) or RocketPort RA boards, you must find
-a range of I/O addresses for it to use. The first RocketPort card
-requires a 68-byte contiguous block of I/O addresses, starting at one
-of the following: 0x100h, 0x140h, 0x180h, 0x200h, 0x240h, 0x280h,
-0x300h, 0x340h, 0x380h.  This I/O address must be reflected in the DIP
-switches of *all* of the Rocketport cards.
-
-The second, third, and fourth RocketPort cards require a 64-byte
-contiguous block of I/O addresses, starting at one of the following
-I/O addresses: 0x100h, 0x140h, 0x180h, 0x1C0h, 0x200h, 0x240h, 0x280h,
-0x2C0h, 0x300h, 0x340h, 0x380h, 0x3C0h.  The I/O address used by the
-second, third, and fourth Rocketport cards (if present) are set via
-software control.  The DIP switch settings for the I/O address must be
-set to the value of the first Rocketport cards.
-
-In order to distinguish each of the card from the others, each card
-must have a unique board ID set on the dip switches.  The first
-Rocketport board must be set with the DIP switches corresponding to
-the first board, the second board must be set with the DIP switches
-corresponding to the second board, etc.  IMPORTANT: The board ID is
-the only place where the DIP switch settings should differ between the
-various Rocketport boards in a system.
-
-The I/O address range used by any of the RocketPort cards must not
-conflict with any other cards in the system, including other
-RocketPort cards.  Below, you will find a list of commonly used I/O
-address ranges which may be in use by other devices in your system.
-On a Linux system, "cat /proc/ioports" will also be helpful in
-identifying what I/O addresses are being used by devices on your
-system.
-
-Remember, the FIRST RocketPort uses 68 I/O addresses.  So, if you set it
-for 0x100, it will occupy 0x100 to 0x143.  This would mean that you
-CAN NOT set the second, third or fourth board for address 0x140 since
-the first 4 bytes of that range are used by the first board.  You would
-need to set the second, third, or fourth board to one of the next available
-blocks such as 0x180.
-
-RocketPort and RocketPort RA SW1 Settings::
-
-            +-------------------------------+
-            | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
-            +-------+-------+---------------+
-            | Unused| Card  | I/O Port Block|
-            +-------------------------------+
-
-  DIP Switches                             DIP Switches
-  7    8                                   6    5
-  ===================                      ===================
-  On   On   UNUSED, MUST BE ON.            On   On   First Card    <==== Default
-                                           On   Off  Second Card
-                                           Off  On   Third Card
-                                           Off  Off  Fourth Card
-
-  DIP Switches         I/O Address Range
-  4    3    2    1     Used by the First Card
-  =====================================
-  On   Off  On   Off   100-143
-  On   Off  Off  On    140-183
-  On   Off  Off  Off   180-1C3       <==== Default
-  Off  On   On   Off   200-243
-  Off  On   Off  On    240-283
-  Off  On   Off  Off   280-2C3
-  Off  Off  On   Off   300-343
-  Off  Off  Off  On    340-383
-  Off  Off  Off  Off   380-3C3
-
-Reporting Bugs
---------------
-
-For technical support, please provide the following
-information: Driver version, kernel release, distribution of
-kernel, and type of board you are using. Error messages and log
-printouts port configuration details are especially helpful.
-
-USA:
-    :Phone: (612) 494-4100
-    :FAX: (612) 494-4199
-    :email: support@comtrol.com
-
-Comtrol Europe:
-    :Phone: +44 (0) 1 869 323-220
-    :FAX: +44 (0) 1 869 323-211
-    :email: support@comtrol.co.uk
-
-Web:	http://www.comtrol.com
-FTP:	ftp.comtrol.com
diff --git a/Documentation/serial/serial-iso7816.rst b/Documentation/serial/serial-iso7816.rst
deleted file mode 100644
index d990143de0c6..000000000000
--- a/Documentation/serial/serial-iso7816.rst
+++ /dev/null
@@ -1,90 +0,0 @@
-=============================
-ISO7816 Serial Communications
-=============================
-
-1. Introduction
-===============
-
-  ISO/IEC7816 is a series of standards specifying integrated circuit cards (ICC)
-  also known as smart cards.
-
-2. Hardware-related considerations
-==================================
-
-  Some CPUs/UARTs (e.g., Microchip AT91) contain a built-in mode capable of
-  handling communication with a smart card.
-
-  For these microcontrollers, the Linux driver should be made capable of
-  working in both modes, and proper ioctls (see later) should be made
-  available at user-level to allow switching from one mode to the other, and
-  vice versa.
-
-3. Data Structures Already Available in the Kernel
-==================================================
-
-  The Linux kernel provides the serial_iso7816 structure (see [1]) to handle
-  ISO7816 communications. This data structure is used to set and configure
-  ISO7816 parameters in ioctls.
-
-  Any driver for devices capable of working both as RS232 and ISO7816 should
-  implement the iso7816_config callback in the uart_port structure. The
-  serial_core calls iso7816_config to do the device specific part in response
-  to TIOCGISO7816 and TIOCSISO7816 ioctls (see below). The iso7816_config
-  callback receives a pointer to struct serial_iso7816.
-
-4. Usage from user-level
-========================
-
-  From user-level, ISO7816 configuration can be get/set using the previous
-  ioctls. For instance, to set ISO7816 you can use the following code::
-
-	#include <linux/serial.h>
-
-	/* Include definition for ISO7816 ioctls: TIOCSISO7816 and TIOCGISO7816 */
-	#include <sys/ioctl.h>
-
-	/* Open your specific device (e.g., /dev/mydevice): */
-	int fd = open ("/dev/mydevice", O_RDWR);
-	if (fd < 0) {
-		/* Error handling. See errno. */
-	}
-
-	struct serial_iso7816 iso7816conf;
-
-	/* Reserved fields as to be zeroed */
-	memset(&iso7816conf, 0, sizeof(iso7816conf));
-
-	/* Enable ISO7816 mode: */
-	iso7816conf.flags |= SER_ISO7816_ENABLED;
-
-	/* Select the protocol: */
-	/* T=0 */
-	iso7816conf.flags |= SER_ISO7816_T(0);
-	/* or T=1 */
-	iso7816conf.flags |= SER_ISO7816_T(1);
-
-	/* Set the guard time: */
-	iso7816conf.tg = 2;
-
-	/* Set the clock frequency*/
-	iso7816conf.clk = 3571200;
-
-	/* Set transmission factors: */
-	iso7816conf.sc_fi = 372;
-	iso7816conf.sc_di = 1;
-
-	if (ioctl(fd_usart, TIOCSISO7816, &iso7816conf) < 0) {
-		/* Error handling. See errno. */
-	}
-
-	/* Use read() and write() syscalls here... */
-
-	/* Close the device when finished: */
-	if (close (fd) < 0) {
-		/* Error handling. See errno. */
-	}
-
-5. References
-=============
-
- [1]    include/uapi/linux/serial.h
diff --git a/Documentation/serial/serial-rs485.rst b/Documentation/serial/serial-rs485.rst
deleted file mode 100644
index 6bc824f948f9..000000000000
--- a/Documentation/serial/serial-rs485.rst
+++ /dev/null
@@ -1,103 +0,0 @@
-===========================
-RS485 Serial Communications
-===========================
-
-1. Introduction
-===============
-
-   EIA-485, also known as TIA/EIA-485 or RS-485, is a standard defining the
-   electrical characteristics of drivers and receivers for use in balanced
-   digital multipoint systems.
-   This standard is widely used for communications in industrial automation
-   because it can be used effectively over long distances and in electrically
-   noisy environments.
-
-2. Hardware-related Considerations
-==================================
-
-   Some CPUs/UARTs (e.g., Atmel AT91 or 16C950 UART) contain a built-in
-   half-duplex mode capable of automatically controlling line direction by
-   toggling RTS or DTR signals. That can be used to control external
-   half-duplex hardware like an RS485 transceiver or any RS232-connected
-   half-duplex devices like some modems.
-
-   For these microcontrollers, the Linux driver should be made capable of
-   working in both modes, and proper ioctls (see later) should be made
-   available at user-level to allow switching from one mode to the other, and
-   vice versa.
-
-3. Data Structures Already Available in the Kernel
-==================================================
-
-   The Linux kernel provides the serial_rs485 structure (see [1]) to handle
-   RS485 communications. This data structure is used to set and configure RS485
-   parameters in the platform data and in ioctls.
-
-   The device tree can also provide RS485 boot time parameters (see [2]
-   for bindings). The driver is in charge of filling this data structure from
-   the values given by the device tree.
-
-   Any driver for devices capable of working both as RS232 and RS485 should
-   implement the rs485_config callback in the uart_port structure. The
-   serial_core calls rs485_config to do the device specific part in response
-   to TIOCSRS485 and TIOCGRS485 ioctls (see below). The rs485_config callback
-   receives a pointer to struct serial_rs485.
-
-4. Usage from user-level
-========================
-
-   From user-level, RS485 configuration can be get/set using the previous
-   ioctls. For instance, to set RS485 you can use the following code::
-
-	#include <linux/serial.h>
-
-	/* Include definition for RS485 ioctls: TIOCGRS485 and TIOCSRS485 */
-	#include <sys/ioctl.h>
-
-	/* Open your specific device (e.g., /dev/mydevice): */
-	int fd = open ("/dev/mydevice", O_RDWR);
-	if (fd < 0) {
-		/* Error handling. See errno. */
-	}
-
-	struct serial_rs485 rs485conf;
-
-	/* Enable RS485 mode: */
-	rs485conf.flags |= SER_RS485_ENABLED;
-
-	/* Set logical level for RTS pin equal to 1 when sending: */
-	rs485conf.flags |= SER_RS485_RTS_ON_SEND;
-	/* or, set logical level for RTS pin equal to 0 when sending: */
-	rs485conf.flags &= ~(SER_RS485_RTS_ON_SEND);
-
-	/* Set logical level for RTS pin equal to 1 after sending: */
-	rs485conf.flags |= SER_RS485_RTS_AFTER_SEND;
-	/* or, set logical level for RTS pin equal to 0 after sending: */
-	rs485conf.flags &= ~(SER_RS485_RTS_AFTER_SEND);
-
-	/* Set rts delay before send, if needed: */
-	rs485conf.delay_rts_before_send = ...;
-
-	/* Set rts delay after send, if needed: */
-	rs485conf.delay_rts_after_send = ...;
-
-	/* Set this flag if you want to receive data even while sending data */
-	rs485conf.flags |= SER_RS485_RX_DURING_TX;
-
-	if (ioctl (fd, TIOCSRS485, &rs485conf) < 0) {
-		/* Error handling. See errno. */
-	}
-
-	/* Use read() and write() syscalls here... */
-
-	/* Close the device when finished: */
-	if (close (fd) < 0) {
-		/* Error handling. See errno. */
-	}
-
-5. References
-=============
-
- [1]	include/uapi/linux/serial.h
-
- [2]	Documentation/devicetree/bindings/serial/rs485.txt
diff --git a/Documentation/serial/tty.rst b/Documentation/serial/tty.rst
deleted file mode 100644
index dd972caacf3e..000000000000
--- a/Documentation/serial/tty.rst
+++ /dev/null
@@ -1,328 +0,0 @@
-=================
-The Lockronomicon
-=================
-
-Your guide to the ancient and twisted locking policies of the tty layer and
-the warped logic behind them. Beware all ye who read on.
-
-
-Line Discipline
----------------
-
-Line disciplines are registered with tty_register_ldisc() passing the
-discipline number and the ldisc structure. At the point of registration the
-discipline must be ready to use and it is possible it will get used before
-the call returns success. If the call returns an error then it won't get
-called. Do not re-use ldisc numbers as they are part of the userspace ABI
-and writing over an existing ldisc will cause demons to eat your computer.
-After the return the ldisc data has been copied so you may free your own
-copy of the structure. You must not re-register over the top of the line
-discipline even with the same data or your computer again will be eaten by
-demons.
-
-In order to remove a line discipline call tty_unregister_ldisc().
-In ancient times this always worked. In modern times the function will
-return -EBUSY if the ldisc is currently in use. Since the ldisc referencing
-code manages the module counts this should not usually be a concern.
-
-Heed this warning: the reference count field of the registered copies of the
-tty_ldisc structure in the ldisc table counts the number of lines using this
-discipline. The reference count of the tty_ldisc structure within a tty
-counts the number of active users of the ldisc at this instant. In effect it
-counts the number of threads of execution within an ldisc method (plus those
-about to enter and exit although this detail matters not).
-
-Line Discipline Methods
------------------------
-
-TTY side interfaces
-^^^^^^^^^^^^^^^^^^^
-
-======================= =======================================================
-open()			Called when the line discipline is attached to
-			the terminal. No other call into the line
-			discipline for this tty will occur until it
-			completes successfully. Should initialize any
-			state needed by the ldisc, and set receive_room
-			in the tty_struct to the maximum amount of data
-			the line discipline is willing to accept from the
-			driver with a single call to receive_buf().
-			Returning an error will prevent the ldisc from
-			being attached. Can sleep.
-
-close()			This is called on a terminal when the line
-			discipline is being unplugged. At the point of
-			execution no further users will enter the
-			ldisc code for this tty. Can sleep.
-
-hangup()		Called when the tty line is hung up.
-			The line discipline should cease I/O to the tty.
-			No further calls into the ldisc code will occur.
-			The return value is ignored. Can sleep.
-
-read()			(optional) A process requests reading data from
-			the line. Multiple read calls may occur in parallel
-			and the ldisc must deal with serialization issues.
-			If not defined, the process will receive an EIO
-			error. May sleep.
-
-write()			(optional) A process requests writing data to the
-			line. Multiple write calls are serialized by the
-			tty layer for the ldisc. If not defined, the
-			process will receive an EIO error. May sleep.
-
-flush_buffer()		(optional) May be called at any point between
-			open and close, and instructs the line discipline
-			to empty its input buffer.
-
-set_termios()		(optional) Called on termios structure changes.
-			The caller passes the old termios data and the
-			current data is in the tty. Called under the
-			termios semaphore so allowed to sleep. Serialized
-			against itself only.
-
-poll()			(optional) Check the status for the poll/select
-			calls. Multiple poll calls may occur in parallel.
-			May sleep.
-
-ioctl()			(optional) Called when an ioctl is handed to the
-			tty layer that might be for the ldisc. Multiple
-			ioctl calls may occur in parallel. May sleep.
-
-compat_ioctl()		(optional) Called when a 32 bit ioctl is handed
-			to the tty layer that might be for the ldisc.
-			Multiple ioctl calls may occur in parallel.
-			May sleep.
-======================= =======================================================
-
-Driver Side Interfaces
-^^^^^^^^^^^^^^^^^^^^^^
-
-======================= =======================================================
-receive_buf()		(optional) Called by the low-level driver to hand
-			a buffer of received bytes to the ldisc for
-			processing. The number of bytes is guaranteed not
-			to exceed the current value of tty->receive_room.
-			All bytes must be processed.
-
-receive_buf2()		(optional) Called by the low-level driver to hand
-			a buffer of received bytes to the ldisc for
-			processing. Returns the number of bytes processed.
-
-			If both receive_buf() and receive_buf2() are
-			defined, receive_buf2() should be preferred.
-
-write_wakeup()		May be called at any point between open and close.
-			The TTY_DO_WRITE_WAKEUP flag indicates if a call
-			is needed but always races versus calls. Thus the
-			ldisc must be careful about setting order and to
-			handle unexpected calls. Must not sleep.
-
-			The driver is forbidden from calling this directly
-			from the ->write call from the ldisc as the ldisc
-			is permitted to call the driver write method from
-			this function. In such a situation defer it.
-
-dcd_change()		Report to the tty line the current DCD pin status
-			changes and the relative timestamp. The timestamp
-			cannot be NULL.
-======================= =======================================================
-
-
-Driver Access
-^^^^^^^^^^^^^
-
-Line discipline methods can call the following methods of the underlying
-hardware driver through the function pointers within the tty->driver
-structure:
-
-======================= =======================================================
-write()			Write a block of characters to the tty device.
-			Returns the number of characters accepted. The
-			character buffer passed to this method is already
-			in kernel space.
-
-put_char()		Queues a character for writing to the tty device.
-			If there is no room in the queue, the character is
-			ignored.
-
-flush_chars()		(Optional) If defined, must be called after
-			queueing characters with put_char() in order to
-			start transmission.
-
-write_room()		Returns the numbers of characters the tty driver
-			will accept for queueing to be written.
-
-ioctl()			Invoke device specific ioctl.
-			Expects data pointers to refer to userspace.
-			Returns ENOIOCTLCMD for unrecognized ioctl numbers.
-
-set_termios()		Notify the tty driver that the device's termios
-			settings have changed. New settings are in
-			tty->termios. Previous settings should be passed in
-			the "old" argument.
-
-			The API is defined such that the driver should return
-			the actual modes selected. This means that the
-			driver function is responsible for modifying any
-			bits in the request it cannot fulfill to indicate
-			the actual modes being used. A device with no
-			hardware capability for change (e.g. a USB dongle or
-			virtual port) can provide NULL for this method.
-
-throttle()		Notify the tty driver that input buffers for the
-			line discipline are close to full, and it should
-			somehow signal that no more characters should be
-			sent to the tty.
-
-unthrottle()		Notify the tty driver that characters can now be
-			sent to the tty without fear of overrunning the
-			input buffers of the line disciplines.
-
-stop()			Ask the tty driver to stop outputting characters
-			to the tty device.
-
-start()			Ask the tty driver to resume sending characters
-			to the tty device.
-
-hangup()		Ask the tty driver to hang up the tty device.
-
-break_ctl()		(Optional) Ask the tty driver to turn on or off
-			BREAK status on the RS-232 port.  If state is -1,
-			then the BREAK status should be turned on; if
-			state is 0, then BREAK should be turned off.
-			If this routine is not implemented, use ioctls
-			TIOCSBRK / TIOCCBRK instead.
-
-wait_until_sent()	Waits until the device has written out all of the
-			characters in its transmitter FIFO.
-
-send_xchar()		Send a high-priority XON/XOFF character to the device.
-======================= =======================================================
-
-
-Flags
-^^^^^
-
-Line discipline methods have access to tty->flags field containing the
-following interesting flags:
-
-======================= =======================================================
-TTY_THROTTLED		Driver input is throttled. The ldisc should call
-			tty->driver->unthrottle() in order to resume
-			reception when it is ready to process more data.
-
-TTY_DO_WRITE_WAKEUP	If set, causes the driver to call the ldisc's
-			write_wakeup() method in order to resume
-			transmission when it can accept more data
-			to transmit.
-
-TTY_IO_ERROR		If set, causes all subsequent userspace read/write
-			calls on the tty to fail, returning -EIO.
-
-TTY_OTHER_CLOSED	Device is a pty and the other side has closed.
-
-TTY_NO_WRITE_SPLIT	Prevent driver from splitting up writes into
-			smaller chunks.
-======================= =======================================================
-
-
-Locking
-^^^^^^^
-
-Callers to the line discipline functions from the tty layer are required to
-take line discipline locks. The same is true of calls from the driver side
-but not yet enforced.
-
-Three calls are now provided::
-
-	ldisc = tty_ldisc_ref(tty);
-
-takes a handle to the line discipline in the tty and returns it. If no ldisc
-is currently attached or the ldisc is being closed and re-opened at this
-point then NULL is returned. While this handle is held the ldisc will not
-change or go away::
-
-	tty_ldisc_deref(ldisc)
-
-Returns the ldisc reference and allows the ldisc to be closed. Returning the
-reference takes away your right to call the ldisc functions until you take
-a new reference::
-
-	ldisc = tty_ldisc_ref_wait(tty);
-
-Performs the same function as tty_ldisc_ref except that it will wait for an
-ldisc change to complete and then return a reference to the new ldisc.
-
-While these functions are slightly slower than the old code they should have
-minimal impact as most receive logic uses the flip buffers and they only
-need to take a reference when they push bits up through the driver.
-
-A caution: The ldisc->open(), ldisc->close() and driver->set_ldisc
-functions are called with the ldisc unavailable. Thus tty_ldisc_ref will
-fail in this situation if used within these functions. Ldisc and driver
-code calling its own functions must be careful in this case.
-
-
-Driver Interface
-----------------
-
-======================= =======================================================
-open()			Called when a device is opened. May sleep
-
-close()			Called when a device is closed. At the point of
-			return from this call the driver must make no
-			further ldisc calls of any kind. May sleep
-
-write()			Called to write bytes to the device. May not
-			sleep. May occur in parallel in special cases.
-			Because this includes panic paths drivers generally
-			shouldn't try and do clever locking here.
-
-put_char()		Stuff a single character onto the queue. The
-			driver is guaranteed following up calls to
-			flush_chars.
-
-flush_chars()		Ask the kernel to write put_char queue
-
-write_room()		Return the number of characters that can be stuffed
-			into the port buffers without overflow (or less).
-			The ldisc is responsible for being intelligent
-			about multi-threading of write_room/write calls
-
-ioctl()			Called when an ioctl may be for the driver
-
-set_termios()		Called on termios change, serialized against
-			itself by a semaphore. May sleep.
-
-set_ldisc()		Notifier for discipline change. At the point this
-			is done the discipline is not yet usable. Can now
-			sleep (I think)
-
-throttle()		Called by the ldisc to ask the driver to do flow
-			control.  Serialization including with unthrottle
-			is the job of the ldisc layer.
-
-unthrottle()		Called by the ldisc to ask the driver to stop flow
-			control.
-
-stop()			Ldisc notifier to the driver to stop output. As with
-			throttle the serializations with start() are down
-			to the ldisc layer.
-
-start()			Ldisc notifier to the driver to start output.
-
-hangup()		Ask the tty driver to cause a hangup initiated
-			from the host side. [Can sleep ??]
-
-break_ctl()		Send RS232 break. Can sleep. Can get called in
-			parallel, driver must serialize (for now), and
-			with write calls.
-
-wait_until_sent()	Wait for characters to exit the hardware queue
-			of the driver. Can sleep
-
-send_xchar()	  	Send XON/XOFF and if possible jump the queue with
-			it in order to get fast flow control responses.
-			Cannot sleep ??
-======================= =======================================================
diff --git a/MAINTAINERS b/MAINTAINERS
index d1a0a817dd92..4f88bca37c55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10767,7 +10767,7 @@ F:	include/uapi/linux/meye.h
 MOXA SMARTIO/INDUSTIO/INTELLIO SERIAL CARD
 M:	Jiri Slaby <jirislaby@gmail.com>
 S:	Maintained
-F:	Documentation/serial/moxa-smartio.rst
+F:	Documentation/driver-api/serial/moxa-smartio.rst
 F:	drivers/tty/mxser.*
 
 MR800 AVERMEDIA USB FM RADIO DRIVER
@@ -13689,7 +13689,7 @@ ROCKETPORT DRIVER
 P:	Comtrol Corp.
 W:	http://www.comtrol.com
 S:	Maintained
-F:	Documentation/serial/rocket.rst
+F:	Documentation/driver-api/serial/rocket.rst
 F:	drivers/tty/rocket*
 
 ROCKETPORT EXPRESS/INFINITY DRIVER
@@ -16228,7 +16228,7 @@ M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 M:	Jiri Slaby <jslaby@suse.com>
 S:	Supported
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
-F:	Documentation/serial/
+F:	Documentation/driver-api/serial/
 F:	drivers/tty/
 F:	drivers/tty/serial/serial_core.c
 F:	include/linux/serial_core.h
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index ee51b9514225..c7623f99ac0f 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -175,7 +175,7 @@ config ROCKETPORT
 	  This driver supports Comtrol RocketPort and RocketModem PCI boards.   
           These boards provide 2, 4, 8, 16, or 32 high-speed serial ports or
           modems.  For information about the RocketPort/RocketModem  boards
-          and this driver read <file:Documentation/serial/rocket.rst>.
+          and this driver read <file:Documentation/driver-api/serial/rocket.rst>.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called rocket.
@@ -193,7 +193,7 @@ config CYCLADES
 	  your Linux box, for instance in order to become a dial-in server.
 
 	  For information about the Cyclades-Z card, read
-	  <file:Documentation/serial/cyclades_z.rst>.
+	  <file:Documentation/driver-api/serial/cyclades_z.rst>.
 
 	  To compile this driver as a module, choose M here: the
 	  module will be called cyclades.
diff --git a/drivers/tty/serial/ucc_uart.c b/drivers/tty/serial/ucc_uart.c
index 6e3c66ab0e62..a0555ae2b1ef 100644
--- a/drivers/tty/serial/ucc_uart.c
+++ b/drivers/tty/serial/ucc_uart.c
@@ -1081,7 +1081,7 @@ static int qe_uart_verify_port(struct uart_port *port,
 }
 /* UART operations
  *
- * Details on these functions can be found in Documentation/serial/driver.rst
+ * Details on these functions can be found in Documentation/driver-api/serial/driver.rst
  */
 static const struct uart_ops qe_uart_pops = {
 	.tx_empty       = qe_uart_tx_empty,
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 05b179015d6c..2b78cc734719 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -32,7 +32,7 @@ struct device;
 
 /*
  * This structure describes all the operations that can be done on the
- * physical hardware.  See Documentation/serial/driver.rst for details.
+ * physical hardware.  See Documentation/driver-api/serial/driver.rst for details.
  */
 struct uart_ops {
 	unsigned int	(*tx_empty)(struct uart_port *);
-- 
cgit v1.2.3-55-g7522


From 4745dc8abb0a0a9851c07265eea01d844886d5c8 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Thu, 27 Jun 2019 16:36:04 -0300
Subject: docs: phy: place documentation under driver-api

This subsystem-specific documentation belongs to the
driver-api.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 .../devicetree/bindings/phy/phy-bindings.txt       |   2 +-
 .../devicetree/bindings/phy/phy-pxa-usb.txt        |   2 +-
 Documentation/driver-api/index.rst                 |   1 +
 Documentation/driver-api/phy/index.rst             |  16 ++
 Documentation/driver-api/phy/phy.rst               | 197 +++++++++++++++++++++
 Documentation/driver-api/phy/samsung-usb2.rst      | 137 ++++++++++++++
 Documentation/index.rst                            |   1 -
 Documentation/phy.txt                              | 197 ---------------------
 Documentation/phy/samsung-usb2.rst                 | 137 --------------
 MAINTAINERS                                        |   2 +-
 10 files changed, 354 insertions(+), 338 deletions(-)
 create mode 100644 Documentation/driver-api/phy/index.rst
 create mode 100644 Documentation/driver-api/phy/phy.rst
 create mode 100644 Documentation/driver-api/phy/samsung-usb2.rst
 delete mode 100644 Documentation/phy.txt
 delete mode 100644 Documentation/phy/samsung-usb2.rst

diff --git a/Documentation/devicetree/bindings/phy/phy-bindings.txt b/Documentation/devicetree/bindings/phy/phy-bindings.txt
index a403b81d0679..c4eb38902533 100644
--- a/Documentation/devicetree/bindings/phy/phy-bindings.txt
+++ b/Documentation/devicetree/bindings/phy/phy-bindings.txt
@@ -1,5 +1,5 @@
 This document explains only the device tree data binding. For general
-information about PHY subsystem refer to Documentation/phy.txt
+information about PHY subsystem refer to Documentation/driver-api/phy/phy.rst
 
 PHY device node
 ===============
diff --git a/Documentation/devicetree/bindings/phy/phy-pxa-usb.txt b/Documentation/devicetree/bindings/phy/phy-pxa-usb.txt
index 93fc09c12954..d80e36a77ec5 100644
--- a/Documentation/devicetree/bindings/phy/phy-pxa-usb.txt
+++ b/Documentation/devicetree/bindings/phy/phy-pxa-usb.txt
@@ -15,4 +15,4 @@ Example:
 	};
 
 This document explains the device tree binding. For general
-information about PHY subsystem refer to Documentation/phy.txt
+information about PHY subsystem refer to Documentation/driver-api/phy/phy.rst
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index cf39b8f9d0f9..eff22db0ed14 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -85,6 +85,7 @@ available subsections can be seen below.
    parport-lowlevel
    pps
    ptp
+   phy/index
    pti_intel_mid
    pwm
    rfkill
diff --git a/Documentation/driver-api/phy/index.rst b/Documentation/driver-api/phy/index.rst
new file mode 100644
index 000000000000..fce9ffae2812
--- /dev/null
+++ b/Documentation/driver-api/phy/index.rst
@@ -0,0 +1,16 @@
+=====================
+Generic PHY Framework
+=====================
+
+.. toctree::
+
+   phy
+   samsung-usb2
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
+
diff --git a/Documentation/driver-api/phy/phy.rst b/Documentation/driver-api/phy/phy.rst
new file mode 100644
index 000000000000..457c3e0f86d6
--- /dev/null
+++ b/Documentation/driver-api/phy/phy.rst
@@ -0,0 +1,197 @@
+=============
+PHY subsystem
+=============
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
+
+This document explains the Generic PHY Framework along with the APIs provided,
+and how-to-use.
+
+Introduction
+============
+
+*PHY* is the abbreviation for physical layer. It is used to connect a device
+to the physical medium e.g., the USB controller has a PHY to provide functions
+such as serialization, de-serialization, encoding, decoding and is responsible
+for obtaining the required data transmission rate. Note that some USB
+controllers have PHY functionality embedded into it and others use an external
+PHY. Other peripherals that use PHY include Wireless LAN, Ethernet,
+SATA etc.
+
+The intention of creating this framework is to bring the PHY drivers spread
+all over the Linux kernel to drivers/phy to increase code re-use and for
+better code maintainability.
+
+This framework will be of use only to devices that use external PHY (PHY
+functionality is not embedded within the controller).
+
+Registering/Unregistering the PHY provider
+==========================================
+
+PHY provider refers to an entity that implements one or more PHY instances.
+For the simple case where the PHY provider implements only a single instance of
+the PHY, the framework provides its own implementation of of_xlate in
+of_phy_simple_xlate. If the PHY provider implements multiple instances, it
+should provide its own implementation of of_xlate. of_xlate is used only for
+dt boot case.
+
+::
+
+	#define of_phy_provider_register(dev, xlate)    \
+		__of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate))
+
+	#define devm_of_phy_provider_register(dev, xlate)       \
+		__devm_of_phy_provider_register((dev), NULL, THIS_MODULE,
+						(xlate))
+
+of_phy_provider_register and devm_of_phy_provider_register macros can be used to
+register the phy_provider and it takes device and of_xlate as
+arguments. For the dt boot case, all PHY providers should use one of the above
+2 macros to register the PHY provider.
+
+Often the device tree nodes associated with a PHY provider will contain a set
+of children that each represent a single PHY. Some bindings may nest the child
+nodes within extra levels for context and extensibility, in which case the low
+level of_phy_provider_register_full() and devm_of_phy_provider_register_full()
+macros can be used to override the node containing the children.
+
+::
+
+	#define of_phy_provider_register_full(dev, children, xlate) \
+		__of_phy_provider_register(dev, children, THIS_MODULE, xlate)
+
+	#define devm_of_phy_provider_register_full(dev, children, xlate) \
+		__devm_of_phy_provider_register_full(dev, children,
+						     THIS_MODULE, xlate)
+
+	void devm_of_phy_provider_unregister(struct device *dev,
+		struct phy_provider *phy_provider);
+	void of_phy_provider_unregister(struct phy_provider *phy_provider);
+
+devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to
+unregister the PHY.
+
+Creating the PHY
+================
+
+The PHY driver should create the PHY in order for other peripheral controllers
+to make use of it. The PHY framework provides 2 APIs to create the PHY.
+
+::
+
+	struct phy *phy_create(struct device *dev, struct device_node *node,
+			       const struct phy_ops *ops);
+	struct phy *devm_phy_create(struct device *dev,
+				    struct device_node *node,
+				    const struct phy_ops *ops);
+
+The PHY drivers can use one of the above 2 APIs to create the PHY by passing
+the device pointer and phy ops.
+phy_ops is a set of function pointers for performing PHY operations such as
+init, exit, power_on and power_off.
+
+Inorder to dereference the private data (in phy_ops), the phy provider driver
+can use phy_set_drvdata() after creating the PHY and use phy_get_drvdata() in
+phy_ops to get back the private data.
+
+4. Getting a reference to the PHY
+
+Before the controller can make use of the PHY, it has to get a reference to
+it. This framework provides the following APIs to get a reference to the PHY.
+
+::
+
+	struct phy *phy_get(struct device *dev, const char *string);
+	struct phy *phy_optional_get(struct device *dev, const char *string);
+	struct phy *devm_phy_get(struct device *dev, const char *string);
+	struct phy *devm_phy_optional_get(struct device *dev,
+					  const char *string);
+	struct phy *devm_of_phy_get_by_index(struct device *dev,
+					     struct device_node *np,
+					     int index);
+
+phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can
+be used to get the PHY. In the case of dt boot, the string arguments
+should contain the phy name as given in the dt data and in the case of
+non-dt boot, it should contain the label of the PHY.  The two
+devm_phy_get associates the device with the PHY using devres on
+successful PHY get. On driver detach, release function is invoked on
+the devres data and devres data is freed. phy_optional_get and
+devm_phy_optional_get should be used when the phy is optional. These
+two functions will never return -ENODEV, but instead returns NULL when
+the phy cannot be found.Some generic drivers, such as ehci, may use multiple
+phys and for such drivers referencing phy(s) by name(s) does not make sense. In
+this case, devm_of_phy_get_by_index can be used to get a phy reference based on
+the index.
+
+It should be noted that NULL is a valid phy reference. All phy
+consumer calls on the NULL phy become NOPs. That is the release calls,
+the phy_init() and phy_exit() calls, and phy_power_on() and
+phy_power_off() calls are all NOP when applied to a NULL phy. The NULL
+phy is useful in devices for handling optional phy devices.
+
+Releasing a reference to the PHY
+================================
+
+When the controller no longer needs the PHY, it has to release the reference
+to the PHY it has obtained using the APIs mentioned in the above section. The
+PHY framework provides 2 APIs to release a reference to the PHY.
+
+::
+
+	void phy_put(struct phy *phy);
+	void devm_phy_put(struct device *dev, struct phy *phy);
+
+Both these APIs are used to release a reference to the PHY and devm_phy_put
+destroys the devres associated with this PHY.
+
+Destroying the PHY
+==================
+
+When the driver that created the PHY is unloaded, it should destroy the PHY it
+created using one of the following 2 APIs::
+
+	void phy_destroy(struct phy *phy);
+	void devm_phy_destroy(struct device *dev, struct phy *phy);
+
+Both these APIs destroy the PHY and devm_phy_destroy destroys the devres
+associated with this PHY.
+
+PM Runtime
+==========
+
+This subsystem is pm runtime enabled. So while creating the PHY,
+pm_runtime_enable of the phy device created by this subsystem is called and
+while destroying the PHY, pm_runtime_disable is called. Note that the phy
+device created by this subsystem will be a child of the device that calls
+phy_create (PHY provider device).
+
+So pm_runtime_get_sync of the phy_device created by this subsystem will invoke
+pm_runtime_get_sync of PHY provider device because of parent-child relationship.
+It should also be noted that phy_power_on and phy_power_off performs
+phy_pm_runtime_get_sync and phy_pm_runtime_put respectively.
+There are exported APIs like phy_pm_runtime_get, phy_pm_runtime_get_sync,
+phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and
+phy_pm_runtime_forbid for performing PM operations.
+
+PHY Mappings
+============
+
+In order to get reference to a PHY without help from DeviceTree, the framework
+offers lookups which can be compared to clkdev that allow clk structures to be
+bound to devices. A lookup can be made be made during runtime when a handle to
+the struct phy already exists.
+
+The framework offers the following API for registering and unregistering the
+lookups::
+
+	int phy_create_lookup(struct phy *phy, const char *con_id,
+			      const char *dev_id);
+	void phy_remove_lookup(struct phy *phy, const char *con_id,
+			       const char *dev_id);
+
+DeviceTree Binding
+==================
+
+The documentation for PHY dt binding can be found @
+Documentation/devicetree/bindings/phy/phy-bindings.txt
diff --git a/Documentation/driver-api/phy/samsung-usb2.rst b/Documentation/driver-api/phy/samsung-usb2.rst
new file mode 100644
index 000000000000..c48c8b9797b9
--- /dev/null
+++ b/Documentation/driver-api/phy/samsung-usb2.rst
@@ -0,0 +1,137 @@
+====================================
+Samsung USB 2.0 PHY adaptation layer
+====================================
+
+1. Description
+--------------
+
+The architecture of the USB 2.0 PHY module in Samsung SoCs is similar
+among many SoCs. In spite of the similarities it proved difficult to
+create a one driver that would fit all these PHY controllers. Often
+the differences were minor and were found in particular bits of the
+registers of the PHY. In some rare cases the order of register writes or
+the PHY powering up process had to be altered. This adaptation layer is
+a compromise between having separate drivers and having a single driver
+with added support for many special cases.
+
+2. Files description
+--------------------
+
+- phy-samsung-usb2.c
+   This is the main file of the adaptation layer. This file contains
+   the probe function and provides two callbacks to the Generic PHY
+   Framework. This two callbacks are used to power on and power off the
+   phy. They carry out the common work that has to be done on all version
+   of the PHY module. Depending on which SoC was chosen they execute SoC
+   specific callbacks. The specific SoC version is selected by choosing
+   the appropriate compatible string. In addition, this file contains
+   struct of_device_id definitions for particular SoCs.
+
+- phy-samsung-usb2.h
+   This is the include file. It declares the structures used by this
+   driver. In addition it should contain extern declarations for
+   structures that describe particular SoCs.
+
+3. Supporting SoCs
+------------------
+
+To support a new SoC a new file should be added to the drivers/phy
+directory. Each SoC's configuration is stored in an instance of the
+struct samsung_usb2_phy_config::
+
+  struct samsung_usb2_phy_config {
+	const struct samsung_usb2_common_phy *phys;
+	int (*rate_to_clk)(unsigned long, u32 *);
+	unsigned int num_phys;
+	bool has_mode_switch;
+  };
+
+The num_phys is the number of phys handled by the driver. `*phys` is an
+array that contains the configuration for each phy. The has_mode_switch
+property is a boolean flag that determines whether the SoC has USB host
+and device on a single pair of pins. If so, a special register has to
+be modified to change the internal routing of these pins between a USB
+device or host module.
+
+For example the configuration for Exynos 4210 is following::
+
+  const struct samsung_usb2_phy_config exynos4210_usb2_phy_config = {
+	.has_mode_switch        = 0,
+	.num_phys		= EXYNOS4210_NUM_PHYS,
+	.phys			= exynos4210_phys,
+	.rate_to_clk		= exynos4210_rate_to_clk,
+  }
+
+- `int (*rate_to_clk)(unsigned long, u32 *)`
+
+	The rate_to_clk callback is to convert the rate of the clock
+	used as the reference clock for the PHY module to the value
+	that should be written in the hardware register.
+
+The exynos4210_phys configuration array is as follows::
+
+  static const struct samsung_usb2_common_phy exynos4210_phys[] = {
+	{
+		.label		= "device",
+		.id		= EXYNOS4210_DEVICE,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{
+		.label		= "host",
+		.id		= EXYNOS4210_HOST,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{
+		.label		= "hsic0",
+		.id		= EXYNOS4210_HSIC0,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{
+		.label		= "hsic1",
+		.id		= EXYNOS4210_HSIC1,
+		.power_on	= exynos4210_power_on,
+		.power_off	= exynos4210_power_off,
+	},
+	{},
+  };
+
+- `int (*power_on)(struct samsung_usb2_phy_instance *);`
+  `int (*power_off)(struct samsung_usb2_phy_instance *);`
+
+	These two callbacks are used to power on and power off the phy
+	by modifying appropriate registers.
+
+Final change to the driver is adding appropriate compatible value to the
+phy-samsung-usb2.c file. In case of Exynos 4210 the following lines were
+added to the struct of_device_id samsung_usb2_phy_of_match[] array::
+
+  #ifdef CONFIG_PHY_EXYNOS4210_USB2
+	{
+		.compatible = "samsung,exynos4210-usb2-phy",
+		.data = &exynos4210_usb2_phy_config,
+	},
+  #endif
+
+To add further flexibility to the driver the Kconfig file enables to
+include support for selected SoCs in the compiled driver. The Kconfig
+entry for Exynos 4210 is following::
+
+  config PHY_EXYNOS4210_USB2
+	bool "Support for Exynos 4210"
+	depends on PHY_SAMSUNG_USB2
+	depends on CPU_EXYNOS4210
+	help
+	  Enable USB PHY support for Exynos 4210. This option requires that
+	  Samsung USB 2.0 PHY driver is enabled and means that support for this
+	  particular SoC is compiled in the driver. In case of Exynos 4210 four
+	  phys are available - device, host, HSCI0 and HSCI1.
+
+The newly created file that supports the new SoC has to be also added to the
+Makefile. In case of Exynos 4210 the added line is following::
+
+  obj-$(CONFIG_PHY_EXYNOS4210_USB2)       += phy-exynos4210-usb2.o
+
+After completing these steps the support for the new SoC should be ready.
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 041ffe442960..dbfec00ba535 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -111,7 +111,6 @@ needed).
    usb/index
    misc-devices/index
    mic/index
-   phy/samsung-usb2
    scheduler/index
 
 Architecture-specific documentation
diff --git a/Documentation/phy.txt b/Documentation/phy.txt
deleted file mode 100644
index 457c3e0f86d6..000000000000
--- a/Documentation/phy.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-=============
-PHY subsystem
-=============
-
-:Author: Kishon Vijay Abraham I <kishon@ti.com>
-
-This document explains the Generic PHY Framework along with the APIs provided,
-and how-to-use.
-
-Introduction
-============
-
-*PHY* is the abbreviation for physical layer. It is used to connect a device
-to the physical medium e.g., the USB controller has a PHY to provide functions
-such as serialization, de-serialization, encoding, decoding and is responsible
-for obtaining the required data transmission rate. Note that some USB
-controllers have PHY functionality embedded into it and others use an external
-PHY. Other peripherals that use PHY include Wireless LAN, Ethernet,
-SATA etc.
-
-The intention of creating this framework is to bring the PHY drivers spread
-all over the Linux kernel to drivers/phy to increase code re-use and for
-better code maintainability.
-
-This framework will be of use only to devices that use external PHY (PHY
-functionality is not embedded within the controller).
-
-Registering/Unregistering the PHY provider
-==========================================
-
-PHY provider refers to an entity that implements one or more PHY instances.
-For the simple case where the PHY provider implements only a single instance of
-the PHY, the framework provides its own implementation of of_xlate in
-of_phy_simple_xlate. If the PHY provider implements multiple instances, it
-should provide its own implementation of of_xlate. of_xlate is used only for
-dt boot case.
-
-::
-
-	#define of_phy_provider_register(dev, xlate)    \
-		__of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate))
-
-	#define devm_of_phy_provider_register(dev, xlate)       \
-		__devm_of_phy_provider_register((dev), NULL, THIS_MODULE,
-						(xlate))
-
-of_phy_provider_register and devm_of_phy_provider_register macros can be used to
-register the phy_provider and it takes device and of_xlate as
-arguments. For the dt boot case, all PHY providers should use one of the above
-2 macros to register the PHY provider.
-
-Often the device tree nodes associated with a PHY provider will contain a set
-of children that each represent a single PHY. Some bindings may nest the child
-nodes within extra levels for context and extensibility, in which case the low
-level of_phy_provider_register_full() and devm_of_phy_provider_register_full()
-macros can be used to override the node containing the children.
-
-::
-
-	#define of_phy_provider_register_full(dev, children, xlate) \
-		__of_phy_provider_register(dev, children, THIS_MODULE, xlate)
-
-	#define devm_of_phy_provider_register_full(dev, children, xlate) \
-		__devm_of_phy_provider_register_full(dev, children,
-						     THIS_MODULE, xlate)
-
-	void devm_of_phy_provider_unregister(struct device *dev,
-		struct phy_provider *phy_provider);
-	void of_phy_provider_unregister(struct phy_provider *phy_provider);
-
-devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to
-unregister the PHY.
-
-Creating the PHY
-================
-
-The PHY driver should create the PHY in order for other peripheral controllers
-to make use of it. The PHY framework provides 2 APIs to create the PHY.
-
-::
-
-	struct phy *phy_create(struct device *dev, struct device_node *node,
-			       const struct phy_ops *ops);
-	struct phy *devm_phy_create(struct device *dev,
-				    struct device_node *node,
-				    const struct phy_ops *ops);
-
-The PHY drivers can use one of the above 2 APIs to create the PHY by passing
-the device pointer and phy ops.
-phy_ops is a set of function pointers for performing PHY operations such as
-init, exit, power_on and power_off.
-
-Inorder to dereference the private data (in phy_ops), the phy provider driver
-can use phy_set_drvdata() after creating the PHY and use phy_get_drvdata() in
-phy_ops to get back the private data.
-
-4. Getting a reference to the PHY
-
-Before the controller can make use of the PHY, it has to get a reference to
-it. This framework provides the following APIs to get a reference to the PHY.
-
-::
-
-	struct phy *phy_get(struct device *dev, const char *string);
-	struct phy *phy_optional_get(struct device *dev, const char *string);
-	struct phy *devm_phy_get(struct device *dev, const char *string);
-	struct phy *devm_phy_optional_get(struct device *dev,
-					  const char *string);
-	struct phy *devm_of_phy_get_by_index(struct device *dev,
-					     struct device_node *np,
-					     int index);
-
-phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can
-be used to get the PHY. In the case of dt boot, the string arguments
-should contain the phy name as given in the dt data and in the case of
-non-dt boot, it should contain the label of the PHY.  The two
-devm_phy_get associates the device with the PHY using devres on
-successful PHY get. On driver detach, release function is invoked on
-the devres data and devres data is freed. phy_optional_get and
-devm_phy_optional_get should be used when the phy is optional. These
-two functions will never return -ENODEV, but instead returns NULL when
-the phy cannot be found.Some generic drivers, such as ehci, may use multiple
-phys and for such drivers referencing phy(s) by name(s) does not make sense. In
-this case, devm_of_phy_get_by_index can be used to get a phy reference based on
-the index.
-
-It should be noted that NULL is a valid phy reference. All phy
-consumer calls on the NULL phy become NOPs. That is the release calls,
-the phy_init() and phy_exit() calls, and phy_power_on() and
-phy_power_off() calls are all NOP when applied to a NULL phy. The NULL
-phy is useful in devices for handling optional phy devices.
-
-Releasing a reference to the PHY
-================================
-
-When the controller no longer needs the PHY, it has to release the reference
-to the PHY it has obtained using the APIs mentioned in the above section. The
-PHY framework provides 2 APIs to release a reference to the PHY.
-
-::
-
-	void phy_put(struct phy *phy);
-	void devm_phy_put(struct device *dev, struct phy *phy);
-
-Both these APIs are used to release a reference to the PHY and devm_phy_put
-destroys the devres associated with this PHY.
-
-Destroying the PHY
-==================
-
-When the driver that created the PHY is unloaded, it should destroy the PHY it
-created using one of the following 2 APIs::
-
-	void phy_destroy(struct phy *phy);
-	void devm_phy_destroy(struct device *dev, struct phy *phy);
-
-Both these APIs destroy the PHY and devm_phy_destroy destroys the devres
-associated with this PHY.
-
-PM Runtime
-==========
-
-This subsystem is pm runtime enabled. So while creating the PHY,
-pm_runtime_enable of the phy device created by this subsystem is called and
-while destroying the PHY, pm_runtime_disable is called. Note that the phy
-device created by this subsystem will be a child of the device that calls
-phy_create (PHY provider device).
-
-So pm_runtime_get_sync of the phy_device created by this subsystem will invoke
-pm_runtime_get_sync of PHY provider device because of parent-child relationship.
-It should also be noted that phy_power_on and phy_power_off performs
-phy_pm_runtime_get_sync and phy_pm_runtime_put respectively.
-There are exported APIs like phy_pm_runtime_get, phy_pm_runtime_get_sync,
-phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and
-phy_pm_runtime_forbid for performing PM operations.
-
-PHY Mappings
-============
-
-In order to get reference to a PHY without help from DeviceTree, the framework
-offers lookups which can be compared to clkdev that allow clk structures to be
-bound to devices. A lookup can be made be made during runtime when a handle to
-the struct phy already exists.
-
-The framework offers the following API for registering and unregistering the
-lookups::
-
-	int phy_create_lookup(struct phy *phy, const char *con_id,
-			      const char *dev_id);
-	void phy_remove_lookup(struct phy *phy, const char *con_id,
-			       const char *dev_id);
-
-DeviceTree Binding
-==================
-
-The documentation for PHY dt binding can be found @
-Documentation/devicetree/bindings/phy/phy-bindings.txt
diff --git a/Documentation/phy/samsung-usb2.rst b/Documentation/phy/samsung-usb2.rst
deleted file mode 100644
index c48c8b9797b9..000000000000
--- a/Documentation/phy/samsung-usb2.rst
+++ /dev/null
@@ -1,137 +0,0 @@
-====================================
-Samsung USB 2.0 PHY adaptation layer
-====================================
-
-1. Description
---------------
-
-The architecture of the USB 2.0 PHY module in Samsung SoCs is similar
-among many SoCs. In spite of the similarities it proved difficult to
-create a one driver that would fit all these PHY controllers. Often
-the differences were minor and were found in particular bits of the
-registers of the PHY. In some rare cases the order of register writes or
-the PHY powering up process had to be altered. This adaptation layer is
-a compromise between having separate drivers and having a single driver
-with added support for many special cases.
-
-2. Files description
---------------------
-
-- phy-samsung-usb2.c
-   This is the main file of the adaptation layer. This file contains
-   the probe function and provides two callbacks to the Generic PHY
-   Framework. This two callbacks are used to power on and power off the
-   phy. They carry out the common work that has to be done on all version
-   of the PHY module. Depending on which SoC was chosen they execute SoC
-   specific callbacks. The specific SoC version is selected by choosing
-   the appropriate compatible string. In addition, this file contains
-   struct of_device_id definitions for particular SoCs.
-
-- phy-samsung-usb2.h
-   This is the include file. It declares the structures used by this
-   driver. In addition it should contain extern declarations for
-   structures that describe particular SoCs.
-
-3. Supporting SoCs
-------------------
-
-To support a new SoC a new file should be added to the drivers/phy
-directory. Each SoC's configuration is stored in an instance of the
-struct samsung_usb2_phy_config::
-
-  struct samsung_usb2_phy_config {
-	const struct samsung_usb2_common_phy *phys;
-	int (*rate_to_clk)(unsigned long, u32 *);
-	unsigned int num_phys;
-	bool has_mode_switch;
-  };
-
-The num_phys is the number of phys handled by the driver. `*phys` is an
-array that contains the configuration for each phy. The has_mode_switch
-property is a boolean flag that determines whether the SoC has USB host
-and device on a single pair of pins. If so, a special register has to
-be modified to change the internal routing of these pins between a USB
-device or host module.
-
-For example the configuration for Exynos 4210 is following::
-
-  const struct samsung_usb2_phy_config exynos4210_usb2_phy_config = {
-	.has_mode_switch        = 0,
-	.num_phys		= EXYNOS4210_NUM_PHYS,
-	.phys			= exynos4210_phys,
-	.rate_to_clk		= exynos4210_rate_to_clk,
-  }
-
-- `int (*rate_to_clk)(unsigned long, u32 *)`
-
-	The rate_to_clk callback is to convert the rate of the clock
-	used as the reference clock for the PHY module to the value
-	that should be written in the hardware register.
-
-The exynos4210_phys configuration array is as follows::
-
-  static const struct samsung_usb2_common_phy exynos4210_phys[] = {
-	{
-		.label		= "device",
-		.id		= EXYNOS4210_DEVICE,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{
-		.label		= "host",
-		.id		= EXYNOS4210_HOST,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{
-		.label		= "hsic0",
-		.id		= EXYNOS4210_HSIC0,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{
-		.label		= "hsic1",
-		.id		= EXYNOS4210_HSIC1,
-		.power_on	= exynos4210_power_on,
-		.power_off	= exynos4210_power_off,
-	},
-	{},
-  };
-
-- `int (*power_on)(struct samsung_usb2_phy_instance *);`
-  `int (*power_off)(struct samsung_usb2_phy_instance *);`
-
-	These two callbacks are used to power on and power off the phy
-	by modifying appropriate registers.
-
-Final change to the driver is adding appropriate compatible value to the
-phy-samsung-usb2.c file. In case of Exynos 4210 the following lines were
-added to the struct of_device_id samsung_usb2_phy_of_match[] array::
-
-  #ifdef CONFIG_PHY_EXYNOS4210_USB2
-	{
-		.compatible = "samsung,exynos4210-usb2-phy",
-		.data = &exynos4210_usb2_phy_config,
-	},
-  #endif
-
-To add further flexibility to the driver the Kconfig file enables to
-include support for selected SoCs in the compiled driver. The Kconfig
-entry for Exynos 4210 is following::
-
-  config PHY_EXYNOS4210_USB2
-	bool "Support for Exynos 4210"
-	depends on PHY_SAMSUNG_USB2
-	depends on CPU_EXYNOS4210
-	help
-	  Enable USB PHY support for Exynos 4210. This option requires that
-	  Samsung USB 2.0 PHY driver is enabled and means that support for this
-	  particular SoC is compiled in the driver. In case of Exynos 4210 four
-	  phys are available - device, host, HSCI0 and HSCI1.
-
-The newly created file that supports the new SoC has to be also added to the
-Makefile. In case of Exynos 4210 the added line is following::
-
-  obj-$(CONFIG_PHY_EXYNOS4210_USB2)       += phy-exynos4210-usb2.o
-
-After completing these steps the support for the new SoC should be ready.
diff --git a/MAINTAINERS b/MAINTAINERS
index 4f88bca37c55..6571653ecb40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14083,7 +14083,7 @@ M:	Sylwester Nawrocki <s.nawrocki@samsung.com>
 L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	Documentation/devicetree/bindings/phy/samsung-phy.txt
-F:	Documentation/phy/samsung-usb2.rst
+F:	Documentation/driver-api/phy/samsung-usb2.rst
 F:	drivers/phy/samsung/phy-exynos4210-usb2.c
 F:	drivers/phy/samsung/phy-exynos4x12-usb2.c
 F:	drivers/phy/samsung/phy-exynos5250-usb2.c
-- 
cgit v1.2.3-55-g7522


From 652a49bc68ce3cf0355bde357b3998bd63e73915 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 18 Jun 2019 15:03:13 -0300
Subject: docs: add a memory-devices subdir to driver-api

There are two docs describing memory device drivers.

Add both to this new chapter of the driver-api.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/bus-devices/ti-gpmc.rst              | 179 ---------------------
 Documentation/driver-api/index.rst                 |   1 +
 Documentation/driver-api/memory-devices/index.rst  |  16 ++
 .../driver-api/memory-devices/ti-emif.rst          |  64 ++++++++
 .../driver-api/memory-devices/ti-gpmc.rst          | 179 +++++++++++++++++++++
 Documentation/memory-devices/ti-emif.rst           |  64 --------
 6 files changed, 260 insertions(+), 243 deletions(-)
 delete mode 100644 Documentation/bus-devices/ti-gpmc.rst
 create mode 100644 Documentation/driver-api/memory-devices/index.rst
 create mode 100644 Documentation/driver-api/memory-devices/ti-emif.rst
 create mode 100644 Documentation/driver-api/memory-devices/ti-gpmc.rst
 delete mode 100644 Documentation/memory-devices/ti-emif.rst

diff --git a/Documentation/bus-devices/ti-gpmc.rst b/Documentation/bus-devices/ti-gpmc.rst
deleted file mode 100644
index 87c366e418be..000000000000
--- a/Documentation/bus-devices/ti-gpmc.rst
+++ /dev/null
@@ -1,179 +0,0 @@
-:orphan:
-
-========================================
-GPMC (General Purpose Memory Controller)
-========================================
-
-GPMC is an unified memory controller dedicated to interfacing external
-memory devices like
-
- * Asynchronous SRAM like memories and application specific integrated
-   circuit devices.
- * Asynchronous, synchronous, and page mode burst NOR flash devices
-   NAND flash
- * Pseudo-SRAM devices
-
-GPMC is found on Texas Instruments SoC's (OMAP based)
-IP details: http://www.ti.com/lit/pdf/spruh73 section 7.1
-
-
-GPMC generic timing calculation:
-================================
-
-GPMC has certain timings that has to be programmed for proper
-functioning of the peripheral, while peripheral has another set of
-timings. To have peripheral work with gpmc, peripheral timings has to
-be translated to the form gpmc can understand. The way it has to be
-translated depends on the connected peripheral. Also there is a
-dependency for certain gpmc timings on gpmc clock frequency. Hence a
-generic timing routine was developed to achieve above requirements.
-
-Generic routine provides a generic method to calculate gpmc timings
-from gpmc peripheral timings. struct gpmc_device_timings fields has to
-be updated with timings from the datasheet of the peripheral that is
-connected to gpmc. A few of the peripheral timings can be fed either
-in time or in cycles, provision to handle this scenario has been
-provided (refer struct gpmc_device_timings definition). It may so
-happen that timing as specified by peripheral datasheet is not present
-in timing structure, in this scenario, try to correlate peripheral
-timing to the one available. If that doesn't work, try to add a new
-field as required by peripheral, educate generic timing routine to
-handle it, make sure that it does not break any of the existing.
-Then there may be cases where peripheral datasheet doesn't mention
-certain fields of struct gpmc_device_timings, zero those entries.
-
-Generic timing routine has been verified to work properly on
-multiple onenand's and tusb6010 peripherals.
-
-A word of caution: generic timing routine has been developed based
-on understanding of gpmc timings, peripheral timings, available
-custom timing routines, a kind of reverse engineering without
-most of the datasheets & hardware (to be exact none of those supported
-in mainline having custom timing routine) and by simulation.
-
-gpmc timing dependency on peripheral timings:
-
-[<gpmc_timing>: <peripheral timing1>, <peripheral timing2> ...]
-
-1. common
-
-cs_on:
-	t_ceasu
-adv_on:
-	t_avdasu, t_ceavd
-
-2. sync common
-
-sync_clk:
-	clk
-page_burst_access:
-	t_bacc
-clk_activation:
-	t_ces, t_avds
-
-3. read async muxed
-
-adv_rd_off:
-	t_avdp_r
-oe_on:
-	t_oeasu, t_aavdh
-access:
-	t_iaa, t_oe, t_ce, t_aa
-rd_cycle:
-	t_rd_cycle, t_cez_r, t_oez
-
-4. read async non-muxed
-
-adv_rd_off:
-	t_avdp_r
-oe_on:
-	t_oeasu
-access:
-	t_iaa, t_oe, t_ce, t_aa
-rd_cycle:
-	t_rd_cycle, t_cez_r, t_oez
-
-5. read sync muxed
-
-adv_rd_off:
-	t_avdp_r, t_avdh
-oe_on:
-	t_oeasu, t_ach, cyc_aavdh_oe
-access:
-	t_iaa, cyc_iaa, cyc_oe
-rd_cycle:
-	t_cez_r, t_oez, t_ce_rdyz
-
-6. read sync non-muxed
-
-adv_rd_off:
-	t_avdp_r
-oe_on:
-	t_oeasu
-access:
-	t_iaa, cyc_iaa, cyc_oe
-rd_cycle:
-	t_cez_r, t_oez, t_ce_rdyz
-
-7. write async muxed
-
-adv_wr_off:
-	t_avdp_w
-we_on, wr_data_mux_bus:
-	t_weasu, t_aavdh, cyc_aavhd_we
-we_off:
-	t_wpl
-cs_wr_off:
-	t_wph
-wr_cycle:
-	t_cez_w, t_wr_cycle
-
-8. write async non-muxed
-
-adv_wr_off:
-	t_avdp_w
-we_on, wr_data_mux_bus:
-	t_weasu
-we_off:
-	t_wpl
-cs_wr_off:
-	t_wph
-wr_cycle:
-	t_cez_w, t_wr_cycle
-
-9. write sync muxed
-
-adv_wr_off:
-	t_avdp_w, t_avdh
-we_on, wr_data_mux_bus:
-	t_weasu, t_rdyo, t_aavdh, cyc_aavhd_we
-we_off:
-	t_wpl, cyc_wpl
-cs_wr_off:
-	t_wph
-wr_cycle:
-	t_cez_w, t_ce_rdyz
-
-10. write sync non-muxed
-
-adv_wr_off:
-	t_avdp_w
-we_on, wr_data_mux_bus:
-	t_weasu, t_rdyo
-we_off:
-	t_wpl, cyc_wpl
-cs_wr_off:
-	t_wph
-wr_cycle:
-	t_cez_w, t_ce_rdyz
-
-
-Note:
-  Many of gpmc timings are dependent on other gpmc timings (a few
-  gpmc timings purely dependent on other gpmc timings, a reason that
-  some of the gpmc timings are missing above), and it will result in
-  indirect dependency of peripheral timings to gpmc timings other than
-  mentioned above, refer timing routine for more details. To know what
-  these peripheral timings correspond to, please see explanations in
-  struct gpmc_device_timings definition. And for gpmc timings refer
-  IP details (link above).
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index eff22db0ed14..d12a80f386a6 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -79,6 +79,7 @@ available subsections can be seen below.
    isapnp
    generic-counter
    lightnvm-pblk
+   memory-devices/index
    men-chameleon-bus
    ntb
    nvmem
diff --git a/Documentation/driver-api/memory-devices/index.rst b/Documentation/driver-api/memory-devices/index.rst
new file mode 100644
index 000000000000..87549828f6ab
--- /dev/null
+++ b/Documentation/driver-api/memory-devices/index.rst
@@ -0,0 +1,16 @@
+=========================
+Memory Controller drivers
+=========================
+
+.. toctree::
+    :maxdepth: 1
+
+    ti-emif
+    ti-gpmc
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/driver-api/memory-devices/ti-emif.rst b/Documentation/driver-api/memory-devices/ti-emif.rst
new file mode 100644
index 000000000000..dea2ad9bcd7e
--- /dev/null
+++ b/Documentation/driver-api/memory-devices/ti-emif.rst
@@ -0,0 +1,64 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+TI EMIF SDRAM Controller Driver
+===============================
+
+Author
+======
+Aneesh V <aneesh@ti.com>
+
+Location
+========
+driver/memory/emif.c
+
+Supported SoCs:
+===============
+TI OMAP44xx
+TI OMAP54xx
+
+Menuconfig option:
+==================
+Device Drivers
+	Memory devices
+		Texas Instruments EMIF driver
+
+Description
+===========
+This driver is for the EMIF module available in Texas Instruments
+SoCs. EMIF is an SDRAM controller that, based on its revision,
+supports one or more of DDR2, DDR3, and LPDDR2 SDRAM protocols.
+This driver takes care of only LPDDR2 memories presently. The
+functions of the driver includes re-configuring AC timing
+parameters and other settings during frequency, voltage and
+temperature changes
+
+Platform Data (see include/linux/platform_data/emif_plat.h)
+===========================================================
+DDR device details and other board dependent and SoC dependent
+information can be passed through platform data (struct emif_platform_data)
+
+- DDR device details: 'struct ddr_device_info'
+- Device AC timings: 'struct lpddr2_timings' and 'struct lpddr2_min_tck'
+- Custom configurations: customizable policy options through
+  'struct emif_custom_configs'
+- IP revision
+- PHY type
+
+Interface to the external world
+===============================
+EMIF driver registers notifiers for voltage and frequency changes
+affecting EMIF and takes appropriate actions when these are invoked.
+
+- freq_pre_notify_handling()
+- freq_post_notify_handling()
+- volt_notify_handling()
+
+Debugfs
+=======
+The driver creates two debugfs entries per device.
+
+- regcache_dump : dump of register values calculated and saved for all
+  frequencies used so far.
+- mr4 : last polled value of MR4 register in the LPDDR2 device. MR4
+  indicates the current temperature level of the device.
diff --git a/Documentation/driver-api/memory-devices/ti-gpmc.rst b/Documentation/driver-api/memory-devices/ti-gpmc.rst
new file mode 100644
index 000000000000..33efcb81f080
--- /dev/null
+++ b/Documentation/driver-api/memory-devices/ti-gpmc.rst
@@ -0,0 +1,179 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+GPMC (General Purpose Memory Controller)
+========================================
+
+GPMC is an unified memory controller dedicated to interfacing external
+memory devices like
+
+ * Asynchronous SRAM like memories and application specific integrated
+   circuit devices.
+ * Asynchronous, synchronous, and page mode burst NOR flash devices
+   NAND flash
+ * Pseudo-SRAM devices
+
+GPMC is found on Texas Instruments SoC's (OMAP based)
+IP details: http://www.ti.com/lit/pdf/spruh73 section 7.1
+
+
+GPMC generic timing calculation:
+================================
+
+GPMC has certain timings that has to be programmed for proper
+functioning of the peripheral, while peripheral has another set of
+timings. To have peripheral work with gpmc, peripheral timings has to
+be translated to the form gpmc can understand. The way it has to be
+translated depends on the connected peripheral. Also there is a
+dependency for certain gpmc timings on gpmc clock frequency. Hence a
+generic timing routine was developed to achieve above requirements.
+
+Generic routine provides a generic method to calculate gpmc timings
+from gpmc peripheral timings. struct gpmc_device_timings fields has to
+be updated with timings from the datasheet of the peripheral that is
+connected to gpmc. A few of the peripheral timings can be fed either
+in time or in cycles, provision to handle this scenario has been
+provided (refer struct gpmc_device_timings definition). It may so
+happen that timing as specified by peripheral datasheet is not present
+in timing structure, in this scenario, try to correlate peripheral
+timing to the one available. If that doesn't work, try to add a new
+field as required by peripheral, educate generic timing routine to
+handle it, make sure that it does not break any of the existing.
+Then there may be cases where peripheral datasheet doesn't mention
+certain fields of struct gpmc_device_timings, zero those entries.
+
+Generic timing routine has been verified to work properly on
+multiple onenand's and tusb6010 peripherals.
+
+A word of caution: generic timing routine has been developed based
+on understanding of gpmc timings, peripheral timings, available
+custom timing routines, a kind of reverse engineering without
+most of the datasheets & hardware (to be exact none of those supported
+in mainline having custom timing routine) and by simulation.
+
+gpmc timing dependency on peripheral timings:
+
+[<gpmc_timing>: <peripheral timing1>, <peripheral timing2> ...]
+
+1. common
+
+cs_on:
+	t_ceasu
+adv_on:
+	t_avdasu, t_ceavd
+
+2. sync common
+
+sync_clk:
+	clk
+page_burst_access:
+	t_bacc
+clk_activation:
+	t_ces, t_avds
+
+3. read async muxed
+
+adv_rd_off:
+	t_avdp_r
+oe_on:
+	t_oeasu, t_aavdh
+access:
+	t_iaa, t_oe, t_ce, t_aa
+rd_cycle:
+	t_rd_cycle, t_cez_r, t_oez
+
+4. read async non-muxed
+
+adv_rd_off:
+	t_avdp_r
+oe_on:
+	t_oeasu
+access:
+	t_iaa, t_oe, t_ce, t_aa
+rd_cycle:
+	t_rd_cycle, t_cez_r, t_oez
+
+5. read sync muxed
+
+adv_rd_off:
+	t_avdp_r, t_avdh
+oe_on:
+	t_oeasu, t_ach, cyc_aavdh_oe
+access:
+	t_iaa, cyc_iaa, cyc_oe
+rd_cycle:
+	t_cez_r, t_oez, t_ce_rdyz
+
+6. read sync non-muxed
+
+adv_rd_off:
+	t_avdp_r
+oe_on:
+	t_oeasu
+access:
+	t_iaa, cyc_iaa, cyc_oe
+rd_cycle:
+	t_cez_r, t_oez, t_ce_rdyz
+
+7. write async muxed
+
+adv_wr_off:
+	t_avdp_w
+we_on, wr_data_mux_bus:
+	t_weasu, t_aavdh, cyc_aavhd_we
+we_off:
+	t_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_wr_cycle
+
+8. write async non-muxed
+
+adv_wr_off:
+	t_avdp_w
+we_on, wr_data_mux_bus:
+	t_weasu
+we_off:
+	t_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_wr_cycle
+
+9. write sync muxed
+
+adv_wr_off:
+	t_avdp_w, t_avdh
+we_on, wr_data_mux_bus:
+	t_weasu, t_rdyo, t_aavdh, cyc_aavhd_we
+we_off:
+	t_wpl, cyc_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_ce_rdyz
+
+10. write sync non-muxed
+
+adv_wr_off:
+	t_avdp_w
+we_on, wr_data_mux_bus:
+	t_weasu, t_rdyo
+we_off:
+	t_wpl, cyc_wpl
+cs_wr_off:
+	t_wph
+wr_cycle:
+	t_cez_w, t_ce_rdyz
+
+
+Note:
+  Many of gpmc timings are dependent on other gpmc timings (a few
+  gpmc timings purely dependent on other gpmc timings, a reason that
+  some of the gpmc timings are missing above), and it will result in
+  indirect dependency of peripheral timings to gpmc timings other than
+  mentioned above, refer timing routine for more details. To know what
+  these peripheral timings correspond to, please see explanations in
+  struct gpmc_device_timings definition. And for gpmc timings refer
+  IP details (link above).
diff --git a/Documentation/memory-devices/ti-emif.rst b/Documentation/memory-devices/ti-emif.rst
deleted file mode 100644
index c9242294e63c..000000000000
--- a/Documentation/memory-devices/ti-emif.rst
+++ /dev/null
@@ -1,64 +0,0 @@
-:orphan:
-
-===============================
-TI EMIF SDRAM Controller Driver
-===============================
-
-Author
-======
-Aneesh V <aneesh@ti.com>
-
-Location
-========
-driver/memory/emif.c
-
-Supported SoCs:
-===============
-TI OMAP44xx
-TI OMAP54xx
-
-Menuconfig option:
-==================
-Device Drivers
-	Memory devices
-		Texas Instruments EMIF driver
-
-Description
-===========
-This driver is for the EMIF module available in Texas Instruments
-SoCs. EMIF is an SDRAM controller that, based on its revision,
-supports one or more of DDR2, DDR3, and LPDDR2 SDRAM protocols.
-This driver takes care of only LPDDR2 memories presently. The
-functions of the driver includes re-configuring AC timing
-parameters and other settings during frequency, voltage and
-temperature changes
-
-Platform Data (see include/linux/platform_data/emif_plat.h)
-===========================================================
-DDR device details and other board dependent and SoC dependent
-information can be passed through platform data (struct emif_platform_data)
-
-- DDR device details: 'struct ddr_device_info'
-- Device AC timings: 'struct lpddr2_timings' and 'struct lpddr2_min_tck'
-- Custom configurations: customizable policy options through
-  'struct emif_custom_configs'
-- IP revision
-- PHY type
-
-Interface to the external world
-===============================
-EMIF driver registers notifiers for voltage and frequency changes
-affecting EMIF and takes appropriate actions when these are invoked.
-
-- freq_pre_notify_handling()
-- freq_post_notify_handling()
-- volt_notify_handling()
-
-Debugfs
-=======
-The driver creates two debugfs entries per device.
-
-- regcache_dump : dump of register values calculated and saved for all
-  frequencies used so far.
-- mr4 : last polled value of MR4 register in the LPDDR2 device. MR4
-  indicates the current temperature level of the device.
-- 
cgit v1.2.3-55-g7522


From 7e042736faab9457dd754668b9db2a1113cd322b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 28 Jun 2019 07:13:34 -0300
Subject: docs: add SPDX tags to new index files

All those new files I added are under GPL v2.0 license.

Add the corresponding SPDX headers to them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/admin-guide/blockdev/drbd/figures.rst | 2 ++
 Documentation/admin-guide/blockdev/index.rst        | 2 ++
 Documentation/admin-guide/laptops/index.rst         | 1 +
 Documentation/admin-guide/namespaces/index.rst      | 2 ++
 Documentation/admin-guide/perf/index.rst            | 2 ++
 Documentation/arm/index.rst                         | 2 ++
 Documentation/arm/nwfpe/index.rst                   | 2 ++
 Documentation/arm/omap/index.rst                    | 2 ++
 Documentation/arm/sa1100/index.rst                  | 2 ++
 Documentation/arm/samsung-s3c24xx/index.rst         | 2 ++
 Documentation/arm/samsung/index.rst                 | 2 ++
 Documentation/driver-api/early-userspace/index.rst  | 2 ++
 Documentation/driver-api/md/index.rst               | 2 ++
 Documentation/driver-api/memory-devices/index.rst   | 2 ++
 Documentation/driver-api/mmc/index.rst              | 2 ++
 Documentation/driver-api/mtd/index.rst              | 2 ++
 Documentation/driver-api/nfc/index.rst              | 2 ++
 Documentation/driver-api/nvdimm/index.rst           | 2 ++
 Documentation/driver-api/phy/index.rst              | 2 ++
 Documentation/driver-api/rapidio/index.rst          | 2 ++
 Documentation/ia64/index.rst                        | 2 ++
 21 files changed, 41 insertions(+)

diff --git a/Documentation/admin-guide/blockdev/drbd/figures.rst b/Documentation/admin-guide/blockdev/drbd/figures.rst
index 3e3fd4b8a478..bd9a4901fe46 100644
--- a/Documentation/admin-guide/blockdev/drbd/figures.rst
+++ b/Documentation/admin-guide/blockdev/drbd/figures.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 .. The here included files are intended to help understand the implementation
 
 Data flows that Relate some functions, and write packets
diff --git a/Documentation/admin-guide/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
index 20a738d9d047..b903cf152091 100644
--- a/Documentation/admin-guide/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===========================
 The Linux RapidIO Subsystem
 ===========================
diff --git a/Documentation/admin-guide/laptops/index.rst b/Documentation/admin-guide/laptops/index.rst
index 6b554e39863b..cd9a1c2695fd 100644
--- a/Documentation/admin-guide/laptops/index.rst
+++ b/Documentation/admin-guide/laptops/index.rst
@@ -1,3 +1,4 @@
+.. SPDX-License-Identifier: GPL-2.0
 
 ==============
 Laptop Drivers
diff --git a/Documentation/admin-guide/namespaces/index.rst b/Documentation/admin-guide/namespaces/index.rst
index 713ec4949fa7..384f2e0f33d2 100644
--- a/Documentation/admin-guide/namespaces/index.rst
+++ b/Documentation/admin-guide/namespaces/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ==========
 Namespaces
 ==========
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 9d445451ea18..ee4bfd2a740f 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===========================
 Performance monitor support
 ===========================
diff --git a/Documentation/arm/index.rst b/Documentation/arm/index.rst
index 9c2f781f4685..5fc072dd0c5e 100644
--- a/Documentation/arm/index.rst
+++ b/Documentation/arm/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ================
 ARM Architecture
 ================
diff --git a/Documentation/arm/nwfpe/index.rst b/Documentation/arm/nwfpe/index.rst
index 21fa8ce192ae..3c4d2f9aa10e 100644
--- a/Documentation/arm/nwfpe/index.rst
+++ b/Documentation/arm/nwfpe/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===================================
 NetWinder's floating point emulator
 ===================================
diff --git a/Documentation/arm/omap/index.rst b/Documentation/arm/omap/index.rst
index f1e9c11d9f9b..8b365b212e49 100644
--- a/Documentation/arm/omap/index.rst
+++ b/Documentation/arm/omap/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 =======
 TI OMAP
 =======
diff --git a/Documentation/arm/sa1100/index.rst b/Documentation/arm/sa1100/index.rst
index fb2385b3accf..68c2a280a745 100644
--- a/Documentation/arm/sa1100/index.rst
+++ b/Documentation/arm/sa1100/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ====================
 Intel StrongARM 1100
 ====================
diff --git a/Documentation/arm/samsung-s3c24xx/index.rst b/Documentation/arm/samsung-s3c24xx/index.rst
index 6c7b241cbf37..5b8a7f9398d8 100644
--- a/Documentation/arm/samsung-s3c24xx/index.rst
+++ b/Documentation/arm/samsung-s3c24xx/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ﻿==========================
 Samsung S3C24XX SoC Family
 ==========================
diff --git a/Documentation/arm/samsung/index.rst b/Documentation/arm/samsung/index.rst
index f54d95734362..8142cce3d23e 100644
--- a/Documentation/arm/samsung/index.rst
+++ b/Documentation/arm/samsung/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===========
 Samsung SoC
 ===========
diff --git a/Documentation/driver-api/early-userspace/index.rst b/Documentation/driver-api/early-userspace/index.rst
index 6f20c3c560d8..149c1822f06d 100644
--- a/Documentation/driver-api/early-userspace/index.rst
+++ b/Documentation/driver-api/early-userspace/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===============
 Early Userspace
 ===============
diff --git a/Documentation/driver-api/md/index.rst b/Documentation/driver-api/md/index.rst
index 205080891a1a..18f54a7d7d6e 100644
--- a/Documentation/driver-api/md/index.rst
+++ b/Documentation/driver-api/md/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ====
 RAID
 ====
diff --git a/Documentation/driver-api/memory-devices/index.rst b/Documentation/driver-api/memory-devices/index.rst
index 87549828f6ab..28101458cda5 100644
--- a/Documentation/driver-api/memory-devices/index.rst
+++ b/Documentation/driver-api/memory-devices/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 =========================
 Memory Controller drivers
 =========================
diff --git a/Documentation/driver-api/mmc/index.rst b/Documentation/driver-api/mmc/index.rst
index 9aaf64951a8c..7339736ac774 100644
--- a/Documentation/driver-api/mmc/index.rst
+++ b/Documentation/driver-api/mmc/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ========================
 MMC/SD/SDIO card support
 ========================
diff --git a/Documentation/driver-api/mtd/index.rst b/Documentation/driver-api/mtd/index.rst
index 2e0e7cc4055e..436ba5a851d7 100644
--- a/Documentation/driver-api/mtd/index.rst
+++ b/Documentation/driver-api/mtd/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ==============================
 Memory Technology Device (MTD)
 ==============================
diff --git a/Documentation/driver-api/nfc/index.rst b/Documentation/driver-api/nfc/index.rst
index 3afb2c0c2e3c..b6e9eedbff29 100644
--- a/Documentation/driver-api/nfc/index.rst
+++ b/Documentation/driver-api/nfc/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ========================
 Near Field Communication
 ========================
diff --git a/Documentation/driver-api/nvdimm/index.rst b/Documentation/driver-api/nvdimm/index.rst
index 19dc8ee371dc..a4f8f98aeb94 100644
--- a/Documentation/driver-api/nvdimm/index.rst
+++ b/Documentation/driver-api/nvdimm/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===================================
 Non-Volatile Memory Device (NVDIMM)
 ===================================
diff --git a/Documentation/driver-api/phy/index.rst b/Documentation/driver-api/phy/index.rst
index fce9ffae2812..69ba1216de72 100644
--- a/Documentation/driver-api/phy/index.rst
+++ b/Documentation/driver-api/phy/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 =====================
 Generic PHY Framework
 =====================
diff --git a/Documentation/driver-api/rapidio/index.rst b/Documentation/driver-api/rapidio/index.rst
index 4c5e51a05134..a41b4242d16f 100644
--- a/Documentation/driver-api/rapidio/index.rst
+++ b/Documentation/driver-api/rapidio/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ===========================
 The Linux RapidIO Subsystem
 ===========================
diff --git a/Documentation/ia64/index.rst b/Documentation/ia64/index.rst
index ef99475f672b..0436e1034115 100644
--- a/Documentation/ia64/index.rst
+++ b/Documentation/ia64/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 ==================
 IA-64 Architecture
 ==================
-- 
cgit v1.2.3-55-g7522


From 113094f743fc97559c068ad20fd2808b64f6989d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 28 Jun 2019 08:36:50 -0300
Subject: docs: add some directories to the main documentation index

The contents of those directories were orphaned at the documentation
body.

While those directories could likely be moved to be inside some guide,
I'm opting to just adding their indexes to the main one, removing the
:orphan: and adding the SPDX header.

For the drivers, the rationale is that the documentation contains
a mix of Kernelspace, uAPI and admin-guide. So, better to keep them on
separate directories, as we've be doing with similar subsystem-specific
docs that were not split yet.

For the others, well... I'm too lazy to do the move. Also, it
seems to make sense to keep at least some of those at the main
dir (like kbuild, for example). In any case, a latter patch
could do the move.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
---
 Documentation/cdrom/index.rst           |  2 +-
 Documentation/fault-injection/index.rst |  2 +-
 Documentation/fb/index.rst              |  2 +-
 Documentation/fpga/index.rst            |  2 +-
 Documentation/ide/index.rst             |  2 +-
 Documentation/index.rst                 | 13 +++++++++++++
 Documentation/kbuild/index.rst          |  2 +-
 Documentation/livepatch/index.rst       |  2 +-
 Documentation/netlabel/index.rst        |  2 +-
 Documentation/pcmcia/index.rst          |  2 +-
 Documentation/target/index.rst          |  2 +-
 Documentation/timers/index.rst          |  2 +-
 Documentation/watchdog/index.rst        |  2 +-
 13 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/Documentation/cdrom/index.rst b/Documentation/cdrom/index.rst
index efbd5d111825..338ad5f94e7c 100644
--- a/Documentation/cdrom/index.rst
+++ b/Documentation/cdrom/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 =====
 cdrom
diff --git a/Documentation/fault-injection/index.rst b/Documentation/fault-injection/index.rst
index 92b5639ed07a..8408a8a91b34 100644
--- a/Documentation/fault-injection/index.rst
+++ b/Documentation/fault-injection/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ===============
 fault-injection
diff --git a/Documentation/fb/index.rst b/Documentation/fb/index.rst
index d47313714635..baf02393d8ee 100644
--- a/Documentation/fb/index.rst
+++ b/Documentation/fb/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ============
 Frame Buffer
diff --git a/Documentation/fpga/index.rst b/Documentation/fpga/index.rst
index 2c87d1ea084f..f80f95667ca2 100644
--- a/Documentation/fpga/index.rst
+++ b/Documentation/fpga/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ====
 fpga
diff --git a/Documentation/ide/index.rst b/Documentation/ide/index.rst
index 45bc12d3957f..813dfe611a31 100644
--- a/Documentation/ide/index.rst
+++ b/Documentation/ide/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ==================================
 Integrated Drive Electronics (IDE)
diff --git a/Documentation/index.rst b/Documentation/index.rst
index dbfec00ba535..0cd4c3901456 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -35,6 +35,7 @@ trying to get it to work optimally on a given system.
    :maxdepth: 2
 
    admin-guide/index
+   kbuild/index
 
 Firmware-related documentation
 ------------------------------
@@ -77,6 +78,9 @@ merged much easier.
    kernel-hacking/index
    trace/index
    maintainer/index
+   fault-injection/index
+   livepatch/index
+
 
 Kernel API documentation
 ------------------------
@@ -94,11 +98,20 @@ needed).
    core-api/index
    accounting/index
    block/index
+   cdrom/index
+   ide/index
+   fb/index
+   fpga/index
    hid/index
    iio/index
    leds/index
    media/index
+   netlabel/index
    networking/index
+   pcmcia/index
+   target/index
+   timers/index
+   watchdog/index
    input/index
    hwmon/index
    gpu/index
diff --git a/Documentation/kbuild/index.rst b/Documentation/kbuild/index.rst
index 42d4cbe4460c..e323a3f2cc81 100644
--- a/Documentation/kbuild/index.rst
+++ b/Documentation/kbuild/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ===================
 Kernel Build System
diff --git a/Documentation/livepatch/index.rst b/Documentation/livepatch/index.rst
index edd291d51847..17674a9e21b2 100644
--- a/Documentation/livepatch/index.rst
+++ b/Documentation/livepatch/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ===================
 Kernel Livepatching
diff --git a/Documentation/netlabel/index.rst b/Documentation/netlabel/index.rst
index 47f1e0e5acd1..984e1b191b12 100644
--- a/Documentation/netlabel/index.rst
+++ b/Documentation/netlabel/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ========
 NetLabel
diff --git a/Documentation/pcmcia/index.rst b/Documentation/pcmcia/index.rst
index 779c8527109e..7ae1f62fca14 100644
--- a/Documentation/pcmcia/index.rst
+++ b/Documentation/pcmcia/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ======
 pcmcia
diff --git a/Documentation/target/index.rst b/Documentation/target/index.rst
index b68f48982392..4b24f81f747e 100644
--- a/Documentation/target/index.rst
+++ b/Documentation/target/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ==================
 TCM Virtual Device
diff --git a/Documentation/timers/index.rst b/Documentation/timers/index.rst
index 91f6f8263c48..df510ad0c989 100644
--- a/Documentation/timers/index.rst
+++ b/Documentation/timers/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ======
 timers
diff --git a/Documentation/watchdog/index.rst b/Documentation/watchdog/index.rst
index 33a0de631e84..c177645081d8 100644
--- a/Documentation/watchdog/index.rst
+++ b/Documentation/watchdog/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 ======================
 Linux Watchdog Support
-- 
cgit v1.2.3-55-g7522


From 4c68060bf6d3eac6e86b995a200eb21b847236da Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 28 Jun 2019 07:29:15 -0300
Subject: docs: locking: add it to the main index

The locking directory is part of the Kernel API bookset. Add
it to the index file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/index.rst         | 1 +
 Documentation/locking/index.rst | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/index.rst b/Documentation/index.rst
index 0cd4c3901456..eb5db850c4ef 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -96,6 +96,7 @@ needed).
 
    driver-api/index
    core-api/index
+   locking/index
    accounting/index
    block/index
    cdrom/index
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index ef5da7fe9aac..626a463f7e42 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -1,4 +1,4 @@
-:orphan:
+.. SPDX-License-Identifier: GPL-2.0
 
 =======
 locking
-- 
cgit v1.2.3-55-g7522


From c2746a1eb741759590e8766958232d06a71840d5 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Fri, 28 Jun 2019 08:14:42 -0300
Subject: docs: gpio: add sysfs interface to the admin-guide

While this is stated as obsoleted, the sysfs interface described
there is still valid, and belongs to the admin-guide.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
---
 Documentation/ABI/obsolete/sysfs-gpio             |   2 +-
 Documentation/admin-guide/gpio/index.rst          |  17 +++
 Documentation/admin-guide/gpio/sysfs.rst          | 167 ++++++++++++++++++++++
 Documentation/admin-guide/index.rst               |   1 +
 Documentation/firmware-guide/acpi/enumeration.rst |   2 +-
 Documentation/gpio/index.rst                      |  17 ---
 Documentation/gpio/sysfs.rst                      | 167 ----------------------
 Documentation/translations/zh_CN/gpio.txt         |   4 +-
 MAINTAINERS                                       |   2 +-
 9 files changed, 190 insertions(+), 189 deletions(-)
 create mode 100644 Documentation/admin-guide/gpio/index.rst
 create mode 100644 Documentation/admin-guide/gpio/sysfs.rst
 delete mode 100644 Documentation/gpio/index.rst
 delete mode 100644 Documentation/gpio/sysfs.rst

diff --git a/Documentation/ABI/obsolete/sysfs-gpio b/Documentation/ABI/obsolete/sysfs-gpio
index 40d41ea1a3f5..e0d4e5e2dd90 100644
--- a/Documentation/ABI/obsolete/sysfs-gpio
+++ b/Documentation/ABI/obsolete/sysfs-gpio
@@ -11,7 +11,7 @@ Description:
   Kernel code may export it for complete or partial access.
 
   GPIOs are identified as they are inside the kernel, using integers in
-  the range 0..INT_MAX.  See Documentation/gpio for more information.
+  the range 0..INT_MAX.  See Documentation/admin-guide/gpio for more information.
 
     /sys/class/gpio
 	/export ... asks the kernel to export a GPIO to userspace
diff --git a/Documentation/admin-guide/gpio/index.rst b/Documentation/admin-guide/gpio/index.rst
new file mode 100644
index 000000000000..a244ba4e87d5
--- /dev/null
+++ b/Documentation/admin-guide/gpio/index.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====
+gpio
+====
+
+.. toctree::
+    :maxdepth: 1
+
+    sysfs
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/admin-guide/gpio/sysfs.rst b/Documentation/admin-guide/gpio/sysfs.rst
new file mode 100644
index 000000000000..ec09ffd983e7
--- /dev/null
+++ b/Documentation/admin-guide/gpio/sysfs.rst
@@ -0,0 +1,167 @@
+GPIO Sysfs Interface for Userspace
+==================================
+
+.. warning::
+
+  THIS ABI IS DEPRECATED, THE ABI DOCUMENTATION HAS BEEN MOVED TO
+  Documentation/ABI/obsolete/sysfs-gpio AND NEW USERSPACE CONSUMERS
+  ARE SUPPOSED TO USE THE CHARACTER DEVICE ABI. THIS OLD SYSFS ABI WILL
+  NOT BE DEVELOPED (NO NEW FEATURES), IT WILL JUST BE MAINTAINED.
+
+Refer to the examples in tools/gpio/* for an introduction to the new
+character device ABI. Also see the userspace header in
+include/uapi/linux/gpio.h
+
+The deprecated sysfs ABI
+------------------------
+Platforms which use the "gpiolib" implementors framework may choose to
+configure a sysfs user interface to GPIOs. This is different from the
+debugfs interface, since it provides control over GPIO direction and
+value instead of just showing a gpio state summary. Plus, it could be
+present on production systems without debugging support.
+
+Given appropriate hardware documentation for the system, userspace could
+know for example that GPIO #23 controls the write protect line used to
+protect boot loader segments in flash memory. System upgrade procedures
+may need to temporarily remove that protection, first importing a GPIO,
+then changing its output state, then updating the code before re-enabling
+the write protection. In normal use, GPIO #23 would never be touched,
+and the kernel would have no need to know about it.
+
+Again depending on appropriate hardware documentation, on some systems
+userspace GPIO can be used to determine system configuration data that
+standard kernels won't know about. And for some tasks, simple userspace
+GPIO drivers could be all that the system really needs.
+
+DO NOT ABUSE SYSFS TO CONTROL HARDWARE THAT HAS PROPER KERNEL DRIVERS.
+PLEASE READ THE DOCUMENT AT Documentation/driver-api/gpio/drivers-on-gpio.rst
+TO AVOID REINVENTING KERNEL WHEELS IN USERSPACE. I MEAN IT. REALLY.
+
+Paths in Sysfs
+--------------
+There are three kinds of entries in /sys/class/gpio:
+
+   -	Control interfaces used to get userspace control over GPIOs;
+
+   -	GPIOs themselves; and
+
+   -	GPIO controllers ("gpio_chip" instances).
+
+That's in addition to standard files including the "device" symlink.
+
+The control interfaces are write-only:
+
+    /sys/class/gpio/
+
+	"export" ...
+		Userspace may ask the kernel to export control of
+		a GPIO to userspace by writing its number to this file.
+
+		Example:  "echo 19 > export" will create a "gpio19" node
+		for GPIO #19, if that's not requested by kernel code.
+
+	"unexport" ...
+		Reverses the effect of exporting to userspace.
+
+		Example:  "echo 19 > unexport" will remove a "gpio19"
+		node exported using the "export" file.
+
+GPIO signals have paths like /sys/class/gpio/gpio42/ (for GPIO #42)
+and have the following read/write attributes:
+
+    /sys/class/gpio/gpioN/
+
+	"direction" ...
+		reads as either "in" or "out". This value may
+		normally be written. Writing as "out" defaults to
+		initializing the value as low. To ensure glitch free
+		operation, values "low" and "high" may be written to
+		configure the GPIO as an output with that initial value.
+
+		Note that this attribute *will not exist* if the kernel
+		doesn't support changing the direction of a GPIO, or
+		it was exported by kernel code that didn't explicitly
+		allow userspace to reconfigure this GPIO's direction.
+
+	"value" ...
+		reads as either 0 (low) or 1 (high). If the GPIO
+		is configured as an output, this value may be written;
+		any nonzero value is treated as high.
+
+		If the pin can be configured as interrupt-generating interrupt
+		and if it has been configured to generate interrupts (see the
+		description of "edge"), you can poll(2) on that file and
+		poll(2) will return whenever the interrupt was triggered. If
+		you use poll(2), set the events POLLPRI and POLLERR. If you
+		use select(2), set the file descriptor in exceptfds. After
+		poll(2) returns, either lseek(2) to the beginning of the sysfs
+		file and read the new value or close the file and re-open it
+		to read the value.
+
+	"edge" ...
+		reads as either "none", "rising", "falling", or
+		"both". Write these strings to select the signal edge(s)
+		that will make poll(2) on the "value" file return.
+
+		This file exists only if the pin can be configured as an
+		interrupt generating input pin.
+
+	"active_low" ...
+		reads as either 0 (false) or 1 (true). Write
+		any nonzero value to invert the value attribute both
+		for reading and writing. Existing and subsequent
+		poll(2) support configuration via the edge attribute
+		for "rising" and "falling" edges will follow this
+		setting.
+
+GPIO controllers have paths like /sys/class/gpio/gpiochip42/ (for the
+controller implementing GPIOs starting at #42) and have the following
+read-only attributes:
+
+    /sys/class/gpio/gpiochipN/
+
+	"base" ...
+		same as N, the first GPIO managed by this chip
+
+	"label" ...
+		provided for diagnostics (not always unique)
+
+	"ngpio" ...
+		how many GPIOs this manages (N to N + ngpio - 1)
+
+Board documentation should in most cases cover what GPIOs are used for
+what purposes. However, those numbers are not always stable; GPIOs on
+a daughtercard might be different depending on the base board being used,
+or other cards in the stack. In such cases, you may need to use the
+gpiochip nodes (possibly in conjunction with schematics) to determine
+the correct GPIO number to use for a given signal.
+
+
+Exporting from Kernel code
+--------------------------
+Kernel code can explicitly manage exports of GPIOs which have already been
+requested using gpio_request()::
+
+	/* export the GPIO to userspace */
+	int gpiod_export(struct gpio_desc *desc, bool direction_may_change);
+
+	/* reverse gpio_export() */
+	void gpiod_unexport(struct gpio_desc *desc);
+
+	/* create a sysfs link to an exported GPIO node */
+	int gpiod_export_link(struct device *dev, const char *name,
+		      struct gpio_desc *desc);
+
+After a kernel driver requests a GPIO, it may only be made available in
+the sysfs interface by gpiod_export(). The driver can control whether the
+signal direction may change. This helps drivers prevent userspace code
+from accidentally clobbering important system state.
+
+This explicit exporting can help with debugging (by making some kinds
+of experiments easier), or can provide an always-there interface that's
+suitable for documenting as part of a board support package.
+
+After the GPIO has been exported, gpiod_export_link() allows creating
+symlinks from elsewhere in sysfs to the GPIO sysfs node. Drivers can
+use this to provide the interface under their own device in sysfs with
+a descriptive name.
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 4e98f5596da0..280355d08af5 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -91,6 +91,7 @@ configure specific aspects of kernel behavior to your liking.
    cputopology
    device-mapper/index
    efi-stub
+   gpio/index
    highuid
    hw_random
    iostats
diff --git a/Documentation/firmware-guide/acpi/enumeration.rst b/Documentation/firmware-guide/acpi/enumeration.rst
index 1252617b520f..0a72b6321f5f 100644
--- a/Documentation/firmware-guide/acpi/enumeration.rst
+++ b/Documentation/firmware-guide/acpi/enumeration.rst
@@ -316,7 +316,7 @@ specifies the path to the controller. In order to use these GPIOs in Linux
 we need to translate them to the corresponding Linux GPIO descriptors.
 
 There is a standard GPIO API for that and is documented in
-Documentation/gpio/.
+Documentation/admin-guide/gpio/.
 
 In the above example we can get the corresponding two GPIO descriptors with
 a code like this::
diff --git a/Documentation/gpio/index.rst b/Documentation/gpio/index.rst
deleted file mode 100644
index 09a4a553f434..000000000000
--- a/Documentation/gpio/index.rst
+++ /dev/null
@@ -1,17 +0,0 @@
-:orphan:
-
-====
-gpio
-====
-
-.. toctree::
-    :maxdepth: 1
-
-    sysfs
-
-.. only::  subproject and html
-
-   Indices
-   =======
-
-   * :ref:`genindex`
diff --git a/Documentation/gpio/sysfs.rst b/Documentation/gpio/sysfs.rst
deleted file mode 100644
index ec09ffd983e7..000000000000
--- a/Documentation/gpio/sysfs.rst
+++ /dev/null
@@ -1,167 +0,0 @@
-GPIO Sysfs Interface for Userspace
-==================================
-
-.. warning::
-
-  THIS ABI IS DEPRECATED, THE ABI DOCUMENTATION HAS BEEN MOVED TO
-  Documentation/ABI/obsolete/sysfs-gpio AND NEW USERSPACE CONSUMERS
-  ARE SUPPOSED TO USE THE CHARACTER DEVICE ABI. THIS OLD SYSFS ABI WILL
-  NOT BE DEVELOPED (NO NEW FEATURES), IT WILL JUST BE MAINTAINED.
-
-Refer to the examples in tools/gpio/* for an introduction to the new
-character device ABI. Also see the userspace header in
-include/uapi/linux/gpio.h
-
-The deprecated sysfs ABI
-------------------------
-Platforms which use the "gpiolib" implementors framework may choose to
-configure a sysfs user interface to GPIOs. This is different from the
-debugfs interface, since it provides control over GPIO direction and
-value instead of just showing a gpio state summary. Plus, it could be
-present on production systems without debugging support.
-
-Given appropriate hardware documentation for the system, userspace could
-know for example that GPIO #23 controls the write protect line used to
-protect boot loader segments in flash memory. System upgrade procedures
-may need to temporarily remove that protection, first importing a GPIO,
-then changing its output state, then updating the code before re-enabling
-the write protection. In normal use, GPIO #23 would never be touched,
-and the kernel would have no need to know about it.
-
-Again depending on appropriate hardware documentation, on some systems
-userspace GPIO can be used to determine system configuration data that
-standard kernels won't know about. And for some tasks, simple userspace
-GPIO drivers could be all that the system really needs.
-
-DO NOT ABUSE SYSFS TO CONTROL HARDWARE THAT HAS PROPER KERNEL DRIVERS.
-PLEASE READ THE DOCUMENT AT Documentation/driver-api/gpio/drivers-on-gpio.rst
-TO AVOID REINVENTING KERNEL WHEELS IN USERSPACE. I MEAN IT. REALLY.
-
-Paths in Sysfs
---------------
-There are three kinds of entries in /sys/class/gpio:
-
-   -	Control interfaces used to get userspace control over GPIOs;
-
-   -	GPIOs themselves; and
-
-   -	GPIO controllers ("gpio_chip" instances).
-
-That's in addition to standard files including the "device" symlink.
-
-The control interfaces are write-only:
-
-    /sys/class/gpio/
-
-	"export" ...
-		Userspace may ask the kernel to export control of
-		a GPIO to userspace by writing its number to this file.
-
-		Example:  "echo 19 > export" will create a "gpio19" node
-		for GPIO #19, if that's not requested by kernel code.
-
-	"unexport" ...
-		Reverses the effect of exporting to userspace.
-
-		Example:  "echo 19 > unexport" will remove a "gpio19"
-		node exported using the "export" file.
-
-GPIO signals have paths like /sys/class/gpio/gpio42/ (for GPIO #42)
-and have the following read/write attributes:
-
-    /sys/class/gpio/gpioN/
-
-	"direction" ...
-		reads as either "in" or "out". This value may
-		normally be written. Writing as "out" defaults to
-		initializing the value as low. To ensure glitch free
-		operation, values "low" and "high" may be written to
-		configure the GPIO as an output with that initial value.
-
-		Note that this attribute *will not exist* if the kernel
-		doesn't support changing the direction of a GPIO, or
-		it was exported by kernel code that didn't explicitly
-		allow userspace to reconfigure this GPIO's direction.
-
-	"value" ...
-		reads as either 0 (low) or 1 (high). If the GPIO
-		is configured as an output, this value may be written;
-		any nonzero value is treated as high.
-
-		If the pin can be configured as interrupt-generating interrupt
-		and if it has been configured to generate interrupts (see the
-		description of "edge"), you can poll(2) on that file and
-		poll(2) will return whenever the interrupt was triggered. If
-		you use poll(2), set the events POLLPRI and POLLERR. If you
-		use select(2), set the file descriptor in exceptfds. After
-		poll(2) returns, either lseek(2) to the beginning of the sysfs
-		file and read the new value or close the file and re-open it
-		to read the value.
-
-	"edge" ...
-		reads as either "none", "rising", "falling", or
-		"both". Write these strings to select the signal edge(s)
-		that will make poll(2) on the "value" file return.
-
-		This file exists only if the pin can be configured as an
-		interrupt generating input pin.
-
-	"active_low" ...
-		reads as either 0 (false) or 1 (true). Write
-		any nonzero value to invert the value attribute both
-		for reading and writing. Existing and subsequent
-		poll(2) support configuration via the edge attribute
-		for "rising" and "falling" edges will follow this
-		setting.
-
-GPIO controllers have paths like /sys/class/gpio/gpiochip42/ (for the
-controller implementing GPIOs starting at #42) and have the following
-read-only attributes:
-
-    /sys/class/gpio/gpiochipN/
-
-	"base" ...
-		same as N, the first GPIO managed by this chip
-
-	"label" ...
-		provided for diagnostics (not always unique)
-
-	"ngpio" ...
-		how many GPIOs this manages (N to N + ngpio - 1)
-
-Board documentation should in most cases cover what GPIOs are used for
-what purposes. However, those numbers are not always stable; GPIOs on
-a daughtercard might be different depending on the base board being used,
-or other cards in the stack. In such cases, you may need to use the
-gpiochip nodes (possibly in conjunction with schematics) to determine
-the correct GPIO number to use for a given signal.
-
-
-Exporting from Kernel code
---------------------------
-Kernel code can explicitly manage exports of GPIOs which have already been
-requested using gpio_request()::
-
-	/* export the GPIO to userspace */
-	int gpiod_export(struct gpio_desc *desc, bool direction_may_change);
-
-	/* reverse gpio_export() */
-	void gpiod_unexport(struct gpio_desc *desc);
-
-	/* create a sysfs link to an exported GPIO node */
-	int gpiod_export_link(struct device *dev, const char *name,
-		      struct gpio_desc *desc);
-
-After a kernel driver requests a GPIO, it may only be made available in
-the sysfs interface by gpiod_export(). The driver can control whether the
-signal direction may change. This helps drivers prevent userspace code
-from accidentally clobbering important system state.
-
-This explicit exporting can help with debugging (by making some kinds
-of experiments easier), or can provide an always-there interface that's
-suitable for documenting as part of a board support package.
-
-After the GPIO has been exported, gpiod_export_link() allows creating
-symlinks from elsewhere in sysfs to the GPIO sysfs node. Drivers can
-use this to provide the interface under their own device in sysfs with
-a descriptive name.
diff --git a/Documentation/translations/zh_CN/gpio.txt b/Documentation/translations/zh_CN/gpio.txt
index 4cb1ba8b8fed..a23ee14fc927 100644
--- a/Documentation/translations/zh_CN/gpio.txt
+++ b/Documentation/translations/zh_CN/gpio.txt
@@ -1,4 +1,4 @@
-Chinese translated version of Documentation/gpio
+Chinese translated version of Documentation/admin-guide/gpio
 
 If you have any comment or update to the content, please contact the
 original document maintainer directly.  However, if you have a problem
@@ -10,7 +10,7 @@ Maintainer: Grant Likely <grant.likely@secretlab.ca>
 		Linus Walleij <linus.walleij@linaro.org>
 Chinese maintainer: Fu Wei <tekkamanninja@gmail.com>
 ---------------------------------------------------------------------
-Documentation/gpio 的中文翻译
+Documentation/admin-guide/gpio 的中文翻译
 
 如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
 交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
diff --git a/MAINTAINERS b/MAINTAINERS
index 6571653ecb40..7fa2c2ca9791 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6867,7 +6867,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git
 S:	Maintained
 F:	Documentation/devicetree/bindings/gpio/
 F:	Documentation/driver-api/gpio/
-F:	Documentation/gpio/
+F:	Documentation/admin-guide/gpio/
 F:	Documentation/ABI/testing/gpio-cdev
 F:	Documentation/ABI/obsolete/sysfs-gpio
 F:	drivers/gpio/
-- 
cgit v1.2.3-55-g7522


From eddeed127b06ea2542dc18f2fe37d383b6369fec Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Sat, 6 Jul 2019 13:38:56 -0300
Subject: docs: don't use nested tables

Nested tables aren't supported for pdf output on Sphinx 1.7.9:

	admin-guide/laptops/sonypi:: nested tables are not yet implemented.
	admin-guide/laptops/toshiba_haps:: nested tables are not yet implemented.
	driver-api/nvdimm/btt:: nested tables are not yet implemented.
	s390/debugging390:: nested tables are not yet implemented.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # laptops
---
 Documentation/admin-guide/laptops/sonypi.rst       | 28 ++++++++++------------
 Documentation/admin-guide/laptops/toshiba_haps.rst |  8 +++----
 Documentation/driver-api/nvdimm/btt.rst            |  2 +-
 Documentation/s390/debugging390.rst                |  2 +-
 4 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/Documentation/admin-guide/laptops/sonypi.rst b/Documentation/admin-guide/laptops/sonypi.rst
index 2a1975ed7ee4..c6eaaf48f7c1 100644
--- a/Documentation/admin-guide/laptops/sonypi.rst
+++ b/Documentation/admin-guide/laptops/sonypi.rst
@@ -53,7 +53,7 @@ module or sonypi.<param>=<value> on the kernel boot line when sonypi is
 statically linked into the kernel). Those options are:
 
 	=============== =======================================================
-	minor: 		minor number of the misc device /dev/sonypi,
+	minor:		minor number of the misc device /dev/sonypi,
 			default is -1 (automatic allocation, see /proc/misc
 			or kernel logs)
 
@@ -89,24 +89,22 @@ statically linked into the kernel). Those options are:
 			set to 0xffffffff, meaning that all possible events
 			will be tried. You can use the following bits to
 			construct your own event mask (from
-			drivers/char/sonypi.h):
-
-				========================	======
-				SONYPI_JOGGER_MASK 		0x0001
-				SONYPI_CAPTURE_MASK 		0x0002
-				SONYPI_FNKEY_MASK 		0x0004
-				SONYPI_BLUETOOTH_MASK 		0x0008
-				SONYPI_PKEY_MASK 		0x0010
-				SONYPI_BACK_MASK 		0x0020
-				SONYPI_HELP_MASK 		0x0040
-				SONYPI_LID_MASK 		0x0080
-				SONYPI_ZOOM_MASK 		0x0100
-				SONYPI_THUMBPHRASE_MASK 	0x0200
+			drivers/char/sonypi.h)::
+
+				SONYPI_JOGGER_MASK		0x0001
+				SONYPI_CAPTURE_MASK		0x0002
+				SONYPI_FNKEY_MASK		0x0004
+				SONYPI_BLUETOOTH_MASK		0x0008
+				SONYPI_PKEY_MASK		0x0010
+				SONYPI_BACK_MASK		0x0020
+				SONYPI_HELP_MASK		0x0040
+				SONYPI_LID_MASK			0x0080
+				SONYPI_ZOOM_MASK		0x0100
+				SONYPI_THUMBPHRASE_MASK		0x0200
 				SONYPI_MEYE_MASK		0x0400
 				SONYPI_MEMORYSTICK_MASK		0x0800
 				SONYPI_BATTERY_MASK		0x1000
 				SONYPI_WIRELESS_MASK		0x2000
-				========================	======
 
 	useinput:	if set (which is the default) two input devices are
 			created, one which interprets the jogdial events as
diff --git a/Documentation/admin-guide/laptops/toshiba_haps.rst b/Documentation/admin-guide/laptops/toshiba_haps.rst
index 11dfc428c080..d28b6c3f2849 100644
--- a/Documentation/admin-guide/laptops/toshiba_haps.rst
+++ b/Documentation/admin-guide/laptops/toshiba_haps.rst
@@ -75,11 +75,11 @@ The sysfs files under /sys/devices/LNXSYSTM:00/LNXSYBUS:00/TOS620A:00/ are:
 protection_level   The protection_level is readable and writeable, and
 		   provides a way to let userspace query the current protection
 		   level, as well as set the desired protection level, the
-		   available protection levels are:
+		   available protection levels are::
 
-		   ============   =======   ==========   ========
-		   0 - Disabled   1 - Low   2 - Medium   3 - High
-		   ============   =======   ==========   ========
+		     ============   =======   ==========   ========
+		     0 - Disabled   1 - Low   2 - Medium   3 - High
+		     ============   =======   ==========   ========
 
 reset_protection   The reset_protection entry is writeable only, being "1"
 		   the only parameter it accepts, it is used to trigger
diff --git a/Documentation/driver-api/nvdimm/btt.rst b/Documentation/driver-api/nvdimm/btt.rst
index 2d8269f834bd..107395c042ae 100644
--- a/Documentation/driver-api/nvdimm/btt.rst
+++ b/Documentation/driver-api/nvdimm/btt.rst
@@ -83,7 +83,7 @@ flags, and the remaining form the internal block number.
 ======== =============================================================
 Bit      Description
 ======== =============================================================
-31 - 30	 Error and Zero flags - Used in the following way:
+31 - 30	 Error and Zero flags - Used in the following way::
 
 	   == ==  ====================================================
 	   31 30  Description
diff --git a/Documentation/s390/debugging390.rst b/Documentation/s390/debugging390.rst
index d49305fd5e1a..73ad0b06c666 100644
--- a/Documentation/s390/debugging390.rst
+++ b/Documentation/s390/debugging390.rst
@@ -170,7 +170,7 @@ currently running at.
 |        +----------------+-------------------------------------------------+
 |        |    32          | Basic Addressing Mode                           |
 |        |                |                                                 |
-|        |                | Used to set addressing mode                     |
+|        |                | Used to set addressing mode::                   |
 |        |                |                                                 |
 |        |                |    +---------+----------+----------+            |
 |        |                |    | PSW 31  | PSW 32   |          |            |
-- 
cgit v1.2.3-55-g7522


From 38cbfed28b3178dd9004064b1a72a992a3b31969 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 9 Jul 2019 12:22:41 -0300
Subject: docs: arm: fix a breakage with pdf output

Add an extra blank line, as otherwise XeLaTex will complain with:

	! LaTeX Error: Too deeply nested.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/arm/spear/overview.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/arm/spear/overview.rst b/Documentation/arm/spear/overview.rst
index 8a1a87aca427..1a77f6b213b6 100644
--- a/Documentation/arm/spear/overview.rst
+++ b/Documentation/arm/spear/overview.rst
@@ -15,6 +15,7 @@ Introduction
   Hierarchy in SPEAr is as follows:
 
   SPEAr (Platform)
+
 	- SPEAr3XX (3XX SOC series, based on ARM9)
 		- SPEAr300 (SOC)
 			- SPEAr300 Evaluation Board
-- 
cgit v1.2.3-55-g7522


From 8bb0776b8b27d548c7e65828ec3a02cb31fe3eed Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 9 Jul 2019 12:36:09 -0300
Subject: docs: block: fix pdf output

Add an extra blank line and use a markup for the enumberated
list, in order to make it possible to build the block book
on pdf format.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/block/biodoc.rst | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/block/biodoc.rst b/Documentation/block/biodoc.rst
index d6e30b680405..c8eb28401332 100644
--- a/Documentation/block/biodoc.rst
+++ b/Documentation/block/biodoc.rst
@@ -9,6 +9,7 @@ Notes on the Generic Block Layer Rewrite in Linux 2.5
 	here might still be useful.
 
 Notes Written on Jan 15, 2002:
+
 	- Jens Axboe <jens.axboe@oracle.com>
 	- Suparna Bhattacharya <suparna@in.ibm.com>
 
@@ -172,8 +173,8 @@ Some new queue property settings:
 
 New queue flags:
 
-	QUEUE_FLAG_CLUSTER (see 3.2.2)
-	QUEUE_FLAG_QUEUED (see 3.2.4)
+	- QUEUE_FLAG_CLUSTER (see 3.2.2)
+	- QUEUE_FLAG_QUEUED (see 3.2.4)
 
 
 ii. High-mem i/o capabilities are now considered the default
@@ -478,7 +479,7 @@ With this multipage bio design:
 - Splitting of an i/o request across multiple devices (as in the case of
   lvm or raid) is achieved by cloning the bio (where the clone points to
   the same bi_io_vec array, but with the index and size accordingly modified)
-- A linked list of bios is used as before for unrelated merges [*]_ - this
+- A linked list of bios is used as before for unrelated merges [#]_ - this
   avoids reallocs and makes independent completions easier to handle.
 - Code that traverses the req list can find all the segments of a bio
   by using rq_for_each_segment.  This handles the fact that a request
@@ -489,7 +490,7 @@ With this multipage bio design:
   [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
   bi_offset an len fields]
 
-.. [*]
+.. [#]
 
 	unrelated merges -- a request ends up containing two or more bios that
 	didn't originate from the same place.
-- 
cgit v1.2.3-55-g7522


From 168869492e7009b6861b615f1d030c99bc805e83 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab
Date: Tue, 9 Jul 2019 13:25:51 -0300
Subject: docs: kbuild: fix build with pdf and fix some minor issues

The tag ".. include" should be replaced by ".. literalinclude" at
issues.rst, otherwise it causes TeX to crash due to excessive usage
of stack with Sphinx 2.0.

While here, solve a few minor issues at the kbuild book output by
adding extra blank lines.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
---
 Documentation/kbuild/issues.rst           | 20 ++++++++++++--------
 Documentation/kbuild/kbuild.rst           |  3 ++-
 Documentation/kbuild/kconfig-language.rst | 12 ++++++++++++
 Documentation/kbuild/kconfig.rst          |  8 ++++++--
 Documentation/kbuild/makefiles.rst        |  1 +
 5 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/Documentation/kbuild/issues.rst b/Documentation/kbuild/issues.rst
index 9fdded4b681c..bdab01f733f6 100644
--- a/Documentation/kbuild/issues.rst
+++ b/Documentation/kbuild/issues.rst
@@ -1,11 +1,15 @@
-Recursion issue #1
-------------------
+================
+Recursion issues
+================
 
- .. include:: Kconfig.recursion-issue-01
-    :literal:
+issue #1
+--------
 
-Recursion issue #2
-------------------
+.. literalinclude:: Kconfig.recursion-issue-01
+   :language: kconfig
 
- .. include:: Kconfig.recursion-issue-02
-    :literal:
+issue #2
+--------
+
+.. literalinclude:: Kconfig.recursion-issue-02
+   :language: kconfig
diff --git a/Documentation/kbuild/kbuild.rst b/Documentation/kbuild/kbuild.rst
index b25548963d70..ce9b99c004ae 100644
--- a/Documentation/kbuild/kbuild.rst
+++ b/Documentation/kbuild/kbuild.rst
@@ -18,7 +18,7 @@ This file lists all modules that are built into the kernel. This is used
 by modprobe to not fail when trying to load something builtin.
 
 modules.builtin.modinfo
---------------------------------------------------
+-----------------------
 This file contains modinfo from all modules that are built into the kernel.
 Unlike modinfo of a separate module, all fields are prefixed with module name.
 
@@ -153,6 +153,7 @@ Install script called when using "make install".
 The default name is "installkernel".
 
 The script will be called with the following arguments:
+
    - $1 - kernel version
    - $2 - kernel image file
    - $3 - kernel map file
diff --git a/Documentation/kbuild/kconfig-language.rst b/Documentation/kbuild/kconfig-language.rst
index 2bc8a7803365..74bef19f69f0 100644
--- a/Documentation/kbuild/kconfig-language.rst
+++ b/Documentation/kbuild/kconfig-language.rst
@@ -53,6 +53,7 @@ A menu entry can have a number of attributes. Not all of them are
 applicable everywhere (see syntax).
 
 - type definition: "bool"/"tristate"/"string"/"hex"/"int"
+
   Every config option must have a type. There are only two basic types:
   tristate and string; the other types are based on these two. The type
   definition optionally accepts an input prompt, so these two examples
@@ -66,11 +67,13 @@ applicable everywhere (see syntax).
 	prompt "Networking support"
 
 - input prompt: "prompt" <prompt> ["if" <expr>]
+
   Every menu entry can have at most one prompt, which is used to display
   to the user. Optionally dependencies only for this prompt can be added
   with "if".
 
 - default value: "default" <expr> ["if" <expr>]
+
   A config option can have any number of default values. If multiple
   default values are visible, only the first defined one is active.
   Default values are not limited to the menu entry where they are
@@ -112,6 +115,7 @@ applicable everywhere (see syntax).
   Optionally dependencies for this default value can be added with "if".
 
 - dependencies: "depends on" <expr>
+
   This defines a dependency for this menu entry. If multiple
   dependencies are defined, they are connected with '&&'. Dependencies
   are applied to all other options within this menu entry (which also
@@ -127,6 +131,7 @@ applicable everywhere (see syntax).
 	default y
 
 - reverse dependencies: "select" <symbol> ["if" <expr>]
+
   While normal dependencies reduce the upper limit of a symbol (see
   below), reverse dependencies can be used to force a lower limit of
   another symbol. The value of the current menu symbol is used as the
@@ -146,6 +151,7 @@ applicable everywhere (see syntax).
 	the illegal configurations all over.
 
 - weak reverse dependencies: "imply" <symbol> ["if" <expr>]
+
   This is similar to "select" as it enforces a lower limit on another
   symbol except that the "implied" symbol's value may still be set to n
   from a direct dependency or with a visible prompt.
@@ -176,6 +182,7 @@ applicable everywhere (see syntax).
   configure that subsystem out without also having to unset these drivers.
 
 - limiting menu display: "visible if" <expr>
+
   This attribute is only applicable to menu blocks, if the condition is
   false, the menu block is not displayed to the user (the symbols
   contained there can still be selected by other symbols, though). It is
@@ -183,12 +190,14 @@ applicable everywhere (see syntax).
   entries. Default value of "visible" is true.
 
 - numerical ranges: "range" <symbol> <symbol> ["if" <expr>]
+
   This allows to limit the range of possible input values for int
   and hex symbols. The user can only input a value which is larger than
   or equal to the first symbol and smaller than or equal to the second
   symbol.
 
 - help text: "help" or "---help---"
+
   This defines a help text. The end of the help text is determined by
   the indentation level, this means it ends at the first line which has
   a smaller indentation than the first line of the help text.
@@ -197,6 +206,7 @@ applicable everywhere (see syntax).
   the file as an aid to developers.
 
 - misc options: "option" <symbol>[=<value>]
+
   Various less common options can be defined via this option syntax,
   which can modify the behaviour of the menu entry and its config
   symbol. These options are currently possible:
@@ -325,6 +335,7 @@ end a menu entry:
 The first five also start the definition of a menu entry.
 
 config::
+
 	"config" <symbol>
 	<config options>
 
@@ -332,6 +343,7 @@ This defines a config symbol <symbol> and accepts any of above
 attributes as options.
 
 menuconfig::
+
 	"menuconfig" <symbol>
 	<config options>
 
diff --git a/Documentation/kbuild/kconfig.rst b/Documentation/kbuild/kconfig.rst
index 88129af7e539..a9a855f894b3 100644
--- a/Documentation/kbuild/kconfig.rst
+++ b/Documentation/kbuild/kconfig.rst
@@ -264,6 +264,7 @@ NCONFIG_MODE
 This mode shows all sub-menus in one large tree.
 
 Example::
+
 	make NCONFIG_MODE=single_menu nconfig
 
 ----------------------------------------------------------------------
@@ -277,9 +278,12 @@ Searching in xconfig:
 	names, so you have to know something close to what you are
 	looking for.
 
-	Example:
+	Example::
+
 		Ctrl-F hotplug
-	or
+
+	or::
+
 		Menu: File, Search, hotplug
 
 	lists all config symbol entries that contain "hotplug" in
diff --git a/Documentation/kbuild/makefiles.rst b/Documentation/kbuild/makefiles.rst
index 093f2d79ab95..f31158457753 100644
--- a/Documentation/kbuild/makefiles.rst
+++ b/Documentation/kbuild/makefiles.rst
@@ -384,6 +384,7 @@ more details, with real examples.
 -----------------------
 
 	Kbuild tracks dependencies on the following:
+
 	1) All prerequisite files (both `*.c` and `*.h`)
 	2) `CONFIG_` options used in all prerequisite files
 	3) Command-line used to compile target
-- 
cgit v1.2.3-55-g7522