summaryrefslogtreecommitdiffstats
path: root/docs/devel
diff options
context:
space:
mode:
Diffstat (limited to 'docs/devel')
-rw-r--r--docs/devel/decodetree.rst221
-rw-r--r--docs/devel/index.rst2
-rw-r--r--docs/devel/s390-dasd-ipl.txt133
3 files changed, 355 insertions, 1 deletions
diff --git a/docs/devel/decodetree.rst b/docs/devel/decodetree.rst
new file mode 100644
index 0000000000..44ac621ea8
--- /dev/null
+++ b/docs/devel/decodetree.rst
@@ -0,0 +1,221 @@
+========================
+Decodetree Specification
+========================
+
+A *decodetree* is built from instruction *patterns*. A pattern may
+represent a single architectural instruction or a group of same, depending
+on what is convenient for further processing.
+
+Each pattern has both *fixedbits* and *fixedmask*, the combination of which
+describes the condition under which the pattern is matched::
+
+ (insn & fixedmask) == fixedbits
+
+Each pattern may have *fields*, which are extracted from the insn and
+passed along to the translator. Examples of such are registers,
+immediates, and sub-opcodes.
+
+In support of patterns, one may declare *fields*, *argument sets*, and
+*formats*, each of which may be re-used to simplify further definitions.
+
+Fields
+======
+
+Syntax::
+
+ field_def := '%' identifier ( unnamed_field )+ ( !function=identifier )?
+ unnamed_field := number ':' ( 's' ) number
+
+For *unnamed_field*, the first number is the least-significant bit position
+of the field and the second number is the length of the field. If the 's' is
+present, the field is considered signed. If multiple ``unnamed_fields`` are
+present, they are concatenated. In this way one can define disjoint fields.
+
+If ``!function`` is specified, the concatenated result is passed through the
+named function, taking and returning an integral value.
+
+FIXME: the fields of the structure into which this result will be stored
+is restricted to ``int``. Which means that we cannot expand 64-bit items.
+
+Field examples:
+
++---------------------------+---------------------------------------------+
+| Input | Generated code |
++===========================+=============================================+
+| %disp 0:s16 | sextract(i, 0, 16) |
++---------------------------+---------------------------------------------+
+| %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) |
++---------------------------+---------------------------------------------+
+| %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | |
+| | extract(i, 1, 1) << 10 | |
+| | extract(i, 2, 10) |
++---------------------------+---------------------------------------------+
+| %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | |
+| !function=expand_shimm8 | extract(i, 13, 1)) |
++---------------------------+---------------------------------------------+
+
+Argument Sets
+=============
+
+Syntax::
+
+ args_def := '&' identifier ( args_elt )+ ( !extern )?
+ args_elt := identifier
+
+Each *args_elt* defines an argument within the argument set.
+Each argument set will be rendered as a C structure "arg_$name"
+with each of the fields being one of the member arguments.
+
+If ``!extern`` is specified, the backing structure is assumed
+to have been already declared, typically via a second decoder.
+
+Argument sets are useful when one wants to define helper functions
+for the translator functions that can perform operations on a common
+set of arguments. This can ensure, for instance, that the ``AND``
+pattern and the ``OR`` pattern put their operands into the same named
+structure, so that a common ``gen_logic_insn`` may be able to handle
+the operations common between the two.
+
+Argument set examples::
+
+ &reg3 ra rb rc
+ &loadstore reg base offset
+
+
+Formats
+=======
+
+Syntax::
+
+ fmt_def := '@' identifier ( fmt_elt )+
+ fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref
+ fixedbit_elt := [01.-]+
+ field_elt := identifier ':' 's'? number
+ field_ref := '%' identifier | identifier '=' '%' identifier
+ args_ref := '&' identifier
+
+Defining a format is a handy way to avoid replicating groups of fields
+across many instruction patterns.
+
+A *fixedbit_elt* describes a contiguous sequence of bits that must
+be 1, 0, or don't care. The difference between '.' and '-'
+is that '.' means that the bit will be covered with a field or a
+final 0 or 1 from the pattern, and '-' means that the bit is really
+ignored by the cpu and will not be specified.
+
+A *field_elt* describes a simple field only given a width; the position of
+the field is implied by its position with respect to other *fixedbit_elt*
+and *field_elt*.
+
+If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
+Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
+
+A *field_ref* incorporates a field by reference. This is the only way to
+add a complex field to a format. A field may be renamed in the process
+via assignment to another identifier. This is intended to allow the
+same argument set be used with disjoint named fields.
+
+A single *args_ref* may specify an argument set to use for the format.
+The set of fields in the format must be a subset of the arguments in
+the argument set. If an argument set is not specified, one will be
+inferred from the set of fields.
+
+It is recommended, but not required, that all *field_ref* and *args_ref*
+appear at the end of the line, not interleaving with *fixedbit_elf* or
+*field_elt*.
+
+Format examples::
+
+ @opr ...... ra:5 rb:5 ... 0 ....... rc:5
+ @opi ...... ra:5 lit:8 1 ....... rc:5
+
+Patterns
+========
+
+Syntax::
+
+ pat_def := identifier ( pat_elt )+
+ pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
+ fmt_ref := '@' identifier
+ const_elt := identifier '=' number
+
+The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
+A pattern that does not specify a named format will have one inferred
+from a referenced argument set (if present) and the set of fields.
+
+A *const_elt* allows a argument to be set to a constant value. This may
+come in handy when fields overlap between patterns and one has to
+include the values in the *fixedbit_elt* instead.
+
+The decoder will call a translator function for each pattern matched.
+
+Pattern examples::
+
+ addl_r 010000 ..... ..... .... 0000000 ..... @opr
+ addl_i 010000 ..... ..... .... 0000000 ..... @opi
+
+which will, in part, invoke::
+
+ trans_addl_r(ctx, &arg_opr, insn)
+
+and::
+
+ trans_addl_i(ctx, &arg_opi, insn)
+
+Pattern Groups
+==============
+
+Syntax::
+
+ group := '{' ( pat_def | group )+ '}'
+
+A *group* begins with a lone open-brace, with all subsequent lines
+indented two spaces, and ending with a lone close-brace. Groups
+may be nested, increasing the required indentation of the lines
+within the nested group to two spaces per nesting level.
+
+Unlike ungrouped patterns, grouped patterns are allowed to overlap.
+Conflicts are resolved by selecting the patterns in order. If all
+of the fixedbits for a pattern match, its translate function will
+be called. If the translate function returns false, then subsequent
+patterns within the group will be matched.
+
+The following example from PA-RISC shows specialization of the *or*
+instruction::
+
+ {
+ {
+ nop 000010 ----- ----- 0000 001001 0 00000
+ copy 000010 00000 r1:5 0000 001001 0 rt:5
+ }
+ or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5
+ }
+
+When the *cf* field is zero, the instruction has no side effects,
+and may be specialized. When the *rt* field is zero, the output
+is discarded and so the instruction has no effect. When the *rt2*
+field is zero, the operation is ``reg[rt] | 0`` and so encodes
+the canonical register copy operation.
+
+The output from the generator might look like::
+
+ switch (insn & 0xfc000fe0) {
+ case 0x08000240:
+ /* 000010.. ........ ....0010 010..... */
+ if ((insn & 0x0000f000) == 0x00000000) {
+ /* 000010.. ........ 00000010 010..... */
+ if ((insn & 0x0000001f) == 0x00000000) {
+ /* 000010.. ........ 00000010 01000000 */
+ extract_decode_Fmt_0(&u.f_decode0, insn);
+ if (trans_nop(ctx, &u.f_decode0)) return true;
+ }
+ if ((insn & 0x03e00000) == 0x00000000) {
+ /* 00001000 000..... 00000010 010..... */
+ extract_decode_Fmt_1(&u.f_decode1, insn);
+ if (trans_copy(ctx, &u.f_decode1)) return true;
+ }
+ }
+ extract_decode_Fmt_2(&u.f_decode2, insn);
+ if (trans_or(ctx, &u.f_decode2)) return true;
+ return false;
+ }
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 6b11e49caa..ebbab636ce 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -19,4 +19,4 @@ Contents:
migration
stable-process
testing
-
+ decodetree
diff --git a/docs/devel/s390-dasd-ipl.txt b/docs/devel/s390-dasd-ipl.txt
new file mode 100644
index 0000000000..9107e048e4
--- /dev/null
+++ b/docs/devel/s390-dasd-ipl.txt
@@ -0,0 +1,133 @@
+*****************************
+***** s390 hardware IPL *****
+*****************************
+
+The s390 hardware IPL process consists of the following steps.
+
+1. A READ IPL ccw is constructed in memory location 0x0.
+ This ccw, by definition, reads the IPL1 record which is located on the disk
+ at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw
+ so when it is complete another ccw will be fetched and executed from memory
+ location 0x08.
+
+2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00.
+ IPL1 data is 24 bytes in length and consists of the following pieces of
+ information: [psw][read ccw][tic ccw]. When the machine executes the Read
+ IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at
+ location 0x0. Then the ccw program at 0x08 which consists of a read
+ ccw and a tic ccw is automatically executed because of the chain flag from
+ the original READ IPL ccw. The read ccw will read the IPL2 data into memory
+ and the TIC (Transfer In Channel) will transfer control to the channel
+ program contained in the IPL2 data. The TIC channel command is the
+ equivalent of a branch/jump/goto instruction for channel programs.
+ NOTE: The ccws in IPL1 are defined by the architecture to be format 0.
+
+3. Execute IPL2.
+ The TIC ccw instruction at the end of the IPL1 channel program will begin
+ the execution of the IPL2 channel program. IPL2 is stage-2 of the boot
+ process and will contain a larger channel program than IPL1. The point of
+ IPL2 is to find and load either the operating system or a small program that
+ loads the operating system from disk. At the end of this step all or some of
+ the real operating system is loaded into memory and we are ready to hand
+ control over to the guest operating system. At this point the guest
+ operating system is entirely responsible for loading any more data it might
+ need to function. NOTE: The IPL2 channel program might read data into memory
+ location 0 thereby overwriting the IPL1 psw and channel program. This is ok
+ as long as the data placed in location 0 contains a psw whose instruction
+ address points to the guest operating system code to execute at the end of
+ the IPL/boot process.
+ NOTE: The ccws in IPL2 are defined by the architecture to be format 0.
+
+4. Start executing the guest operating system.
+ The psw that was loaded into memory location 0 as part of the ipl process
+ should contain the needed flags for the operating system we have loaded. The
+ psw's instruction address will point to the location in memory where we want
+ to start executing the operating system. This psw is loaded (via LPSW
+ instruction) causing control to be passed to the operating system code.
+
+In a non-virtualized environment this process, handled entirely by the hardware,
+is kicked off by the user initiating a "Load" procedure from the hardware
+management console. This "Load" procedure crafts a special "Read IPL" ccw in
+memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking
+off the reading of IPL1 data. Since the channel program from IPL1 will be
+written immediately after the special "Read IPL" ccw, the IPL1 channel program
+will be executed immediately (the special read ccw has the chaining bit turned
+on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel
+program to be executed automatically. After this sequence completes the "Load"
+procedure then loads the psw from 0x0.
+
+**********************************************************
+***** How this all pertains to QEMU (and the kernel) *****
+**********************************************************
+
+In theory we should merely have to do the following to IPL/boot a guest
+operating system from a DASD device:
+
+1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on.
+2. Execute channel program at 0x0.
+3. LPSW 0x0.
+
+However, our emulation of the machine's channel program logic within the kernel
+is missing one key feature that is required for this process to work:
+non-prefetch of ccw data.
+
+When we start a channel program we pass the channel subsystem parameters via an
+ORB (Operation Request Block). One of those parameters is a prefetch bit. If the
+bit is on then the vfio-ccw kernel driver is allowed to read the entire channel
+program from guest memory before it starts executing it. This means that any
+channel commands that read additional channel commands will not work as expected
+because the newly read commands will only exist in guest memory and NOT within
+the kernel's channel subsystem memory. The kernel vfio-ccw driver currently
+requires this bit to be on for all channel programs. This is a problem because
+the IPL process consists of transferring control from the "Read IPL" ccw
+immediately to the IPL1 channel program that was read by "Read IPL".
+
+Not being able to turn off prefetch will also prevent the TIC at the end of the
+IPL1 channel program from transferring control to the IPL2 channel program.
+
+Lastly, in some cases (the zipl bootloader for example) the IPL2 program also
+transfers control to another channel program segment immediately after reading
+it from the disk. So we need to be able to handle this case.
+
+**************************
+***** What QEMU does *****
+**************************
+
+Since we are forced to live with prefetch we cannot use the very simple IPL
+procedure we defined in the preceding section. So we compensate by doing the
+following.
+
+1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit.
+2. Execute "Read IPL" at 0x0.
+
+ So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08.
+
+4. Write a custom channel program that will seek to the IPL2 record and then
+ execute the READ and TIC ccws from IPL1. Normally the seek is not required
+ because after reading the IPL1 record the disk is automatically positioned
+ to read the very next record which will be IPL2. But since we are not reading
+ both IPL1 and IPL2 as part of the same channel program we must manually set
+ the position.
+
+5. Grab the target address of the TIC instruction from the IPL1 channel program.
+ This address is where the IPL2 channel program starts.
+
+ Now IPL2 is loaded into memory somewhere, and we know the address.
+
+6. Execute the IPL2 channel program at the address obtained in step #5.
+
+ Because this channel program can be dynamic, we must use a special algorithm
+ that detects a READ immediately followed by a TIC and breaks the ccw chain
+ by turning off the chain bit in the READ ccw. When control is returned from
+ the kernel/hardware to the QEMU bios code we immediately issue another start
+ subchannel to execute the remaining TIC instruction. This causes the entire
+ channel program (starting from the TIC) and all needed data to be refetched
+ thereby stepping around the limitation that would otherwise prevent this
+ channel program from executing properly.
+
+ Now the operating system code is loaded somewhere in guest memory and the psw
+ in memory location 0x0 will point to entry code for the guest operating
+ system.
+
+7. LPSW 0x0.
+ LPSW transfers control to the guest operating system and we're done.