summaryrefslogtreecommitdiffstats
path: root/hw/block/nvme.h
Commit message (Collapse)AuthorAgeFilesLines
* hw/block/nvme: add support for the format nvm commandMinwoo Im2021-03-181-0/+1
| | | | | | | | | | | | | | | | | | | | Format NVM admin command can make a namespace or namespaces to be with different LBA size and metadata size with protection information types. This patch introduces Format NVM command with LBA format, Metadata, and Protection Information for the device. The secure erase operation things and support for formatting zoned namespaces are yet to be added. The parameter checks inside of this patch has been referred from Keith's old branch. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> [anaidu.gollu: rebased on e2e] Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: rebased for reworked aio tracking] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: add non-mdts command size limit for verifyKlaus Jensen2021-03-181-0/+1
| | | | | | | | | Verify is not subject to MDTS, so a single Verify command may result in excessive amounts of allocated memory. Impose a limit on the data size by adding support for TP 4040 ("Non-MDTS Command Size Limits"). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: add verify commandGollu Appalanaidu2021-03-181-0/+1
| | | | | | | | | See NVM Express 1.4, section 6.14 ("Verify Command"). Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: rebased, refactored for e2e] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: end-to-end data protectionKlaus Jensen2021-03-181-0/+31
| | | | | | | | | | | | | | | | Add support for namespaces formatted with protection information. The type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is selected with the `pi` nvme-ns device parameter. If the number of metadata bytes is larger than 8, the `pil` nvme-ns device parameter may be used to control the location of the 8-byte DIF tuple. The default `pil` value of '0', causes the DIF tuple to be transferred as the last 8 bytes of the metadata. Set to 1 to store this in the first eight bytes instead. Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: assert namespaces array indicesKlaus Jensen2021-03-181-2/+8
| | | | | | | | | | | | | | | | | | | Coverity complains about a possible memory corruption in the nvme_ns_attach and _detach functions. While we should not (famous last words) be able to reach this function without nsid having previously been validated, this is still an open door for future misuse. Make Coverity and maintainers happy by asserting that the index into the array is valid. Also, while not detected by Coverity (yet), add an assert in nvme_subsys_ns and nvme_subsys_register_ns as well since a similar issue is exists there. Fixes: 037953b5b299 ("hw/block/nvme: support namespace detach") Fixes: CID 1450757 Fixes: CID 1450758 Cc: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
* hw/block/nvme: support changed namespace asynchronous eventMinwoo Im2021-03-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | If namespace inventory is changed due to some reasons (e.g., namespace attachment/detachment), controller can send out event notifier to the host to manage namespaces. This patch sends out the AEN to the host after either attach or detach namespaces from controllers. To support clear of the event from the controller, this patch also implemented Get Log Page command for Changed Namespace List log type. To return namespace id list through the command, when namespace inventory is updated, id is added to the per-controller list (changed_ns_list). To indicate the support of this async event, this patch set OAES(Optional Asynchronous Events Supported) in Identify Controller data structure. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: support namespace attachment commandMinwoo Im2021-03-091-0/+5
| | | | | | | | | | | | | | | | | | | | This patch supports Namespace Attachment command for the pre-defined nvme-ns device nodes. Of course, attach/detach namespace should only be supported in case 'subsys' is given. This is because if we detach a namespace from a controller, somebody needs to manage the detached, but allocated namespace in the NVMe subsystem. As command effect for the namespace attachment command is registered, the host will be notified that namespace inventory is changed so that host will rescan the namespace inventory after this command. For example, kernel driver manages this command effect via passthru IOCTL. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: rebased for dma refactor] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: fix allocated namespace list to 256Minwoo Im2021-03-091-0/+6
| | | | | | | | | | | | | | | | | | | | Expand allocated namespace list (subsys->namespaces) to have 256 entries which is a value lager than at least NVME_MAX_NAMESPACES which is for attached namespace list in a controller. Allocated namespace list should at least larger than attached namespace list. n->num_namespaces = NVME_MAX_NAMESPACES; The above line will set the NN field by id->nn so that the subsystem should also prepare at least this number of namespace list entries. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: support namespace detachMinwoo Im2021-03-091-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that now we have nvme-subsys device supported, we can manage namespace allocated, but not attached: detached. This patch introduced a parameter for nvme-ns device named 'detached'. This parameter indicates whether the given namespace device is detached from a entire NVMe subsystem('subsys' given case, shared namespace) or a controller('bus' given case, private namespace). - Allocated namespace 1) Shared ns in the subsystem 'subsys0': -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true 2) Private ns for the controller 'nvme0' of the subsystem 'subsys0': -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true 3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns: -device nvme,serial=foo,id=nvme0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: try to deal with the iov/qsg dualityKlaus Jensen2021-03-091-2/+15
| | | | | | | | Introduce NvmeSg and try to deal with that pesky qsg/iov duality that haunts all the memory-related functions. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: report non-mdts command size limit for dsmGollu Appalanaidu2021-03-091-0/+2
| | | | | | | | | | Dataset Management is not subject to MDTS, but exceeded a certain size per range causes internal looping. Report this limit (DMRSL) in the NVM command set specific identify controller data structure. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: align zoned.zasl with mdtsKlaus Jensen2021-03-091-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data Transfer Size), that is, it is a value in units of the minimum memory page size (CAP.MPSMIN) and is reported as a power of two. The 'mdts' nvme device parameter is specified as in the spec, but the 'zoned.append_size_limit' parameter is specified in bytes. This is suboptimal for a number of reasons: 1. It is just plain confusing wrt. the definition of mdts. 2. There is a lot of complexity involved in validating the value; it must be a power of two, it should be larger than 4k, if it is zero we set it internally to mdts, but still report it as zero. 3. While "hw/block/nvme: improve invalid zasl value reporting" slightly improved the handling of the parameter, the validation is still wrong; it does not depend on CC.MPS, it depends on CAP.MPSMIN. And we are not even checking that it is actually less than or equal to MDTS, which is kinda the *one* condition it must satisfy. Fix this by defining zasl exactly like mdts and checking the one thing that it must satisfy (that it is less than or equal to mdts). Also, change the default value from 128KiB to 0 (aka, whatever mdts is). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: add simple copy commandKlaus Jensen2021-03-091-0/+1
| | | | | | | | | | | | | Add support for TP 4065a ("Simple Copy Command"), v2020.05.04 ("Ratified"). The implementation uses a bounce buffer to first read in the source logical blocks, then issue a write of that bounce buffer. The default maximum number of source logical blocks is 128, translating to 512 KiB for 4k logical blocks which aligns with the default value of MDTS. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: support for multi-controller in subsystemMinwoo Im2021-03-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | We have nvme-subsys and nvme devices mapped together. To support multi-controller scheme to this setup, controller identifier(id) has to be managed. Earlier, cntlid(controller id) used to be always 0 because we didn't have any subsystem scheme that controller id matters. This patch introduced 'cntlid' attribute to the nvme controller instance(NvmeCtrl) and make it allocated by the nvme-subsys device mapped to the controller. If nvme-subsys is not given to the controller, then it will always be 0 as it was. Added 'ctrls' array in the nvme-subsys instance to manage attached controllers to the subsystem with a limit(32). This patch didn't take list for the controllers to make it seamless with nvme-ns device. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: support to map controller to a subsystemMinwoo Im2021-03-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | nvme controller(nvme) can be mapped to a NVMe subsystem(nvme-subsys). This patch maps a controller to a subsystem by adding a parameter 'subsys' to the nvme device. To map a controller to a subsystem, we need to put nvme-subsys first and then maps the subsystem to the controller: -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 If 'subsys' property is not given to the nvme controller, then subsystem NQN will be created with serial (e.g., 'foo' in above example), Otherwise, it will be based on subsys id (e.g., 'subsys0' in above example). Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: move cmb logic to v1.4Padmakar Kalghatgi2021-02-081-2/+8
| | | | | | | | | | | | | | | Implement v1.4 logic for configuring the Controller Memory Buffer. By default, the v1.4 scheme will be used (CMB must be explicitly enabled by the host), so drivers that only support v1.3 will not be able to use the CMB anymore. To retain the v1.3 behavior, set the boolean 'legacy-cmb' nvme device parameter. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Padmakar Kalghatgi <p.kalghatgi@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: add PMR RDS/WDS supportNaveen Nagar2021-02-081-1/+5
| | | | | | | | | Add support for the PMRMSCL and PMRMSCU MMIO registers. This allows adding RDS/WDS support for PMR as well. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: move msix table and pba to BAR 0Klaus Jensen2021-02-081-0/+1
| | | | | | | | | | | | | | | | | In the interest of supporting both CMB and PMR to be enabled on the same device, move the MSI-X table and pending bit array out of BAR 4 and into BAR 0. This is a simplified version of the patch contributed by Andrzej Jakowski (see [1]). Leaving the CMB at offset 0 removes the need for changes to CMB address mapping code. [1]: https://lore.kernel.org/qemu-devel/20200729220107.37758-3-andrzej.jakowski@linux.intel.com/ Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Tested-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: add smart_critical_warning propertyzhenwei pi2021-02-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a very low probability that hitting physical NVMe disk hardware critical warning case, it's hard to write & test a monitor agent service. For debugging purposes, add a new 'smart_critical_warning' property to emulate this situation. The orignal version of this change is implemented by adding a fixed property which could be initialized by QEMU command line. Suggested by Philippe & Klaus, rework like current version. Test with this patch: 1, change smart_critical_warning property for a running VM: #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set", "arguments": { "path": "/machine/peripheral-anon/device[0]", "property": "smart_critical_warning", "value":16 } }' 2, run smartctl in guest #smartctl -H -l error /dev/nvme0n1 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - volatile memory backup device has failed Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: open code for volatile write cacheMinwoo Im2021-02-081-1/+0Star
| | | | | | | | | | | | | | | Volatile Write Cache(VWC) feature is set in nvme_ns_setup() in the initial time. This feature is related to block device backed, but this feature is controlled in controller level via Set/Get Features command. This patch removed dependency between nvme and nvme-ns to manage the VWC flag value. Also, it open coded the Get Features for VWC to check all namespaces attached to the controller, and if false detected, return directly false. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [k.jensen: report write cache preset if present on ANY namespace] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: add missing string representations for commandsKlaus Jensen2021-02-081-0/+4
| | | | | | | | Add missing string representations for a couple of new commands. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
* hw/block/nvme: Support Zoned Namespace Command SetDmitry Fomichev2021-02-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The emulation code has been changed to advertise NVM Command Set when "zoned" device property is not set (default) and Zoned Namespace Command Set otherwise. Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. Define trace events where needed in newly introduced code. In order to improve scalability, all open, closed and full zones are organized in separate linked lists. Consequently, almost all zone operations don't require scanning of the entire zone array (which potentially can be quite large) - it is only necessary to enumerate one or more zone lists. Handlers for three new NVMe commands introduced in Zoned Namespace Command Set specification are added, namely for Zone Management Receive, Zone Management Send and Zone Append. Device initialization code has been extended to create a proper configuration for zoned operation using device properties. Read/Write command handler is modified to only allow writes at the write pointer if the namespace is zoned. For Zone Append command, writes implicitly happen at the write pointer and the starting write pointer value is returned as the result of the command. Write Zeroes handler is modified to add zoned checks that are identical to those done as a part of Write flow. Subsequent commits in this series add ZDE support and checks for active and open zone limits. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
* hw/block/nvme: add the dataset management commandKlaus Jensen2021-02-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the Dataset Management command and the Deallocate attribute. Deallocation results in discards being sent to the underlying block device. Whether of not the blocks are actually deallocated is affected by the same factors as Write Zeroes (see previous commit). format | discard | dsm (512B) dsm (4KiB) dsm (64KiB) -------------------------------------------------------- qcow2 ignore n n n qcow2 unmap n n y raw ignore n n n raw unmap n y y Again, a raw format and 4KiB LBAs are preferable. In order to set the Namespace Preferred Deallocate Granularity and Alignment fields (NPDG and NPDA), choose a sane minimum discard granularity of 4KiB. If we are using a passthru device supporting discard at a 512B granularity, user should set the discard_granularity property explicitly. NPDG and NPDA will also account for the cluster_size of the block driver if required (i.e. for QCOW2). See NVM Express 1.3d, Section 6.7 ("Dataset Management command"). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: change controller pci idKlaus Jensen2020-10-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two reasons for changing this: 1. The nvme device currently uses an internal Intel device id. 2. Since commits "nvme: fix write zeroes offset and count" and "nvme: support multiple namespaces" the controller device no longer has the quirks that the Linux kernel think it has. As the quirks are applied based on pci vendor and device id, change them to get rid of the quirks. To keep backward compatibility, add a new 'use-intel-id' parameter to the nvme device to force use of the Intel vendor and device id. This is off by default but add a compat property to set this for 5.1 machines and older. If a 5.1 machine is booted (or the use-intel-id parameter is explicitly set to true), the Linux kernel will just apply these unnecessary quirks: 1. NVME_QUIRK_IDENTIFY_CNS which says that the device does not support anything else than values 0x0 and 0x1 for CNS (Identify Namespace and Identify Namespace). With multiple namespace support, this just means that the kernel will "scan" namespaces instead of using "Active Namespace ID list" (CNS 0x2). 2. NVME_QUIRK_DISABLE_WRITE_ZEROES. The nvme device started out with a broken Write Zeroes implementation which has since been fixed in commit 9d6459d21a6e ("nvme: fix write zeroes offset and count"). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
* hw/block/nvme: support multiple namespacesKlaus Jensen2020-10-271-25/+21Star
| | | | | | | | | | | | | | | | | | | | | | | This adds support for multiple namespaces by introducing a new 'nvme-ns' device model. The nvme device creates a bus named from the device name ('id'). The nvme-ns devices then connect to this and registers themselves with the nvme device. This changes how an nvme device is created. Example with two namespaces: -drive file=nvme0n1.img,if=none,id=disk1 -drive file=nvme0n2.img,if=none,id=disk2 -device nvme,serial=deadbeef,id=nvme0 -device nvme-ns,drive=disk1,bus=nvme0,nsid=1 -device nvme-ns,drive=disk2,bus=nvme0,nsid=2 The drive property is kept on the nvme device to keep the change backward compatible, but the property is now optional. Specifying a drive for the nvme device will always create the namespace with nsid 1. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
* hw/block/nvme: refactor aio submissionKlaus Jensen2020-10-271-0/+14
| | | | | | | | | | This pulls block layer aio submission/completion to common functions. For completions, additionally map an AIO error to the Unrecovered Read and Write Fault status codes. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: add symbolic command name to trace eventsKlaus Jensen2020-10-271-0/+28
| | | | | | | | | Add the symbolic command name to the pci_nvme_{io,admin}_cmd and pci_nvme_rw trace events. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: add a lba to bytes helperKlaus Jensen2020-10-271-0/+6
| | | | | | | | Add the nvme_l2b helper and use it for converting NLB and SLBA to byte counts and offsets. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
* hw/block/nvme: add ns/cmd references in NvmeRequestKlaus Jensen2020-09-021-0/+2
| | | | | | | | | Instead of passing around the NvmeNamespace and the NvmeCmd, add them as members in the NvmeRequest structure. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
* hw/block/nvme: add check for mdtsKlaus Jensen2020-09-021-0/+1
| | | | | | | | | Add 'mdts' device parameter to control the Maximum Data Transfer Size of the controller and check that it is respected. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
* hw/block/nvme: remove redundant has_sg memberKlaus Jensen2020-09-021-1/+0Star
| | | | | | | | Remove the has_sg member from NvmeRequest since it's redundant. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
* hw/block/nvme: enforce valid queue creation sequenceKlaus Jensen2020-09-021-0/+1
| | | | | | | | | | Support returning Command Sequence Error if Set Features on Number of Queues is called after queues have been created. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-17-its@irrelevant.dk>
* hw/block/nvme: move NvmeFeatureVal into hw/block/nvme.hKlaus Jensen2020-09-021-0/+8
| | | | | | | | | | | | | | The NvmeFeatureVal does not belong with the spec-related data structures in include/block/nvme.h that is shared between the block-level nvme driver and the emulated nvme device. Move it into the nvme device specific header file as it is the only user of the structure. Also, remove the unused members. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-10-its@irrelevant.dk>
* hw/block/nvme: add support for the asynchronous event request commandKlaus Jensen2020-09-021-1/+9
| | | | | | | | | | | | | | | | | | Add support for the Asynchronous Event Request command. Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d, Section 5.2 ("Asynchronous Event Request command"). Mostly imported from Keith's qemu-nvme tree. Modified with a max number of queued events (controllable with the aer_max_queued device parameter). The spec states that the controller *should* retain events, so we do best effort here. Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-9-its@irrelevant.dk>
* hw/block/nvme: add support for the get log page commandKlaus Jensen2020-09-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Add support for the Get Log Page command and basic implementations of the mandatory Error Information, SMART / Health Information and Firmware Slot Information log pages. In violation of the specification, the SMART / Health Information log page does not persist information over the lifetime of the controller because the device has no place to store such persistent state. Note that the LPA field in the Identify Controller data structure intentionally has bit 0 cleared because there is no namespace specific information in the SMART / Health information log page. Required for compliance with NVMe revision 1.3d. See NVM Express 1.3d, Section 5.14 ("Get Log Page command"). Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200706061303.246057-8-its@irrelevant.dk>
* hw/block/nvme: add temperature threshold featureKlaus Jensen2020-09-021-0/+1
| | | | | | | | | | | | | It might seem weird to implement this feature for an emulated device, but it is mandatory to support and the feature is useful for testing asynchronous event request support, which will be added in a later patch. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20200706061303.246057-6-its@irrelevant.dk>
* hw/block/nvme: add msix_qsize parameterKlaus Jensen2020-06-171-0/+1
| | | | | | | | | | | | | | | | | Decouple the requested maximum number of ioqpairs (param max_ioqpairs) from the number of MSI-X interrupt vectors by introducing a new msix_qsize parameter and initialize MSI-X with that. This allows emulating a device that has fewer vectors than I/O queue pairs and also allows more than 2048 queue pairs. To keep the device behaving as previously, use a msix_qsize default of 65 (default max_ioqpairs + 1). This decoupling was actually suggested by Maxim some time ago in a slightly different context, so adding a Suggested-by. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Message-Id: <20200609190333.59390-22-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* hw/block/nvme: add namespace helpersKlaus Jensen2020-06-171-0/+17
| | | | | | | | | | | Introduce some small helpers to make the next patches easier on the eye. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Message-Id: <20200609190333.59390-14-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* hw/block/nvme: remove redundant cmbloc/cmbsz membersKlaus Jensen2020-06-171-2/+0Star
| | | | | | | | | Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Message-Id: <20200609190333.59390-10-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* hw/block/nvme: add max_ioqpairs device parameterKlaus Jensen2020-06-171-1/+2
| | | | | | | | | | | | | | | | The num_queues device paramater has a slightly confusing meaning because it accounts for the admin queue pair which is not really optional. Secondly, it is really a maximum value of queues allowed. Add a new max_ioqpairs parameter that only accounts for I/O queue pairs, but keep num_queues for compatibility. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Message-Id: <20200609190333.59390-9-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* hw/block/nvme: fix pin-based interrupt behaviorKlaus Jensen2020-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | First, since the device only supports MSI-X or pin-based interrupt, if MSI-X is not enabled, it should not accept interrupt vectors different from 0 when creating completion queues. Secondly, the irq_status NvmeCtrl member is meant to be compared to the INTMS register, so it should only be 32 bits wide. And it is really only useful when used with multi-message MSI. Third, since we do not force a 1-to-1 correspondence between cqid and interrupt vector, the irq_status register should not have bits set according to cqid, but according to the associated interrupt vector. Fix these issues, but keep irq_status available so we can easily support multi-message MSI down the line. Fixes: 5e9aa92eb1a5 ("hw/block: Fix pin-based interrupt behaviour of NVMe") Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Message-Id: <20200609190333.59390-8-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* hw/block/nvme: move device parameters to separate structKlaus Jensen2020-06-171-3/+8
| | | | | | | | | | | | Move device configuration parameters to separate struct to make it explicit what is configurable and what is set internally. Signed-off-by: Klaus Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200609190333.59390-5-its@irrelevant.dk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* nvme: introduce PMR support from NVMe 1.4 specAndrzej Jakowski2020-04-301-0/+2
| | | | | | | | | | | | | | | This patch introduces support for PMR that has been defined as part of NVMe 1.4 spec. User can now specify a pmrdev option that should point to HostMemoryBackend. pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe device. Guest OS can perform mmio read and writes to the PMR region that will stay persistent across system reboot. Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200330164656.9348-1-andrzej.jakowski@linux.intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* nvme: add Get/Set Feature Timestamp supportKenneth Heitke2019-06-041-0/+2
| | | | | | Signed-off-by: Kenneth Heitke <kenneth.heitke@intel.com> Reviewed-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qemu/queue.h: leave head structs anonymous unless necessaryPaolo Bonzini2019-01-111-4/+4
| | | | | | | | | | | | Most list head structs need not be given a name. In most cases the name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV or reverse iteration, but this does not apply to lists of other kinds, and even for QTAILQ in practice this is only rarely needed. In addition, we will soon reimplement those macros completely so that they do not need a name for the head struct. So clean up everything, not giving a name except in the rare case where it is necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/block/nvme: Include "qemu/cutils.h" directly in the source filePhilippe Mathieu-Daudé2018-06-011-1/+0Star
| | | | | | | | Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20180528232719.4721-16-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* block: Move NVMe constants to a separate headerFam Zheng2018-02-081-697/+1Star
| | | | | | | Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20180116060901.17413-8-famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
* hw/block: Fix pin-based interrupt behaviour of NVMeHikaru Nishida2018-01-231-0/+1
| | | | | | | | | | | Pin-based interrupt of NVMe controller did not work properly because using an obsolated function pci_irq_pulse(). To fix this, change to use pci_irq_assert() / pci_irq_deassert() instead of pci_irq_pulse(). Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* nvme: Add support for Read Data and Write Data in CMBs.Stephen Bates2017-06-261-0/+1
| | | | | | | | | | | | | | | | | | | | Add the ability for the NVMe model to support both the RDS and WDS modes in the Controller Memory Buffer. Although not currently supported in the upstreamed Linux kernel a fork with support exists [1] and user-space test programs that build on this also exist [2]. Useful for testing CMB functionality in preperation for real CMB enabled NVMe devices (coming soon). [1] https://github.com/sbates130272/linux-p2pmem [2] https://github.com/sbates130272/p2pmem-test Signed-off-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* nvme: Add support for Controller Memory BuffersStephen Bates2017-05-261-0/+73
| | | | | | | | | | | | Implement NVMe Controller Memory Buffers (CMBs) which were added in version 1.2 of the NVMe Specification. This patch adds an optional argument (cmb_size_mb) which indicates the size of the CMB (in MB). Currently only the Submission Queue Support (SQS) is enabled which aligns with the current Linux driver for NVMe. Signed-off-by: Stephen Bates <sbates@raithlin.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>