summaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-zoned-metadata.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-5.3/dm-changes-2' of ↵Linus Torvalds2019-07-181-24/+0Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper updates from Mike Snitzer: - Fix zone state management race in DM zoned target by eliminating the unnecessary DMZ_ACTIVE state. - A couple fixes for issues the DM snapshot target's optional discard support added during first week of the 5.3 merge. - Increase default size of outstanding IO that is allowed for a each dm-kcopyd client and introduce tunable to allow user adjust. - Update DM core to use printk ratelimiting functions rather than duplicate them and in doing so fix an issue where DMDEBUG_LIMIT() rate limited KERN_DEBUG messages had excessive "callbacks suppressed" messages. * tag 'for-5.3/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: use printk ratelimiting functions dm kcopyd: Increase default sub-job size to 512KB dm snapshot: fix oversights in optional discard support dm zoned: fix zone state management race
| * dm zoned: fix zone state management raceDamien Le Moal2019-07-171-24/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-zoned uses the zone flag DMZ_ACTIVE to indicate that a zone of the backend device is being actively read or written and so cannot be reclaimed. This flag is set as long as the zone atomic reference counter is not 0. When this atomic is decremented and reaches 0 (e.g. on BIO completion), the active flag is cleared and set again whenever the zone is reused and BIO issued with the atomic counter incremented. These 2 operations (atomic inc/dec and flag set/clear) are however not always executed atomically under the target metadata mutex lock and this causes the warning: WARN_ON(!test_bit(DMZ_ACTIVE, &zone->flags)); in dmz_deactivate_zone() to be displayed. This problem is regularly triggered with xfstests generic/209, generic/300, generic/451 and xfs/077 with XFS being used as the file system on the dm-zoned target device. Similarly, xfstests ext4/303, ext4/304, generic/209 and generic/300 trigger the warning with ext4 use. This problem can be easily fixed by simply removing the DMZ_ACTIVE flag and managing the "ACTIVE" state by directly looking at the reference counter value. To do so, the functions dmz_activate_zone() and dmz_deactivate_zone() are changed to inline functions respectively calling atomic_inc() and atomic_dec(), while the dmz_is_active() macro is changed to an inline function calling atomic_read(). Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Reported-by: Masato Suzuki <masato.suzuki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | block: Kill gfp_t argument of blkdev_report_zones()Damien Le Moal2019-07-121-4/+12
|/ | | | | | | | | | | | | Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In preparation of using vmalloc() for large report buffer and zone array allocations used by this function, remove its "gfp_t gfp_mask" argument and rely on the caller context to use memalloc_noio_save/restore() where necessary (block layer zone revalidation and dm-zoned I/O error path). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm zoned: Fix zone report handlingDamien Le Moal2019-04-181-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | The function blkdev_report_zones() returns success even if no zone information is reported (empty report). Empty zone reports can only happen if the report start sector passed exceeds the device capacity. The conditions for this to happen are either a bug in the caller code, or, a change in the device that forced the low level driver to change the device capacity to a value that is lower than the report start sector. This situation includes a failed disk revalidation resulting in the disk capacity being changed to 0. If this change happens while dm-zoned is in its initialization phase executing dmz_init_zones(), this function may enter an infinite loop and hang the system. To avoid this, add a check to disallow empty zone reports and bail out early. Also fix the function dmz_update_zone() to make sure that the report for the requested zone was correctly obtained. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Shaun Tancheff <shaun@tancheff.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm zoned: fix various dmz_get_mblock() issuesDamien Le Moal2018-10-181-24/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | dmz_fetch_mblock() called from dmz_get_mblock() has a race since the allocation of the new metadata block descriptor and its insertion in the cache rbtree with the READING state is not atomic. Two different contexts requesting the same block may end up each adding two different descriptors of the same block to the cache. Another problem for this function is that the BIO for processing the block read is allocated after the metadata block descriptor is inserted in the cache rbtree. If the BIO allocation fails, the metadata block descriptor is freed without first being removed from the rbtree. Fix the first problem by checking again if the requested block is not in the cache right before inserting the newly allocated descriptor, atomically under the mblk_lock spinlock. The second problem is fixed by simply allocating the BIO before inserting the new block in the cache. Finally, since dmz_fetch_mblock() also increments a block reference counter, rename the function to dmz_get_mblock_slow(). To be symmetric and clear, also rename dmz_lookup_mblock() to dmz_get_mblock_fast() and increment the block reference counter directly in that function rather than in dmz_get_mblock(). Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm zoned: fix metadata block ref countingDamien Le Moal2018-10-181-9/+11
| | | | | | | | | | | | Since the ref field of struct dmz_mblock is always used with the spinlock of struct dmz_metadata locked, there is no need to use an atomic_t type. Change the type of the ref field to an unsigne integer. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: backfill missing calls to mutex_destroy()Mike Snitzer2018-01-171-0/+3
| | | | Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig2017-08-231-3/+3
| | | | | | | | | | | | | | | | | | | | This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm zoned: use GFP_NOIO in I/O pathDamien Le Moal2017-07-261-6/+6
| | | | | | | | | | Use GFP_NOIO for memory allocations in the I/O path. Other memory allocations in the initialization path can use GFP_KERNEL. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm zoned: fix overflow when converting zone ID to sectorsDamien Le Moal2017-07-051-2/+2
| | | | | | | | | | | | A zone ID is a 32 bits unsigned int which can overflow when doing the bit shifts in dmz_start_sect(). With a 256 MB zone size drive, the overflow happens for a zone ID >= 8192. Fix this by casting the zone ID to a sector_t before doing the bit shift. While at it, similarly fix dmz_start_block(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm zoned: drive-managed zoned block device targetDamien Le Moal2017-06-191-0/+2509
The dm-zoned device mapper target provides transparent write access to zoned block devices (ZBC and ZAC compliant block devices). dm-zoned hides to the device user (a file system or an application doing raw block device accesses) any constraint imposed on write requests by the device, equivalent to a drive-managed zoned block device model. Write requests are processed using a combination of on-disk buffering using the device conventional zones and direct in-place processing for requests aligned to a zone sequential write pointer position. A background reclaim process implemented using dm_kcopyd_copy ensures that conventional zones are always available for executing unaligned write requests. The reclaim process overhead is minimized by managing buffer zones in a least-recently-written order and first targeting the oldest buffer zones. Doing so, blocks under regular write access (such as metadata blocks of a file system) remain stored in conventional zones, resulting in no apparent overhead. dm-zoned implementation focus on simplicity and on minimizing overhead (CPU, memory and storage overhead). For a 14TB host-managed disk with 256 MB zones, dm-zoned memory usage per disk instance is at most about 3 MB and as little as 5 zones will be used internally for storing metadata and performing buffer zone reclaim operations. This is achieved using zone level indirection rather than a full block indirection system for managing block movement between zones. dm-zoned primary target is host-managed zoned block devices but it can also be used with host-aware device models to mitigate potential device-side performance degradation due to excessive random writing. Zoned block devices can be formatted and checked for use with the dm-zoned target using the dmzadm utility available at: https://github.com/hgst/dm-zoned-tools Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [Mike Snitzer partly refactored Damien's original work to cleanup the code] Signed-off-by: Mike Snitzer <snitzer@redhat.com>