MD: raid5 trim support

Discard for raid4/5/6 has limitation. If discard request size is small, we do discard for one disk, but we need calculate parity and write parity disk. To correctly calculate parity, zero_after_discard must be guaranteed. Even it's true, we need do discard for one disk but write another disks, which makes the parity disks wear out fast. This doesn't make sense. So an efficient discard for raid4/5/6 should discard all data disks and parity disks, which requires the write pattern to be (A, A+chunk_size, A+chunk_size*2...). If A's size is smaller than chunk_size, such pattern is almost impossible in practice. So in this patch, I only handle the case that A's size equals to chunk_size. That is discard request should be aligned to stripe size and its size is multiple of stripe size. Since we can only handle request with specific alignment and size (or part of the request fitting stripes), we can't guarantee zero_after_discard even zero_after_discard is true in low level drives. The block layer doesn't send down correctly aligned requests even correct discard alignment is set, so I must filter out. For raid4/5/6 parity calculation, if data is 0, parity is 0. So if zero_after_discard is true for all disks, data is consistent after discard. Otherwise, data might be lost. Let's consider a scenario: discard a stripe, write data to one disk and write parity disk. The stripe could be still inconsistent till then depending on using data from other data disks or parity disks to calculate new parity. If the disk is broken, we can't restore it. So in this patch, we only enable discard support if all disks have zero_after_discard. If discard fails in one disk, we face the similar inconsistent issue above. The patch will make discard follow the same path as normal write request. If discard fails, a resync will be scheduled to make the data consistent. This isn't good to have extra writes, but data consistency is important. If a subsequent read/write request hits raid5 cache of a discarded stripe, the discarded dev page should have zero filled, so the data is consistent. This patch will always zero dev page for discarded request stripe. This isn't optimal because discard request doesn't need such payload. Next patch will avoid it. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
author: Shaohua Li 2012-10-11 04:49:05 +0200
committer: NeilBrown 2012-10-11 04:49:05 +0200
commit: 620125f2bf8ff0c4969b79653b54d7bcc9d40637 (patch)
tree: 373257b7e9a236e66bc3ad99cd1d158e7430014e /drivers/md/raid5.h
parent: md/bitmap:Don't use IS_ERR to judge alloc_page(). (diff)
download: kernel-qcow2-linux-620125f2bf8ff0c4969b79653b54d7bcc9d40637.tar.gz
kernel-qcow2-linux-620125f2bf8ff0c4969b79653b54d7bcc9d40637.tar.xz
kernel-qcow2-linux-620125f2bf8ff0c4969b79653b54d7bcc9d40637.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index a9fc24901eda..18b2c4a8a1fd 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -298,6 +298,7 @@ enum r5dev_flags {
 	R5_WantReplace, /* We need to update the replacement, we have read
 			 * data in, and now is a good time to write it out.
 			 */
+	R5_Discard,	/* Discard the stripe */
 };
 
 /*
author	Shaohua Li	2012-10-11 04:49:05 +0200
committer	NeilBrown	2012-10-11 04:49:05 +0200
commit	620125f2bf8ff0c4969b79653b54d7bcc9d40637 (patch)
tree	373257b7e9a236e66bc3ad99cd1d158e7430014e /drivers/md/raid5.h
parent	md/bitmap:Don't use IS_ERR to judge alloc_page(). (diff)
download	kernel-qcow2-linux-620125f2bf8ff0c4969b79653b54d7bcc9d40637.tar.gz kernel-qcow2-linux-620125f2bf8ff0c4969b79653b54d7bcc9d40637.tar.xz kernel-qcow2-linux-620125f2bf8ff0c4969b79653b54d7bcc9d40637.zip